Kernel Problems since 1 Day

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Kernel Problems since 1 Day

Holliefant
Hi there,

I abandoned the IBM Changer and wanted to set up a new Library to test further with my TSM. But I did a github update before and now I am getting this error when I try to connect the changer to the TSM:

Mar  5 09:41:09 mhvtl-test kernel: [66430.566892] vtllibrary[7457] trap divide error ip:7f29fcc38888 sp:7fff323d0640 error:0 in libvtlscsi.so[7f29fcc2a000+1a000]

Any idea why this could be? it is with the default conf as well. I tried almost every combination.
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
try removing libvtlscsi.so (/usr/lib*/libvtlscsi.so) and 'make install' again...

Sounds like it's picking up an old version of libvtlscsi.so..

Sent from my iPad

On Mar 5, 2013, at 19:45, "Holliefant [via MHVTL - Linux Virtual Tape Library - Community Forums]" <[hidden email]> wrote:

Hi there,

I abandoned the IBM Changer and wanted to set up a new Library to test further with my TSM. But I did a github update before and now I am getting this error when I try to connect the changer to the TSM:

Mar  5 09:41:09 mhvtl-test kernel: [66430.566892] vtllibrary[7457] trap divide error ip:7f29fcc38888 sp:7fff323d0640 error:0 in libvtlscsi.so[7f29fcc2a000+1a000]

Any idea why this could be? it is with the default conf as well. I tried almost every combination.



If you reply to this email, your message will be added to the discussion below:
http://mhvtl-linux-virtual-tape-library-community-forums.966029.n3.nabble.com/Kernel-Problems-since-1-Day-tp4025337.html
To start a new topic under MHVTL - Linux Virtual Tape Library - Community Forums, email [hidden email]
To unsubscribe from MHVTL - Linux Virtual Tape Library - Community Forums, click here.
NAML
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

claudio
Hi there,
I have the same problem, I tried to remove libvtlscsi.so from /usr/lib after, make && 'make install' again, but the problem there is still ....

trap from my syslog ....

Mar  7 15:39:06 iscsiserver vtllibrary[1858]: num_available_elements(): Determing 57 elements of type ANY starting at 1, returning 57
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr(): Building READ ELEMENT STATUS Header struct
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():  Starting slot: 1, number of configured slots: 57
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():  Element Status Data HEADER: 00 01 00 39 00 00 6c 08
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():  Decoded:
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():   First element Address    : 1
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():   Number elements reported : 57
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():   Total byte count         : 27656 (0x6c08)
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: decode_element_status(): Element Status Data
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: decode_element_status():   First element reported       : 1
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: decode_element_status():   Number of elements available : 57
Mar  7 15:39:06 iscsiserver kernel: [  892.185705] mhvtl: CDB (127) 16 bytes
Mar  7 15:39:06 iscsiserver kernel: [  892.185715]  b8 10 00 01 00 39 01 00 0d 30 00 00 00 00 00 00
Mar  7 15:39:06 iscsiserver kernel: [  892.187150] vtllibrary[1858] trap divide error ip:7f7f4e510ae5 sp:7fff58843520 error:0 in libvtlscsi.so (deleted)[7f7f4e504000+1a
000]

Those are the config files ......
device.conf
library_contents.50
mhvtl.conf

OS. Ubuntu 12.04.1 LTS
vtlcmd -V Version: 1.4.6-git-85c5f69

Thanks in advance ...

Regards
Claudio


Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
Thanks for the details. With the info, I've reproduced the error and will submit a patch shortly.

Sent from my iPhone

On 08/03/2013, at 3:04, "claudio [via MHVTL - Linux Virtual Tape Library - Community Forums]" <[hidden email]> wrote:

Hi there,
I have the same problem, I tried to remove libvtlscsi.so from /usr/lib after, make && 'make install' again, but the problem there is still ....

trap from my syslog ....

Mar  7 15:39:06 iscsiserver vtllibrary[1858]: num_available_elements(): Determing 57 elements of type ANY starting at 1, returning 57
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr(): Building READ ELEMENT STATUS Header struct
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():  Starting slot: 1, number of configured slots: 57
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():  Element Status Data HEADER: 00 01 00 39 00 00 6c 08
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():  Decoded:
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():   First element Address    : 1
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():   Number elements reported : 57
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: fill_element_status_data_hdr():   Total byte count         : 27656 (0x6c08)
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: decode_element_status(): Element Status Data
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: decode_element_status():   First element reported       : 1
Mar  7 15:39:06 iscsiserver vtllibrary[1858]: decode_element_status():   Number of elements available : 57
Mar  7 15:39:06 iscsiserver kernel: [  892.185705] mhvtl: CDB (127) 16 bytes
Mar  7 15:39:06 iscsiserver kernel: [  892.185715]  b8 10 00 01 00 39 01 00 0d 30 00 00 00 00 00 00
Mar  7 15:39:06 iscsiserver kernel: [  892.187150] vtllibrary[1858] trap divide error ip:7f7f4e510ae5 sp:7fff58843520 error:0 in libvtlscsi.so (deleted)[7f7f4e504000+1a
000]

Those are the config files ......
device.conf
library_contents.50
mhvtl.conf

OS. Ubuntu 12.04.1 LTS
vtlcmd -V Version: 1.4.6-git-85c5f69

Thanks in advance ...

Regards
Claudio





To start a new topic under MHVTL - Linux Virtual Tape Library - Community Forums, email [hidden email]
To unsubscribe from MHVTL - Linux Virtual Tape Library - Community Forums, click here.
NAML
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
Commit now in github which should resolve the issue - can somebody please confirm..
  ==================
commit 79e918595834fd750ecce90b996df43da7b0bb99
Author: Mark Harvey <markh794@gmail.com>
Date:   Fri Mar 8 07:37:28 2013 +1100

    Fix incorrect handling of 'ANY' slot type

    Resulted in segfault in vtllibrary

    This 'read element status' would trigger the segfault.
    sg_raw -r 1k /dev/sg12 b8 10 00 01 00 39 01 00 0d 30 00 00 00 00 00 00

    Reference:
    http://mhvtl-linux-virtual-tape-library-community-forums.966029.n3.nabble.co

    Reported-by: Holliefant on mhvtl-linux-virtual-tape-library-community-forums
    Signed-off-by: Mark Harvey <markh794@gmail.com>
  ==================
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

claudio
Hi Mark, thanks for patch ... I tried new release and now the library doesn't crash but my app isn't able to initialize the library. App error :
Invalid element type : 0

vtlcmd -V Version: 1.4.6-git-79e9185

With the 1.4.4 the app worked fine ...

messages file ... the blank separation is when my app tried to initialize the library .

messages


Thanks in advance ....

Regards from Italy
Claudio







Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
- 18:55:25 localhost vtllibrary[29844]: num_available_elements(): Determing 5 elements of type ANY starting at 1, returning 5
- 18:55:25 localhost vtllibrary[29844]: fill_element_page(): Available count: 5, type: 0
- 18:55:25 localhost vtllibrary[29844]: fill_element_status_page_hdr(): Element Status Page Header: 00 80 00 34 00 00 01 04
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Slot location: 1, DVCID: 1, VOLTAG: 1
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Returning 86 bytes
- 18:55:25 localhost vtllibrary[29844]: fill_element_page(): Count: 1, max_count: 5, slot: 1
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Slot location: 704, DVCID: 1, VOLTAG: 1
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Returning 52 bytes
- 18:55:25 localhost vtllibrary[29844]: fill_element_page(): Count: 2, max_count: 5, slot: 704
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Slot location: 768, DVCID: 1, VOLTAG: 1
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Returning 52 bytes
- 18:55:25 localhost vtllibrary[29844]: fill_element_page(): Count: 3, max_count: 5, slot: 768
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Slot location: 769, DVCID: 1, VOLTAG: 1
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Returning 52 bytes
- 18:55:25 localhost vtllibrary[29844]: fill_element_page(): Count: 4, max_count: 5, slot: 769
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Slot location: 1024, DVCID: 1, VOLTAG: 1
- 18:55:25 localhost vtllibrary[29844]: fill_element_descriptor(): Returning 52 bytes
- 18:55:25 localhost vtllibrary[29844]: fill_element_page(): Count: 5, max_count: 5, slot: 1024
OK, back to the drawing board (for another patch)..
The 'query' is for 5 elements of any 'type'
The first loop thru, we find 1 drive (slot 1), followed by 1 'picker' (slot 704), followed by 2 MAP (slots 768 & 769) and 5th slot is type Storage at slot 1024..
Then the code increments from the 'drive' slot and starts at slot 704 (picker) and goes over the same slots again..

Now I understand the bug, the patch should be coming 'real soon now'..

Many thanks for hanging in, testing and reporting bugs..
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
Next fix now submitted:
commit cdd7d4e9c368579ecc5ac8c43d5a7a12d284ba00
Author: Mark Harvey <markh794@gmail.com>
Date:   Sat Mar 9 14:11:26 2013 +1100

    READ ELEMENT STATUS: query 'ANY' spaning multiple slot types

    - This fixes returned data of type 'ANY' if address range spans multiple
    slot types.
    - Also included is a fix to dump the element data correctly

    tested on library config with one drive - this queries 5 slots of type 'any'
    sg_raw -r 1k /dev/sg12 b8 10 00 01 00 05 01 00 01 88 00 00
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

claudio
Hi Mark, thanks for your time.
I  have no good news, my application isn't able to initialize the library, the major error now is :

 Invalid Element Status Page: 16711680, 65535

In the ApplicationTrace file, the apllication debug with the three latest versions and messages with Version: 1.4.6-git-cdd7d4e


ApplicationTrace.txt
messages_v1.4.6-git-cdd7d4e

I hope this helps.
thanks in advance
Claudio
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

claudio
Hi Mark, watching the operation I noticed that the calculated total byte count is 456 and the next offset is 542.

   Offset (main header length)=8, total length=456 (offset + byte count) (8 + 448)
   Offset=8, byte count=86, descriptor length= 86
   Next offset=102, current element descriptor set type=3
   Offset=102, byte count=52, descriptor length= 52
   Next offset=162, current element descriptor set type=0
   Offset=162, byte count=104, descriptor length= 52
   Next offset=274, current element descriptor set type=2
   Offset=274, byte count=260, descriptor length= 52
   Next offset=542, current element descriptor set type=1
   Major, cause = Invalid Element Status Page: 16711680, 65535)

I think to have :
one DRIVE (4) byte count 86x1 + offset
one PICKER (1) byte count 52x1 + offset
two MAP (3) byte count 52x2 + offset
five STORAGE(2) byte count 52x5 + offset

The result of my configuration should be 8 + 534  (offset + byte count) 542 within the next offset.

I read in the messages_v1.4.6-git-cdd7d4e file  and in "Building READ ELEMENT STATUS Header struct" section, type ANY, the total byte count is 448.

I'm sorry if I misunderstood, thanks for your time.
Regards
Claudio

Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
I've not forgotten this.. Work is getting in the way of fun...hopefully a fix soon

Sent from my iPad

On Mar 12, 2013, at 2:53, "claudio [via MHVTL - Linux Virtual Tape Library - Community Forums]" <[hidden email]> wrote:

Hi Mark, watching the operation I noticed that the calculated total byte count is 456 and the next offset is 542.

   Offset (main header length)=8, total length=456 (offset + byte count) (8 + 448)
   Offset=8, byte count=86, descriptor length= 86
   Next offset=102, current element descriptor set type=3
   Offset=102, byte count=52, descriptor length= 52
   Next offset=162, current element descriptor set type=0
   Offset=162, byte count=104, descriptor length= 52
   Next offset=274, current element descriptor set type=2
   Offset=274, byte count=260, descriptor length= 52
   Next offset=542, current element descriptor set type=1
   Major, cause = Invalid Element Status Page: 16711680, 65535)

I think to have :
one DRIVE (4) byte count 86x1 + offset
one PICKER (1) byte count 52x1 + offset
two MAP (3) byte count 52x2 + offset
five STORAGE(2) byte count 52x5 + offset

The result of my configuration should be 8 + 534  (offset + byte count) 542 within the next offset.

I read in the messages_v1.4.6-git-cdd7d4e file  and in "Building READ ELEMENT STATUS Header struct" section, type ANY, the total byte count is 448.

I'm sorry if I misunderstood, thanks for your time.
Regards
Claudio




To start a new topic under MHVTL - Linux Virtual Tape Library - Community Forums, email [hidden email]
To unsubscribe from MHVTL - Linux Virtual Tape Library - Community Forums, click here.
NAML
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
OK, 3rd time lucky..

I've just pushed (hopefully) the final fix for READ ELEMENT STATUS fallout after rewriting the op code function.

commit 3c8baf5e1e1e840200e518fccf588f1f1377858a
Author: Mark Harvey <markh794@gmail.com>
Date:   Thu Mar 14 07:49:25 2013 +1100

    READ ELEMENT STATUS: Add byte count of drives to total
   
    Missed adding drive structure size to total byte count
Please let me know the results of any testing.
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

claudio
Hi Mark,
I tried the new patch but the issue there is still. After a few attemps, I toke the liberty to change the following value
on file smc.c line 856
- elem_byte_count += byte_count;
+ elem_byte_count = byte_count;
Now it works. The total length have the same value of the last offset. I'm happy.
With the new release I  not even have the error on tapeusage and so on, this is fantastic.

Very very thanks for you time...
Regards (ciao)
Claudio.

Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
Many thanks for the update.

I need to re-visit the T10 specs and my code with a fresh set of eyes..
I would have thought the change you introduced would have reported 8 bytes too short (i.e. the header size).
However testing with real world software proves otherwise..
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

Mark Harvey
Administrator
In reply to this post by claudio
Hi Claudio,

Many thanks for pointing out a fix..

Update (#4)

As always, many thanks for your willingness to help out and hanging in there until I finally figure it out..

commit eed22ddb8f8ff28453f13e50afbb47838a16ccb8
Author: Mark Harvey <markh794@gmail.com>
Date:   Fri Mar 15 17:40:08 2013 +1100

    READ ELEMENT STATUS: Correct 'byte count of report available'
   
    As per smc4r15, 6.12.2:
    The byte count does not include the byte count of the actual
    element status data header.
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel Problems since 1 Day

claudio
Hi Mark, I tried the latest release and it works fine.
Thanks again for availability.

Regards
Claudio