Hardware Error 0x0403 on load

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Hardware Error 0x0403 on load

Tim Jones
Hi Folks,

Just ran into an oddity when trying to configure a library with multiple drives.  It varies between launches, but eventually one of the drives will report a hardware error of 0x0403 (LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED).

I have verified that all drives are seen as IBM TD5 and all of the tapes in the stack are labeled as L5 media.

You can duplicate this with the default configuration of 4 drives in the first library with the following loop:

for x in 0 1 2 3
do
    mtx -f /dev/sg6 load 1 $x
    sleep 2
    mtx -f /dev/sg6 unload 1 $x
    sleep 2
done

I added the sleep calls thinking that it may be a timing error with no difference.  Once it happens, executing a single load to the failing drive will continue to result in this error.

Configuration -
CentOS 7.5 1804
Current mhvtl pull from github as of this morning (04/02)
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Mark Harvey
Administrator
Can you post messages (syslog) ?

Perhaps change VERBOSE = 3 in /etc/mhvtl/mhvtl.conf

Rerun until failure, then post messages 



Sent from my iPhone

On 3 Apr 2019, at 04:01, Tim Jones [via mhVTL - A Linux Virtual Tape Library] <[hidden email]> wrote:

Hi Folks,

Just ran into an oddity when trying to configure a library with multiple drives.  It varies between launches, but eventually one of the drives will report a hardware error of 0x0403 (LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED).

I have verified that all drives are seen as IBM TD5 and all of the tapes in the stack are labeled as L5 media.

You can duplicate this with the default configuration of 4 drives in the first library with the following loop:

for x in 0 1 2 3
do
    mtx -f /dev/sg6 load 1 $x
    sleep 2
    mtx -f /dev/sg6 unload 1 $x
    sleep 2
done

I added the sleep calls thinking that it may be a timing error with no difference.  Once it happens, execute a single load to the failing drive will continue to result in this error.

Configuration -
CentOS 7.5 1804
Current mhvtl pull from github as of this morning (04/02)



If you reply to this email, your message will be added to the discussion below:
http://mhvtl-a-linux-virtual-tape-library.966029.n3.nabble.com/Hardware-Error-0x0403-on-load-tp4026226.html
To start a new topic under mhVTL - A Linux Virtual Tape Library, email [hidden email]
To unsubscribe from mhVTL - A Linux Virtual Tape Library, click here.
NAML
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Mark Harvey
Administrator
In reply to this post by Tim Jones
I’ve reproduced the error. 

Interesting 

No idea ATM. But hopefully it won’t be too challenging 

Sent from my iPhone

On 3 Apr 2019, at 07:19, Mark Harvey <[hidden email]> wrote:

Can you post messages (syslog) ?

Perhaps change VERBOSE = 3 in /etc/mhvtl/mhvtl.conf

Rerun until failure, then post messages 



Sent from my iPhone

On 3 Apr 2019, at 04:01, Tim Jones [via mhVTL - A Linux Virtual Tape Library] <[hidden email]> wrote:

Hi Folks,

Just ran into an oddity when trying to configure a library with multiple drives.  It varies between launches, but eventually one of the drives will report a hardware error of 0x0403 (LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED).

I have verified that all drives are seen as IBM TD5 and all of the tapes in the stack are labeled as L5 media.

You can duplicate this with the default configuration of 4 drives in the first library with the following loop:

for x in 0 1 2 3
do
    mtx -f /dev/sg6 load 1 $x
    sleep 2
    mtx -f /dev/sg6 unload 1 $x
    sleep 2
done

I added the sleep calls thinking that it may be a timing error with no difference.  Once it happens, execute a single load to the failing drive will continue to result in this error.

Configuration -
CentOS 7.5 1804
Current mhvtl pull from github as of this morning (04/02)



If you reply to this email, your message will be added to the discussion below:
http://mhvtl-a-linux-virtual-tape-library.966029.n3.nabble.com/Hardware-Error-0x0403-on-load-tp4026226.html
To start a new topic under mhVTL - A Linux Virtual Tape Library, email [hidden email]
To unsubscribe from mhVTL - A Linux Virtual Tape Library, click here.
NAML
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Mark Harvey
Administrator
This post was updated on .
Just to confirm it is the same issue.

run loop until it errors (take note of the drive the tape load failed).

Or brute-force check all tape drives:
My dev config
[3:0:0:0]    mediumx STK      L700             0106  /dev/sch0  /dev/sg8 
[3:0:1:0]    tape    IBM      ULT3580-TD5      0106  /dev/st5   /dev/sg7 
[3:0:2:0]    tape    IBM      ULT3580-TD5      0106  /dev/st7   /dev/sg11
[3:0:3:0]    tape    IBM      ULT3580-TD4      0106  /dev/st3   /dev/sg5 
[3:0:4:0]    tape    IBM      ULT3580-TD4      0106  /dev/st6   /dev/sg10

Then,
# for a in 5 7 3 6; do echo st$a;  mt -f /dev/nst$a status; done
st5
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN
st7
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN
st3
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN
st6
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (1010000):
 ONLINE IM_REP_EN
[root@tempcentos ~]#

(which is really, really bad !)

And to confirm it is really, really bad... lsof shows the same virtual media opened by 4 different devices (daemons)..
# lsof | grep E01001
vtltape   17913                 root    5u      REG              253,0         0   67880002 /opt/mhvtl/E01001L4/data
vtltape   17913                 root    6u      REG              253,0         0   67880003 /opt/mhvtl/E01001L4/indx
vtltape   17913                 root    7u      REG              253,0      1536   67880005 /opt/mhvtl/E01001L4/meta
vtltape   17914                 root    5u      REG              253,0         0   67880002 /opt/mhvtl/E01001L4/data
vtltape   17914                 root    6u      REG              253,0         0   67880003 /opt/mhvtl/E01001L4/indx
vtltape   17914                 root    7u      REG              253,0      1536   67880005 /opt/mhvtl/E01001L4/meta
vtltape   17915                 root    5u      REG              253,0         0   67880002 /opt/mhvtl/E01001L4/data
vtltape   17915                 root    6u      REG              253,0         0   67880003 /opt/mhvtl/E01001L4/indx
vtltape   17915                 root    7u      REG              253,0      1536   67880005 /opt/mhvtl/E01001L4/meta
vtltape   17918                 root    5u      REG              253,0         0   67880002 /opt/mhvtl/E01001L4/data
vtltape   17918                 root    6u      REG              253,0         0   67880003 /opt/mhvtl/E01001L4/indx
vtltape   17918                 root    7u      REG              253,0      1536   67880005 /opt/mhvtl/E01001L4/meta
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Mark Harvey
Administrator
I've identified the root cause and have a fix in testing..

The error is due to a race condition on media unload.
i.e. vtllibrary -> send msg via queue to vtltape : On media load, vtltape send ack msg back to vtllibrary before acknowledging the load was successful. This handshake did not occur on media unload.
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Mark Harvey
Administrator
Commit the fix to GitHub

commit cbbe7d47f538bccd2aa7e24219a9b19da539a064
Author: Mark Harvey <markh794@gmail.com>
Date:   Wed Apr 3 14:11:47 2019 +1100

    Fix a media unload race condition
    
    http://mhvtl-a-linux-virtual-tape-library.966029.n3.nabble.com/Hardware-Error-0x0403-on-load-td4026226.html
    
    Signed-off-by: Mark Harvey <markh794@gmail.com>

Please check it resolves the issue you were seeing.
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Tim Jones
In reply to this post by Mark Harvey
The race is what I suspected - which is the reason for my sleep calls in the loop.

Good find.

I'll update things on the test systems this morning.

Also - you misspelled mktape (mkape) in the else portion of the check for mktape in the make_vtl_media script.  You don't notice this if you pass in the -m argument.

Finally, opeSUSE LEAP:

make
cd kernel
make
sudo -s
make install
cd ..
make install
ldconfig
systemctl enable mhvtl.target
systemctl daemon-reload
reboot

et voila!

~> lsscsi -g
[0:0:0:0]    disk    ATA      WDC WD5000BPKT-0 1A01  /dev/sda   /dev/sg0
[0:0:1:0]    disk    ATA      WDC WD5000BPKT-0 1A01  /dev/sdd   /dev/sg3
[0:0:2:0]    disk    ATA      WDC WD5000BPKT-0 1A01  /dev/sde   /dev/sg4
[0:0:3:0]    disk    ATA      WDC WD5000BPKT-0 1A01  /dev/sdf   /dev/sg5
[1:0:0:0]    disk    ATA      OCZ-VERTEX4      1.5   /dev/sdb   /dev/sg1
[2:0:0:0]    disk    ATA      SPCC Solid State 7A    /dev/sdc   /dev/sg2
[7:0:0:0]    mediumx STK      L700             0106  /dev/sch0  /dev/sg8
[7:0:1:0]    tape    IBM      ULT3580-TD5      0106  /dev/st1   /dev/sg7
[7:0:2:0]    tape    IBM      ULT3580-TD5      0106  /dev/st0   /dev/sg6
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Tim Jones
In reply to this post by Mark Harvey
Hi Mark,

Using the patch from yesterday and now all load attempts fail with the 0403 ASC/ASCQ.

Tim
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Mark Harvey
Administrator
Ouch..

Can I ask for the output of:

- lsof | grep vtl
- mt -f /dev/nstX status (of all of the mhVTL tape devices).
- /var/log/messages might be good too
 Or at least: 
- egrep "lload|msg :" /var/log/messages



On 5 Apr 2019, at 6:06 am, Tim Jones [via mhVTL - A Linux Virtual Tape Library] <[hidden email]> wrote:

Hi Mark,

Using the patch from yesterday and now all load attempts fail with the 0403 ASC/ASCQ.

Tim



If you reply to this email, your message will be added to the discussion below:
http://mhvtl-a-linux-virtual-tape-library.966029.n3.nabble.com/Hardware-Error-0x0403-on-load-tp4026226p4026233.html
To start a new topic under mhVTL - A Linux Virtual Tape Library, [hidden email]
To unsubscribe from mhVTL - A Linux Virtual Tape Library, click here.
NAML

Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Tim Jones
It must be something tied to openSUSE LEAP.  My other systems seem happy with this current code.

I'll increase the debug level and do a bit more sleuthing tomorrow.
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Mark Harvey
Administrator
Perhaps a stale libvtl*.so being picked up ?

e.g.
# ls -l /usr/lib/libvtl*
-rwxr-xr-x. 1 root root 2377338 Apr  5 15:43 /usr/lib/libvtlcart.so
-rwxr-xr-x. 1 root root  410555 Apr  5 15:43 /usr/lib/libvtlscsi.so

And check with "ldd /usr/bin/vtltape" & "ldd /usr/bin/vtllibrary" as to which ones are being loaded.
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Mark Harvey
Administrator
My testing has exposed a logic flaw with the existing 'fix'.

if you "mt -f /dev/nstX offline' *before* attempting to move the tape from the slot, the move fails..

Looks like I need a 'tape loaded, but not threaded' and a 'tape loaded, threaded and ready for action'.

More patches coming :)
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Tim Jones
Here's two that we look for explicitly:

        case 0x0401: strcat ( str, "Logical unit not ready, in process of becoming ready" ); break;
        case 0x0402: strcat ( str, "Logical unit not ready, initializing command required" ); break;

The second is representative of the tape being unloaded from the drive (REWOFF), but still in the mouth of the drive.

Sometime, we also see:

        case 0x3F11: strcat ( str, "Medium auxiliary memory accessible" ); break;

This occurs when tape is offline (in the mouth of the drive) because the RFID/MIC reader can still read the MAM/MIC.

Reply | Threaded
Open this post in threaded view
|

Re: Hardware Error 0x0403 on load

Tim Jones
I think I've found the culprit -

I changed the drive types to TD6 instead of TD5.  When I changed back to TD5 and remade the tapes as L5 media, it's working again.

I'll look into the model entries and get the 6, 7, and 8 devices added for IBM and HPE.

Tim