Kernel crash with mhvtl

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Kernel crash with mhvtl

rohr22
I received the following kernel crash while trying to write to write to a mhvtl tape with version 1.5.3:

      KERNEL: vmlinux                          
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 8
        DATE: Fri Dec 11 17:24:09 2015
      UPTIME: 1 days, 06:44:55
LOAD AVERAGE: 1.88, 1.45, 1.15
       TASKS: 10897
    NODENAME: ---------------
     RELEASE: 2.6.32-573.el6.ppc64
     VERSION: #1 SMP Wed Jul 1 18:21:11 EDT 2015
     MACHINE: ppc64  (3550 Mhz)
      MEMORY: 24 GB
       PANIC: "Unable to handle kernel paging request for data at address 0x00100070"
         PID: 30724
     COMMAND: "vtltape"
        TASK: c00000043794ce00  [THREAD_INFO: c0000005ea85c000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

The backtrace showed:
crash> bt
PID: 30724  TASK: c00000043794ce00  CPU: 0   COMMAND: "vtltape"
 #0 [c0000005ea85f4f0] .crash_kexec at c0000000000ec0e4
 #1 [c0000005ea85f6f0] .die at c000000000031638
 #2 [c0000005ea85f7a0] .bad_page_fault at c000000000044bd8
 #3 [c0000005ea85f820] handle_page_fault at c000000000005228
 Data Access error  [300] exception frame:
 R0:  0000000000000002    R1:  c0000005ea85fb10    R2:  d000000005fbcbf8  
 R3:  c00000044c9fb780    R4:  0000000000000200    R5:  00000fffffffe868  
 R6:  00000fffffffe868    R7:  0000000000000000    R8:  0000000000000005  
 R9:  0000000000100100    R10: c0000000001e6660    R11: c0000005eabc9718  
 R12: d000000005fb2ea8    R13: c000000001072500    R14: 0000000000000003  
 R15: 0000000000000000    R16: 00000000100377a0    R17: 000000802d59a980  
 R18: 0000000010021e10    R19: 00000fffffffe8b0    R20: 00000fffffffea70  
 R21: 00000000100376e0    R22: 0000000010021e18    R23: 0000000010037868  
 R24: 0000000000000000    R25: 00000000100378b8    R26: 0000000000000000  
 R27: 00000fffffffe868    R28: 0000000000000200    R29: ffffffffffffffed  
 R30: d000000005fbcc08    R31: 0000000000100070  
 NIP: d000000005fb2310    MSR: 8000000000009032    OR3: c000000000f1cb10
 CTR: c0000000005e7b80    LR:  d000000005fb202c    XER: 0000000000000000
 CCR: 0000000022002248    MQ:  0000000000000001    DAR: 0000000000100070
 DSISR: 0000000040000000     Syscall Result: 0000000000000000
 #4 [c0000005ea85fb10] .vtl_c_ioctl at d000000005fb2310 [mhvtl]
 [Link Register ]  [c0000005ea85fb10] .vtl_c_ioctl at d000000005fb202c  (unreliable)
 #5 [c0000005ea85fc00] .vfs_ioctl at c0000000001e5ce4
 #6 [c0000005ea85fc90] .do_vfs_ioctl at c0000000001e5f30
 #7 [c0000005ea85fd80] .sys_ioctl at c0000000001e6714
 #8 [c0000005ea85fe30] syscall_exit at c000000000008564
 syscall  [c00] exception frame:
 R0:  0000000000000036    R1:  00000fffffffe760    R2:  00000080785332d8  
 R3:  0000000000000003    R4:  0000000000000200    R5:  00000fffffffe868  
 R6:  0000000000000000    R7:  0000000000000000    R8:  0000000000000005  
 R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000  
 R12: 0000000000000000    R13: 000000807834df80    R14: 0000000000000003  
 R15: 0000000000000000    R16: 00000000100377a0    R17: 000000802d59a980  
 R18: 0000000010021e10    R19: 00000fffffffe8b0    R20: 00000fffffffea70  
 R21: 00000000100376e0    R22: 0000000010021e18    R23: 0000000010037868  
 R24: 0000000000000000    R25: 00000000100378b8    R26: 00000000100376e0  
 R27: 0000000010037790    R28: 00000000100377a0    R29: 0000000010038258  
 R30: 00000fffffffe858    R31: 00000fffffffe868  
 NIP: 0000008078470270    MSR: 800000000000d032    OR3: 0000000000000003
 CTR: 00000080784701d0    LR:  000000001000cbf0    XER: 0000000000000000
 CCR: 0000000048002248    MQ:  0000000000000001    DAR: 0000008078467d70
 DSISR: 0000000040000000     Syscall Result: 00000000014a8000

Is this a known issue and is a fix available?

Thank you,
Peter
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash with mhvtl

Mark Harvey
Administrator
Hello Peter,

Unfortunately, this is a new bug report. It's also the first report I've seen of the vtl running on PPC :)

I have no method to troubleshoot/diagnose this. Analyzing kernel oops (unfortunately) exceeds my debug skills. Hopefully the syslog will show what ioctl() was being utilised at the time of the crash.

Do you have the syslog (typically /var/log/messages) leading up to this crash ?

Enabling kernel debugging may throw more light what was occurring at the time.

Note: I would dearly love to move away from this custom (a hacked scsi_debug) kernel module and to the newer SCSI target driver now shipped with linux kernel. I've not found the time to make the changes. With Christmas/New Year fast approaching, I can not see any free time to do this until February at the earliest..

I wish I had better news for you.
Regards from Australia
Mark Harvey
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash with mhvtl

rohr22
Mark, that is interesting that this is the first time you are aware of vtl running on PPC. Our PPC system uses the big-endian format for storing words. I think the kernel crash only occurs when we are trying to create files over 4 GB (> 32 bits) in size. Maybe something with that combination is causing the crash. Maybe you could briefly analyze the code to see if this could be the mix that causes the crash.

Thank you,
Peter
Reply | Threaded
Open this post in threaded view
|

Re: Kernel crash with mhvtl

rohr22
Hi, Mark. We are still running mhvtl on PPC64 systems and still are getting periodic kernel crashes. Yesterday a kernel crash occurred and crash vmcore /usr/lib/debug/lib/modules/2.6.32-573.el6.ppc64/vmlinux showed:

This GDB was configured as "powerpc64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-573.el6.ppc64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 80
        DATE: Wed Oct  5 20:23:03 2016
      UPTIME: 05:00:15
LOAD AVERAGE: 4.87, 1.74, 1.24
       TASKS: 4549
    NODENAME: ................................
     RELEASE: 2.6.32-573.el6.ppc64
     VERSION: #1 SMP Wed Jul 1 18:21:11 EDT 2015
     MACHINE: ppc64  (3000 Mhz)
      MEMORY: 30 GB
       PANIC: "Unable to handle kernel paging request for data at address 0x5bc020000fffe8"
         PID: 8512
     COMMAND: "vtltape"
        TASK: c00000075ce5e5c0  [THREAD_INFO: c00000076c230000]
         CPU: 40
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 8512   TASK: c00000075ce5e5c0  CPU: 40  COMMAND: "vtltape"
 #0 [c00000076c2334f0] .crash_kexec at c0000000000ec0e4
 #1 [c00000076c2336f0] .die at c000000000031638
 #2 [c00000076c2337a0] .bad_page_fault at c000000000044bd8
 #3 [c00000076c233820] handle_page_fault at c000000000005228
 Data Access error  [300] exception frame:
 R0:  0000000000000000    R1:  c00000076c233b10    R2:  d000000005d8cbf8  
 R3:  c000000001041a00    R4:  0000000000000200    R5:  00000000008881f8  
 R6:  00000ffffb35c638    R7:  0000000000000000    R8:  0000000000000005  
 R9:  005bc02000100078    R10: c000000000d92000    R11: c00000075d183718  
 R12: d000000005d82ea8    R13: c000000001078900    R14: 0000000000000003  
 R15: 0000000000000000    R16: 00000000100377a0    R17: 000000801b38a980  
 R18: 0000000010021e10    R19: 00000ffffb35c680    R20: 00000ffffb35c840  
 R21: 00000000100376e0    R22: 0000000010021e18    R23: 0000000010037868  
 R24: 0000000000000000    R25: 00000000100378b8    R26: 0000000000000000  
 R27: 00000ffffb35c638    R28: 0000000000000200    R29: ffffffffffffffed  
 R30: d000000005d8cc08    R31: 005bc020000fffe8  
 NIP: d000000005d82310    MSR: 8000000000009032    OR3: c000000000f1cb10
 CTR: c0000000005e7b80    LR:  d000000005d8202c    XER: 0000000000000000
 CCR: 0000000022002248    MQ:  0000000000000001    DAR: 005bc020000fffe8
 DSISR: 0000000040000000     Syscall Result: 0000000000000000
 #4 [c00000076c233b10] .vtl_c_ioctl at d000000005d82310 [mhvtl]
 [Link Register ]  [c00000076c233b10] .vtl_c_ioctl at d000000005d8202c  (unreliable)
 #5 [c00000076c233c00] .vfs_ioctl at c0000000001e5ce4
 #6 [c00000076c233c90] .do_vfs_ioctl at c0000000001e5f30
 #7 [c00000076c233d80] .sys_ioctl at c0000000001e6714
 #8 [c00000076c233e30] syscall_exit at c000000000008564
 syscall  [c00] exception frame:
 R0:  0000000000000036    R1:  00000ffffb35c530    R2:  00000080227132d8  
 R3:  0000000000000003    R4:  0000000000000200    R5:  00000ffffb35c638  
 R6:  0000000000000000    R7:  0000000000000000    R8:  0000000000000005  
 R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000  
 R12: 0000000000000000    R13: 000000802252dfa0    R14: 0000000000000003  
 R15: 0000000000000000    R16: 00000000100377a0    R17: 000000801b38a980  
 R18: 0000000010021e10    R19: 00000ffffb35c680    R20: 00000ffffb35c840  
 R21: 00000000100376e0    R22: 0000000010021e18    R23: 0000000010037868  
 R24: 0000000000000000    R25: 00000000100378b8    R26: 00000000100376e0  
 R27: 0000000010037790    R28: 00000000100377a0    R29: 0000000010038258  
 R30: 00000ffffb35c628    R31: 00000ffffb35c638  
 NIP: 0000008022650270    MSR: 800000000000d032    OR3: 0000000000000003
 CTR: 00000080226501d0    LR:  000000001000cbf0    XER: 0000000000000000
 CCR: 0000000048002248    MQ:  0000000000000001    DAR: 0000010011900000
 DSISR: 0000000042000000     Syscall Result: 0000000000000000

I am using the mhvtl from mhvtl-2015-04-14.tgz, Do you think the above problem was resolved in the most current version of mhvtl?

Thank you,
Peter