mfi troubles (Unexpected Sense)

Discussion:

(too old to reply)

Dmitry Morozovsky

2018-07-15 20:27:11 UTC

Colleagues,

one of my servers start to expose unexpected delays possibly related to disk
subsystems.

It's Supemicro with mfi, ZFS on set of RAID0 (yes, I know we've missed the
right controller, but this is out of question at least for the current time)

Now kernel log is filled with messages like

mfi0: 60006 (585001405s/0x0002/info) - Unexpected sense: PD 0e(e0x08/s5) Path
500304800021bf31, CDB: 8f 00 00 00 00 00 14 77 5c 1e 00 00 10 00 00 00, Sense:
3/11/00

every few seconds

I tried to find the place in the source which produce these lines but failed :(

Hard reboot, including full power off, was tried, but did not help.

Any hints to diagnose this further?

Ah, and this is stable/10 from Nov 2017

please keep me CC:d as I'm not subscribed to -scsi@

Thanks!
--
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ***@FreeBSD.org ]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ***@rinet.ru ***
------------------------------------------------------------------------
_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-***@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Dmitry Morozovsky

2018-07-15 21:18:32 UTC

Permalink

Rick,

"Unexpected sense" is generally something an LSI controller says when a disk
is going bad. PD identifies the physical disk involved.
If this is RAID 0 I would be very afraid, you are about to lose all of your
data. Actually I would always be very afraid with RAID 0. Make sure you have
a good backup.

As I said, it's zfs with raidz2 (12 disks each of them in RAID0, cause this
particular controller does not support JBOD exporting), so data integrity is
not an issue (yet)

the question is -- how can I identify the disk in question? mfiutil says
everything's ok, and no red lights are on physical drive cages

Thanks!
--
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ***@FreeBSD.org ]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ***@rinet.ru ***
------------------------------------------------------------------------
_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-***@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Douglas Gilbert

2018-07-15 22:46:20 UTC

Permalink

Post by Dmitry Morozovsky
Colleagues,
one of my servers start to expose unexpected delays possibly related to disk
subsystems.
It's Supemicro with mfi, ZFS on set of RAID0 (yes, I know we've missed the
right controller, but this is out of question at least for the current time)
Now kernel log is filled with messages like
mfi0: 60006 (585001405s/0x0002/info) - Unexpected sense: PD 0e(e0x08/s5) Path
3/11/00
every few seconds

That's a SCSI VERIFY(16) command with a sense key of medium error and
additional sense of 'unrecovered read error'. [Note to FreeBSD SCSI
maintainers: how about some leading '0x' or trailing 'h' for hex numbers ??]

Translation: a disk is dying, probably associated with NAA 0x500304800021bf31.

Is there any enclosure management? A device like /dev/ses*

If so, try 'sg_ses /dev/ses<n>' and look for that NAA (or a close number to
it (within 3)). I'll assume you have an exact match.

Then try 'sg_ses -A 0x500304800021bf31 --set=ident /dev/ses<n>'

That should cause a LED to flash on the disk carrier of the damaged disk.
To stop it flashing substitute "clear" for "set" in the previous invocation.

Post by Dmitry Morozovsky
I tried to find the place in the source which produce these lines but failed :(
Hard reboot, including full power off, was tried, but did not help.
Any hints to diagnose this further?
Ah, and this is stable/10 from Nov 2017

Good luck
Doug Gilbert

P.S. I'm currently trying to recover data from a disk whose heads got
stuck ... so I know the feeling.

_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-***@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de