Discussion:
Not all disks come back after power cycling a JBOD
(too old to reply)
Alan Somers
2020-03-30 16:05:30 UTC
Permalink
If I remove a hot-swappable SCSI drive and reinsert it, FreeBSD always
seems to handle that just fine. But if instead I unplug or power off an
entire JBOD, then reattach it, frequently FreeBSD fails to fails to
recreate all of the device nodes. Using "mpsutil show devices" or "mprutil
show devices" I can see all of the devices that I'm expecting. However,
"camcontrol devlist" doesn't show them, and "camcontrol rescan" doesn't
help.

This has been the situation for as long as I can remember, several years at
least. But now it's starting to cause problems for me. Before I try to
debug this myself, does anybody know anything about the problem?
_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-***@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jan Bramkamp
2020-04-01 13:42:39 UTC
Permalink
Post by Alan Somers
If I remove a hot-swappable SCSI drive and reinsert it, FreeBSD always
seems to handle that just fine. But if instead I unplug or power off an
entire JBOD, then reattach it, frequently FreeBSD fails to fails to
recreate all of the device nodes. Using "mpsutil show devices" or "mprutil
show devices" I can see all of the devices that I'm expecting. However,
"camcontrol devlist" doesn't show them, and "camcontrol rescan" doesn't
help.
This has been the situation for as long as I can remember, several years at
least. But now it's starting to cause problems for me. Before I try to
debug this myself, does anybody know anything about the problem?
The only time I encountered a similar problem a "camcontrol rescan all"
was enough to discover all disks. If that's enough to discover all disks
in your case you could probably write an ugly devd rule triggered by the
addition of new ses devices to hide the problem.
_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-***@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Andriy Gapon
2020-04-02 08:47:47 UTC
Permalink
Post by Alan Somers
If I remove a hot-swappable SCSI drive and reinsert it, FreeBSD always
seems to handle that just fine. But if instead I unplug or power off an
entire JBOD, then reattach it, frequently FreeBSD fails to fails to
recreate all of the device nodes. Using "mpsutil show devices" or "mprutil
show devices" I can see all of the devices that I'm expecting. However,
"camcontrol devlist" doesn't show them, and "camcontrol rescan" doesn't
help.
This has been the situation for as long as I can remember, several years at
least. But now it's starting to cause problems for me. Before I try to
debug this myself, does anybody know anything about the problem?
I have been trying to help a user with this problem with mpr driver.
It seemed that the problem happened at the controller or expander level.
At least, I could not see any problem with the driver.

Some things we saw:
- the problem could be reproduced with Linux as well
- it was always the same slots / expander ports that could get the problem

We collected logs after doing these things:
- dev.mpr.0.debug_level=0x6ff
- camcontrol debug -I -P -c -p <bus>

From what I could see in the logs affected disks were in permanent reset state
and that's what the controller kept reporting.
The driver kept getting SasTopologyChangeList events where the affected disks
kept oscillating between PHYLinkStatusChange and TargetMissing.
E.g., PHY 3 and 5 here:
EventDataLength: 6
AckRequired: 0
Event: SasTopologyChangeList (0x1c)
EventContext: 0x0
EnclosureHandle: 0x2
ExpanderDevHandle: 0x9
NumPhys: 39
NumEntries: 3
StartPhyNum: 3
ExpStatus: Responding (0x3)
PhysicalPort: 0
PHY[3].AttachedDevHandle: 0x000d
PHY[3].LinkRate: 12.0Gbps (0xb0)
PHY[3].PhyStatus: PHYLinkStatusChange
PHY[4].AttachedDevHandle: 0x000e
PHY[4].LinkRate: 12.0Gbps (0xbb)
PHY[4].PhyStatus: PHYLinkStatusUnchanged
PHY[5].AttachedDevHandle: 0x000f
PHY[5].LinkRate: 12.0Gbps (0xb0)
PHY[5].PhyStatus: PHYLinkStatusChange

EventDataLength: 6
AckRequired: 0
Event: SasTopologyChangeList (0x1c)
EventContext: 0x0
EnclosureHandle: 0x2
ExpanderDevHandle: 0x9
NumPhys: 39
NumEntries: 3
StartPhyNum: 3
ExpStatus: Responding (0x3)
PhysicalPort: 0
PHY[3].AttachedDevHandle: 0x000d
PHY[3].LinkRate: LinkRate Unknown (0xb)
PHY[3].PhyStatus: TargetMissing
PHY[4].AttachedDevHandle: 0x000e
PHY[4].LinkRate: 12.0Gbps (0xbb)
PHY[4].PhyStatus: PHYLinkStatusUnchanged
PHY[5].AttachedDevHandle: 0x000f
PHY[5].LinkRate: LinkRate Unknown (0xb)
PHY[5].PhyStatus: TargetMissing

There were also SasDeviceStatusChange like this:
mpr0: EventReply :
EventDataLength: 7
AckRequired: 0
Event: SasDeviceStatusChange (0xf)
EventContext: 0x20
TaskTag: 0xffff
ReasonCode: Internal Device Reset
ASC: 0x0
ASCQ: 0x0
DevHandle: 0x20
SASAddress: 0x5000cca2584a54cd

mpr0: EventReply :
EventDataLength: 7
AckRequired: 0
Event: SasDeviceStatusChange (0xf)
EventContext: 0x20
TaskTag: 0xffff
ReasonCode: Cmp Internal Device Reset
ASC: 0x0
ASCQ: 0x0
DevHandle: 0x20
SASAddress: 0x5000cca2584a54cd

Finally, the user discovered that after sas3flash -reset the controller (and
FreeBSD) is able to see all disks again.

If anyone has any thoughts / suggestions they are very welcome!
--
Andriy Gapon
_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-***@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alan Somers
2020-04-02 18:51:53 UTC
Permalink
Post by Alan Somers
Post by Alan Somers
If I remove a hot-swappable SCSI drive and reinsert it, FreeBSD always
seems to handle that just fine. But if instead I unplug or power off an
entire JBOD, then reattach it, frequently FreeBSD fails to fails to
recreate all of the device nodes. Using "mpsutil show devices" or
"mprutil
Post by Alan Somers
show devices" I can see all of the devices that I'm expecting. However,
"camcontrol devlist" doesn't show them, and "camcontrol rescan" doesn't
help.
This has been the situation for as long as I can remember, several years
at
Post by Alan Somers
least. But now it's starting to cause problems for me. Before I try to
debug this myself, does anybody know anything about the problem?
I have been trying to help a user with this problem with mpr driver.
It seemed that the problem happened at the controller or expander level.
At least, I could not see any problem with the driver.
- the problem could be reproduced with Linux as well
- it was always the same slots / expander ports that could get the problem
- dev.mpr.0.debug_level=0x6ff
- camcontrol debug -I -P -c -p <bus>
From what I could see in the logs affected disks were in permanent reset state
and that's what the controller kept reporting.
The driver kept getting SasTopologyChangeList events where the affected disks
kept oscillating between PHYLinkStatusChange and TargetMissing.
EventDataLength: 6
AckRequired: 0
Event: SasTopologyChangeList (0x1c)
EventContext: 0x0
EnclosureHandle: 0x2
ExpanderDevHandle: 0x9
NumPhys: 39
NumEntries: 3
StartPhyNum: 3
ExpStatus: Responding (0x3)
PhysicalPort: 0
PHY[3].AttachedDevHandle: 0x000d
PHY[3].LinkRate: 12.0Gbps (0xb0)
PHY[3].PhyStatus: PHYLinkStatusChange
PHY[4].AttachedDevHandle: 0x000e
PHY[4].LinkRate: 12.0Gbps (0xbb)
PHY[4].PhyStatus: PHYLinkStatusUnchanged
PHY[5].AttachedDevHandle: 0x000f
PHY[5].LinkRate: 12.0Gbps (0xb0)
PHY[5].PhyStatus: PHYLinkStatusChange
EventDataLength: 6
AckRequired: 0
Event: SasTopologyChangeList (0x1c)
EventContext: 0x0
EnclosureHandle: 0x2
ExpanderDevHandle: 0x9
NumPhys: 39
NumEntries: 3
StartPhyNum: 3
ExpStatus: Responding (0x3)
PhysicalPort: 0
PHY[3].AttachedDevHandle: 0x000d
PHY[3].LinkRate: LinkRate Unknown (0xb)
PHY[3].PhyStatus: TargetMissing
PHY[4].AttachedDevHandle: 0x000e
PHY[4].LinkRate: 12.0Gbps (0xbb)
PHY[4].PhyStatus: PHYLinkStatusUnchanged
PHY[5].AttachedDevHandle: 0x000f
PHY[5].LinkRate: LinkRate Unknown (0xb)
PHY[5].PhyStatus: TargetMissing
EventDataLength: 7
AckRequired: 0
Event: SasDeviceStatusChange (0xf)
EventContext: 0x20
TaskTag: 0xffff
ReasonCode: Internal Device Reset
ASC: 0x0
ASCQ: 0x0
DevHandle: 0x20
SASAddress: 0x5000cca2584a54cd
EventDataLength: 7
AckRequired: 0
Event: SasDeviceStatusChange (0xf)
EventContext: 0x20
TaskTag: 0xffff
ReasonCode: Cmp Internal Device Reset
ASC: 0x0
ASCQ: 0x0
DevHandle: 0x20
SASAddress: 0x5000cca2584a54cd
Finally, the user discovered that after sas3flash -reset the controller (and
FreeBSD) is able to see all disks again.
If anyone has any thoughts / suggestions they are very welcome!
Thanks for the tip, avg! sas2flash -reset worked. At least, it worked for
the case where "mpsutil show devices" shows missing devices. There was one
case where "mprutil show devices" looked fine. But I haven't been able to
reproduce that failure yet. I'll let you know if I ever do. In the
meantime, I'll add sas2flash/sas3flash to my toolkit.
-Alan
_______________________________________________
freebsd-***@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "freebsd-scsi-***@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...