Discussion:
What is ENXIO – MSI allocation regression in :[Was Re: svn commit: r321714 - in head/sys/dev: mpr mps]
(too old to reply)
Harry Schmalzbauer
2018-06-04 10:51:59 UTC
Permalink

mps0: <Avago Technologies (LSI) SAS2008> port 0x4000-0x40ff mem 0xc3bc0000-0xc3bc3fff,0xc3b80000-0xc3bbffff irq 19 at device 0.0 on pci7
mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
mps0: Cannot allocate INTx interrupt
mps0: mps_iocfacts_allocate failed to setup interrupts
mps0: mps_attach IOC Facts based allocation failed with error 6
panic: resource_list_release: resource entry is not busy
cpuid = 6
#0 0xffffffff805e32d7 at kdb_backtrace+0x67
#1 0xffffffff805a1d26 at vpanic+0x186
#2 0xffffffff805a1b93 at panic+0x43
#3 0xffffffff805d71c6 at resource_list_release+0x1c6
#4 0xffffffff8040fef1 at mps_pci_free+0xe1
#5 0xffffffff8040fa23 at mps_pci_attach+0x1b3
#6 0xffffffff805d6594 at device_attach+0x3a4
#7 0xffffffff805d774d at bus_generic_attach+0x3d
#8 0xffffffff8044ac05 at pci_attach+0xd5
#9 0xffffffff805d6594 at device_attach+0x3a4
#10 0xffffffff805d774d at bus_generic_attach+0x3d
#11 0xffffffff80364761 at acpi_pcib_pci_attach+0xa1
#12 0xffffffff805d6594 at device_attach+0x3a4
#13 0xffffffff805d774d at bus_generic_attach+0x3d
#14 0xffffffff8044ac05 at pci_attach+0xd5
#15 0xffffffff805d6594 at device_attach+0x3a4
#16 0xffffffff805d774d at bus_generic_attach+0x3d
#17 0xffffffff80363e4d at acpi_pcib_acpi_attach+0x42d
Uptime: 1s

Fixed in r321799, thanks for the report.
Fix confiremd; merged together with r321733 (and 321737) to 11.1 and
panic vanished.
Late in the 11.2 phase, I identified this commit as a regression for MSI
(non-x) alloctaion.
I have an idea what probably causes the problem here (INTx allocation,
although MSI (and MSI-x) capability):
disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).

Corresponding lines:
{
        device_t dev;
        int error, msgs;

        dev = sc->mps_dev;
        error = 0;
        msgs = 0;

        if ((sc->disable_msix == 0) &&
            ((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
                error = mps_alloc_msix(sc, MPS_MSI_COUNT);
        if ((error != 0) && (sc->disable_msi == 0) &&
            ((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
                error = mps_alloc_msi(sc, MPS_MSI_COUNT);
        if (error != 0)
                msgs = 0;

        sc->msi_msgs = msgs;
        return (error);
}

Before r321714, error was assigned ENXIO, which, if != 0, could help
make me understand the problem.
Unfortunately I have no idea what ENXIO means, where it's defined and
most important, how to find the place where the declaration/definition
happens.  Only joe and vi available here, any hints highly appreciated.

I can confirm that MSI allocation works with
mps.ko_21.02.00.00-fbsd-r321415 with my ESXi-passthru-non_msi-x setup.
Although the dirver emits no message that an MSI was allocated, like
toher drivers do.  That's a cosmetic one though.
But the MSI->INTx regression is a severe one for me, which I'd like to
fix myself but I'm missing so many fundamental skills :-(

Thanks,

-harry
Scott Long
2018-06-04 22:22:01 UTC
Permalink

mps0: <Avago Technologies (LSI) SAS2008> port 0x4000-0x40ff mem 0xc3bc0000-0xc3bc3fff,0xc3b80000-0xc3bbffff irq 19 at device 0.0 on pci7
mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
mps0: Cannot allocate INTx interrupt
mps0: mps_iocfacts_allocate failed to setup interrupts
mps0: mps_attach IOC Facts based allocation failed with error 6
panic: resource_list_release: resource entry is not busy
cpuid = 6
#0 0xffffffff805e32d7 at kdb_backtrace+0x67
#1 0xffffffff805a1d26 at vpanic+0x186
#2 0xffffffff805a1b93 at panic+0x43
#3 0xffffffff805d71c6 at resource_list_release+0x1c6
#4 0xffffffff8040fef1 at mps_pci_free+0xe1
#5 0xffffffff8040fa23 at mps_pci_attach+0x1b3
#6 0xffffffff805d6594 at device_attach+0x3a4
#7 0xffffffff805d774d at bus_generic_attach+0x3d
#8 0xffffffff8044ac05 at pci_attach+0xd5
#9 0xffffffff805d6594 at device_attach+0x3a4
#10 0xffffffff805d774d at bus_generic_attach+0x3d
#11 0xffffffff80364761 at acpi_pcib_pci_attach+0xa1
#12 0xffffffff805d6594 at device_attach+0x3a4
#13 0xffffffff805d774d at bus_generic_attach+0x3d
#14 0xffffffff8044ac05 at pci_attach+0xd5
#15 0xffffffff805d6594 at device_attach+0x3a4
#16 0xffffffff805d774d at bus_generic_attach+0x3d
#17 0xffffffff80363e4d at acpi_pcib_acpi_attach+0x42d
Uptime: 1s

Fixed in r321799, thanks for the report.
Fix confiremd; merged together with r321733 (and 321737) to 11.1 and
panic vanished.
Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
{
device_t dev;
int error, msgs;
dev = sc->mps_dev;
error = 0;
msgs = 0;
if ((sc->disable_msix == 0) &&
((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msix(sc, MPS_MSI_COUNT);
if ((error != 0) && (sc->disable_msi == 0) &&
((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msi(sc, MPS_MSI_COUNT);
if (error != 0)
msgs = 0;
sc->msi_msgs = msgs;
return (error);
}
Before r321714, error was assigned ENXIO, which, if != 0, could help make me understand the problem.
Unfortunately I have no idea what ENXIO means, where it's defined and most important, how to find the place where the declaration/definition happens. Only joe and vi available here, any hints highly appreciated.
I can confirm that MSI allocation works with mps.ko_21.02.00.00-fbsd-r321415 with my ESXi-passthru-non_msi-x setup.
Although the dirver emits no message that an MSI was allocated, like toher drivers do. That's a cosmetic one though.
But the MSI->INTx regression is a severe one for me, which I'd like to fix myself but I'm missing so many fundamental skills :-(
Hi Harry,

You are correct about the bug. Please change the line at the top of the function that reads

error = 0;

to

error = ENXIO;

Let me know if that fixes the MSI problem for you.

Scott
Harry Schmalzbauer
2018-06-05 07:18:06 UTC
Permalink
Post by Scott Long

mps0: <Avago Technologies (LSI) SAS2008> port 0x4000-0x40ff mem 0xc3bc0000-0xc3bc3fff,0xc3b80000-0xc3bbffff irq 19 at device 0.0 on pci7
mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
mps0: Cannot allocate INTx interrupt
mps0: mps_iocfacts_allocate failed to setup interrupts
mps0: mps_attach IOC Facts based allocation failed with error 6
panic: resource_list_release: resource entry is not busy
cpuid = 6
#0 0xffffffff805e32d7 at kdb_backtrace+0x67
#1 0xffffffff805a1d26 at vpanic+0x186
#2 0xffffffff805a1b93 at panic+0x43
#3 0xffffffff805d71c6 at resource_list_release+0x1c6
#4 0xffffffff8040fef1 at mps_pci_free+0xe1
#5 0xffffffff8040fa23 at mps_pci_attach+0x1b3
#6 0xffffffff805d6594 at device_attach+0x3a4
#7 0xffffffff805d774d at bus_generic_attach+0x3d
#8 0xffffffff8044ac05 at pci_attach+0xd5
#9 0xffffffff805d6594 at device_attach+0x3a4
#10 0xffffffff805d774d at bus_generic_attach+0x3d
#11 0xffffffff80364761 at acpi_pcib_pci_attach+0xa1
#12 0xffffffff805d6594 at device_attach+0x3a4
#13 0xffffffff805d774d at bus_generic_attach+0x3d
#14 0xffffffff8044ac05 at pci_attach+0xd5
#15 0xffffffff805d6594 at device_attach+0x3a4
#16 0xffffffff805d774d at bus_generic_attach+0x3d
#17 0xffffffff80363e4d at acpi_pcib_acpi_attach+0x42d
Uptime: 1s

Fixed in r321799, thanks for the report.
Fix confiremd; merged together with r321733 (and 321737) to 11.1 and
panic vanished.
Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
{
device_t dev;
int error, msgs;
dev = sc->mps_dev;
error = 0;
msgs = 0;
if ((sc->disable_msix == 0) &&
((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msix(sc, MPS_MSI_COUNT);
if ((error != 0) && (sc->disable_msi == 0) &&
((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msi(sc, MPS_MSI_COUNT);
if (error != 0)
msgs = 0;
sc->msi_msgs = msgs;
return (error);
}
Before r321714, error was assigned ENXIO, which, if != 0, could help make me understand the problem.
Unfortunately I have no idea what ENXIO means, where it's defined and most important, how to find the place where the declaration/definition happens. Only joe and vi available here, any hints highly appreciated.
I can confirm that MSI allocation works with mps.ko_21.02.00.00-fbsd-r321415 with my ESXi-passthru-non_msi-x setup.
Although the dirver emits no message that an MSI was allocated, like toher drivers do. That's a cosmetic one though.
But the MSI->INTx regression is a severe one for me, which I'd like to fix myself but I'm missing so many fundamental skills :-(
Hi Harry,
You are correct about the bug. Please change the line at the top of the function that reads
error = 0;
to
error = ENXIO;
Let me know if that fixes the MSI problem for you.
Hello Scott,

thanks for your hint.
Unfortunately I have a lot more problems – the system (11.2-RC1)
deadlocks for some soconds with iSCSI load...
This is far easyer reproducable / heavier impact with mps(4) and INTx
allocation than with MSI, but backup runs over night triggered that
extreme slowdown although mps(4) was allocating MSI – up to 20 sec
locks, where even no terminal update happes.
All those update ar queued though, so after about 10-20 sedonds, the
screen flickers, showing all queued output.

One symptom is that systat(1) shows 25% intr usage which is one core.
It's a ZFS machine, so high sys usage is normal, but intr usually is
about 10% with GbE traffic.
Only when the slowdown/lockup happens, intr usage constantly stays at 25%.

Can't imagine ctld(8) or zfs is causing this, but who knows – I don't at
the moment.
Will have to revert to 11.1 and see if things change, the machine was
10.? before – without such problems.

BTW, does anybody have a link where I can get info about ENXIO?

Thanks,

-harry
Scott Long
2018-06-05 17:54:32 UTC
Permalink
Post by Harry Schmalzbauer
Post by Scott Long

mps0: <Avago Technologies (LSI) SAS2008> port 0x4000-0x40ff mem 0xc3bc0000-0xc3bc3fff,0xc3b80000-0xc3bbffff irq 19 at device 0.0 on pci7
mps0: Firmware: 20.00.04.00, Driver: 21.02.00.00-fbsd
185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
mps0: Cannot allocate INTx interrupt
mps0: mps_iocfacts_allocate failed to setup interrupts
mps0: mps_attach IOC Facts based allocation failed with error 6
panic: resource_list_release: resource entry is not busy
cpuid = 6
#0 0xffffffff805e32d7 at kdb_backtrace+0x67
#1 0xffffffff805a1d26 at vpanic+0x186
#2 0xffffffff805a1b93 at panic+0x43
#3 0xffffffff805d71c6 at resource_list_release+0x1c6
#4 0xffffffff8040fef1 at mps_pci_free+0xe1
#5 0xffffffff8040fa23 at mps_pci_attach+0x1b3
#6 0xffffffff805d6594 at device_attach+0x3a4
#7 0xffffffff805d774d at bus_generic_attach+0x3d
#8 0xffffffff8044ac05 at pci_attach+0xd5
#9 0xffffffff805d6594 at device_attach+0x3a4
#10 0xffffffff805d774d at bus_generic_attach+0x3d
#11 0xffffffff80364761 at acpi_pcib_pci_attach+0xa1
#12 0xffffffff805d6594 at device_attach+0x3a4
#13 0xffffffff805d774d at bus_generic_attach+0x3d
#14 0xffffffff8044ac05 at pci_attach+0xd5
#15 0xffffffff805d6594 at device_attach+0x3a4
#16 0xffffffff805d774d at bus_generic_attach+0x3d
#17 0xffffffff80363e4d at acpi_pcib_acpi_attach+0x42d
Uptime: 1s

Fixed in r321799, thanks for the report.
Fix confiremd; merged together with r321733 (and 321737) to 11.1 and
panic vanished.
Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
{
device_t dev;
int error, msgs;
dev = sc->mps_dev;
error = 0;
msgs = 0;
if ((sc->disable_msix == 0) &&
((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msix(sc, MPS_MSI_COUNT);
if ((error != 0) && (sc->disable_msi == 0) &&
((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msi(sc, MPS_MSI_COUNT);
if (error != 0)
msgs = 0;
sc->msi_msgs = msgs;
return (error);
}
Before r321714, error was assigned ENXIO, which, if != 0, could help make me understand the problem.
Unfortunately I have no idea what ENXIO means, where it's defined and most important, how to find the place where the declaration/definition happens. Only joe and vi available here, any hints highly appreciated.
I can confirm that MSI allocation works with mps.ko_21.02.00.00-fbsd-r321415 with my ESXi-passthru-non_msi-x setup.
Although the dirver emits no message that an MSI was allocated, like toher drivers do. That's a cosmetic one though.
But the MSI->INTx regression is a severe one for me, which I'd like to fix myself but I'm missing so many fundamental skills :-(
Hi Harry,
You are correct about the bug. Please change the line at the top of the function that reads
error = 0;
to
error = ENXIO;
Let me know if that fixes the MSI problem for you.
Hello Scott,
thanks for your hint.
Unfortunately I have a lot more problems – the system (11.2-RC1) deadlocks for some soconds with iSCSI load...
This is far easyer reproducable / heavier impact with mps(4) and INTx allocation than with MSI, but backup runs over night triggered that extreme slowdown although mps(4) was allocating MSI – up to 20 sec locks, where even no terminal update happes.
All those update ar queued though, so after about 10-20 sedonds, the screen flickers, showing all queued output.
One symptom is that systat(1) shows 25% intr usage which is one core.
It's a ZFS machine, so high sys usage is normal, but intr usually is about 10% with GbE traffic.
Only when the slowdown/lockup happens, intr usage constantly stays at 25%.
Can't imagine ctld(8) or zfs is causing this, but who knows – I don't at the moment.
Will have to revert to 11.1 and see if things change, the machine was 10.? before – without such problems.
BTW, does anybody have a link where I can get info about ENXIO?
ENXIO means that the device is not available. I use it in the driver to signal when the hardware cannot be accessed. The manual page for error codes is “man errno"

Scott
Harry Schmalzbauer
2018-06-11 18:28:44 UTC
Permalink
Am 05.06.2018 um 19:54 schrieb Scott Long:

Post by Scott Long
Post by Harry Schmalzbauer
Post by Scott Long
Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
{
device_t dev;
int error, msgs;
dev = sc->mps_dev;
error = 0;
msgs = 0;
if ((sc->disable_msix == 0) &&
((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msix(sc, MPS_MSI_COUNT);
if ((error != 0) && (sc->disable_msi == 0) &&
((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msi(sc, MPS_MSI_COUNT);
if (error != 0)
msgs = 0;
sc->msi_msgs = msgs;
return (error);
}
Before r321714, error was assigned ENXIO, which, if != 0, could help make me understand the problem.
Unfortunately I have no idea what ENXIO means, where it's defined and most important, how to find the place where the declaration/definition happens. Only joe and vi available here, any hints highly appreciated.
I can confirm that MSI allocation works with mps.ko_21.02.00.00-fbsd-r321415 with my ESXi-passthru-non_msi-x setup.
Although the dirver emits no message that an MSI was allocated, like toher drivers do. That's a cosmetic one though.
But the MSI->INTx regression is a severe one for me, which I'd like to fix myself but I'm missing so many fundamental skills :-(
Hi Harry,
You are correct about the bug. Please change the line at the top of the function that reads
error = 0;
to
error = ENXIO;
Let me know if that fixes the MSI problem for you.

Post by Scott Long
Post by Harry Schmalzbauer
BTW, does anybody have a link where I can get info about ENXIO?
ENXIO means that the device is not available. I use it in the driver to signal when the hardware cannot be accessed. The manual page for error codes is “man errno"
Oic, there's a man page :-)

Haven't had time to look into it, but since you confirmed that ENXIO!=0,
I simply changed that and now mps(4) allocates MSI again in my setup.
For completeness my diff:
Index: src/sys/dev/mps/mps_pci.c
===================================================================
--- sys/dev/mps/mps_pci.c (Revision 334948)
+++ sys/dev/mps/mps_pci.c (Arbeitskopie)
@@ -244,7 +244,7 @@
int error, msgs;

dev = sc->mps_dev;
- error = 0;
+ error = ENXIO;
msgs = 0;

if ((sc->disable_msix == 0) &&

Unfortunately my other real problem persists – iSCSI sessions lock up
the machine (11.2-RC2). No deadlock, since it will recover within some
minutes, but otherwise a complete lock until iSCSI sessions time out,
since no single ethernet/ip packet get's processed.

Unfortunately I'm very short on testing resources here and don't know
how to trace ctld/whatelse to find the lock-circle.
So far I can only tell that it happens only with Server2016 iSCSI
connections (using 4k block size).

Will open a different thread/PR as soon as I found out anything…

Thanks,

-harry

P.S.: I guess it's far too late to get that into 11.2?
Harry Schmalzbauer
2018-10-26 11:32:25 UTC
Permalink
Post by Harry Schmalzbauer

Post by Harry Schmalzbauer
Late in the 11.2 phase, I identified this commit as a regression
for MSI (non-x) alloctaion.

Post by Harry Schmalzbauer
Haven't had time to look into it, but since you confirmed that
ENXIO!=0, I simply changed that and now mps(4) allocates MSI again in
my setup.
Index: src/sys/dev/mps/mps_pci.c
===================================================================
--- sys/dev/mps/mps_pci.c   (Revision 334948)
+++ sys/dev/mps/mps_pci.c   (Arbeitskopie)
@@ -244,7 +244,7 @@
        int error, msgs;
        dev = sc->mps_dev;
-       error = 0;
+       error = ENXIO;
        msgs = 0;
        if ((sc->disable_msix == 0) &&
Can somebody please take care of
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229267  possibly to
get it into 12.0-RELEASE?

Thanks,
-harry
Harry Schmalzbauer
2018-11-12 17:03:22 UTC
Permalink
Post by Harry Schmalzbauer

Post by Scott Long
Post by Harry Schmalzbauer
Late in the 11.2 phase, I identified this commit as a regression
for MSI (non-x) alloctaion.
I have an idea what probably causes the problem here (INTx
disable_msix is not 0 (I need to disable MSI-x because of
ESXi-passthru…).
{
         device_t dev;
         int error, msgs;
         dev = sc->mps_dev;
         error = 0;
         msgs = 0;
         if ((sc->disable_msix == 0) &&
             ((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
                 error = mps_alloc_msix(sc, MPS_MSI_COUNT);
         if ((error != 0) && (sc->disable_msi == 0) &&
             ((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
                 error = mps_alloc_msi(sc, MPS_MSI_COUNT);
         if (error != 0)
                 msgs = 0;
         sc->msi_msgs = msgs;
         return (error);
}

Post by Harry Schmalzbauer
Post by Scott Long
Hi Harry,
You are correct about the bug.  Please change the line at the top
of the function that reads
error = 0;
to
error = ENXIO;
Let me know if that fixes the MSI problem for you.


Post by Harry Schmalzbauer
Index: src/sys/dev/mps/mps_pci.c
===================================================================
--- sys/dev/mps/mps_pci.c   (Revision 334948)
+++ sys/dev/mps/mps_pci.c   (Arbeitskopie)
@@ -244,7 +244,7 @@
        int error, msgs;
        dev = sc->mps_dev;
-       error = 0;
+       error = ENXIO;
        msgs = 0;
        if ((sc->disable_msix == 0) &&
To my understanding, it's obvious that the way
mps_pci_alloc_interrupts() currently works is unintended.
This might not affect too many people, but is there a reason not to fix it?

I already created a coresponding problem report:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229267
Anything else I should do?

Thanks,

-harry
Scott Long
2018-11-13 18:02:25 UTC
Permalink
Post by Harry Schmalzbauer

Post by Scott Long
Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
{
device_t dev;
int error, msgs;
dev = sc->mps_dev;
error = 0;
msgs = 0;
if ((sc->disable_msix == 0) &&
((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msix(sc, MPS_MSI_COUNT);
if ((error != 0) && (sc->disable_msi == 0) &&
((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msi(sc, MPS_MSI_COUNT);
if (error != 0)
msgs = 0;
sc->msi_msgs = msgs;
return (error);
}

Post by Harry Schmalzbauer
Post by Scott Long
Hi Harry,
You are correct about the bug. Please change the line at the top of the function that reads
error = 0;
to
error = ENXIO;
Let me know if that fixes the MSI problem for you.


Post by Harry Schmalzbauer
Index: src/sys/dev/mps/mps_pci.c
===================================================================
--- sys/dev/mps/mps_pci.c (Revision 334948)
+++ sys/dev/mps/mps_pci.c (Arbeitskopie)
@@ -244,7 +244,7 @@
int error, msgs;
dev = sc->mps_dev;
- error = 0;
+ error = ENXIO;
msgs = 0;
if ((sc->disable_msix == 0) &&
To my understanding, it's obvious that the way mps_pci_alloc_interrupts() currently works is unintended.
This might not affect too many people, but is there a reason not to fix it?
I already created a coresponding problem report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229267
Anything else I should do?
Hi Harry,

Sorry for ignoring this for so long. I’m going to commit a fix today, but it won’t be the same one-line change.
Upon reviewing the code, I’d going to refactor it so it’s not so confusing and prone to these kinds of mistakes.
Thank you for the continued reminders to finish this.

Scott
Harry Schmalzbauer
2018-11-13 18:11:11 UTC
Permalink
Post by Scott Long
Post by Harry Schmalzbauer

Post by Scott Long
Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
{
device_t dev;
int error, msgs;
dev = sc->mps_dev;
error = 0;
msgs = 0;
if ((sc->disable_msix == 0) &&
((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msix(sc, MPS_MSI_COUNT);
if ((error != 0) && (sc->disable_msi == 0) &&
((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msi(sc, MPS_MSI_COUNT);
if (error != 0)
msgs = 0;
sc->msi_msgs = msgs;
return (error);
}

Post by Harry Schmalzbauer
Post by Scott Long
Hi Harry,
You are correct about the bug. Please change the line at the top of the function that reads
error = 0;
to
error = ENXIO;
Let me know if that fixes the MSI problem for you.


Post by Harry Schmalzbauer
Index: src/sys/dev/mps/mps_pci.c
===================================================================
--- sys/dev/mps/mps_pci.c (Revision 334948)
+++ sys/dev/mps/mps_pci.c (Arbeitskopie)
@@ -244,7 +244,7 @@
int error, msgs;
dev = sc->mps_dev;
- error = 0;
+ error = ENXIO;
msgs = 0;
if ((sc->disable_msix == 0) &&
To my understanding, it's obvious that the way mps_pci_alloc_interrupts() currently works is unintended.
This might not affect too many people, but is there a reason not to fix it?
I already created a coresponding problem report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229267
Anything else I should do?
Hi Harry,
Sorry for ignoring this for so long. I’m going to commit a fix today, but it won’t be the same one-line change.
Upon reviewing the code, I’d going to refactor it so it’s not so confusing and prone to these kinds of mistakes.
Thank you for the continued reminders to finish this.
Hi Scott,

thanks a lot, in fact I'm not surprised that you come up with a better
solution than that quick fix :-)
Had hoped someone else would do an intermediate commit to get it into
12.0 in time, so you won't feel any time pressure - good job needs the
time it needs, as long as the right person is doing the job.

Unfortunately I don't have a non-productive setup where I could test
before release/12.0 will be branched – might be subject to change...

best,

-harry
Scott Long
2018-11-13 18:45:10 UTC
Permalink
Post by Harry Schmalzbauer
Post by Scott Long
Post by Harry Schmalzbauer

Post by Scott Long
Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.
disable_msix is not 0 (I need to disable MSI-x because of ESXi-passthru…).
{
device_t dev;
int error, msgs;
dev = sc->mps_dev;
error = 0;
msgs = 0;
if ((sc->disable_msix == 0) &&
((msgs = pci_msix_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msix(sc, MPS_MSI_COUNT);
if ((error != 0) && (sc->disable_msi == 0) &&
((msgs = pci_msi_count(dev)) >= MPS_MSI_COUNT))
error = mps_alloc_msi(sc, MPS_MSI_COUNT);
if (error != 0)
msgs = 0;
sc->msi_msgs = msgs;
return (error);
}

Post by Harry Schmalzbauer
Post by Scott Long
Hi Harry,
You are correct about the bug. Please change the line at the top of the function that reads
error = 0;
to
error = ENXIO;
Let me know if that fixes the MSI problem for you.


Post by Harry Schmalzbauer
Index: src/sys/dev/mps/mps_pci.c
===================================================================
--- sys/dev/mps/mps_pci.c (Revision 334948)
+++ sys/dev/mps/mps_pci.c (Arbeitskopie)
@@ -244,7 +244,7 @@
int error, msgs;
dev = sc->mps_dev;
- error = 0;
+ error = ENXIO;
msgs = 0;
if ((sc->disable_msix == 0) &&
To my understanding, it's obvious that the way mps_pci_alloc_interrupts() currently works is unintended.
This might not affect too many people, but is there a reason not to fix it?
I already created a coresponding problem report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229267
Anything else I should do?
Hi Harry,
Sorry for ignoring this for so long. I’m going to commit a fix today, but it won’t be the same one-line change.
Upon reviewing the code, I’d going to refactor it so it’s not so confusing and prone to these kinds of mistakes.
Thank you for the continued reminders to finish this.
Hi Scott,
thanks a lot, in fact I'm not surprised that you come up with a better solution than that quick fix :-)
Had hoped someone else would do an intermediate commit to get it into 12.0 in time, so you won't feel any time pressure - good job needs the time it needs, as long as the right person is doing the job.
Unfortunately I don't have a non-productive setup where I could test before release/12.0 will be branched – might be subject to change...
12.0 has completely different code from 11.x, and from my review of it last night it should be fine. If you have evidence that what’s currently in 12 is not working, please let me know ASAP.

Scott
Harry Schmalzbauer
2018-11-13 19:07:00 UTC
Permalink
Post by Scott Long
Post by Harry Schmalzbauer

Late in the 11.2 phase, I identified this commit as a regression for MSI (non-x) alloctaion.

Post by Scott Long
Post by Harry Schmalzbauer
thanks a lot, in fact I'm not surprised that you come up with a better solution than that quick fix :-)
Had hoped someone else would do an intermediate commit to get it into 12.0 in time, so you won't feel any time pressure - good job needs the time it needs, as long as the right person is doing the job.
Unfortunately I don't have a non-productive setup where I could test before release/12.0 will be branched – might be subject to change...
12.0 has completely different code from 11.x, and from my review of it last night it should be fine. If you have evidence that what’s currently in 12 is not working, please let me know ASAP.
Sorry for the confusion, I missed that.
I just verified that I do apply the patch (without errors) to local
stable/12 source tree for local releases... That's probably a mistake.
I can't remember if I ever checked whether stable/12 (for sure not
stable/12, but -current back then) MSI fallback allocation does work
without the patch or not.

Like metioned, I don't have a non-productive machine of that kind for
testing, but it's superfluous anyways if you know that code paths differ
in that part.

Please ignore my 12.0 referings, sorry.

-harry

Loading...