| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Exactly when the catastrophic error polling timer function runs is not
important, so use round_jiffies() to save unnecessary wakeups.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
|
|
|
|
|
| |
Since we use del_timer_sync() anyway, there's no need for an
additional flag to tell the timer not to rearm.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
|
|
|
|
| |
They don't get updated by git and so they're worse than useless.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|
|
|
|
|
| |
Fix up for make allyesconfig.
Signed-Off-By: David Howells <dhowells@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trigger device remove and then add when a catastrophic error is
detected in hardware. This, in turn, will cause a device reset, which
we hope will recover from the catastrophic condition.
Since this might interefere with debugging the root cause, add a
module option to suppress this behaviour.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|\
| |
| |
| | |
master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
|
| |
| |
| |
| |
| |
| |
| | |
Fix a typo in the rearming of the catastrophic error polling timer: we
should rearm the timer as long as the stop flag is _not_ set.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
|/
|
|
|
|
|
|
|
|
| |
Fix more include file problems that surfaced since I submitted the previous
fix-missing-includes.patch. This should now allow not to include sched.h
from module.h, which is done by a followup patch.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add some initial support for detecting and reporting catastrophic
errors reported by Mellanox HCAs. We start a periodic timer which
polls the catastrophic error reporting buffer in device memory. If an
error is detected, we dump the contents of the buffer for port-mortem
debugging, and report a fatal asynchronous error to higher levels.
In the future we can try to recover from these errors by resetting the
device, but this will require some work in higher-level code as well.
Let's get this in now, so that we at least get catastrophic errors
reported in logs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
|