[PATCH] nbd: fix TX/RX race condition

Janos Haar of First NetCenter Bt. reported numerous crashes involving the NBD driver. With his help, this was tracked down to bogus bio vectors which in turn was the result of a race condition between the receive/transmit routines in the NBD driver. The bug manifests itself like this: CPU0 CPU1 do_nbd_request add req to queuelist nbd_send_request send req head for each bio kmap send nbd_read_stat nbd_find_request nbd_end_request kunmap When CPU1 finishes nbd_end_request, the request and all its associated bio's are freed. So when CPU0 calls kunmap whose argument is derived from the last bio, it may crash. Under normal circumstances, the race occurs only on the last bio. However, if an error is encountered on the remote NBD server (such as an incorrect magic number in the request), or if there were a bug in the server, it is possible for the nbd_end_request to occur any time after the request's addition to the queuelist. The following patch fixes this problem by making sure that requests are not added to the queuelist until after they have been completed transmission. In order for the receiving side to be ready for responses involving requests still being transmitted, the patch introduces the concept of the active request. When a response matches the current active request, its processing is delayed until after the tranmission has come to a stop. This has been tested by Janos and it has been successful in curing this race condition. From: Herbert Xu <herbert@gondor.apana.org.au> Here is an updated patch which removes the active_req wait in nbd_clear_queue and the associated memory barrier. I've also clarified this in the comment. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: <djani22@dynamicweb.hu> Cc: Paul Clements <Paul.Clements@SteelEye.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: Herbert Xu <herbert@gondor.apana.org.au> 2006-01-06 00:09:47 -0800
committer: Linus Torvalds <torvalds@g5.osdl.org> 2006-01-06 08:33:20 -0800
commit: 4b2f0260c74324abca76ccaa42d426af163125e7 (patch)
tree: 881f76200dc3489b11497528feb72d6eae93bddb /include/linux/nbd.h
parent: bd6a59b22fd3bd044bb14978b885bcd042a10e8e (diff)
download: kernel_samsung_espresso10-4b2f0260c74324abca76ccaa42d426af163125e7.zip
kernel_samsung_espresso10-4b2f0260c74324abca76ccaa42d426af163125e7.tar.gz
kernel_samsung_espresso10-4b2f0260c74324abca76ccaa42d426af163125e7.tar.bz2
1 files changed, 8 insertions, 0 deletions
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index 090e210..f95d51f 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -37,18 +37,26 @@ enum {
 /* userspace doesn't need the nbd_device structure */
 #ifdef __KERNEL__
 
+#include <linux/wait.h>
+
 /* values for flags field */
 #define NBD_READ_ONLY 0x0001
 #define NBD_WRITE_NOCHK 0x0002
 
+struct request;
+
 struct nbd_device {
 	int flags;
 	int harderror;		/* Code of hard error			*/
 	struct socket * sock;
 	struct file * file; 	/* If == NULL, device is not ready, yet	*/
 	int magic;
+
 	spinlock_t queue_lock;
 	struct list_head queue_head;/* Requests are added here...	*/
+	struct request *active_req;
+	wait_queue_head_t active_wq;
+
 	struct semaphore tx_lock;
 	struct gendisk *disk;
 	int blksize;
author	Herbert Xu <herbert@gondor.apana.org.au>	2006-01-06 00:09:47 -0800
committer	Linus Torvalds <torvalds@g5.osdl.org>	2006-01-06 08:33:20 -0800
commit	4b2f0260c74324abca76ccaa42d426af163125e7 (patch)
tree	881f76200dc3489b11497528feb72d6eae93bddb /include/linux/nbd.h
parent	bd6a59b22fd3bd044bb14978b885bcd042a10e8e (diff)
download	kernel_samsung_espresso10-4b2f0260c74324abca76ccaa42d426af163125e7.zip kernel_samsung_espresso10-4b2f0260c74324abca76ccaa42d426af163125e7.tar.gz kernel_samsung_espresso10-4b2f0260c74324abca76ccaa42d426af163125e7.tar.bz2