aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nfsd41: Fix a crash when a callback is retriedBoaz Harrosh2010-08-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a callback is retried at nfsd4_cb_recall_done() due to some error, the returned rpc reply crashes here: @@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res, u32 dummy; __be32 *p; + BUG_ON(!res); if (res->cbs_minorversion == 0) return 0; [BUG_ON added for demonstration] This is because the nfsd4_cb_done_sequence() has NULLed out the task->tk_msg.rpc_resp pointer. Also eventually the rpc would use the new slot without making sure it is free by calling nfsd41_cb_setup_sequence(). This problem was introduced by a 4.1 protocol addition patch: [0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1 Which was overlooking the possibility of an RPC callback retries. For not-4.1 case redoing the _prepare is harmless. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: fix startup/shutdown order bugJ. Bruce Fields2010-08-061-14/+16
| | | | | | | | | | | | | We must create the server before we can call init_socks or check the number of threads. Symptoms were a NULL pointer dereference in nfsd_svc(). Problem identified by Jeff Layton. Also fix a minor cleanup-on-error case in nfsd_startup(). Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: minor nfsd read api cleanupJ. Bruce Fields2010-07-305-13/+16
| | | | | | | | | Christoph points that the NFSv2/v3 callers know which case they want here, so we may as well just call the file=NULL case directly instead of making this conditional. Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* gcc-4.6: nfsd: fix initialized but not read warningsAndi Kleen2010-07-294-13/+2
| | | | | | | | | | | | | | | | | | | | | Fixes at least one real minor bug: the nfs4 recovery dir sysctl would not return its status properly. Also I finished Al's 1e41568d7378d ("Take ima_path_check() in nfsd past dentry_open() in nfsd_open()") commit, it moved the IMA code, but left the old path initializer in there. The rest is just dead code removed I think, although I was not fully sure about the "is_borc" stuff. Some more review would be still good. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: share file descriptors between stateid'sJ. Bruce Fields2010-07-293-123/+221
| | | | | | | | | | | | | | | | The vfs doesn't really allow us to "upgrade" a file descriptor from read-only to read-write, and our attempt to do so in nfs4_upgrade_open is ugly and incomplete. Move to a different scheme where we keep multiple opens, shared between open stateid's, in the nfs4_file struct. Each file will be opened at most 3 times (for read, write, and read-write), and those opens will be shared between all clients and openers. On upgrade we will do another open if necessary instead of attempting to upgrade an existing open. We keep count of the number of readers and writers so we know when to close the shared files. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: fix openmode checking on IO using lock stateidJ. Bruce Fields2010-07-291-1/+3
| | | | | | | | | | | It is legal to perform a write using the lock stateid that was originally associated with a read lock, or with a file that was originally opened for read, but has since been upgraded. So, when checking the openmode, check the mode associated with the open stateid from which the lock was derived. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: miscellaneous process_open2 cleanupJ. Bruce Fields2010-07-291-9/+17
| | | | | | Move more work into helper functions. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: don't pretend to support write delegationsJ. Bruce Fields2010-07-291-0/+7
| | | | | | | | | | | | | | | | | The delegation code mostly pretends to support either read or write delegations. However, correct support for write delegations would require, for example, breaking of delegations (and/or implementation of cb_getattr) on stat. Currently all that stops us from handing out delegations is a subtle reference-counting issue. Avoid confusion by adding an earlier check that explicitly refuses write delegations. For now, though, I'm not going so far as to rip out existing half-support for write delegations, in case we get around to using that soon. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: bypass readahead cache when have struct fileJ. Bruce Fields2010-07-271-24/+38
| | | | | | | | | | | | | | | | | The readahead cache compensates for the fact that the NFS server currently does an open and close on every IO operation in the NFSv2 and NFSv3 case. In the NFSv4 case we have long-lived struct files associated with client opens, so there's no need for this. In fact, concurrent IO's using trying to modify the same file->f_ra may cause problems. So, don't bother with the readahead cache in that case. Note eventually we'll likely do this in the v2/v3 case as well by keeping a cache of struct files instead of struct file_ra_state's. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: minor nfsd_svc() cleanupJ. Bruce Fields2010-07-231-6/+7
| | | | | | More idiomatic to put the error case in the if clause. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: move more into nfsd_startup()J. Bruce Fields2010-07-231-34/+35
| | | | | | | | This is just cleanup--it's harmless to call nfsd_rachache_init, nfsd_init_socks, and nfsd_reset_versions more than once. But there's no point to it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: just keep single lockd reference for nfsdJeff Layton2010-07-232-21/+14
| | | | | | | | | | | | | Right now, nfsd keeps a lockd reference for each socket that it has open. This is unnecessary and complicates the error handling on startup and shutdown. Change it to just do a lockd_up when starting the first nfsd thread just do a single lockd_down when taking down the last nfsd thread. Because of the strange way the sv_count is handled this requires an extra flag to tell whether the nfsd_serv holds a reference for lockd or not. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: clean up nfsd_create_serv error handlingJeff Layton2010-07-231-3/+2
| | | | | | | | There doesn't seem to be any need to reset the nfssvc_boot time if the nfsd startup failed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: fix error handling in __write_ports_addxprtJeff Layton2010-07-231-2/+4
| | | | | | | | | | | | | | | | | | __write_ports_addxprt calls nfsd_create_serv. That increases the refcount of nfsd_serv (which is tracked in sv_nrthreads). The service only decrements the thread count on error, not on success like __write_ports_addfd does, so using this interface leaves the nfsd thread count high. Fix this by having this function call svc_destroy() on error to release the reference (and possibly to tear down the service) and simply decrement the refcount without tearing down the service on success. This makes the sv_threads handling work basically the same in both __write_ports_addxprt and __write_ports_addfd. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: fix error handling when starting nfsd with rpcbind downJeff Layton2010-07-231-4/+8
| | | | | | | | | | | | | | | | | | | The refcounting for nfsd is a little goofy. What happens is that we create the nfsd RPC service, attach sockets to it but don't actually start the threads until someone writes to the "threads" procfile. To do this, __write_ports_addfd will create the nfsd service and then will decrement the refcount when exiting but won't actually destroy the service. This is fine when there aren't errors, but when there are this can cause later attempts to start nfsd to fail. nfsd_serv will be set, and that causes __write_versions to return EBUSY. Fix this by calling svc_destroy on nfsd_serv when this function is going to return error. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd4: fix v4 state shutdown error pathsJeff Layton2010-07-232-20/+43
| | | | | | | | | | | | | | | | | | If someone tries to shut down the laundry_wq while it isn't up it'll cause an oops. This can happen because write_ports can create a nfsd_svc before we really start the nfs server, and we may fail before the server is ever started. Also make sure state is shutdown on error paths in nfsd_svc(). Use a common global nfsd_up flag instead of nfs4_init, and create common helper functions for nfsd start/shutdown, as there will be other work that we want done only when we the number of nfsd threads transitions between zero and nonzero. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* nfsd: remove unused assignment from nfsd_linkJ. Bruce Fields2010-07-231-2/+1
| | | | | | | Trivial cleanup, since "dest" is never used. Reported-by: Anshul Madan <Anshul.Madan@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* NFSD: Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIRChuck Lever2010-07-071-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Some well-known NFSv3 clients drop their directory entry caches when they receive replies with no WCC data. Without this data, they employ extra READ, LOOKUP, and GETATTR requests to ensure their directory entry caches are up to date, causing performance to suffer needlessly. In order to return WCC data, our server has to have both the pre-op and the post-op attribute data on hand when a reply is XDR encoded. The pre-op data is filled in when the incoming fh is locked, and the post-op data is filled in when the fh is unlocked. Unfortunately, for REMOVE, RMDIR, MKNOD, and MKDIR, the directory fh is not unlocked until well after the reply has been XDR encoded. This means that encode_wcc_data() does not have wcc_data for the parent directory, so none is returned to the client after these operations complete. By unlocking the parent directory fh immediately after the internal operations for each NFS procedure is complete, the post-op data is filled in before XDR encoding starts, so it can be returned to the client properly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: comment nitpickJ. Bruce Fields2010-07-061-1/+1
| | | | | Reported-by: "Madan, Anshul" <Anshul.Madan@netapp.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* sunrpc: make the cache cleaner workqueue deferrableArtem Bityutskiy2010-07-063-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the cache_cleaner workqueue deferrable, to prevent unnecessary system wake-ups, which is very important for embedded battery-powered devices. do_cache_clean() is called every 30 seconds at the moment, and often makes the system wake up from its power-save sleep state. With this change, when the workqueue uses a deferrable timer, the do_cache_clean() invocation will be delayed and combined with the closest "real" wake-up. This improves the power consumption situation. Note, I tried to create a DECLARE_DELAYED_WORK_DEFERRABLE() helper macro, similar to DECLARE_DELAYED_WORK(), but failed because of the way the timer wheel core stores the deferrable flag (it is the LSBit in the time->base pointer). My attempt to define a static variable with this bit set ended up with the "initializer element is not constant" error. Thus, I have to use run-time initialization, so I created a new cache_initialize() function which is called once when sunrpc is being initialized. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: fix delegation recall race use-after-freeJ. Bruce Fields2010-06-241-0/+1
| | | | | | | | | | When the rarely-used callback-connection-changing setclientid occurs simultaneously with a delegation recall, we rerun the recall by requeueing it on a workqueue. But we also need to take a reference on the delegation in that case, since the delegation held by the rpc itself will be released by the rpc_release callback. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: fix deleg leak on callback errorJ. Bruce Fields2010-06-241-1/+3
| | | | Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: remove some debugging codeJ. Bruce Fields2010-06-221-2/+0
| | | | | | This is overkill. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd: nfs4callback encode_stateid helper functionBenny Halevy2010-06-221-3/+13
| | | | | | | | | | | | To be used also for the pnfs cb_layoutrecall callback Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd4: fix cb_recall encoding] "nfsd: nfs4callback encode_stateid helper function" forgot to reserve more space after return from the new helper. Reported-by: Michael Groshans <groshans@citi.umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: translate memory errors to delay, not serverfaultJ. Bruce Fields2010-06-221-3/+3
| | | | | | | If the server is out of memory is better for clients to back off and retry than to just error out. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4; fix session reference count leakJ. Bruce Fields2010-06-222-1/+1
| | | | | | | Note the session has to be put() here regardless of what happens to the client. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: don't bother storing callback reply tagJ. Bruce Fields2010-05-311-6/+5
| | | | | | We don't use this, and probably never will. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: fix use of op_share_accessJ. Bruce Fields2010-05-311-4/+12
| | | | | | | | | NFSv4.1 adds additional flags to the share_access argument of the open call. These flags need to be masked out in some of the existing code, but current code does that inconsistently. Tested-by: Michael Groshans <groshans@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: treat more recall errors as failuresJ. Bruce Fields2010-05-311-9/+8
| | | | | | | | | | | If a recall fails for some unexpected reason, instead of ignoring it and treating it like a success, it's safer to treat it as a failure, preventing further delgation grants and returning CB_PATH_DOWN. Also put put switches in a (two me) more logical order, with normal case first. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: remove extra put() on callback errorsJ. Bruce Fields2010-05-311-5/+1
| | | | | | | Since rpc_call_async() guarantees that the release method will be called even on failure, this put is wrong. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* Linux 2.6.35-rc1Linus Torvalds2010-05-301-2/+2
| | | | .. and thus endeth the merge window.
* Merge branch 'slub/urgent' of ↵Linus Torvalds2010-05-302-29/+15
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slub/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: SLUB: Allow full duplication of kmalloc array for 390 slub: move kmem_cache_node into it's own cacheline
| * SLUB: Allow full duplication of kmalloc array for 390Christoph Lameter2010-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Commit 756dee75872a2a764b478e18076360b8a4ec9045 ("SLUB: Get rid of dynamic DMA kmalloc cache allocation") makes S390 run out of kmalloc caches. Increase the number of kmalloc caches to a safe size. Cc: <stable@kernel.org> [ .33 and .34 ] Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
| * slub: move kmem_cache_node into it's own cachelineAlexander Duyck2010-05-242-28/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is meant to improve the performance of SLUB by moving the local kmem_cache_node lock into it's own cacheline separate from kmem_cache. This is accomplished by simply removing the local_node when NUMA is enabled. On my system with 2 nodes I saw around a 5% performance increase w/ hackbench times dropping from 6.2 seconds to 5.9 seconds on average. I suspect the performance gain would increase as the number of nodes increases, but I do not have the data to currently back that up. Bugzilla-Reference: http://bugzilla.kernel.org/show_bug.cgi?id=15713 Cc: <stable@kernel.org> Reported-by: Alex Shi <alex.shi@intel.com> Tested-by: Alex Shi <alex.shi@intel.com> Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
* | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2010-05-301-0/+7
|\ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: mutex: Fix optimistic spinning vs. BKL
| * | mutex: Fix optimistic spinning vs. BKLTony Breeds2010-05-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we can hit a nasty case with optimistic spinning on mutexes: CPU A tries to take a mutex, while holding the BKL CPU B tried to take the BLK while holding the mutex This looks like a AB-BA scenario but in practice, is allowed and happens due to the auto-release on schedule() nature of the BKL. In that case, the optimistic spinning code can get us into a situation where instead of going to sleep, A will spin waiting for B who is spinning waiting for A, and the only way out of that loop is the need_resched() test in mutex_spin_on_owner(). This patch fixes it by completely disabling spinning if we own the BKL. This adds one more detail to the extensive list of reasons why it's a bad idea for kernel code to be holding the BKL. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: <stable@kernel.org> LKML-Reference: <20100519054636.GC12389@ozlabs.org> [ added an unlikely() attribute to the branch ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-05-308-16/+45
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf tui: Fix last use_browser problem related to .perfconfig perf symbols: Add the build id cache to the vmlinux path perf tui: Reset use_browser if stdout is not a tty ring-buffer: Move zeroing out excess in page to ring buffer code ring-buffer: Reset "real_end" when page is filled
| * \ \ Merge branch 'tip/perf/core' of ↵Ingo Molnar2010-05-292-8/+17
| |\ \ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
| | * | | ring-buffer: Move zeroing out excess in page to ring buffer codeSteven Rostedt2010-05-252-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the trace splice code zeros out the excess bytes in the page before sending it off to userspace. This is to make sure userspace is not getting anything it should not be when reading the pages, because the excess data was never initialized to zero before writing (for perfomance reasons). But the splice code has no business in doing this work, it should be done by the ring buffer. With the latest changes for recording lost events, the splice code gets it wrong anyway. Move the zeroing out of excess bytes into the ring buffer code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| | * | | ring-buffer: Reset "real_end" when page is filledSteven Rostedt2010-05-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code to store the "lost events" requires knowing the real end of the page. Since the 'commit' includes the padding at the end of a page a "real_end" variable was used to keep track of the end not including the padding. If events were lost, the reader can place the count of events in the padded area if there is enough room. The bug this patch fixes is that when we fill the page we do not reset the real_end variable, and if the writer had wrapped a few times, the real_end would be incorrect. This patch simply resets the real_end if the page was filled. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | perf tui: Fix last use_browser problem related to .perfconfigArnaldo Carvalho de Melo2010-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we moved to using ~/.perfconfig to set the value of use_browser, it changed from a boolean to an int so that the convention used for use_pager was followed. That convention is: -1: unspecified, that is what use_{browser,pager} is initialized 0: Don't use the browser (should be TUI), because was explicitely set to 0/off/false on ~/.perfconfig [tui] cmd =, or because we're redirecting the stdout to a file or piping it to some other command (!isatty()). 1: Use the TUI Some code was not properly audited and continued testing it as a boolean, this seems to be the last one. Reported-by: Frédéric Weisbecker <fweisbec@gmail.com> Tested-by: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf symbols: Add the build id cache to the vmlinux pathArnaldo Carvalho de Melo2010-05-263-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So that if the kernel DSO has a build id because record inserted it in the perf.data build id table in the header, or a BUILD_ID event was inserted in the stream, we first look at the build id cache ($HOME/.debug/). If we find it there, try to use it, allowing offline annotation in addition to 'perf report'. Reported-by: Stephane Eranian <eranian@google.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | perf tui: Reset use_browser if stdout is not a ttyArnaldo Carvalho de Melo2010-05-262-1/+2
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newt initialization routines weren't being called because the output was a file (perf annotate > /tmp/bla) but use_browser was still 1, because ~/.perfconfig had it as 'on', so, later on newt routines segfaulted. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | | ia64: revert __node_random additionLinus Torvalds2010-05-301-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This partially reverts commit 4ec37de89d8c758ee8115e0e64b3f994910789ee ("[IA64] Fix build breakage"), since the commit that made it necessary got reverted earlier (see commit 35926ff5fba8, 'Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()"') Even if we ever re-introduce this, there is no reason to make __node_random be some architecture-specific function. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-05-307-82/+500
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: mm: export generic_pipe_buf_*() to modules fuse: support splice() reading from fuse device fuse: allow splice to move pages mm: export remove_from_page_cache() to modules mm: export lru_cache_add_*() to modules fuse: support splice() writing to fuse device fuse: get page reference for readpages fuse: use get_user_pages_fast() fuse: remove unneeded variable
| * | | | mm: export generic_pipe_buf_*() to modulesMiklos Szeredi2010-05-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed by fuse device code which wants to create pipe buffers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | | | fuse: support splice() reading from fuse deviceMiklos Szeredi2010-05-251-41/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow userspace filesystem implementation to use splice() to read from the fuse device. The userspace filesystem can now transfer data coming from a WRITE request to an arbitrary file descriptor (regular file, block device or socket) without having to go through a userspace buffer. The semantics of using splice() to read messages are: 1) with a single splice() call move the whole message from the fuse device to a temporary pipe 2) read the header from the pipe and determine the message type 3a) if message is a WRITE then splice data from pipe to destination 3b) else read rest of message to userspace buffer Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | | | fuse: allow splice to move pagesMiklos Szeredi2010-05-253-15/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When splicing buffers to the fuse device with SPLICE_F_MOVE, try to move pages from the pipe buffer into the page cache. This allows populating the fuse filesystem's cache without ever touching the page contents, i.e. zero copy read capability. The following steps are performed when trying to move a page into the page cache: - buf->ops->confirm() to make sure the new page is uptodate - buf->ops->steal() to try to remove the new page from it's previous place - remove_from_page_cache() on the old page - add_to_page_cache_locked() on the new page If any of the above steps fail (non fatally) then the code falls back to copying the page. In particular ->steal() will fail if there are external references (other than the page cache and the pipe buffer) to the page. Also since the remove_from_page_cache() + add_to_page_cache_locked() are non-atomic it is possible that the page cache is repopulated in between the two and add_to_page_cache_locked() will fail. This could be fixed by creating a new atomic replace_page_cache_page() function. fuse_readpages_end() needed to be reworked so it works even if page->mapping is NULL for some or all pages which can happen if the add_to_page_cache_locked() failed. A number of sanity checks were added to make sure the stolen pages don't have weird flags set, etc... These could be moved into generic splice/steal code. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | | | mm: export remove_from_page_cache() to modulesMiklos Szeredi2010-05-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to enable moving pages into the page cache in fuse with splice(..., SPLICE_F_MOVE). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | | | mm: export lru_cache_add_*() to modulesMiklos Szeredi2010-05-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to enable moving pages into the page cache in fuse with splice(..., SPLICE_F_MOVE). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>