aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/workqueue.h
Commit message (Collapse)AuthorAgeFilesLines
* Block: use a freezable workqueue for disk-event pollingAlan Stern2012-03-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 upstream. This patch (as1519) fixes a bug in the block layer's disk-events polling. The polling is done by a work routine queued on the system_nrt_wq workqueue. Since that workqueue isn't freezable, the polling continues even in the middle of a system sleep transition. Obviously, polling a suspended drive for media changes and such isn't a good thing to do; in the case of USB mass-storage devices it can lead to real problems requiring device resets and even re-enumeration. The patch fixes things by creating a new system-wide, non-reentrant, freezable workqueue and using it for disk-events polling. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* workqueue: fix build failure introduced by s/freezeable/freezable/Tejun Heo2011-02-211-3/+3
| | | | | | | | | | wq:fixes-2.6.38 does s/WQ_FREEZEABLE/WQ_FREEZABLE and wq:for-2.6.39 adds new usage of the flag. The combination of the two creates a build failure after merge. Fix it by renaming all freezeables to freezables. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
* Merge branch 'master' into for-2.6.39Tejun Heo2011-02-211-4/+4
|\
| * workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'Tejun Heo2011-02-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>
* | workqueue: add system_freezeable_wqTejun Heo2011-02-091-0/+4
|/ | | | | | | | Add system wide freezeable workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
* Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2011-01-071-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits) usb: don't use flush_scheduled_work() speedtch: don't abuse struct delayed_work media/video: don't use flush_scheduled_work() media/video: explicitly flush request_module work ioc4: use static work_struct for ioc4_load_modules() init: don't call flush_scheduled_work() from do_initcalls() s390: don't use flush_scheduled_work() rtc: don't use flush_scheduled_work() mmc: update workqueue usages mfd: update workqueue usages dvb: don't use flush_scheduled_work() leds-wm8350: don't use flush_scheduled_work() mISDN: don't use flush_scheduled_work() macintosh/ams: don't use flush_scheduled_work() vmwgfx: don't use flush_scheduled_work() tpm: don't use flush_scheduled_work() sonypi: don't use flush_scheduled_work() hvsi: don't use flush_scheduled_work() xen: don't use flush_scheduled_work() gdrom: don't use flush_scheduled_work() ... Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c as per Tejun.
| * workqueue: deprecate cancel_rearming_delayed_work[queue]()Tejun Heo2010-12-151-2/+2
| | | | | | | | | | | | | | | | There's no in-kernel user left for these two obsolete functions. Mark them deprecated and schedule for removal during 2.6.39 cycle. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'timers-for-linus' of ↵Linus Torvalds2011-01-061-0/+8
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: MAINTAINERS: Update timer related entries timers: Use this_cpu_read timerqueue: Make timerqueue_getnext() static inline hrtimer: fix timerqueue conversion flub hrtimers: Convert hrtimers to use timerlist infrastructure timers: Fixup allmodconfig build issue timers: Rename timerlist infrastructure to timerqueue timers: Introduce timerlist infrastructure. hrtimer: Remove stale comment on curr_timer timer: Warn when del_timer_sync() is called in hardirq context timer: Del_timer_sync() can be used in softirq context timer: Make try_to_del_timer_sync() the same on SMP and UP posix-timers: Annotate lock_timer() timer: Permit statically-declared work with deferrable timers time: Use ARRAY_SIZE macro in timecompare.c timer: Initialize the field slack of timer_list timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS time: Compensate for rounding on odd-frequency clocksources Fix up trivial conflict in MAINTAINERS
| * timer: Permit statically-declared work with deferrable timersPhil Carmody2010-10-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, you have to just define a delayed_work uninitialised, and then initialise it before first use. That's a tad clumsy. At risk of playing mind-games with the compiler, fooling it into doing pointer arithmetic with compile-time-constants, this lets clients properly initialise delayed work with deferrable timers statically. This patch was inspired by the issues which lead Artem Bityutskiy to commit 8eab945c5616fc984 ("sunrpc: make the cache cleaner workqueue deferrable"). Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | workqueues: s/ON_STACK/ONSTACK/Andrew Morton2010-10-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Silly though it is, completions and wait_queue_heads use foo_ONSTACK (COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK, __WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I guess workqueues should do the same thing. s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/ s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/ Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | workqueue: remove in_workqueue_context()Tejun Heo2010-10-191-4/+0
| | | | | | | | | | | | | | | | | | | | Commit a25909a4 (lockdep: Add an in_workqueue_context() lockdep-based test function) added in_workqueue_context() but there hasn't been any in-kernel user and the lockdep annotation in workqueue is scheduled to change. Remove the unused function. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | workqueue: add and use WQ_MEM_RECLAIM flagTejun Heo2010-10-111-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark WQ_RESCUER as internal and replace all external WQ_RESCUER usages to WQ_MEM_RECLAIM. This makes the API users express the intent of the workqueue instead of indicating the internal mechanism used to guarantee forward progress. This is also to make it cleaner to add more semantics to WQ_MEM_RECLAIM. For example, if deemed necessary, memory reclaim workqueues can be made highpri. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Steven Whitehouse <swhiteho@redhat.com>
* | workqueue: implement flush[_delayed]_work_sync()Tejun Heo2010-09-191-0/+2
| | | | | | | | | | | | | | | | | | | | Implement flush[_delayed]_work_sync(). These are flush functions which also make sure no CPU is still executing the target work from earlier queueing instances. These are similar to cancel[_delayed]_work_sync() except that the target work item is flushed instead of cancelled. Signed-off-by: Tejun Heo <tj@kernel.org>
* | workqueue: cleanup flush/cancel functionsTejun Heo2010-09-191-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the following cleanup changes. * Relocate flush/cancel function prototypes and definitions. * Relocate wait_on_cpu_work() and wait_on_work() before try_to_grab_pending(). These will be used to implement flush_work_sync(). * Make all flush/cancel functions return bool instead of int. * Update wait_on_cpu_work() and wait_on_work() to return %true if they actually waited. * Add / update comments. This patch doesn't cause any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
* | workqueue: implement alloc_ordered_workqueue()Tejun Heo2010-09-191-0/+18
| | | | | | | | | | | | | | | | | | alloc_ordered_workqueue() creates a workqueue which processes each work itemp one by one in the queued order. This will be used to replace create_freezeable_workqueue() and create_singlethread_workqueue(). Signed-off-by: Tejun Heo <tj@kernel.org>
* | workqueue: add documentationTejun Heo2010-09-131-0/+4
|/ | | | | | | | | | | | | Update copyright notice and add Documentation/workqueue.txt. Randy Dunlap, Dave Chinner: misc fixes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-By: Florian Mickler <florian@mickler.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Dave Chinner <david@fromorbit.com>
* workqueue: fix cwq->nr_active underflowTejun Heo2010-08-251-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | cwq->nr_active is used to keep track of how many work items are active for the cpu workqueue, where 'active' is defined as either pending on global worklist or executing. This is used to implement the max_active limit and workqueue freezing. If a work item is queued after nr_active has already reached max_active, the work item doesn't increment nr_active and is put on the delayed queue and gets activated later as previous active work items retire. try_to_grab_pending() which is used in the cancellation path unconditionally decremented nr_active whether the work item being cancelled is currently active or delayed, so cancelling a delayed work item makes nr_active underflow. This breaks max_active enforcement and triggers BUG_ON() in destroy_workqueue() later on. This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which is set while a work item in on the delayed list and making try_to_grab_pending() decrement nr_active iff the work item is currently active. The addition of the flag enlarges cwq alignment to 256 bytes which is getting a bit too large. It's scheduled to be reduced back to 128 bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next devel cycle. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Johannes Berg <johannes@sipsolutions.net>
* workqueue: improve destroy_workqueue() debuggabilityTejun Heo2010-08-241-0/+2
| | | | | | | | | | | | | | | | | | | Now that the worklist is global, having works pending after wq destruction can easily lead to oops and destroy_workqueue() have several BUG_ON()s to catch these cases. Unfortunately, BUG_ON() doesn't tell much about how the work became pending after the final flush_workqueue(). This patch adds WQ_DYING which is set before the final flush begins. If a work is requested to be queued on a dying workqueue, WARN_ON_ONCE() is triggered and the request is ignored. This clearly indicates which caller is trying to queue a work on a dying workqueue and keeps the system working in most cases. Locking rule comment is updated such that the 'I' rule includes modifying the field from destruction path. Signed-off-by: Tejun Heo <tj@kernel.org>
* Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2010-08-071-25/+129
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits) workqueue: mark init_workqueues() as early_initcall() workqueue: explain for_each_*cwq_cpu() iterators fscache: fix build on !CONFIG_SYSCTL slow-work: kill it gfs2: use workqueue instead of slow-work drm: use workqueue instead of slow-work cifs: use workqueue instead of slow-work fscache: drop references to slow-work fscache: convert operation to use workqueue instead of slow-work fscache: convert object to use workqueue instead of slow-work workqueue: fix how cpu number is stored in work->data workqueue: fix mayday_mask handling on UP workqueue: fix build problem on !CONFIG_SMP workqueue: fix locking in retry path of maybe_create_worker() async: use workqueue for worker pool workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead workqueue: implement unbound workqueue workqueue: prepare for WQ_UNBOUND implementation libata: take advantage of cmwq and remove concurrency limitations workqueue: fix worker management invocation without pending works ... Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
| * workqueue: mark init_workqueues() as early_initcall()Suresh Siddha2010-08-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark init_workqueues() as early_initcall() and thus it will be initialized before smp bringup. init_workqueues() registers for the hotcpu notifier and thus it should cope with the processors that are brought online after the workqueues are initialized. x86 smp bringup code uses workqueues and uses a workaround for the cold boot process (as the workqueues are initialized post smp_init()). Marking init_workqueues() as early_initcall() will pave the way for cleaning up this code. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>
| * workqueue: fix how cpu number is stored in work->dataTejun Heo2010-07-221-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once a work starts execution, its data contains the cpu number it was on instead of pointing to cwq. This is added by commit 7a22ad75 (workqueue: carry cpu number in work data once execution starts) to reliably determine the work was last on even if the workqueue itself was destroyed inbetween. Whether data points to a cwq or contains a cpu number was distinguished by comparing the value against PAGE_OFFSET. The assumption was that a cpu number should be below PAGE_OFFSET while a pointer to cwq should be above it. However, on architectures which use separate address spaces for user and kernel spaces, this doesn't hold as PAGE_OFFSET is zero. Fix it by using an explicit flag, WORK_STRUCT_CWQ, to mark what the data field contains. If the flag is set, it's pointing to a cwq; otherwise, it contains a cpu number. Reported on s390 and microblaze during linux-next testing. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sachin Sant <sachinp@in.ibm.com> Reported-by: Michal Simek <michal.simek@petalogix.com> Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Tested-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Tested-by: Michal Simek <monstr@monstr.eu>
| * workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND insteadTejun Heo2010-07-021-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WQ_SINGLE_CPU combined with @max_active of 1 is used to achieve full ordering among works queued to a workqueue. The same can be achieved using WQ_UNBOUND as unbound workqueues always use the gcwq for WORK_CPU_UNBOUND. As @max_active is always one and benefits from cpu locality isn't accessible anyway, serving them with unbound workqueues should be fine. Drop WQ_SINGLE_CPU support and use WQ_UNBOUND instead. Note that most single thread workqueue users will be converted to use multithread or non-reentrant instead and only the ones which require strict ordering will keep using WQ_UNBOUND + @max_active of 1. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: implement unbound workqueueTejun Heo2010-07-021-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements unbound workqueue which can be specified with WQ_UNBOUND flag on creation. An unbound workqueue has the following properties. * It uses a dedicated gcwq with a pseudo CPU number WORK_CPU_UNBOUND. This gcwq is always online and disassociated. * Workers are not bound to any CPU and not concurrency managed. Works are dispatched to workers as soon as possible and the only applied limitation is @max_active. IOW, all unbound workqeueues are implicitly high priority. Unbound workqueues can be used as simple execution context provider. Contexts unbound to any cpu are served as soon as possible. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: David Howells <dhowells@redhat.com>
| * workqueue: prepare for WQ_UNBOUND implementationTejun Heo2010-07-021-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation of WQ_UNBOUND addition, make the following changes. * Add WORK_CPU_* constants for pseudo cpu id numbers used (currently only WORK_CPU_NONE) and use them instead of NR_CPUS. This is to allow another pseudo cpu id for unbound cpu. * Reorder WQ_* flags. * Make workqueue_struct->cpu_wq a union which contains a percpu pointer, regular pointer and an unsigned long value and use kzalloc/kfree() in UP allocation path. This will be used to implement unbound workqueues which will use only one cwq on SMPs. * Move alloc_cwqs() allocation after initialization of wq fields, so that alloc_cwqs() has access to wq->flags. * Trivial relocation of wq local variables in freeze functions. These changes don't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: implement cpu intensive workqueueTejun Heo2010-06-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements cpu intensive workqueue which can be specified with WQ_CPU_INTENSIVE flag on creation. Works queued to a cpu intensive workqueue don't participate in concurrency management. IOW, it doesn't contribute to gcwq->nr_running and thus doesn't delay excution of other works. Note that although cpu intensive works won't delay other works, they can be delayed by other works. Combine with WQ_HIGHPRI to avoid being delayed by other works too. As the name suggests this is useful when using workqueue for cpu intensive works. Workers executing cpu intensive works are not considered for workqueue concurrency management and left for the scheduler to manage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>
| * workqueue: implement high priority workqueueTejun Heo2010-06-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements high priority workqueue which can be specified with WQ_HIGHPRI flag on creation. A high priority workqueue has the following properties. * A work queued to it is queued at the head of the worklist of the respective gcwq after other highpri works, while normal works are always appended at the end. * As long as there are highpri works on gcwq->worklist, [__]need_more_worker() remains %true and process_one_work() wakes up another worker before it start executing a work. The above two properties guarantee that works queued to high priority workqueues are dispatched to workers and start execution as soon as possible regardless of the state of other works. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org>
| * workqueue: implement several utility APIsTejun Heo2010-06-291-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the following utility APIs. workqueue_set_max_active() : adjust max_active of a wq workqueue_congested() : test whether a wq is contested work_cpu() : determine the last / current cpu of a work work_busy() : query whether a work is busy * Anton Blanchard fixed missing ret initialization in work_busy(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Anton Blanchard <anton@samba.org>
| * workqueue: s/__create_workqueue()/alloc_workqueue()/, and add system workqueuesTejun Heo2010-06-291-11/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes changes to make new workqueue features available to its users. * Now that workqueue is more featureful, there should be a public workqueue creation function which takes paramters to control them. Rename __create_workqueue() to alloc_workqueue() and make 0 max_active mean WQ_DFL_ACTIVE. In the long run, all create_workqueue_*() will be converted over to alloc_workqueue(). * To further unify access interface, rename keventd_wq to system_wq and export it. * Add system_long_wq and system_nrt_wq. The former is to host long running works separately (so that flush_scheduled_work() dosen't take so long) and the latter guarantees any queued work item is never executed in parallel by multiple CPUs. These will be used by future patches to update workqueue users. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: increase max_active of keventd and kill current_is_keventd()Tejun Heo2010-06-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define WQ_MAX_ACTIVE and create keventd with max_active set to half of it which means that keventd now can process upto WQ_MAX_ACTIVE / 2 - 1 works concurrently. Unless some combination can result in dependency loop longer than max_active, deadlock won't happen and thus it's unnecessary to check whether current_is_keventd() before trying to schedule a work. Kill current_is_keventd(). (Lockdep annotations are broken. We need lock_map_acquire_read_norecurse()) Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com>
| * workqueue: implement concurrency managed dynamic worker poolTejun Heo2010-06-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of creating a worker for each cwq and putting it into the shared pool, manage per-cpu workers dynamically. Works aren't supposed to be cpu cycle hogs and maintaining just enough concurrency to prevent work processing from stalling due to lack of processing context is optimal. gcwq keeps the number of concurrent active workers to minimum but no less. As long as there's one or more running workers on the cpu, no new worker is scheduled so that works can be processed in batch as much as possible but when the last running worker blocks, gcwq immediately schedules new worker so that the cpu doesn't sit idle while there are works to be processed. gcwq always keeps at least single idle worker around. When a new worker is necessary and the worker is the last idle one, the worker assumes the role of "manager" and manages the worker pool - ie. creates another worker. Forward-progress is guaranteed by having dedicated rescue workers for workqueues which may be necessary while creating a new worker. When the manager is having problem creating a new worker, mayday timer activates and rescue workers are summoned to the cpu and execute works which might be necessary to create new workers. Trustee is expanded to serve the role of manager while a CPU is being taken down and stays down. As no new works are supposed to be queued on a dead cpu, it just needs to drain all the existing ones. Trustee continues to try to create new workers and summon rescuers as long as there are pending works. If the CPU is brought back up while the trustee is still trying to drain the gcwq from the previous offlining, the trustee will kill all idles ones and tell workers which are still busy to rebind to the cpu, and pass control over to gcwq which assumes the manager role as necessary. Concurrency managed worker pool reduces the number of workers drastically. Only workers which are necessary to keep the processing going are created and kept. Also, it reduces cache footprint by avoiding unnecessarily switching contexts between different workers. Please note that this patch does not increase max_active of any workqueue. All workqueues can still only process one work per cpu. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: implement WQ_NON_REENTRANTTejun Heo2010-06-291-0/+1
| | | | | | | | | | | | | | | | | | With gcwq managing all the workers and work->data pointing to the last gcwq it was on, non-reentrance can be easily implemented by checking whether the work is still running on the previous gcwq on queueing. Implement it. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: carry cpu number in work data once execution startsTejun Heo2010-06-291-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To implement non-reentrant workqueue, the last gcwq a work was executed on must be reliably obtainable as long as the work structure is valid even if the previous workqueue has been destroyed. To achieve this, work->data will be overloaded to carry the last cpu number once execution starts so that the previous gcwq can be located reliably. This means that cwq can't be obtained from work after execution starts but only gcwq. Implement set_work_{cwq|cpu}(), get_work_[g]cwq() and clear_work_data() to set work data to the cpu number when starting execution, access the overloaded work data and clear it after cancellation. queue_delayed_work_on() is updated to preserve the last cpu while in-flight in timer and other callers which depended on getting cwq from work after execution starts are converted to depend on gcwq instead. * Anton Blanchard fixed compile error on powerpc due to missing linux/threads.h include. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Anton Blanchard <anton@samba.org>
| * workqueue: make single thread workqueue shared worker pool friendlyTejun Heo2010-06-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reimplement st (single thread) workqueue so that it's friendly to shared worker pool. It was originally implemented by confining st workqueues to use cwq of a fixed cpu and always having a worker for the cpu. This implementation isn't very friendly to shared worker pool and suboptimal in that it ends up crossing cpu boundaries often. Reimplement st workqueue using dynamic single cpu binding and cwq->limit. WQ_SINGLE_THREAD is replaced with WQ_SINGLE_CPU. In a single cpu workqueue, at most single cwq is bound to the wq at any given time. Arbitration is done using atomic accesses to wq->single_cpu when queueing a work. Once bound, the binding stays till the workqueue is drained. Note that the binding is never broken while a workqueue is frozen. This is because idle cwqs may have works waiting in delayed_works queue while frozen. On thaw, the cwq is restarted if there are any delayed works or unbound otherwise. When combined with max_active limit of 1, single cpu workqueue has exactly the same execution properties as the original single thread workqueue while allowing sharing of per-cpu workers. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: reimplement workqueue freeze using max_activeTejun Heo2010-06-291-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, workqueue freezing is implemented by marking the worker freezeable and calling try_to_freeze() from dispatch loop. Reimplement it using cwq->limit so that the workqueue is frozen instead of the worker. * workqueue_struct->saved_max_active is added which stores the specified max_active on initialization. * On freeze, all cwq->max_active's are quenched to zero. Freezing is complete when nr_active on all cwqs reach zero. * On thaw, all cwq->max_active's are restored to wq->saved_max_active and the worklist is repopulated. This new implementation allows having single shared pool of workers per cpu. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: implement per-cwq active work limitTejun Heo2010-06-291-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add cwq->nr_active, cwq->max_active and cwq->delayed_work. nr_active counts the number of active works per cwq. A work is active if it's flushable (colored) and is on cwq's worklist. If nr_active reaches max_active, new works are queued on cwq->delayed_work and activated later as works on the cwq complete and decrement nr_active. cwq->max_active can be specified via the new @max_active parameter to __create_workqueue() and is set to 1 for all workqueues for now. As each cwq has only single worker now, this double queueing doesn't cause any behavior difference visible to its users. This will be used to reimplement freeze/thaw and implement shared worker pool. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: reimplement work flushing using linked worksTejun Heo2010-06-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | A work is linked to the next one by having WORK_STRUCT_LINKED bit set and these links can be chained. When a linked work is dispatched to a worker, all linked works are dispatched to the worker's newly added ->scheduled queue and processed back-to-back. Currently, as there's only single worker per cwq, having linked works doesn't make any visible behavior difference. This change is to prepare for multiple shared workers per cpu. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: reimplement workqueue flushing using color coded worksTejun Heo2010-06-291-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reimplement workqueue flushing using color coded works. wq has the current work color which is painted on the works being issued via cwqs. Flushing a workqueue is achieved by advancing the current work colors of cwqs and waiting for all the works which have any of the previous colors to drain. Currently there are 16 possible colors, one is reserved for no color and 15 colors are useable allowing 14 concurrent flushes. When color space gets full, flush attempts are batched up and processed together when color frees up, so even with many concurrent flushers, the new implementation won't build up huge queue of flushers which has to be processed one after another. Only works which are queued via __queue_work() are colored. Works which are directly put on queue using insert_work() use NO_COLOR and don't participate in workqueue flushing. Currently only works used for work-specific flush fall in this category. This new implementation leaves only cleanup_workqueue_thread() as the user of flush_cpu_workqueue(). Just make its users use flush_workqueue() and kthread_stop() directly and kill cleanup_workqueue_thread(). As workqueue flushing doesn't use barrier request anymore, the comment describing the complex synchronization around it in cleanup_workqueue_thread() is removed together with the function. This new implementation is to allow having and sharing multiple workers per cpu. Please note that one more bit is reserved for a future work flag by this patch. This is to avoid shifting bits and updating comments later. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: update cwq alignementTejun Heo2010-06-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | work->data field is used for two purposes. It points to cwq it's queued on and the lower bits are used for flags. Currently, two bits are reserved which is always safe as 4 byte alignment is guaranteed on every architecture. However, future changes will need more flag bits. On SMP, the percpu allocator is capable of honoring larger alignment (there are other users which depend on it) and larger alignment works just fine. On UP, percpu allocator is a thin wrapper around kzalloc/kfree() and don't honor alignment request. This patch introduces WORK_STRUCT_FLAG_BITS and implements alloc/free_cwqs() which guarantees max(1 << WORK_STRUCT_FLAG_BITS, __alignof__(unsigned long long) alignment both on SMP and UP. On SMP, simply wrapping percpu allocator is enough. On UP, extra space is allocated so that cwq can be aligned and the original pointer can be stored after it which is used in the free path. * Alignment problem on UP is reported by Michal Simek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: Michal Simek <michal.simek@petalogix.com>
| * workqueue: define masks for work flags and conditionalize STATIC flagsTejun Heo2010-06-291-8/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Work flags are about to see more traditional mask handling. Define WORK_STRUCT_*_BIT as the bit position constant and redefine WORK_STRUCT_* as bit masks. Also, make WORK_STRUCT_STATIC_* flags conditional While at it, re-define these constants as enums and use WORK_STRUCT_STATIC instead of hard-coding 2 in WORK_DATA_STATIC_INIT(). Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: merge feature parameters into flagsTejun Heo2010-06-291-10/+15
| | | | | | | | | | | | | | | | | | Currently, __create_workqueue_key() takes @singlethread and @freezeable paramters and store them separately in workqueue_struct. Merge them into a single flags parameter and field and use WQ_FREEZEABLE and WQ_SINGLE_THREAD. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: misc/cosmetic updatesTejun Heo2010-06-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the following updates in preparation of concurrency managed workqueue. None of these changes causes any visible behavior difference. * Add comments and adjust indentations to data structures and several functions. * Rename wq_per_cpu() to get_cwq() and swap the position of two parameters for consistency. Convert a direct per_cpu_ptr() access to wq->cpu_wq to get_cwq(). * Add work_static() and Update set_wq_data() such that it sets the flags part to WORK_STRUCT_PENDING | WORK_STRUCT_STATIC if static | @extra_flags. * Move santiy check on work->entry emptiness from queue_work_on() to __queue_work() which all queueing paths share. * Make __queue_work() take @cpu and @wq instead of @cwq. * Restructure flush_work() and __create_workqueue_key() to make them easier to modify. Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: kill RT workqueueTejun Heo2010-06-291-11/+9
| | | | | | | | | | | | | | With stop_machine() converted to use cpu_stop, RT workqueue doesn't have any user left. Kill RT workqueue support. Signed-off-by: Tejun Heo <tj@kernel.org>
* | lockdep: Add an in_workqueue_context() lockdep-based test functionPaul E. McKenney2010-06-141-0/+4
|/ | | | | | | | | | | | | | Some recent uses of RCU make use of workqueues. In these uses, execution within the context of a specific workqueue takes the place of the usual RCU read-side primitives such as rcu_read_lock(), and flushing of workqueues takes the place of the usual RCU grace-period primitives. Checking for correct use of rcu_dereference() in such cases requires a test of whether the code is executing in the context of a particular workqueue. This commit adds an in_workqueue_context() function that provides this test. This new function is only defined when lockdep is enabled, which allows it to be used as the second argument of rcu_dereference_check(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* workqueue: Add debugobjects supportThomas Gleixner2009-11-161-11/+27
| | | | | | | | | | Add debugobject support to track the life time of work_structs. While at it, remove duplicate definition of INIT_DELAYED_WORK_ON_STACK(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>
* workqueue: add 'flush_delayed_work()' to run and wait for delayed workLinus Torvalds2009-10-141-0/+1
| | | | | It basically turns a delayed work into an immediate work, and then waits for it to finish.
* Change "useing" -> "using".Dmitri Vorobiev2009-09-211-1/+1
| | | | | Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* workqueues: introduce __cancel_delayed_work()Oleg Nesterov2009-09-051-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | cancel_delayed_work() has to use del_timer_sync() to guarantee the timer function is not running after return. But most users doesn't actually need this, and del_timer_sync() has problems: it is not useable from interrupt, and it depends on every lock which could be taken from irq. Introduce __cancel_delayed_work() which calls del_timer() instead. The immediate reason for this patch is http://bugzilla.kernel.org/show_bug.cgi?id=13757 but hopefully this helper makes sense anyway. As for 13757 bug, actually we need requeue_delayed_work(), but its semantics are not yet clear. Merge this patch early to resolves cross-tree interdependencies between input and infiniband. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* workqueue: add to_delayed_work() helper functionJean Delvare2009-04-021-0/+5
| | | | | | | | | | | | | | | | | | | It is a fairly common operation to have a pointer to a work and to need a pointer to the delayed work it is contained in. In particular, all delayed works which want to rearm themselves will have to do that. So it would seem fair to offer a helper function for this operation. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2009-01-261-0/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: debugobjects: add and use INIT_WORK_ON_STACK rcu: remove duplicate CONFIG_RCU_CPU_STALL_DETECTOR relay: fix lock imbalance in relay_late_setup_files oprofile: fix uninitialized use of struct op_entry rcu: move Kconfig menu softlock: fix false panic which can occur if softlockup_thresh is reduced rcu: add __cpuinit to rcu_init_percpu_data()
| * debugobjects: add and use INIT_WORK_ON_STACKThomas Gleixner2009-01-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | Impact: Fix debugobjects warning debugobject enabled kernels spit out a warning in hpet code due to a workqueue which is initialized on stack. Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it in hpet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>