From 955234755ce4a2c33cfc558912aa8f2148cc1fc6 Mon Sep 17 00:00:00 2001
From: Paul Wise <pabs3@bonedaddy.net>
Date: Sat, 1 Aug 2009 21:30:31 +0900
Subject: vfat: change the default from shortname=lower to shortname=mixed

Because, with "shortname=lower", copying one FAT filesystem tree to
another FAT filesystem tree using Linux results in semantically
different filesystems. (E.g.: Filenames which were once "all
uppercase" are now "all lowercase").

So, this changes the default of "shortname=lower" to "shortname=mixed".

Signed-off-by: Paul Wise <pabs3@bonedaddy.net>
[change fat_show_options()]
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---
 Documentation/filesystems/vfat.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
index b58b84b..eed520f 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -102,7 +102,7 @@ shortname=lower|win95|winnt|mixed
 		 winnt: emulate the Windows NT rule for display/create.
 		 mixed: emulate the Windows NT rule for display,
 			emulate the Windows 95 rule for create.
-		 Default setting is `lower'.
+		 Default setting is `mixed'.
 
 tz=UTC        -- Interpret timestamps as UTC rather than local time.
                  This option disables the conversion of timestamps
-- 
cgit v1.1


From 8365388827663bd6fb773e3623ed9023c0f82b1d Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 29 Sep 2009 15:59:34 -0400
Subject: ext4: Update documentation about quota mount options

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 Documentation/filesystems/ext4.txt | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 18b5ec8..bf4f4b7 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -282,9 +282,16 @@ stripe=n		Number of filesystem blocks that mballoc will try
 			to use for allocation size and alignment. For RAID5/6
 			systems this should be the number of data
 			disks *  RAID chunk size in file system blocks.
-delalloc	(*)	Deferring block allocation until write-out time.
-nodelalloc		Disable delayed allocation. Blocks are allocation
-			when data is copied from user to page cache.
+
+delalloc	(*)	Defer block allocation until just before ext4
+			writes out the block(s) in question.  This
+			allows ext4 to better allocation decisions
+			more efficiently.
+nodelalloc		Disable delayed allocation.  Blocks are allocated
+			when the data is copied from userspace to the
+			page cache, either via the write(2) system call
+			or when an mmap'ed page which was previously
+			unallocated is written for the first time.
 
 max_batch_time=usec	Maximum amount of time ext4 should wait for
 			additional filesystem operations to be batch
-- 
cgit v1.1


From a72cb4bc8590d222ac27205444d7f0dcf47ab1d5 Mon Sep 17 00:00:00 2001
From: Miguel de Barros <miguel.de.barros@bluewin.ch>
Date: Sun, 27 Sep 2009 22:11:21 +0200
Subject: ALSA: hda - Analog Devices AD1984A add HP Touchsmart model

Reference: ALSA bug #0004614
https://bugtrack.alsa-project.org/alsa-bug/view.php?id=4614

port-A (0x11)      - front hp-out
port-D (0x12)      - rear line out
port-E (0x1c)      - front mic-in
port-F (0x16)      - Internal speakers
digital-mic (0x17) - Internal mic

init verbs, mixers, jack sensing and PCI_QUIRK to support this hardware

Signed-off-by: Miguel de Barros <miguel.de.barros@bluewin.ch>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 Documentation/sound/alsa/HD-Audio-Models.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt
index 97eebd6..a2643cf 100644
--- a/Documentation/sound/alsa/HD-Audio-Models.txt
+++ b/Documentation/sound/alsa/HD-Audio-Models.txt
@@ -209,6 +209,7 @@ AD1884A / AD1883 / AD1984A / AD1984B
   laptop	laptop with HP jack sensing
   mobile	mobile devices with HP jack sensing
   thinkpad	Lenovo Thinkpad X300
+  touchsmart	HP Touchsmart
 
 AD1884
 ======
-- 
cgit v1.1


From 296c355cd6443d89fa251885a8d78778fe111dc4 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Wed, 30 Sep 2009 00:32:42 -0400
Subject: ext4: Use tracepoints for mb_history trace file

The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 Documentation/filesystems/proc.txt | 1 -
 1 file changed, 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index b5aee78..2c48f94 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1113,7 +1113,6 @@ Table 1-12: Files in /proc/fs/ext4/<devname>
 ..............................................................................
  File            Content                                        
  mb_groups       details of multiblock allocator buddy cache of free blocks
- mb_history      multiblock allocation history
 ..............................................................................
 
 
-- 
cgit v1.1


From ea3acb199a5d7e4da1de0a4288eba993b29f33b9 Mon Sep 17 00:00:00 2001
From: Jason Wessel <jason.wessel@windriver.com>
Date: Thu, 24 Sep 2009 09:08:30 -0500
Subject: x86: earlyprintk: Fix regression to handle serial,ttySn as 1 arg

Commit c953094 ("early_printk: Allow more than one early console")
introduced a regression in the parsing of the earlyprintk= kernel
arguments.

If you specify "earlyprintk=serial,ttyS0,115200" as a kernel
argument, the "serial,ttyS" should be parsed as a single argument
and not as "serial" and then "ttyS".

Also update the documentation to reflect you can specify the ttyS
directly without the "serial" argument.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
LKML-Reference: <4ABB7D5E.6000301@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/kernel-parameters.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6fa7292..9107b38 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -671,6 +671,7 @@ and is between 256 and 4096 characters. It is defined in the file
 	earlyprintk=	[X86,SH,BLACKFIN]
 			earlyprintk=vga
 			earlyprintk=serial[,ttySn[,baudrate]]
+			earlyprintk=ttySn[,baudrate]
 			earlyprintk=dbgp[debugController#]
 
 			Append ",keep" to not disable it when the real console
-- 
cgit v1.1


From 610ea6c671685a09afff7ba521bdccda21c84c76 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij@stericsson.com>
Date: Thu, 1 Oct 2009 14:31:22 +0100
Subject: ARM: 5738/1: Correct TCM documentation

It turns out that the TCM memory can be remap:ed by the MMU just
like any other memory.

Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 Documentation/arm/tcm.txt | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/arm/tcm.txt b/Documentation/arm/tcm.txt
index 074f4be..77fd937 100644
--- a/Documentation/arm/tcm.txt
+++ b/Documentation/arm/tcm.txt
@@ -29,11 +29,13 @@ TCM location and size. Notice that this is not a MMU table: you
 actually move the physical location of the TCM around. At the
 place you put it, it will mask any underlying RAM from the
 CPU so it is usually wise not to overlap any physical RAM with
-the TCM. The TCM memory exists totally outside the MMU and will
-override any MMU mappings.
+the TCM.
 
-Code executing inside the ITCM does not "see" any MMU mappings
-and e.g. register accesses must be made to physical addresses.
+The TCM memory can then be remapped to another address again using
+the MMU, but notice that the TCM if often used in situations where
+the MMU is turned off. To avoid confusion the current Linux
+implementation will map the TCM 1 to 1 from physical to virtual
+memory in the location specified by the machine.
 
 TCM is used for a few things:
 
-- 
cgit v1.1


From d6f4965d7d2e718eb9b223cb06db5f6a53b73507 Mon Sep 17 00:00:00 2001
From: Andrew Patterson <andrew.patterson@hp.com>
Date: Thu, 17 Sep 2009 13:47:03 -0500
Subject: cciss: Allow triggering of rescan of logical drive topology via sysfs
 entry

Added /sys/bus/pci/devices/<dev>/ccissX/rescan sysfs entry used
to kick off a rescan that discovers logical drive topology changes.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/ABI/testing/sysfs-bus-pci-devices-cciss | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
index 0a92a7c..ac3429d 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
@@ -31,3 +31,10 @@ Date:		March 2009
 Kernel Version: 2.6.30
 Contact:	iss_storagedev@hp.com
 Description:	A symbolic link to /sys/block/cciss!cXdY
+
+Where:		/sys/bus/pci/devices/<dev>/ccissX/rescan
+Date:		August 2009
+Kernel Version:	2.6.31
+Contact:	iss_storagedev@hp.com
+Description:	Kicks of a rescan of the controller to discover logical
+		drive topology changes.
-- 
cgit v1.1


From ce84a8aeac4a4a2cc421b3145dd2fb7cae860e4d Mon Sep 17 00:00:00 2001
From: "Stephen M. Cameron" <scameron@beardog.cce.hp.com>
Date: Thu, 17 Sep 2009 13:48:10 -0500
Subject: cciss: Add lunid attribute to each logical drive in /sys

Add lunid attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/lunid for controller X,
logical drive Y

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/ABI/testing/sysfs-bus-pci-devices-cciss | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
index ac3429d..5a6c8d3 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
@@ -38,3 +38,10 @@ Kernel Version:	2.6.31
 Contact:	iss_storagedev@hp.com
 Description:	Kicks of a rescan of the controller to discover logical
 		drive topology changes.
+
+Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/lunid
+Date:		August 2009
+Kernel Version: 2.6.31
+Contact:	iss_storagedev@hp.com
+Description:	Displays the 8-byte LUN ID used to address logical
+		drive Y of controller X.
-- 
cgit v1.1


From 3ff1111dc6e27524eeef267ab0ca9b5690594748 Mon Sep 17 00:00:00 2001
From: "Stephen M. Cameron" <scameron@beardog.cce.hp.com>
Date: Thu, 17 Sep 2009 13:48:21 -0500
Subject: cciss: Add a "raid_level" attribute to each logical drive in /sys

and change get rid of some magic numbers in raid lavel decoding.

Add raid_level attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/raid_level for controller X,
logical drive Y

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/ABI/testing/sysfs-bus-pci-devices-cciss | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
index 5a6c8d3..8d02602 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
@@ -45,3 +45,10 @@ Kernel Version: 2.6.31
 Contact:	iss_storagedev@hp.com
 Description:	Displays the 8-byte LUN ID used to address logical
 		drive Y of controller X.
+
+Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/raid_level
+Date:		August 2009
+Kernel Version: 2.6.31
+Contact:	iss_storagedev@hp.com
+Description:	Displays the RAID level of logical drive Y of
+		controller X.
-- 
cgit v1.1


From e272afecaf18912e971374df4605496975942e5c Mon Sep 17 00:00:00 2001
From: "Stephen M. Cameron" <scameron@beardog.cce.hp.com>
Date: Thu, 17 Sep 2009 13:48:26 -0500
Subject: cciss: Add usage_count attribute to each logical drive in /sys

Add usage_count attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/usage_count for controller X,
logical drive Y.  The usage count is the number of times
the device has currently been opened.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
 Documentation/ABI/testing/sysfs-bus-pci-devices-cciss | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
index 8d02602..4f29e5f1 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
@@ -52,3 +52,10 @@ Kernel Version: 2.6.31
 Contact:	iss_storagedev@hp.com
 Description:	Displays the RAID level of logical drive Y of
 		controller X.
+
+Where:		/sys/bus/pci/devices/<dev>/ccissX/cXdY/usage_count
+Date:		August 2009
+Kernel Version: 2.6.31
+Contact:	iss_storagedev@hp.com
+Description:	Displays the usage count (number of opens) of logical drive Y
+		of controller X.
-- 
cgit v1.1


From 4932be778952d4f3c278cbdef0d717358849aa8c Mon Sep 17 00:00:00 2001
From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Thu, 1 Oct 2009 15:44:06 -0700
Subject: docs: update patch size in SubmittingPatches

This patch size comment is like so last millenium.  Update it to modern
times.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index b7f9d3b..72651f7 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -232,7 +232,7 @@ your e-mail client so that it sends your patches untouched.
 When sending patches to Linus, always follow step #7.
 
 Large changes are not appropriate for mailing lists, and some
-maintainers.  If your patch, uncompressed, exceeds 40 kB in size,
+maintainers.  If your patch, uncompressed, exceeds 300 kB in size,
 it is preferred that you store your patch on an Internet-accessible
 server, and provide instead a URL (link) pointing to your patch.
 
-- 
cgit v1.1


From 3bfc13c239fd56ebc1ac98a914c6c6b8b0045478 Mon Sep 17 00:00:00 2001
From: HighPoint Linux Team <linux@highpoint-tech.com>
Date: Fri, 11 Sep 2009 17:21:27 +0800
Subject: [SCSI] hptiop: Add RR44xx adapter support

Most code changes were made to support RR44xx adapters.
- add more PCI device ID.
- using PCI BAR[2] to access RR44xx IOP.
- using PCI BAR[0] to check and clear RR44xx IRQ.

Signed-off-by: HighPoint Linux Team <linux@highpoint-tech.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
---
 Documentation/scsi/hptiop.txt | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/scsi/hptiop.txt b/Documentation/scsi/hptiop.txt
index a6eb4ad..9605179 100644
--- a/Documentation/scsi/hptiop.txt
+++ b/Documentation/scsi/hptiop.txt
@@ -3,6 +3,25 @@ HIGHPOINT ROCKETRAID 3xxx/4xxx ADAPTER DRIVER (hptiop)
 Controller Register Map
 -------------------------
 
+For RR44xx Intel IOP based adapters, the controller IOP is accessed via PCI BAR0 and BAR2:
+
+     BAR0 offset    Register
+            0x11C5C Link Interface IRQ Set
+            0x11C60 Link Interface IRQ Clear
+
+     BAR2 offset    Register
+            0x10    Inbound Message Register 0
+            0x14    Inbound Message Register 1
+            0x18    Outbound Message Register 0
+            0x1C    Outbound Message Register 1
+            0x20    Inbound Doorbell Register
+            0x24    Inbound Interrupt Status Register
+            0x28    Inbound Interrupt Mask Register
+            0x30    Outbound Interrupt Status Register
+            0x34    Outbound Interrupt Mask Register
+            0x40    Inbound Queue Port
+            0x44    Outbound Queue Port
+
 For Intel IOP based adapters, the controller IOP is accessed via PCI BAR0:
 
      BAR0 offset    Register
@@ -93,7 +112,7 @@ The driver exposes following sysfs attributes:
 
 
 -----------------------------------------------------------------------------
-Copyright (C) 2006-2007 HighPoint Technologies, Inc. All Rights Reserved.
+Copyright (C) 2006-2009 HighPoint Technologies, Inc. All Rights Reserved.
 
   This file is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
-- 
cgit v1.1


From b607bd900051efc3308c4edc65dd98b34b230021 Mon Sep 17 00:00:00 2001
From: Jean Delvare <khali@linux-fr.org>
Date: Fri, 2 Oct 2009 09:55:19 -0700
Subject: net: Fix wrong sizeof

Which is why I have always preferred sizeof(struct foo) over
sizeof(var).

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/timestamping/timestamping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
index 43d1431..a7936fe 100644
--- a/Documentation/networking/timestamping/timestamping.c
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -381,7 +381,7 @@ int main(int argc, char **argv)
 	memset(&hwtstamp, 0, sizeof(hwtstamp));
 	strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name));
 	hwtstamp.ifr_data = (void *)&hwconfig;
-	memset(&hwconfig, 0, sizeof(&hwconfig));
+	memset(&hwconfig, 0, sizeof(hwconfig));
 	hwconfig.tx_type =
 		(so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ?
 		HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
-- 
cgit v1.1


From 7069331dbe7155f23966f5944109f909fea0c7e4 Mon Sep 17 00:00:00 2001
From: Philipp Reisner <philipp.reisner@linbit.com>
Date: Fri, 2 Oct 2009 02:40:05 +0000
Subject: connector: Provide the sender's credentials to the callback

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/connector/cn_test.c     | 2 +-
 Documentation/connector/connector.txt | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c
index 1711adc..b07add3 100644
--- a/Documentation/connector/cn_test.c
+++ b/Documentation/connector/cn_test.c
@@ -34,7 +34,7 @@ static char cn_test_name[] = "cn_test";
 static struct sock *nls;
 static struct timer_list cn_test_timer;
 
-static void cn_test_callback(struct cn_msg *msg)
+static void cn_test_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
 {
 	pr_info("%s: %lu: idx=%x, val=%x, seq=%u, ack=%u, len=%d: %s.\n",
 	        __func__, jiffies, msg->id.idx, msg->id.val,
diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt
index 81e6bf6..78c9466 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -23,7 +23,7 @@ handling, etc...  The Connector driver allows any kernelspace agents to use
 netlink based networking for inter-process communication in a significantly
 easier way:
 
-int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
+int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
 void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask);
 
 struct cb_id
@@ -53,15 +53,15 @@ struct cn_msg
 Connector interfaces.
 /*****************************************/
 
-int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
+int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
 
  Registers new callback with connector core.
 
  struct cb_id *id		- unique connector's user identifier.
 				  It must be registered in connector.h for legal in-kernel users.
  char *name			- connector's callback symbolic name.
- void (*callback) (void *)	- connector's callback.
-				  Argument must be dereferenced to struct cn_msg *.
+ void (*callback) (struct cn..)	- connector's callback.
+				  cn_msg and the sender's credentials
 
 
 void cn_del_callback(struct cb_id *id);
-- 
cgit v1.1


From 211a641770c2ae191c9589fee69a8fb67f67fffd Mon Sep 17 00:00:00 2001
From: "Justin P. Mattock" <justinmattock@gmail.com>
Date: Tue, 29 Sep 2009 15:13:58 -0700
Subject: ieee1394: update URLs in debugging-via-ohci1394.txt

Update URLs of the userspace tools to use ohci1394_dma=early for
debugging.

Seems the address ftp://ftp.suse.de/private/bk/firewire/tools/* is not
very helpful.  After a quick search, seems this was talked about:
http://www.mail-archive.com/kgdb-bugreport@lists.sourceforge.net/msg02761.html
(can't find the original thread).

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
 Documentation/debugging-via-ohci1394.txt | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/debugging-via-ohci1394.txt b/Documentation/debugging-via-ohci1394.txt
index 59a91e5..611f5a5 100644
--- a/Documentation/debugging-via-ohci1394.txt
+++ b/Documentation/debugging-via-ohci1394.txt
@@ -64,14 +64,14 @@ be used to view the printk buffer of a remote machine, even with live update.
 
 Bernhard Kaindl enhanced firescope to support accessing 64-bit machines
 from 32-bit firescope and vice versa:
-- ftp://ftp.suse.de/private/bk/firewire/tools/firescope-0.2.2.tar.bz2
+- http://halobates.de/firewire/firescope-0.2.2.tar.bz2
 
 and he implemented fast system dump (alpha version - read README.txt):
-- ftp://ftp.suse.de/private/bk/firewire/tools/firedump-0.1.tar.bz2
+- http://halobates.de/firewire/firedump-0.1.tar.bz2
 
 There is also a gdb proxy for firewire which allows to use gdb to access
 data which can be referenced from symbols found by gdb in vmlinux:
-- ftp://ftp.suse.de/private/bk/firewire/tools/fireproxy-0.33.tar.bz2
+- http://halobates.de/firewire/fireproxy-0.33.tar.bz2
 
 The latest version of this gdb proxy (fireproxy-0.34) can communicate (not
 yet stable) with kgdb over an memory-based communication module (kgdbom).
@@ -178,7 +178,7 @@ Step-by-step instructions for using firescope with early OHCI initialization:
 
 Notes
 -----
-Documentation and specifications: ftp://ftp.suse.de/private/bk/firewire/docs
+Documentation and specifications: http://halobates.de/firewire/
 
 FireWire is a trademark of Apple Inc. - for more information please refer to:
 http://en.wikipedia.org/wiki/FireWire
-- 
cgit v1.1


From f58ee00f1547ceb17b610ecfce2aa9097f1f9737 Mon Sep 17 00:00:00 2001
From: Andi Kleen <ak@linux.intel.com>
Date: Sun, 4 Oct 2009 02:28:42 +0200
Subject: HWPOISON: Add brief hwpoison description to Documentation

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Documentation/vm/hwpoison.txt | 136 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 136 insertions(+)
 create mode 100644 Documentation/vm/hwpoison.txt

(limited to 'Documentation')

diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt
new file mode 100644
index 0000000..3ffadf8
--- /dev/null
+++ b/Documentation/vm/hwpoison.txt
@@ -0,0 +1,136 @@
+What is hwpoison?
+
+Upcoming Intel CPUs have support for recovering from some memory errors
+(``MCA recovery''). This requires the OS to declare a page "poisoned",
+kill the processes associated with it and avoid using it in the future.
+
+This patchkit implements the necessary infrastructure in the VM.
+
+To quote the overview comment:
+
+ * High level machine check handler. Handles pages reported by the
+ * hardware as being corrupted usually due to a 2bit ECC memory or cache
+ * failure.
+ *
+ * This focusses on pages detected as corrupted in the background.
+ * When the current CPU tries to consume corruption the currently
+ * running process can just be killed directly instead. This implies
+ * that if the error cannot be handled for some reason it's safe to
+ * just ignore it because no corruption has been consumed yet. Instead
+ * when that happens another machine check will happen.
+ *
+ * Handles page cache pages in various states. The tricky part
+ * here is that we can access any page asynchronous to other VM
+ * users, because memory failures could happen anytime and anywhere,
+ * possibly violating some of their assumptions. This is why this code
+ * has to be extremely careful. Generally it tries to use normal locking
+ * rules, as in get the standard locks, even if that means the
+ * error handling takes potentially a long time.
+ *
+ * Some of the operations here are somewhat inefficient and have non
+ * linear algorithmic complexity, because the data structures have not
+ * been optimized for this case. This is in particular the case
+ * for the mapping from a vma to a process. Since this case is expected
+ * to be rare we hope we can get away with this.
+
+The code consists of a the high level handler in mm/memory-failure.c,
+a new page poison bit and various checks in the VM to handle poisoned
+pages.
+
+The main target right now is KVM guests, but it works for all kinds
+of applications. KVM support requires a recent qemu-kvm release.
+
+For the KVM use there was need for a new signal type so that
+KVM can inject the machine check into the guest with the proper
+address. This in theory allows other applications to handle
+memory failures too. The expection is that near all applications
+won't do that, but some very specialized ones might.
+
+---
+
+There are two (actually three) modi memory failure recovery can be in:
+
+vm.memory_failure_recovery sysctl set to zero:
+	All memory failures cause a panic. Do not attempt recovery.
+	(on x86 this can be also affected by the tolerant level of the
+	MCE subsystem)
+
+early kill
+	(can be controlled globally and per process)
+	Send SIGBUS to the application as soon as the error is detected
+	This allows applications who can process memory errors in a gentle
+	way (e.g. drop affected object)
+	This is the mode used by KVM qemu.
+
+late kill
+	Send SIGBUS when the application runs into the corrupted page.
+	This is best for memory error unaware applications and default
+	Note some pages are always handled as late kill.
+
+---
+
+User control:
+
+vm.memory_failure_recovery
+	See sysctl.txt
+
+vm.memory_failure_early_kill
+	Enable early kill mode globally
+
+PR_MCE_KILL
+	Set early/late kill mode/revert to system default
+	arg1: PR_MCE_KILL_CLEAR: Revert to system default
+	arg1: PR_MCE_KILL_SET: arg2 defines thread specific mode
+		PR_MCE_KILL_EARLY: Early kill
+		PR_MCE_KILL_LATE:  Late kill
+		PR_MCE_KILL_DEFAULT: Use system global default
+PR_MCE_KILL_GET
+	return current mode
+
+
+---
+
+Testing:
+
+madvise(MADV_POISON, ....)
+	(as root)
+	Poison a page in the process for testing
+
+
+hwpoison-inject module through debugfs
+	/sys/debug/hwpoison/corrupt-pfn
+
+Inject hwpoison fault at PFN echoed into this file
+
+
+Architecture specific MCE injector
+
+x86 has mce-inject, mce-test
+
+Some portable hwpoison test programs in mce-test, see blow.
+
+---
+
+References:
+
+http://halobates.de/mce-lc09-2.pdf
+	Overview presentation from LinuxCon 09
+
+git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
+	Test suite (hwpoison specific portable tests in tsrc)
+
+git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
+	x86 specific injector
+
+
+---
+
+Limitations:
+
+- Not all page types are supported and never will. Most kernel internal
+objects cannot be recovered, only LRU pages for now.
+- Right now hugepage support is missing.
+
+---
+Andi Kleen, Oct 2009
+
-- 
cgit v1.1


From f546c65cd59275c7b95eba4f9b3ab83b38a5e9cb Mon Sep 17 00:00:00 2001
From: Jean Delvare <khali@linux-fr.org>
Date: Sun, 4 Oct 2009 22:53:40 +0200
Subject: i2c: Move misc devices documentation

Some times ago the eeprom and max6875 drivers moved to
drivers/misc/eeprom, but their documentation did not follow. It's
finally time to get rid of Documentation/i2c/chips.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Ben Gardner <gardner.ben@gmail.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
---
 Documentation/i2c/chips/eeprom     |  96 ---------------------------------
 Documentation/i2c/chips/max6875    | 108 -------------------------------------
 Documentation/misc-devices/eeprom  |  96 +++++++++++++++++++++++++++++++++
 Documentation/misc-devices/max6875 | 108 +++++++++++++++++++++++++++++++++++++
 4 files changed, 204 insertions(+), 204 deletions(-)
 delete mode 100644 Documentation/i2c/chips/eeprom
 delete mode 100644 Documentation/i2c/chips/max6875
 create mode 100644 Documentation/misc-devices/eeprom
 create mode 100644 Documentation/misc-devices/max6875

(limited to 'Documentation')

diff --git a/Documentation/i2c/chips/eeprom b/Documentation/i2c/chips/eeprom
deleted file mode 100644
index f7e8104..0000000
--- a/Documentation/i2c/chips/eeprom
+++ /dev/null
@@ -1,96 +0,0 @@
-Kernel driver eeprom
-====================
-
-Supported chips:
-  * Any EEPROM chip in the designated address range
-    Prefix: 'eeprom'
-    Addresses scanned: I2C 0x50 - 0x57
-    Datasheets: Publicly available from:
-                Atmel (www.atmel.com),
-                Catalyst (www.catsemi.com),
-                Fairchild (www.fairchildsemi.com),
-                Microchip (www.microchip.com),
-                Philips (www.semiconductor.philips.com),
-                Rohm (www.rohm.com),
-                ST (www.st.com),
-                Xicor (www.xicor.com),
-                and others.
-
-        Chip     Size (bits)    Address
-        24C01     1K            0x50 (shadows at 0x51 - 0x57)
-        24C01A    1K            0x50 - 0x57 (Typical device on DIMMs)
-        24C02     2K            0x50 - 0x57
-        24C04     4K            0x50, 0x52, 0x54, 0x56
-                                (additional data at 0x51, 0x53, 0x55, 0x57)
-        24C08     8K            0x50, 0x54 (additional data at 0x51, 0x52,
-                                0x53, 0x55, 0x56, 0x57)
-        24C16    16K            0x50 (additional data at 0x51 - 0x57)
-        Sony      2K            0x57
-
-        Atmel     34C02B  2K    0x50 - 0x57, SW write protect at 0x30-37
-        Catalyst  34FC02  2K    0x50 - 0x57, SW write protect at 0x30-37
-        Catalyst  34RC02  2K    0x50 - 0x57, SW write protect at 0x30-37
-        Fairchild 34W02   2K    0x50 - 0x57, SW write protect at 0x30-37
-        Microchip 24AA52  2K    0x50 - 0x57, SW write protect at 0x30-37
-        ST        M34C02  2K    0x50 - 0x57, SW write protect at 0x30-37
-
-
-Authors:
-        Frodo Looijaard <frodol@dds.nl>,
-        Philip Edelbrock <phil@netroedge.com>,
-        Jean Delvare <khali@linux-fr.org>,
-        Greg Kroah-Hartman <greg@kroah.com>,
-        IBM Corp.
-
-Description
------------
-
-This is a simple EEPROM module meant to enable reading the first 256 bytes
-of an EEPROM (on a SDRAM DIMM for example). However, it will access serial
-EEPROMs on any I2C adapter. The supported devices are generically called
-24Cxx, and are listed above; however the numbering for these
-industry-standard devices may vary by manufacturer.
-
-This module was a programming exercise to get used to the new project
-organization laid out by Frodo, but it should be at least completely
-effective for decoding the contents of EEPROMs on DIMMs.
-
-DIMMS will typically contain a 24C01A or 24C02, or the 34C02 variants.
-The other devices will not be found on a DIMM because they respond to more
-than one address.
-
-DDC Monitors may contain any device. Often a 24C01, which responds to all 8
-addresses, is found.
-
-Recent Sony Vaio laptops have an EEPROM at 0x57. We couldn't get the
-specification, so it is guess work and far from being complete.
-
-The Microchip 24AA52/24LCS52, ST M34C02, and others support an additional
-software write protect register at 0x30 - 0x37 (0x20 less than the memory
-location). The chip responds to "write quick" detection at this address but
-does not respond to byte reads. If this register is present, the lower 128
-bytes of the memory array are not write protected. Any byte data write to
-this address will write protect the memory array permanently, and the
-device will no longer respond at the 0x30-37 address. The eeprom driver
-does not support this register.
-
-Lacking functionality:
-
-* Full support for larger devices (24C04, 24C08, 24C16). These are not
-typically found on a PC. These devices will appear as separate devices at
-multiple addresses.
-
-* Support for really large devices (24C32, 24C64, 24C128, 24C256, 24C512).
-These devices require two-byte address fields and are not supported.
-
-* Enable Writing. Again, no technical reason why not, but making it easy
-to change the contents of the EEPROMs (on DIMMs anyway) also makes it easy
-to disable the DIMMs (potentially preventing the computer from booting)
-until the values are restored somehow.
-
-Use:
-
-After inserting the module (and any other required SMBus/i2c modules), you
-should have some EEPROM directories in /sys/bus/i2c/devices/* of names such
-as "0-0050". Inside each of these is a series of files, the eeprom file
-contains the binary data from EEPROM.
diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875
deleted file mode 100644
index 10ca43c..0000000
--- a/Documentation/i2c/chips/max6875
+++ /dev/null
@@ -1,108 +0,0 @@
-Kernel driver max6875
-=====================
-
-Supported chips:
-  * Maxim MAX6874, MAX6875
-    Prefix: 'max6875'
-    Addresses scanned: None (see below)
-    Datasheet:
-        http://pdfserv.maxim-ic.com/en/ds/MAX6874-MAX6875.pdf
-
-Author: Ben Gardner <bgardner@wabtec.com>
-
-
-Description
------------
-
-The Maxim MAX6875 is an EEPROM-programmable power-supply sequencer/supervisor.
-It provides timed outputs that can be used as a watchdog, if properly wired.
-It also provides 512 bytes of user EEPROM.
-
-At reset, the MAX6875 reads the configuration EEPROM into its configuration
-registers.  The chip then begins to operate according to the values in the
-registers.
-
-The Maxim MAX6874 is a similar, mostly compatible device, with more intputs
-and outputs:
-             vin     gpi    vout
-MAX6874        6       4       8
-MAX6875        4       3       5
-
-See the datasheet for more information.
-
-
-Sysfs entries
--------------
-
-eeprom        - 512 bytes of user-defined EEPROM space.
-
-
-General Remarks
----------------
-
-Valid addresses for the MAX6875 are 0x50 and 0x52.
-Valid addresses for the MAX6874 are 0x50, 0x52, 0x54 and 0x56.
-The driver does not probe any address, so you must force the address.
-
-Example:
-$ modprobe max6875 force=0,0x50
-
-The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple
-addresses.  For example, for address 0x50, it also reserves 0x51.
-The even-address instance is called 'max6875', the odd one is 'dummy'.
-
-
-Programming the chip using i2c-dev
-----------------------------------
-
-Use the i2c-dev interface to access and program the chips.
-Reads and writes are performed differently depending on the address range.
-
-The configuration registers are at addresses 0x00 - 0x45.
-Use i2c_smbus_write_byte_data() to write a register and
-i2c_smbus_read_byte_data() to read a register.
-The command is the register number.
-
-Examples:
-To write a 1 to register 0x45:
-  i2c_smbus_write_byte_data(fd, 0x45, 1);
-
-To read register 0x45:
-  value = i2c_smbus_read_byte_data(fd, 0x45);
-
-
-The configuration EEPROM is at addresses 0x8000 - 0x8045.
-The user EEPROM is at addresses 0x8100 - 0x82ff.
-
-Use i2c_smbus_write_word_data() to write a byte to EEPROM.
-
-The command is the upper byte of the address: 0x80, 0x81, or 0x82.
-The data word is the lower part of the address or'd with data << 8.
-  cmd = address >> 8;
-  val = (address & 0xff) | (data << 8);
-
-Example:
-To write 0x5a to address 0x8003:
-  i2c_smbus_write_word_data(fd, 0x80, 0x5a03);
-
-
-Reading data from the EEPROM is a little more complicated.
-Use i2c_smbus_write_byte_data() to set the read address and then
-i2c_smbus_read_byte() or i2c_smbus_read_i2c_block_data() to read the data.
-
-Example:
-To read data starting at offset 0x8100, first set the address:
-  i2c_smbus_write_byte_data(fd, 0x81, 0x00);
-
-And then read the data
-  value = i2c_smbus_read_byte(fd);
-
-  or
-
-  count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer);
-
-The block read should read 16 bytes.
-0x84 is the block read command.
-
-See the datasheet for more details.
-
diff --git a/Documentation/misc-devices/eeprom b/Documentation/misc-devices/eeprom
new file mode 100644
index 0000000..f7e8104
--- /dev/null
+++ b/Documentation/misc-devices/eeprom
@@ -0,0 +1,96 @@
+Kernel driver eeprom
+====================
+
+Supported chips:
+  * Any EEPROM chip in the designated address range
+    Prefix: 'eeprom'
+    Addresses scanned: I2C 0x50 - 0x57
+    Datasheets: Publicly available from:
+                Atmel (www.atmel.com),
+                Catalyst (www.catsemi.com),
+                Fairchild (www.fairchildsemi.com),
+                Microchip (www.microchip.com),
+                Philips (www.semiconductor.philips.com),
+                Rohm (www.rohm.com),
+                ST (www.st.com),
+                Xicor (www.xicor.com),
+                and others.
+
+        Chip     Size (bits)    Address
+        24C01     1K            0x50 (shadows at 0x51 - 0x57)
+        24C01A    1K            0x50 - 0x57 (Typical device on DIMMs)
+        24C02     2K            0x50 - 0x57
+        24C04     4K            0x50, 0x52, 0x54, 0x56
+                                (additional data at 0x51, 0x53, 0x55, 0x57)
+        24C08     8K            0x50, 0x54 (additional data at 0x51, 0x52,
+                                0x53, 0x55, 0x56, 0x57)
+        24C16    16K            0x50 (additional data at 0x51 - 0x57)
+        Sony      2K            0x57
+
+        Atmel     34C02B  2K    0x50 - 0x57, SW write protect at 0x30-37
+        Catalyst  34FC02  2K    0x50 - 0x57, SW write protect at 0x30-37
+        Catalyst  34RC02  2K    0x50 - 0x57, SW write protect at 0x30-37
+        Fairchild 34W02   2K    0x50 - 0x57, SW write protect at 0x30-37
+        Microchip 24AA52  2K    0x50 - 0x57, SW write protect at 0x30-37
+        ST        M34C02  2K    0x50 - 0x57, SW write protect at 0x30-37
+
+
+Authors:
+        Frodo Looijaard <frodol@dds.nl>,
+        Philip Edelbrock <phil@netroedge.com>,
+        Jean Delvare <khali@linux-fr.org>,
+        Greg Kroah-Hartman <greg@kroah.com>,
+        IBM Corp.
+
+Description
+-----------
+
+This is a simple EEPROM module meant to enable reading the first 256 bytes
+of an EEPROM (on a SDRAM DIMM for example). However, it will access serial
+EEPROMs on any I2C adapter. The supported devices are generically called
+24Cxx, and are listed above; however the numbering for these
+industry-standard devices may vary by manufacturer.
+
+This module was a programming exercise to get used to the new project
+organization laid out by Frodo, but it should be at least completely
+effective for decoding the contents of EEPROMs on DIMMs.
+
+DIMMS will typically contain a 24C01A or 24C02, or the 34C02 variants.
+The other devices will not be found on a DIMM because they respond to more
+than one address.
+
+DDC Monitors may contain any device. Often a 24C01, which responds to all 8
+addresses, is found.
+
+Recent Sony Vaio laptops have an EEPROM at 0x57. We couldn't get the
+specification, so it is guess work and far from being complete.
+
+The Microchip 24AA52/24LCS52, ST M34C02, and others support an additional
+software write protect register at 0x30 - 0x37 (0x20 less than the memory
+location). The chip responds to "write quick" detection at this address but
+does not respond to byte reads. If this register is present, the lower 128
+bytes of the memory array are not write protected. Any byte data write to
+this address will write protect the memory array permanently, and the
+device will no longer respond at the 0x30-37 address. The eeprom driver
+does not support this register.
+
+Lacking functionality:
+
+* Full support for larger devices (24C04, 24C08, 24C16). These are not
+typically found on a PC. These devices will appear as separate devices at
+multiple addresses.
+
+* Support for really large devices (24C32, 24C64, 24C128, 24C256, 24C512).
+These devices require two-byte address fields and are not supported.
+
+* Enable Writing. Again, no technical reason why not, but making it easy
+to change the contents of the EEPROMs (on DIMMs anyway) also makes it easy
+to disable the DIMMs (potentially preventing the computer from booting)
+until the values are restored somehow.
+
+Use:
+
+After inserting the module (and any other required SMBus/i2c modules), you
+should have some EEPROM directories in /sys/bus/i2c/devices/* of names such
+as "0-0050". Inside each of these is a series of files, the eeprom file
+contains the binary data from EEPROM.
diff --git a/Documentation/misc-devices/max6875 b/Documentation/misc-devices/max6875
new file mode 100644
index 0000000..10ca43c
--- /dev/null
+++ b/Documentation/misc-devices/max6875
@@ -0,0 +1,108 @@
+Kernel driver max6875
+=====================
+
+Supported chips:
+  * Maxim MAX6874, MAX6875
+    Prefix: 'max6875'
+    Addresses scanned: None (see below)
+    Datasheet:
+        http://pdfserv.maxim-ic.com/en/ds/MAX6874-MAX6875.pdf
+
+Author: Ben Gardner <bgardner@wabtec.com>
+
+
+Description
+-----------
+
+The Maxim MAX6875 is an EEPROM-programmable power-supply sequencer/supervisor.
+It provides timed outputs that can be used as a watchdog, if properly wired.
+It also provides 512 bytes of user EEPROM.
+
+At reset, the MAX6875 reads the configuration EEPROM into its configuration
+registers.  The chip then begins to operate according to the values in the
+registers.
+
+The Maxim MAX6874 is a similar, mostly compatible device, with more intputs
+and outputs:
+             vin     gpi    vout
+MAX6874        6       4       8
+MAX6875        4       3       5
+
+See the datasheet for more information.
+
+
+Sysfs entries
+-------------
+
+eeprom        - 512 bytes of user-defined EEPROM space.
+
+
+General Remarks
+---------------
+
+Valid addresses for the MAX6875 are 0x50 and 0x52.
+Valid addresses for the MAX6874 are 0x50, 0x52, 0x54 and 0x56.
+The driver does not probe any address, so you must force the address.
+
+Example:
+$ modprobe max6875 force=0,0x50
+
+The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple
+addresses.  For example, for address 0x50, it also reserves 0x51.
+The even-address instance is called 'max6875', the odd one is 'dummy'.
+
+
+Programming the chip using i2c-dev
+----------------------------------
+
+Use the i2c-dev interface to access and program the chips.
+Reads and writes are performed differently depending on the address range.
+
+The configuration registers are at addresses 0x00 - 0x45.
+Use i2c_smbus_write_byte_data() to write a register and
+i2c_smbus_read_byte_data() to read a register.
+The command is the register number.
+
+Examples:
+To write a 1 to register 0x45:
+  i2c_smbus_write_byte_data(fd, 0x45, 1);
+
+To read register 0x45:
+  value = i2c_smbus_read_byte_data(fd, 0x45);
+
+
+The configuration EEPROM is at addresses 0x8000 - 0x8045.
+The user EEPROM is at addresses 0x8100 - 0x82ff.
+
+Use i2c_smbus_write_word_data() to write a byte to EEPROM.
+
+The command is the upper byte of the address: 0x80, 0x81, or 0x82.
+The data word is the lower part of the address or'd with data << 8.
+  cmd = address >> 8;
+  val = (address & 0xff) | (data << 8);
+
+Example:
+To write 0x5a to address 0x8003:
+  i2c_smbus_write_word_data(fd, 0x80, 0x5a03);
+
+
+Reading data from the EEPROM is a little more complicated.
+Use i2c_smbus_write_byte_data() to set the read address and then
+i2c_smbus_read_byte() or i2c_smbus_read_i2c_block_data() to read the data.
+
+Example:
+To read data starting at offset 0x8100, first set the address:
+  i2c_smbus_write_byte_data(fd, 0x81, 0x00);
+
+And then read the data
+  value = i2c_smbus_read_byte(fd);
+
+  or
+
+  count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer);
+
+The block read should read 16 bytes.
+0x84 is the block read command.
+
+See the datasheet for more details.
+
-- 
cgit v1.1


From b835d7fbd54c42d7b9abb5e8a64f32690ebfad43 Mon Sep 17 00:00:00 2001
From: Jean Delvare <khali@linux-fr.org>
Date: Sun, 4 Oct 2009 22:53:41 +0200
Subject: max6875: Discard obsolete detect method

There is no point in implementing a detect callback for the MAX6875, as
this device can't be detected. It was there solely to handle "force"
module parameters to instantiate devices, but now we have a better sysfs
interface that can do the same.

So we can get rid of the ugly module parameters and the detect callback.
This basically divides the binary module size by 2.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Ben Gardner <gardner.ben@gmail.com>
---
 Documentation/misc-devices/max6875 | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/misc-devices/max6875 b/Documentation/misc-devices/max6875
index 10ca43c..1e89ee3 100644
--- a/Documentation/misc-devices/max6875
+++ b/Documentation/misc-devices/max6875
@@ -42,10 +42,12 @@ General Remarks
 
 Valid addresses for the MAX6875 are 0x50 and 0x52.
 Valid addresses for the MAX6874 are 0x50, 0x52, 0x54 and 0x56.
-The driver does not probe any address, so you must force the address.
+The driver does not probe any address, so you explicitly instantiate the
+devices.
 
 Example:
-$ modprobe max6875 force=0,0x50
+$ modprobe max6875
+$ echo max6875 0x50 > /sys/bus/i2c/devices/i2c-0/new_device
 
 The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple
 addresses.  For example, for address 0x50, it also reserves 0x51.
-- 
cgit v1.1


From 0314b020c49c1d6cd182d2b89775bfa6686660db Mon Sep 17 00:00:00 2001
From: Jean Delvare <khali@linux-fr.org>
Date: Sun, 4 Oct 2009 22:53:41 +0200
Subject: ds2482: Discard obsolete detect method

There is no point in implementing a detect callback for the DS2482, as
this device can't be detected. It was there solely to handle "force"
module parameters to instantiate devices, but now we have a better sysfs
interface that can do the same.

So we can get rid of the ugly module parameters and the detect callback.
This shrinks the binary module size by 21%.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Ben Gardner <gardner.ben@gmail.com>
---
 Documentation/w1/masters/ds2482 | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/w1/masters/ds2482 b/Documentation/w1/masters/ds2482
index 9210d6f..299b91c 100644
--- a/Documentation/w1/masters/ds2482
+++ b/Documentation/w1/masters/ds2482
@@ -24,8 +24,8 @@ General Remarks
 
 Valid addresses are 0x18, 0x19, 0x1a, and 0x1b.
 However, the device cannot be detected without writing to the i2c bus, so no
-detection is done.
-You should force the device address.
+detection is done. You should instantiate the device explicitly.
 
-$ modprobe ds2482 force=0,0x18
+$ modprobe ds2482
+$ echo ds2482 0x18 > /sys/bus/i2c/devices/i2c-0/new_device
 
-- 
cgit v1.1


From 2d2a7cff1b63cde1e2d981eea8ae9e69ae9ce96d Mon Sep 17 00:00:00 2001
From: Jean Delvare <khali@linux-fr.org>
Date: Sun, 4 Oct 2009 22:53:42 +0200
Subject: ltc4215/ltc4245: Discard obsolete detect methods

There is no point in implementing a detect callback for the LTC4215
and LTC4245, as these devices can't be detected. It was there solely
to handle "force" module parameters to instantiate devices, but now
we have a better sysfs interface that can do the same.

So we can get rid of the ugly module parameters and the detect
callbacks. This shrinks the binary module sizes by 36% and 46%,
respectively.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Ira W. Snyder <iws@ovro.caltech.edu>
---
 Documentation/hwmon/ltc4215 | 7 ++++---
 Documentation/hwmon/ltc4245 | 7 ++++---
 2 files changed, 8 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/ltc4215 b/Documentation/hwmon/ltc4215
index 2e6a21e..c196a18 100644
--- a/Documentation/hwmon/ltc4215
+++ b/Documentation/hwmon/ltc4215
@@ -22,12 +22,13 @@ Usage Notes
 -----------
 
 This driver does not probe for LTC4215 devices, due to the fact that some
-of the possible addresses are unfriendly to probing. You will need to use
-the "force" parameter to tell the driver where to find the device.
+of the possible addresses are unfriendly to probing. You will have to
+instantiate the devices explicitly.
 
 Example: the following will load the driver for an LTC4215 at address 0x44
 on I2C bus #0:
-$ modprobe ltc4215 force=0,0x44
+$ modprobe ltc4215
+$ echo ltc4215 0x44 > /sys/bus/i2c/devices/i2c-0/new_device
 
 
 Sysfs entries
diff --git a/Documentation/hwmon/ltc4245 b/Documentation/hwmon/ltc4245
index bae7a3a..02838a4 100644
--- a/Documentation/hwmon/ltc4245
+++ b/Documentation/hwmon/ltc4245
@@ -23,12 +23,13 @@ Usage Notes
 -----------
 
 This driver does not probe for LTC4245 devices, due to the fact that some
-of the possible addresses are unfriendly to probing. You will need to use
-the "force" parameter to tell the driver where to find the device.
+of the possible addresses are unfriendly to probing. You will have to
+instantiate the devices explicitly.
 
 Example: the following will load the driver for an LTC4245 at address 0x23
 on I2C bus #1:
-$ modprobe ltc4245 force=1,0x23
+$ modprobe ltc4245
+$ echo ltc4245 0x23 > /sys/bus/i2c/devices/i2c-1/new_device
 
 
 Sysfs entries
-- 
cgit v1.1


From 03f1805ad0ce5aae02bfe40c29b230abb63179ac Mon Sep 17 00:00:00 2001
From: Jean Delvare <khali@linux-fr.org>
Date: Sun, 4 Oct 2009 22:53:45 +0200
Subject: i2c: Minor documentation update

The sysfs path to i2c adapters has changed recently, update the
documentation to reflect that change.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
---
 Documentation/i2c/instantiating-devices | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices
index c740b7b..e894902 100644
--- a/Documentation/i2c/instantiating-devices
+++ b/Documentation/i2c/instantiating-devices
@@ -188,7 +188,7 @@ segment, the address is sufficient to uniquely identify the device to be
 deleted.
 
 Example:
-# echo eeprom 0x50 > /sys/class/i2c-adapter/i2c-3/new_device
+# echo eeprom 0x50 > /sys/bus/i2c/devices/i2c-3/new_device
 
 While this interface should only be used when in-kernel device declaration
 can't be done, there is a variety of cases where it can be helpful:
-- 
cgit v1.1


From 896a7cf8d846a9e86fb823be16f4f14ffeb7f074 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 2 Oct 2009 20:24:59 +0000
Subject: pktgen: Fix multiqueue handling

It is not currently possible to instruct pktgen to use one selected tx queue.

When Robert added multiqueue support in commit 45b270f8, he added
an interval (queue_map_min, queue_map_max), and his code doesnt take
into account the case of min = max, to select one tx queue exactly.

I suspect a high performance setup on a eight txqueue device wants
to use exactly eight cpus, and assign one tx queue to each sender.

This patchs makes pktgen select the right tx queue, not the first one.

Also updates Documentation to reflect Robert changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/networking/pktgen.txt | 8 ++++++++
 1 file changed, 8 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
index c6cf4a3..61bb645 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -90,6 +90,11 @@ Examples:
  pgset "dstmac 00:00:00:00:00:00"    sets MAC destination address
  pgset "srcmac 00:00:00:00:00:00"    sets MAC source address
 
+ pgset "queue_map_min 0" Sets the min value of tx queue interval
+ pgset "queue_map_max 7" Sets the max value of tx queue interval, for multiqueue devices
+                         To select queue 1 of a given device,
+                         use queue_map_min=1 and queue_map_max=1
+
  pgset "src_mac_count 1" Sets the number of MACs we'll range through.  
                          The 'minimum' MAC is what you set with srcmac.
 
@@ -101,6 +106,9 @@ Examples:
                               IPDST_RND, UDPSRC_RND,
                               UDPDST_RND, MACSRC_RND, MACDST_RND 
                               MPLS_RND, VID_RND, SVID_RND
+                              QUEUE_MAP_RND # queue map random
+                              QUEUE_MAP_CPU # queue map mirrors smp_processor_id()
+
 
  pgset "udp_src_min 9"   set UDP source port min, If < udp_src_max, then
                          cycle through the port range.
-- 
cgit v1.1


From f1af9f58546e2d98ef078fa30b2ef80a9042131e Mon Sep 17 00:00:00 2001
From: Tilman Schmidt <tilman@imap.cc>
Date: Tue, 6 Oct 2009 12:18:00 +0000
Subject: Documentation: expand isdn/INTERFACE.CAPI document

- Note that send_message() may be called in interrupt context.
- Describe the storage of CAPI messages and payload data in SKBs.
- Add more details to the description of the _cmsg structure.
- Describe kernelcapi debugging output.

Impact: documentation
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 Documentation/isdn/INTERFACE.CAPI | 83 +++++++++++++++++++++++++++++++--------
 1 file changed, 67 insertions(+), 16 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/isdn/INTERFACE.CAPI b/Documentation/isdn/INTERFACE.CAPI
index 686e107..5fe8de5 100644
--- a/Documentation/isdn/INTERFACE.CAPI
+++ b/Documentation/isdn/INTERFACE.CAPI
@@ -60,10 +60,9 @@ open() operation on regular files or character devices.
 
 After a successful return from register_appl(), CAPI messages from the
 application may be passed to the driver for the device via calls to the
-send_message() callback function. The CAPI message to send is stored in the
-data portion of an skb. Conversely, the driver may call Kernel CAPI's
-capi_ctr_handle_message() function to pass a received CAPI message to Kernel
-CAPI for forwarding to an application, specifying its ApplID.
+send_message() callback function. Conversely, the driver may call Kernel
+CAPI's capi_ctr_handle_message() function to pass a received CAPI message to
+Kernel CAPI for forwarding to an application, specifying its ApplID.
 
 Deregistration requests (CAPI operation CAPI_RELEASE) from applications are
 forwarded as calls to the release_appl() callback function, passing the same
@@ -142,6 +141,7 @@ u16  (*send_message)(struct capi_ctr *ctrlr, struct sk_buff *skb)
 	to accepting or queueing the message. Errors occurring during the
 	actual processing of the message should be signaled with an
 	appropriate reply message.
+	May be called in process or interrupt context.
 	Calls to this function are not serialized by Kernel CAPI, ie. it must
 	be prepared to be re-entered.
 
@@ -154,7 +154,8 @@ read_proc_t *ctr_read_proc
 	system entry, /proc/capi/controllers/<n>; will be called with a
 	pointer to the device's capi_ctr structure as the last (data) argument
 
-Note: Callback functions are never called in interrupt context.
+Note: Callback functions except send_message() are never called in interrupt
+context.
 
 - to be filled in before calling capi_ctr_ready():
 
@@ -171,14 +172,40 @@ u8 serial[CAPI_SERIAL_LEN]
 	value to return for CAPI_GET_SERIAL
 
 
-4.3 The _cmsg Structure
+4.3 SKBs
+
+CAPI messages are passed between Kernel CAPI and the driver via send_message()
+and capi_ctr_handle_message(), stored in the data portion of a socket buffer
+(skb).  Each skb contains a single CAPI message coded according to the CAPI 2.0
+standard.
+
+For the data transfer messages, DATA_B3_REQ and DATA_B3_IND, the actual
+payload data immediately follows the CAPI message itself within the same skb.
+The Data and Data64 parameters are not used for processing. The Data64
+parameter may be omitted by setting the length field of the CAPI message to 22
+instead of 30.
+
+
+4.4 The _cmsg Structure
 
 (declared in <linux/isdn/capiutil.h>)
 
 The _cmsg structure stores the contents of a CAPI 2.0 message in an easily
-accessible form. It contains members for all possible CAPI 2.0 parameters, of
-which only those appearing in the message type currently being processed are
-actually used. Unused members should be set to zero.
+accessible form. It contains members for all possible CAPI 2.0 parameters,
+including subparameters of the Additional Info and B Protocol structured
+parameters, with the following exceptions:
+
+* second Calling party number (CONNECT_IND)
+
+* Data64 (DATA_B3_REQ and DATA_B3_IND)
+
+* Sending complete (subparameter of Additional Info, CONNECT_REQ and INFO_REQ)
+
+* Global Configuration (subparameter of B Protocol, CONNECT_REQ, CONNECT_RESP
+  and SELECT_B_PROTOCOL_REQ)
+
+Only those parameters appearing in the message type currently being processed
+are actually used. Unused members should be set to zero.
 
 Members are named after the CAPI 2.0 standard names of the parameters they
 represent. See <linux/isdn/capiutil.h> for the exact spelling. Member data
@@ -190,18 +217,19 @@ u16         for CAPI parameters of type 'word'
 
 u32         for CAPI parameters of type 'dword'
 
-_cstruct    for CAPI parameters of type 'struct' not containing any
-	    variably-sized (struct) subparameters (eg. 'Called Party Number')
+_cstruct    for CAPI parameters of type 'struct'
 	    The member is a pointer to a buffer containing the parameter in
 	    CAPI encoding (length + content). It may also be NULL, which will
 	    be taken to represent an empty (zero length) parameter.
+	    Subparameters are stored in encoded form within the content part.
 
-_cmstruct   for CAPI parameters of type 'struct' containing 'struct'
-	    subparameters ('Additional Info' and 'B Protocol')
+_cmstruct   alternative representation for CAPI parameters of type 'struct'
+	    (used only for the 'Additional Info' and 'B Protocol' parameters)
 	    The representation is a single byte containing one of the values:
-	    CAPI_DEFAULT: the parameter is empty
-	    CAPI_COMPOSE: the values of the subparameters are stored
-	    individually in the corresponding _cmsg structure members
+	    CAPI_DEFAULT: The parameter is empty/absent.
+	    CAPI_COMPOSE: The parameter is present.
+	    Subparameter values are stored individually in the corresponding
+	    _cmsg structure members.
 
 Functions capi_cmsg2message() and capi_message2cmsg() are provided to convert
 messages between their transport encoding described in the CAPI 2.0 standard
@@ -297,3 +325,26 @@ char *capi_cmd2str(u8 Command, u8 Subcommand)
 	be NULL if the command/subcommand is not one of those defined in the
 	CAPI 2.0 standard.
 
+
+7. Debugging
+
+The module kernelcapi has a module parameter showcapimsgs controlling some
+debugging output produced by the module. It can only be set when the module is
+loaded, via a parameter "showcapimsgs=<n>" to the modprobe command, either on
+the command line or in the configuration file.
+
+If the lowest bit of showcapimsgs is set, kernelcapi logs controller and
+application up and down events.
+
+In addition, every registered CAPI controller has an associated traceflag
+parameter controlling how CAPI messages sent from and to tha controller are
+logged. The traceflag parameter is initialized with the value of the
+showcapimsgs parameter when the controller is registered, but can later be
+changed via the MANUFACTURER_REQ command KCAPI_CMD_TRACE.
+
+If the value of traceflag is non-zero, CAPI messages are logged.
+DATA_B3 messages are only logged if the value of traceflag is > 2.
+
+If the lowest bit of traceflag is set, only the command/subcommand and message
+length are logged. Otherwise, kernelcapi logs a readable representation of
+the entire message.
-- 
cgit v1.1


From aa07a99412f56ad56faecbaa683f3bc0ae99abc2 Mon Sep 17 00:00:00 2001
From: Bart Van Assche <bart.vanassche@gmail.com>
Date: Wed, 7 Oct 2009 15:35:55 -0700
Subject: IB: Fix typo in udev rule documentation

The proper syntax for udev rules is KERNEL==... instead of KERNEL=...

Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Reported-by: Lukasz Jurewicz <lukasz.jurewicz@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
 Documentation/infiniband/user_mad.txt   | 4 ++--
 Documentation/infiniband/user_verbs.txt | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/infiniband/user_mad.txt b/Documentation/infiniband/user_mad.txt
index 744687d..8a36695 100644
--- a/Documentation/infiniband/user_mad.txt
+++ b/Documentation/infiniband/user_mad.txt
@@ -128,8 +128,8 @@ Setting IsSM Capability Bit
   To create the appropriate character device files automatically with
   udev, a rule like
 
-    KERNEL="umad*", NAME="infiniband/%k"
-    KERNEL="issm*", NAME="infiniband/%k"
+    KERNEL=="umad*", NAME="infiniband/%k"
+    KERNEL=="issm*", NAME="infiniband/%k"
 
   can be used.  This will create device nodes named
 
diff --git a/Documentation/infiniband/user_verbs.txt b/Documentation/infiniband/user_verbs.txt
index f847501..afe3f8d 100644
--- a/Documentation/infiniband/user_verbs.txt
+++ b/Documentation/infiniband/user_verbs.txt
@@ -58,7 +58,7 @@ Memory pinning
   To create the appropriate character device files automatically with
   udev, a rule like
 
-    KERNEL="uverbs*", NAME="infiniband/%k"
+    KERNEL=="uverbs*", NAME="infiniband/%k"
 
   can be used.  This will create device nodes named
 
-- 
cgit v1.1


From c73602ad31cdcf7e6651f43d12f65b5b9b825b6f Mon Sep 17 00:00:00 2001
From: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Date: Wed, 7 Oct 2009 16:32:22 -0700
Subject: ksm: more on default values

Adjust the max_kernel_pages default to a quarter of totalram_pages,
instead of nr_free_buffer_pages() / 4: the KSM pages themselves come from
highmem, and even on a 16GB PAE machine, 4GB of KSM pages would only be
pinning 32MB of lowmem with their rmap_items, so no need for the more
obscure calculation (nor for its own special init function).

There is no way for the user to switch KSM on if CONFIG_SYSFS is not
enabled, so in that case default run to KSM_RUN_MERGE.

Update KSM Documentation and Kconfig to reflect the new defaults.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/ksm.txt | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt
index 72a22f6..262d8e6 100644
--- a/Documentation/vm/ksm.txt
+++ b/Documentation/vm/ksm.txt
@@ -52,15 +52,15 @@ The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/,
 readable by all but writable only by root:
 
 max_kernel_pages - set to maximum number of kernel pages that KSM may use
-                   e.g. "echo 2000 > /sys/kernel/mm/ksm/max_kernel_pages"
+                   e.g. "echo 100000 > /sys/kernel/mm/ksm/max_kernel_pages"
                    Value 0 imposes no limit on the kernel pages KSM may use;
                    but note that any process using MADV_MERGEABLE can cause
                    KSM to allocate these pages, unswappable until it exits.
-                   Default: 2000 (chosen for demonstration purposes)
+                   Default: quarter of memory (chosen to not pin too much)
 
 pages_to_scan    - how many present pages to scan before ksmd goes to sleep
-                   e.g. "echo 200 > /sys/kernel/mm/ksm/pages_to_scan"
-                   Default: 200 (chosen for demonstration purposes)
+                   e.g. "echo 100 > /sys/kernel/mm/ksm/pages_to_scan"
+                   Default: 100 (chosen for demonstration purposes)
 
 sleep_millisecs  - how many milliseconds ksmd should sleep before next scan
                    e.g. "echo 20 > /sys/kernel/mm/ksm/sleep_millisecs"
@@ -70,7 +70,8 @@ run              - set 0 to stop ksmd from running but keep merged pages,
                    set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run",
                    set 2 to stop ksmd and unmerge all pages currently merged,
                          but leave mergeable areas registered for next run
-                   Default: 1 (for immediate use by apps which register)
+                   Default: 0 (must be changed to 1 to activate KSM,
+                               except if CONFIG_SYSFS is disabled)
 
 The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/:
 
@@ -86,4 +87,4 @@ pages_volatile embraces several different kinds of activity, but a high
 proportion there would also indicate poor use of madvise MADV_MERGEABLE.
 
 Izik Eidus,
-Hugh Dickins, 30 July 2009
+Hugh Dickins, 24 Sept 2009
-- 
cgit v1.1


From 7823da36ce8e42d66941887eb922768d259763f2 Mon Sep 17 00:00:00 2001
From: Paul Menage <menage@google.com>
Date: Wed, 7 Oct 2009 16:32:26 -0700
Subject: cgroups: update documentation of cgroups tasks and procs files

Update documentation of cgroups tasks and procs files

Document the cgroup.procs file.

Clarify the semantics of the cgroup.procs and tasks files.  Although the
current cgroup.procs interface returns a sorted and uniqified list of
pids, potential future performance enhancements could result in those
properties being removed - explicitly document this aspect of the API.

There are no existing users of cgroup.procs, so compatibility isn't an
issue.  There are users of the "tasks" file, but none that would appear to
break in the event of the sorted property being broken.  The standard
"libcpuset" explicitly sorts the results of reading from the tasks file,
and "libcg" and other users don't appear to care about ordering.

Signed-off-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/cgroups/cgroups.txt | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 455d4e6..0b33bfe 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -227,7 +227,14 @@ as the path relative to the root of the cgroup file system.
 Each cgroup is represented by a directory in the cgroup file system
 containing the following files describing that cgroup:
 
- - tasks: list of tasks (by pid) attached to that cgroup
+ - tasks: list of tasks (by pid) attached to that cgroup.  This list
+   is not guaranteed to be sorted.  Writing a thread id into this file
+   moves the thread into this cgroup.
+ - cgroup.procs: list of tgids in the cgroup.  This list is not
+   guaranteed to be sorted or free of duplicate tgids, and userspace
+   should sort/uniquify the list if this property is required.
+   Writing a tgid into this file moves all threads with that tgid into
+   this cgroup.
  - notify_on_release flag: run the release agent on exit?
  - release_agent: the path to use for release notifications (this file
    exists in the top cgroup only)
@@ -374,7 +381,7 @@ Now you want to do something with this cgroup.
 
 In this directory you can find several files:
 # ls
-notify_on_release tasks
+cgroup.procs notify_on_release tasks
 (plus whatever files added by the attached subsystems)
 
 Now attach your shell to this cgroup:
-- 
cgit v1.1


From 253fb02d62571e5455eedc9e39b9d660e86a40f0 Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Wed, 7 Oct 2009 16:32:27 -0700
Subject: pagemap: export KPF_HWPOISON

This flag indicates a hardware detected memory corruption on the page.
Any future access of the page data may bring down the machine.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/page-types.c | 2 ++
 Documentation/vm/pagemap.txt  | 4 ++++
 2 files changed, 6 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index fa1a30d..87f5722 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -69,6 +69,7 @@
 #define KPF_COMPOUND_TAIL	16
 #define KPF_HUGE		17
 #define KPF_UNEVICTABLE		18
+#define KPF_HWPOISON		19
 #define KPF_NOPAGE		20
 
 /* [32-] kernel hacking assistances */
@@ -116,6 +117,7 @@ static char *page_flag_names[] = {
 	[KPF_COMPOUND_TAIL]	= "T:compound_tail",
 	[KPF_HUGE]		= "G:huge",
 	[KPF_UNEVICTABLE]	= "u:unevictable",
+	[KPF_HWPOISON]		= "X:hwpoison",
 	[KPF_NOPAGE]		= "n:nopage",
 
 	[KPF_RESERVED]		= "r:reserved",
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 600a304..2fdd84a 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -57,6 +57,7 @@ There are three components to pagemap:
     16. COMPOUND_TAIL
     16. HUGE
     18. UNEVICTABLE
+    19. HWPOISON
     20. NOPAGE
 
 Short descriptions to the page flags:
@@ -86,6 +87,9 @@ Short descriptions to the page flags:
 17. HUGE
     this is an integral part of a HugeTLB page
 
+19. HWPOISON
+    hardware detected memory corruption on this page: don't touch the data!
+
 20. NOPAGE
     no page frame exists at the requested address
 
-- 
cgit v1.1


From a1bbb5ec39042fb762e7f4bcc634da0d87834193 Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Wed, 7 Oct 2009 16:32:28 -0700
Subject: pagemap: document KPF_KSM and show it in page-types

It indicates to the system admin that processes mapping such pages may be
eating less physical memory than the reported numbers by legacy tools.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Acked-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/page-types.c | 2 ++
 Documentation/vm/pagemap.txt  | 4 ++++
 2 files changed, 6 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index 87f5722..9899fa2 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -71,6 +71,7 @@
 #define KPF_UNEVICTABLE		18
 #define KPF_HWPOISON		19
 #define KPF_NOPAGE		20
+#define KPF_KSM			21
 
 /* [32-] kernel hacking assistances */
 #define KPF_RESERVED		32
@@ -119,6 +120,7 @@ static char *page_flag_names[] = {
 	[KPF_UNEVICTABLE]	= "u:unevictable",
 	[KPF_HWPOISON]		= "X:hwpoison",
 	[KPF_NOPAGE]		= "n:nopage",
+	[KPF_KSM]		= "x:ksm",
 
 	[KPF_RESERVED]		= "r:reserved",
 	[KPF_MLOCKED]		= "m:mlocked",
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index 2fdd84a..df09b96 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -59,6 +59,7 @@ There are three components to pagemap:
     18. UNEVICTABLE
     19. HWPOISON
     20. NOPAGE
+    21. KSM
 
 Short descriptions to the page flags:
 
@@ -93,6 +94,9 @@ Short descriptions to the page flags:
 20. NOPAGE
     no page frame exists at the requested address
 
+21. KSM
+    identical memory pages dynamically shared between one or more processes
+
     [IO related page flags]
  1. ERROR     IO error occurred
  3. UPTODATE  page has up-to-date data
-- 
cgit v1.1


From 0c57effe27eb6544eb44d5fac563b7334e3bc771 Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Wed, 7 Oct 2009 16:32:28 -0700
Subject: page-types: add GPL note

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/page-types.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index 9899fa2..e46fb5b 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -2,7 +2,10 @@
  * page-types: Tool for querying page flags
  *
  * Copyright (C) 2009 Intel corporation
- * Copyright (C) 2009 Wu Fengguang <fengguang.wu@intel.com>
+ *
+ * Authors: Wu Fengguang <fengguang.wu@intel.com>
+ *
+ * Released under the General Public License (GPL).
  */
 
 #define _LARGEFILE64_SOURCE
-- 
cgit v1.1


From 31bbf66eaaaf50ba79e50ab7d3c89531b31c0614 Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Wed, 7 Oct 2009 16:32:29 -0700
Subject: page-types: introduce checked_open()

This helps merge duplicate code (now and future) and outstand the main
logic.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/page-types.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index e46fb5b..6bdcc06 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -214,6 +214,18 @@ static void fatal(const char *x, ...)
 	exit(EXIT_FAILURE);
 }
 
+int checked_open(const char *pathname, int flags)
+{
+	int fd = open(pathname, flags);
+
+	if (fd < 0) {
+		perror(pathname);
+		exit(EXIT_FAILURE);
+	}
+
+	return fd;
+}
+
 
 /*
  * page flag names
@@ -534,11 +546,7 @@ static void walk_addr_ranges(void)
 {
 	int i;
 
-	kpageflags_fd = open(PROC_KPAGEFLAGS, O_RDONLY);
-	if (kpageflags_fd < 0) {
-		perror(PROC_KPAGEFLAGS);
-		exit(EXIT_FAILURE);
-	}
+	kpageflags_fd = checked_open(PROC_KPAGEFLAGS, O_RDONLY);
 
 	if (!nr_addr_ranges)
 		add_addr_range(0, ULONG_MAX);
@@ -631,11 +639,7 @@ static void parse_pid(const char *str)
 	opt_pid = parse_number(str);
 
 	sprintf(buf, "/proc/%d/pagemap", opt_pid);
-	pagemap_fd = open(buf, O_RDONLY);
-	if (pagemap_fd < 0) {
-		perror(buf);
-		exit(EXIT_FAILURE);
-	}
+	pagemap_fd = checked_open(buf, O_RDONLY);
 
 	sprintf(buf, "/proc/%d/maps", opt_pid);
 	file = fopen(buf, "r");
-- 
cgit v1.1


From 4a1b6726feee3132905996ae4cd1c7dc2104e721 Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Wed, 7 Oct 2009 16:32:30 -0700
Subject: page-types: make standalone pagemap/kpageflags read routines

Refactor the code to be more modular and easier to reuse.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/page-types.c | 154 ++++++++++++++++++++++++------------------
 1 file changed, 89 insertions(+), 65 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index 6bdcc06..18e891a 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -161,8 +161,6 @@ static unsigned long	pg_start[MAX_VMAS];
 static unsigned long	pg_end[MAX_VMAS];
 static unsigned long	voffset;
 
-static int		pagemap_fd;
-
 #define MAX_BIT_FILTERS	64
 static int		nr_bit_filters;
 static uint64_t		opt_mask[MAX_BIT_FILTERS];
@@ -170,7 +168,7 @@ static uint64_t		opt_bits[MAX_BIT_FILTERS];
 
 static int		page_size;
 
-#define PAGES_BATCH	(64 << 10)	/* 64k pages */
+static int		pagemap_fd;
 static int		kpageflags_fd;
 
 #define HASH_SHIFT	13
@@ -226,6 +224,62 @@ int checked_open(const char *pathname, int flags)
 	return fd;
 }
 
+/*
+ * pagemap/kpageflags routines
+ */
+
+static unsigned long do_u64_read(int fd, char *name,
+				 uint64_t *buf,
+				 unsigned long index,
+				 unsigned long count)
+{
+	long bytes;
+
+	if (index > ULONG_MAX / 8)
+		fatal("index overflow: %lu\n", index);
+
+	if (lseek(fd, index * 8, SEEK_SET) < 0) {
+		perror(name);
+		exit(EXIT_FAILURE);
+	}
+
+	bytes = read(fd, buf, count * 8);
+	if (bytes < 0) {
+		perror(name);
+		exit(EXIT_FAILURE);
+	}
+	if (bytes % 8)
+		fatal("partial read: %lu bytes\n", bytes);
+
+	return bytes / 8;
+}
+
+static unsigned long kpageflags_read(uint64_t *buf,
+				     unsigned long index,
+				     unsigned long pages)
+{
+	return do_u64_read(kpageflags_fd, PROC_KPAGEFLAGS, buf, index, pages);
+}
+
+static unsigned long pagemap_read(uint64_t *buf,
+				  unsigned long index,
+				  unsigned long pages)
+{
+	return do_u64_read(pagemap_fd, "/proc/pid/pagemap", buf, index, pages);
+}
+
+static unsigned long pagemap_pfn(uint64_t val)
+{
+	unsigned long pfn;
+
+	if (val & PM_PRESENT)
+		pfn = PM_PFRAME(val);
+	else
+		pfn = 0;
+
+	return pfn;
+}
+
 
 /*
  * page flag names
@@ -432,79 +486,53 @@ static void add_page(unsigned long offset, uint64_t flags)
 	total_pages++;
 }
 
+#define KPAGEFLAGS_BATCH	(64 << 10)	/* 64k pages */
 static void walk_pfn(unsigned long index, unsigned long count)
 {
+	uint64_t buf[KPAGEFLAGS_BATCH];
 	unsigned long batch;
-	unsigned long n;
+	unsigned long pages;
 	unsigned long i;
 
-	if (index > ULONG_MAX / KPF_BYTES)
-		fatal("index overflow: %lu\n", index);
-
-	lseek(kpageflags_fd, index * KPF_BYTES, SEEK_SET);
-
 	while (count) {
-		uint64_t kpageflags_buf[KPF_BYTES * PAGES_BATCH];
-
-		batch = min_t(unsigned long, count, PAGES_BATCH);
-		n = read(kpageflags_fd, kpageflags_buf, batch * KPF_BYTES);
-		if (n == 0)
+		batch = min_t(unsigned long, count, KPAGEFLAGS_BATCH);
+		pages = kpageflags_read(buf, index, batch);
+		if (pages == 0)
 			break;
-		if (n < 0) {
-			perror(PROC_KPAGEFLAGS);
-			exit(EXIT_FAILURE);
-		}
-
-		if (n % KPF_BYTES != 0)
-			fatal("partial read: %lu bytes\n", n);
-		n = n / KPF_BYTES;
 
-		for (i = 0; i < n; i++)
-			add_page(index + i, kpageflags_buf[i]);
+		for (i = 0; i < pages; i++)
+			add_page(index + i, buf[i]);
 
-		index += batch;
-		count -= batch;
+		index += pages;
+		count -= pages;
 	}
 }
 
-
-#define PAGEMAP_BATCH	4096
-static unsigned long task_pfn(unsigned long pgoff)
+#define PAGEMAP_BATCH	(64 << 10)
+static void walk_vma(unsigned long index, unsigned long count)
 {
-	static uint64_t buf[PAGEMAP_BATCH];
-	static unsigned long start;
-	static long count;
-	uint64_t pfn;
+	uint64_t buf[PAGEMAP_BATCH];
+	unsigned long batch;
+	unsigned long pages;
+	unsigned long pfn;
+	unsigned long i;
 
-	if (pgoff < start || pgoff >= start + count) {
-		if (lseek64(pagemap_fd,
-			    (uint64_t)pgoff * PM_ENTRY_BYTES,
-			    SEEK_SET) < 0) {
-			perror("pagemap seek");
-			exit(EXIT_FAILURE);
-		}
-		count = read(pagemap_fd, buf, sizeof(buf));
-		if (count == 0)
-			return 0;
-		if (count < 0) {
-			perror("pagemap read");
-			exit(EXIT_FAILURE);
-		}
-		if (count % PM_ENTRY_BYTES) {
-			fatal("pagemap read not aligned.\n");
-			exit(EXIT_FAILURE);
-		}
-		count /= PM_ENTRY_BYTES;
-		start = pgoff;
-	}
+	while (count) {
+		batch = min_t(unsigned long, count, PAGEMAP_BATCH);
+		pages = pagemap_read(buf, index, batch);
+		if (pages == 0)
+			break;
 
-	pfn = buf[pgoff - start];
-	if (pfn & PM_PRESENT)
-		pfn = PM_PFRAME(pfn);
-	else
-		pfn = 0;
+		for (i = 0; i < pages; i++) {
+			pfn = pagemap_pfn(buf[i]);
+			voffset = index + i;
+			if (pfn)
+				walk_pfn(pfn, 1);
+		}
 
-	return pfn;
+		index += pages;
+		count -= pages;
+	}
 }
 
 static void walk_task(unsigned long index, unsigned long count)
@@ -524,11 +552,7 @@ static void walk_task(unsigned long index, unsigned long count)
 		index   = min_t(unsigned long, pg_end[i], end);
 
 		assert(voffset < index);
-		for (; voffset < index; voffset++) {
-			unsigned long pfn = task_pfn(voffset);
-			if (pfn)
-				walk_pfn(pfn, 1);
-		}
+		walk_vma(voffset, index - voffset);
 	}
 }
 
-- 
cgit v1.1


From e577ebde9fb161bdc87511763c75dfad291059e2 Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Wed, 7 Oct 2009 16:32:30 -0700
Subject: page-types: make voffset local variables

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/page-types.c | 39 +++++++++++++++++++++------------------
 1 file changed, 21 insertions(+), 18 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index 18e891a..56f33b6 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -159,7 +159,6 @@ static unsigned long	opt_size[MAX_ADDR_RANGES];
 static int		nr_vmas;
 static unsigned long	pg_start[MAX_VMAS];
 static unsigned long	pg_end[MAX_VMAS];
-static unsigned long	voffset;
 
 #define MAX_BIT_FILTERS	64
 static int		nr_bit_filters;
@@ -328,7 +327,8 @@ static char *page_flag_longname(uint64_t flags)
  * page list and summary
  */
 
-static void show_page_range(unsigned long offset, uint64_t flags)
+static void show_page_range(unsigned long voffset,
+			    unsigned long offset, uint64_t flags)
 {
 	static uint64_t      flags0;
 	static unsigned long voff;
@@ -354,7 +354,8 @@ static void show_page_range(unsigned long offset, uint64_t flags)
 	count  = 1;
 }
 
-static void show_page(unsigned long offset, uint64_t flags)
+static void show_page(unsigned long voffset,
+		      unsigned long offset, uint64_t flags)
 {
 	if (opt_pid)
 		printf("%lx\t", voffset);
@@ -435,7 +436,6 @@ static uint64_t well_known_flags(uint64_t flags)
 	return flags;
 }
 
-
 /*
  * page frame walker
  */
@@ -467,7 +467,8 @@ static int hash_slot(uint64_t flags)
 	exit(EXIT_FAILURE);
 }
 
-static void add_page(unsigned long offset, uint64_t flags)
+static void add_page(unsigned long voffset,
+		     unsigned long offset, uint64_t flags)
 {
 	flags = expand_overloaded_flags(flags);
 
@@ -478,16 +479,18 @@ static void add_page(unsigned long offset, uint64_t flags)
 		return;
 
 	if (opt_list == 1)
-		show_page_range(offset, flags);
+		show_page_range(voffset, offset, flags);
 	else if (opt_list == 2)
-		show_page(offset, flags);
+		show_page(voffset, offset, flags);
 
 	nr_pages[hash_slot(flags)]++;
 	total_pages++;
 }
 
 #define KPAGEFLAGS_BATCH	(64 << 10)	/* 64k pages */
-static void walk_pfn(unsigned long index, unsigned long count)
+static void walk_pfn(unsigned long voffset,
+		     unsigned long index,
+		     unsigned long count)
 {
 	uint64_t buf[KPAGEFLAGS_BATCH];
 	unsigned long batch;
@@ -501,7 +504,7 @@ static void walk_pfn(unsigned long index, unsigned long count)
 			break;
 
 		for (i = 0; i < pages; i++)
-			add_page(index + i, buf[i]);
+			add_page(voffset + i, index + i, buf[i]);
 
 		index += pages;
 		count -= pages;
@@ -525,9 +528,8 @@ static void walk_vma(unsigned long index, unsigned long count)
 
 		for (i = 0; i < pages; i++) {
 			pfn = pagemap_pfn(buf[i]);
-			voffset = index + i;
 			if (pfn)
-				walk_pfn(pfn, 1);
+				walk_pfn(index + i, pfn, 1);
 		}
 
 		index += pages;
@@ -537,8 +539,9 @@ static void walk_vma(unsigned long index, unsigned long count)
 
 static void walk_task(unsigned long index, unsigned long count)
 {
-	int i = 0;
 	const unsigned long end = index + count;
+	unsigned long start;
+	int i = 0;
 
 	while (index < end) {
 
@@ -548,11 +551,11 @@ static void walk_task(unsigned long index, unsigned long count)
 		if (pg_start[i] >= end)
 			return;
 
-		voffset = max_t(unsigned long, pg_start[i], index);
-		index   = min_t(unsigned long, pg_end[i], end);
+		start = max_t(unsigned long, pg_start[i], index);
+		index = min_t(unsigned long, pg_end[i], end);
 
-		assert(voffset < index);
-		walk_vma(voffset, index - voffset);
+		assert(start < index);
+		walk_vma(start, index - start);
 	}
 }
 
@@ -577,7 +580,7 @@ static void walk_addr_ranges(void)
 
 	for (i = 0; i < nr_addr_ranges; i++)
 		if (!opt_pid)
-			walk_pfn(opt_offset[i], opt_size[i]);
+			walk_pfn(0, opt_offset[i], opt_size[i]);
 		else
 			walk_task(opt_offset[i], opt_size[i]);
 
@@ -879,7 +882,7 @@ int main(int argc, char *argv[])
 	walk_addr_ranges();
 
 	if (opt_list == 1)
-		show_page_range(0, 0);  /* drain the buffer */
+		show_page_range(0, 0, 0);  /* drain the buffer */
 
 	if (opt_no_summary)
 		return 0;
-- 
cgit v1.1


From 48640d69f5f06fe951a4de065a7f374cb9ee6040 Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Wed, 7 Oct 2009 16:32:31 -0700
Subject: page-types: introduce kpageflags_flags()

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/page-types.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index 56f33b6..11b9a03 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -436,6 +436,16 @@ static uint64_t well_known_flags(uint64_t flags)
 	return flags;
 }
 
+static uint64_t kpageflags_flags(uint64_t flags)
+{
+	flags = expand_overloaded_flags(flags);
+
+	if (!opt_raw)
+		flags = well_known_flags(flags);
+
+	return flags;
+}
+
 /*
  * page frame walker
  */
@@ -470,10 +480,7 @@ static int hash_slot(uint64_t flags)
 static void add_page(unsigned long voffset,
 		     unsigned long offset, uint64_t flags)
 {
-	flags = expand_overloaded_flags(flags);
-
-	if (!opt_raw)
-		flags = well_known_flags(flags);
+	flags = kpageflags_flags(flags);
 
 	if (!bit_mask_ok(flags))
 		return;
-- 
cgit v1.1


From a54fed9f70a2765f4476e1ce3d691a2f31df258f Mon Sep 17 00:00:00 2001
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Wed, 7 Oct 2009 16:32:32 -0700
Subject: page-types: add hwpoison/unpoison feature

For hwpoison stress testing.  The debugfs mount point is assumed to be
/debug/.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/vm/page-types.c | 73 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index 11b9a03..3ec4f2a 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -170,6 +170,13 @@ static int		page_size;
 static int		pagemap_fd;
 static int		kpageflags_fd;
 
+static int		opt_hwpoison;
+static int		opt_unpoison;
+
+static char		*hwpoison_debug_fs = "/debug/hwpoison";
+static int		hwpoison_inject_fd;
+static int		hwpoison_forget_fd;
+
 #define HASH_SHIFT	13
 #define HASH_SIZE	(1 << HASH_SHIFT)
 #define HASH_MASK	(HASH_SIZE - 1)
@@ -447,6 +454,53 @@ static uint64_t kpageflags_flags(uint64_t flags)
 }
 
 /*
+ * page actions
+ */
+
+static void prepare_hwpoison_fd(void)
+{
+	char buf[100];
+
+	if (opt_hwpoison && !hwpoison_inject_fd) {
+		sprintf(buf, "%s/corrupt-pfn", hwpoison_debug_fs);
+		hwpoison_inject_fd = checked_open(buf, O_WRONLY);
+	}
+
+	if (opt_unpoison && !hwpoison_forget_fd) {
+		sprintf(buf, "%s/renew-pfn", hwpoison_debug_fs);
+		hwpoison_forget_fd = checked_open(buf, O_WRONLY);
+	}
+}
+
+static int hwpoison_page(unsigned long offset)
+{
+	char buf[100];
+	int len;
+
+	len = sprintf(buf, "0x%lx\n", offset);
+	len = write(hwpoison_inject_fd, buf, len);
+	if (len < 0) {
+		perror("hwpoison inject");
+		return len;
+	}
+	return 0;
+}
+
+static int unpoison_page(unsigned long offset)
+{
+	char buf[100];
+	int len;
+
+	len = sprintf(buf, "0x%lx\n", offset);
+	len = write(hwpoison_forget_fd, buf, len);
+	if (len < 0) {
+		perror("hwpoison forget");
+		return len;
+	}
+	return 0;
+}
+
+/*
  * page frame walker
  */
 
@@ -485,6 +539,11 @@ static void add_page(unsigned long voffset,
 	if (!bit_mask_ok(flags))
 		return;
 
+	if (opt_hwpoison)
+		hwpoison_page(offset);
+	if (opt_unpoison)
+		unpoison_page(offset);
+
 	if (opt_list == 1)
 		show_page_range(voffset, offset, flags);
 	else if (opt_list == 2)
@@ -624,6 +683,8 @@ static void usage(void)
 "            -l|--list                 Show page details in ranges\n"
 "            -L|--list-each            Show page details one by one\n"
 "            -N|--no-summary           Don't show summay info\n"
+"            -X|--hwpoison             hwpoison pages\n"
+"            -x|--unpoison             unpoison pages\n"
 "            -h|--help                 Show this usage message\n"
 "addr-spec:\n"
 "            N                         one page at offset N (unit: pages)\n"
@@ -833,6 +894,8 @@ static struct option opts[] = {
 	{ "list"      , 0, NULL, 'l' },
 	{ "list-each" , 0, NULL, 'L' },
 	{ "no-summary", 0, NULL, 'N' },
+	{ "hwpoison"  , 0, NULL, 'X' },
+	{ "unpoison"  , 0, NULL, 'x' },
 	{ "help"      , 0, NULL, 'h' },
 	{ NULL        , 0, NULL, 0 }
 };
@@ -844,7 +907,7 @@ int main(int argc, char *argv[])
 	page_size = getpagesize();
 
 	while ((c = getopt_long(argc, argv,
-				"rp:f:a:b:lLNh", opts, NULL)) != -1) {
+				"rp:f:a:b:lLNXxh", opts, NULL)) != -1) {
 		switch (c) {
 		case 'r':
 			opt_raw = 1;
@@ -870,6 +933,14 @@ int main(int argc, char *argv[])
 		case 'N':
 			opt_no_summary = 1;
 			break;
+		case 'X':
+			opt_hwpoison = 1;
+			prepare_hwpoison_fd();
+			break;
+		case 'x':
+			opt_unpoison = 1;
+			prepare_hwpoison_fd();
+			break;
 		case 'h':
 			usage();
 			exit(0);
-- 
cgit v1.1


From d0153ca35d344d9b640dc305031b0703ba3f30f0 Mon Sep 17 00:00:00 2001
From: Alok Kataria <akataria@vmware.com>
Date: Tue, 29 Sep 2009 10:25:24 -0700
Subject: x86, vmi: Mark VMI deprecated and schedule it for removal

Add text in feature-removal.txt indicating that VMI will be removed in
the 2.6.37 timeframe.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1254193238.13456.48.camel@ank32.eng.vmware.com>
[ removed a bogus Kconfig change, marked (DEPRECATED) in Kconfig ]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/feature-removal-schedule.txt | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 89a47b5..04e6c81 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -451,3 +451,33 @@ Why:	OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
 	will also allow making ALSA OSS emulation independent of
 	sound_core.  The dependency will be broken then too.
 Who:	Tejun Heo <tj@kernel.org>
+
+----------------------------
+
+What:	Support for VMware's guest paravirtuliazation technique [VMI] will be
+	dropped.
+When:	2.6.37 or earlier.
+Why:	With the recent innovations in CPU hardware acceleration technologies
+	from Intel and AMD, VMware ran a few experiments to compare these
+	techniques to guest paravirtualization technique on VMware's platform.
+	These hardware assisted virtualization techniques have outperformed the
+	performance benefits provided by VMI in most of the workloads. VMware
+	expects that these hardware features will be ubiquitous in a couple of
+	years, as a result, VMware has started a phased retirement of this
+	feature from the hypervisor. We will be removing this feature from the
+	Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
+	technical reasons (read opportunity to remove major chunk of pvops)
+	arise.
+
+	Please note that VMI has always been an optimization and non-VMI kernels
+	still work fine on VMware's platform.
+	Latest versions of VMware's product which support VMI are,
+	Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
+	releases for these products will continue supporting VMI.
+
+	For more details about VMI retirement take a look at this,
+	http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
+
+Who:	Alok N Kataria <akataria@vmware.com>
+
+----------------------------
-- 
cgit v1.1


From 6dbce5218298c0c77a740b3ffe97cb8c304e1710 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 17 Sep 2009 17:37:12 +0200
Subject: ext3: Update documentation about ext3 quota mount options

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/filesystems/ext3.txt | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 570f9bd..05d5cf1 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -123,10 +123,18 @@ resuid=n		The user ID which may use the reserved blocks.
 
 sb=n			Use alternate superblock at this location.
 
-quota
-noquota
-grpquota
-usrquota
+quota			These options are ignored by the filesystem. They
+noquota			are used only by quota tools to recognize volumes
+grpquota		where quota should be turned on. See documentation
+usrquota		in the quota-tools package for more details
+			(http://sourceforge.net/projects/linuxquota).
+
+jqfmt=<quota type>	These options tell filesystem details about quota
+usrjquota=<file>	so that quota information can be properly updated
+grpjquota=<file>	during journal replay. They replace the above
+			quota options. See documentation in the quota-tools
+			package for more details
+			(http://sourceforge.net/projects/linuxquota).
 
 bh		(*)	ext3 associates buffer heads to data pages to
 nobh			(a) cache disk block mapping information
-- 
cgit v1.1


From 54930531a00af5a1c33361a02e67dd1802110465 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai@suse.de>
Date: Sun, 11 Oct 2009 17:38:29 +0200
Subject: ALSA: hda - Fix mute sound with STAC9227/9228 codecs

On FSC laptops, the sound gets muted gradually when the volume is chnaged.
This is due to the wrong volume-knob widget setup.  The delta bit (bit 7)
shouldn't be set for these devices.

This patch adds a new quirk to set the value 0x7f to the widget 0x24
instead of 0xff.

Reference: Novell bnc#546006
	http://bugzilla.novell.com/show_bug.cgi?id=546006

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 Documentation/sound/alsa/HD-Audio-Models.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt
index a2643cf..4bf953b 100644
--- a/Documentation/sound/alsa/HD-Audio-Models.txt
+++ b/Documentation/sound/alsa/HD-Audio-Models.txt
@@ -359,6 +359,7 @@ STAC9227/9228/9229/927x
   5stack-no-fp	D965 5stack without front panel
   dell-3stack	Dell Dimension E520
   dell-bios	Fixes with Dell BIOS setup
+  volknob	Fixes with volume-knob widget 0x24
   auto		BIOS setup (default)
 
 STAC92HD71B*
-- 
cgit v1.1


From 99b830aa553668a2c433e4cbff130224a75c74bb Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel@csr.com>
Date: Mon, 12 Oct 2009 15:45:13 +0000
Subject: USB: rename Documentation/ABI/.../sysfs-class-usb_host

The usb_host class is no more.  Rename its documentation file (which
only contained WUSB specific files) to .../sysfs-class-uwb_rc-wusbhc.

Signed-off-by: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 Documentation/ABI/testing/sysfs-class-usb_host     | 25 ----------------------
 .../ABI/testing/sysfs-class-uwb_rc-wusbhc          | 25 ++++++++++++++++++++++
 2 files changed, 25 insertions(+), 25 deletions(-)
 delete mode 100644 Documentation/ABI/testing/sysfs-class-usb_host
 create mode 100644 Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-class-usb_host b/Documentation/ABI/testing/sysfs-class-usb_host
deleted file mode 100644
index 46b66ad1..0000000
--- a/Documentation/ABI/testing/sysfs-class-usb_host
+++ /dev/null
@@ -1,25 +0,0 @@
-What:           /sys/class/usb_host/usb_hostN/wusb_chid
-Date:           July 2008
-KernelVersion:  2.6.27
-Contact:        David Vrabel <david.vrabel@csr.com>
-Description:
-                Write the CHID (16 space-separated hex octets) for this host controller.
-                This starts the host controller, allowing it to accept connection from
-                WUSB devices.
-
-                Set an all zero CHID to stop the host controller.
-
-What:           /sys/class/usb_host/usb_hostN/wusb_trust_timeout
-Date:           July 2008
-KernelVersion:  2.6.27
-Contact:        David Vrabel <david.vrabel@csr.com>
-Description:
-                Devices that haven't sent a WUSB packet to the host
-                within 'wusb_trust_timeout' ms are considered to have
-                disconnected and are removed.  The default value of
-                4000 ms is the value required by the WUSB
-                specification.
-
-                Since this relates to security (specifically, the
-                lifetime of PTKs and GTKs) it should not be changed
-                from the default.
diff --git a/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc b/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc
new file mode 100644
index 0000000..4e8106f
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-uwb_rc-wusbhc
@@ -0,0 +1,25 @@
+What:           /sys/class/uwb_rc/uwbN/wusbhc/wusb_chid
+Date:           July 2008
+KernelVersion:  2.6.27
+Contact:        David Vrabel <david.vrabel@csr.com>
+Description:
+                Write the CHID (16 space-separated hex octets) for this host controller.
+                This starts the host controller, allowing it to accept connection from
+                WUSB devices.
+
+                Set an all zero CHID to stop the host controller.
+
+What:           /sys/class/uwb_rc/uwbN/wusbhc/wusb_trust_timeout
+Date:           July 2008
+KernelVersion:  2.6.27
+Contact:        David Vrabel <david.vrabel@csr.com>
+Description:
+                Devices that haven't sent a WUSB packet to the host
+                within 'wusb_trust_timeout' ms are considered to have
+                disconnected and are removed.  The default value of
+                4000 ms is the value required by the WUSB
+                specification.
+
+                Since this relates to security (specifically, the
+                lifetime of PTKs and GTKs) it should not be changed
+                from the default.
-- 
cgit v1.1


From 1243ba98e3abecd12e9e90dd390801b95ef070f2 Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Wed, 14 Oct 2009 12:43:22 -0600
Subject: Update flex_arrays.txt

The 2.6.32 merge window brought a number of changes to the flexible array
API; this patch updates the documentation to match the new state of
affairs.

Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/flexible-arrays.txt | 43 +++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 11 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/flexible-arrays.txt b/Documentation/flexible-arrays.txt
index 84eb268..cb8a3a0 100644
--- a/Documentation/flexible-arrays.txt
+++ b/Documentation/flexible-arrays.txt
@@ -1,5 +1,5 @@
 Using flexible arrays in the kernel
-Last updated for 2.6.31
+Last updated for 2.6.32
 Jonathan Corbet <corbet@lwn.net>
 
 Large contiguous memory allocations can be unreliable in the Linux kernel.
@@ -40,6 +40,13 @@ argument is passed directly to the internal memory allocation calls.  With
 the current code, using flags to ask for high memory is likely to lead to
 notably unpleasant side effects.
 
+It is also possible to define flexible arrays at compile time with:
+
+    DEFINE_FLEX_ARRAY(name, element_size, total);
+
+This macro will result in a definition of an array with the given name; the
+element size and total will be checked for validity at compile time.
+
 Storing data into a flexible array is accomplished with a call to:
 
     int flex_array_put(struct flex_array *array, unsigned int element_nr,
@@ -76,16 +83,30 @@ particular element has never been allocated.
 Note that it is possible to get back a valid pointer for an element which
 has never been stored in the array.  Memory for array elements is allocated
 one page at a time; a single allocation could provide memory for several
-adjacent elements.  The flexible array code does not know if a specific
-element has been written; it only knows if the associated memory is
-present.  So a flex_array_get() call on an element which was never stored
-in the array has the potential to return a pointer to random data.  If the
-caller does not have a separate way to know which elements were actually
-stored, it might be wise, at least, to add GFP_ZERO to the flags argument
-to ensure that all elements are zeroed.
-
-There is no way to remove a single element from the array.  It is possible,
-though, to remove all elements with a call to:
+adjacent elements.  Flexible array elements are normally initialized to the
+value FLEX_ARRAY_FREE (defined as 0x6c in <linux/poison.h>), so errors
+involving that number probably result from use of unstored array entries.
+Note that, if array elements are allocated with __GFP_ZERO, they will be
+initialized to zero and this poisoning will not happen.
+
+Individual elements in the array can be cleared with:
+
+    int flex_array_clear(struct flex_array *array, unsigned int element_nr);
+
+This function will set the given element to FLEX_ARRAY_FREE and return
+zero.  If storage for the indicated element is not allocated for the array,
+flex_array_clear() will return -EINVAL instead.  Note that clearing an
+element does not release the storage associated with it; to reduce the
+allocated size of an array, call:
+
+    int flex_array_shrink(struct flex_array *array);
+
+The return value will be the number of pages of memory actually freed.
+This function works by scanning the array for pages containing nothing but
+FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work
+if the array's pages are allocated with __GFP_ZERO.
+
+It is possible to remove all elements of an array with a call to:
 
     void flex_array_free_parts(struct flex_array *array);
 
-- 
cgit v1.1


From cdc321ff0af78e818c97d4787f62bf52bdf9db2a Mon Sep 17 00:00:00 2001
From: Eric Paris <eparis@redhat.com>
Date: Mon, 29 Jun 2009 11:13:30 -0400
Subject: inotify: deprecate the inotify kernel interface

In 2.6.33 there will be no users of the inotify interface.  Mark it for
removal as fsnotify is more generic and is easier to use.

Signed-off-by: Eric Paris <eparis@redhat.com>
---
 Documentation/feature-removal-schedule.txt | 8 ++++++++
 1 file changed, 8 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 04e6c81..bc693ff 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -418,6 +418,14 @@ When:	2.6.33
 Why:	Should be implemented in userspace, policy daemon.
 Who:	Johannes Berg <johannes@sipsolutions.net>
 
+---------------------------
+
+What:	CONFIG_INOTIFY
+When:	2.6.33
+Why:	last user (audit) will be converted to the newer more generic
+	and more easily maintained fsnotify subsystem
+Who:	Eric Paris <eparis@redhat.com>
+
 ----------------------------
 
 What:	lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
-- 
cgit v1.1


From e95646c3ec33c8ec0693992da4332a6b32eb7e31 Mon Sep 17 00:00:00 2001
From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Wed, 30 Sep 2009 11:17:21 +0200
Subject: virtio: let header files include virtio_ids.h

Rusty,

commit 3ca4f5ca73057a617f9444a91022d7127041970a
    virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.

In addition, this patch exports virtio_ids.h to userspace.

CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 Documentation/lguest/lguest.c | 1 -
 1 file changed, 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index ba9373f..098de5b 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -42,7 +42,6 @@
 #include <signal.h>
 #include "linux/lguest_launcher.h"
 #include "linux/virtio_config.h"
-#include <linux/virtio_ids.h>
 #include "linux/virtio_net.h"
 #include "linux/virtio_blk.h"
 #include "linux/virtio_console.h"
-- 
cgit v1.1


From 67b394f7f26d84edb7294cc6528ab7ca6daa2ad1 Mon Sep 17 00:00:00 2001
From: Jiri Olsa <jolsa@redhat.com>
Date: Fri, 23 Oct 2009 19:36:18 -0400
Subject: tracing: Fix comment typo and documentation example

Trivial patch to fix a documentation example and to fix a
comment.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.871719877@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 Documentation/trace/ftrace.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 957b22f..8179692 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1231,6 +1231,7 @@ something like this simple program:
 #include <sys/stat.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <string.h>
 
 #define _STR(x) #x
 #define STR(x) _STR(x)
@@ -1265,6 +1266,7 @@ const char *find_debugfs(void)
                return NULL;
        }
 
+       strcat(debugfs, "/tracing/");
        debugfs_found = 1;
 
        return debugfs;
-- 
cgit v1.1


From 115a57c5b31ab560574fe1a09deaba2ae89e77b5 Mon Sep 17 00:00:00 2001
From: "Darrick J. Wong" <djwong@us.ibm.com>
Date: Mon, 26 Oct 2009 16:50:07 -0700
Subject: hwmon: enhance the sysfs API for power meters

Augment the documentation of the hwmon sysfs API to accomodate ACPI power
meters and the current desired behavior of power capping hardware drivers.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/hwmon/sysfs-interface | 57 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index dcbd502..82def88 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -353,10 +353,20 @@ power[1-*]_average		Average power use
 				Unit: microWatt
 				RO
 
-power[1-*]_average_interval	Power use averaging interval
+power[1-*]_average_interval	Power use averaging interval.  A poll
+				notification is sent to this file if the
+				hardware changes the averaging interval.
 				Unit: milliseconds
 				RW
 
+power[1-*]_average_interval_max	Maximum power use averaging interval
+				Unit: milliseconds
+				RO
+
+power[1-*]_average_interval_min	Minimum power use averaging interval
+				Unit: milliseconds
+				RO
+
 power[1-*]_average_highest	Historical average maximum power use
 				Unit: microWatt
 				RO
@@ -365,6 +375,18 @@ power[1-*]_average_lowest	Historical average minimum power use
 				Unit: microWatt
 				RO
 
+power[1-*]_average_max		A poll notification is sent to
+				power[1-*]_average when power use
+				rises above this value.
+				Unit: microWatt
+				RW
+
+power[1-*]_average_min		A poll notification is sent to
+				power[1-*]_average when power use
+				sinks below this value.
+				Unit: microWatt
+				RW
+
 power[1-*]_input		Instantaneous power use
 				Unit: microWatt
 				RO
@@ -381,6 +403,39 @@ power[1-*]_reset_history	Reset input_highest, input_lowest,
 				average_highest and average_lowest.
 				WO
 
+power[1-*]_accuracy		Accuracy of the power meter.
+				Unit: Percent
+				RO
+
+power[1-*]_alarm		1 if the system is drawing more power than the
+				cap allows; 0 otherwise.  A poll notification is
+				sent to this file when the power use exceeds the
+				cap.  This file only appears if the cap is known
+				to be enforced by hardware.
+				RO
+
+power[1-*]_cap			If power use rises above this limit, the
+				system should take action to reduce power use.
+				A poll notification is sent to this file if the
+				cap is changed by the hardware.  The *_cap
+				files only appear if the cap is known to be
+				enforced by hardware.
+				Unit: microWatt
+				RW
+
+power[1-*]_cap_hyst		Margin of hysteresis built around capping and
+				notification.
+				Unit: microWatt
+				RW
+
+power[1-*]_cap_max		Maximum cap that can be set.
+				Unit: microWatt
+				RO
+
+power[1-*]_cap_min		Minimum cap that can be set.
+				Unit: microWatt
+				RO
+
 **********
 * Energy *
 **********
-- 
cgit v1.1


From 468727ab1273a0f95562befa611a3ce39778599c Mon Sep 17 00:00:00 2001
From: Alex Chiang <achiang@hp.com>
Date: Wed, 21 Oct 2009 21:45:15 -0600
Subject: Documentation: ABI: rename sysfs-devices-cache_disable properly

Rename sysfs-devices-cache_disable to sysfs-devices-system-cpu, in
order to keep a stricter correlation between a sysfs directory and
its documentation.

Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 Documentation/ABI/testing/sysfs-devices-cache_disable | 18 ------------------
 Documentation/ABI/testing/sysfs-devices-system-cpu    | 18 ++++++++++++++++++
 2 files changed, 18 insertions(+), 18 deletions(-)
 delete mode 100644 Documentation/ABI/testing/sysfs-devices-cache_disable
 create mode 100644 Documentation/ABI/testing/sysfs-devices-system-cpu

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-devices-cache_disable b/Documentation/ABI/testing/sysfs-devices-cache_disable
deleted file mode 100644
index 175bb4f..0000000
--- a/Documentation/ABI/testing/sysfs-devices-cache_disable
+++ /dev/null
@@ -1,18 +0,0 @@
-What:      /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
-Date:      August 2008
-KernelVersion:	2.6.27
-Contact:	mark.langsdorf@amd.com
-Description:	These files exist in every cpu's cache index directories.
-		There are currently 2 cache_disable_# files in each
-		directory.  Reading from these files on a supported
-		processor will return that cache disable index value
-		for that processor and node.  Writing to one of these
-		files will cause the specificed cache index to be disabled.
-
-		Currently, only AMD Family 10h Processors support cache index
-		disable, and only for their L3 caches.  See the BIOS and
-		Kernel Developer's Guide at
-		http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
-		for formatting information and other details on the
-		cache index disable.
-Users:    joachim.deguara@amd.com
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
new file mode 100644
index 0000000..175bb4f
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -0,0 +1,18 @@
+What:      /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
+Date:      August 2008
+KernelVersion:	2.6.27
+Contact:	mark.langsdorf@amd.com
+Description:	These files exist in every cpu's cache index directories.
+		There are currently 2 cache_disable_# files in each
+		directory.  Reading from these files on a supported
+		processor will return that cache disable index value
+		for that processor and node.  Writing to one of these
+		files will cause the specificed cache index to be disabled.
+
+		Currently, only AMD Family 10h Processors support cache index
+		disable, and only for their L3 caches.  See the BIOS and
+		Kernel Developer's Guide at
+		http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
+		for formatting information and other details on the
+		cache index disable.
+Users:    joachim.deguara@amd.com
-- 
cgit v1.1


From 2ceb3fb0a78c671892b01319ac8e3baede33a78c Mon Sep 17 00:00:00 2001
From: Alex Chiang <achiang@hp.com>
Date: Wed, 21 Oct 2009 21:45:20 -0600
Subject: Documentation: ABI: document /sys/devices/system/cpu/

This interface has been around for a long time, but hasn't been
officially documented.

Document the top level sysfs directory for CPU attributes.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 12 ++++++++++++
 1 file changed, 12 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 175bb4f..86126b1 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -1,3 +1,15 @@
+What:		/sys/devices/system/cpu/
+Date:		pre-git history
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:
+		A collection of both global and individual CPU attributes
+
+		Individual CPU attributes are contained in subdirectories
+		named by the kernel's logical CPU number, e.g.:
+
+		/sys/devices/system/cpu/cpu#/
+
+
 What:      /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
 Date:      August 2008
 KernelVersion:	2.6.27
-- 
cgit v1.1


From d93fc863d2d2cea1057996c39cef368f41741448 Mon Sep 17 00:00:00 2001
From: Alex Chiang <achiang@hp.com>
Date: Wed, 21 Oct 2009 21:45:25 -0600
Subject: Documentation: ABI: /sys/devices/system/cpu/ topology files

Add brief descriptions for the following sysfs files:

	/sys/devices/system/cpu/kernel_max
	/sys/devices/system/cpu/offline
	/sys/devices/system/cpu/online
	/sys/devices/system/cpu/possible
	/sys/devices/system/cpu/present

Excerpted the relevant information from Documentation/cputopology.txt
and pointed back to cputopology.txt as the authoritative source of
information.

Cc: Mike Travis <travis@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 86126b1..871acdb 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -10,6 +10,34 @@ Description:
 		/sys/devices/system/cpu/cpu#/
 
 
+What:		/sys/devices/system/cpu/kernel_max
+		/sys/devices/system/cpu/offline
+		/sys/devices/system/cpu/online
+		/sys/devices/system/cpu/possible
+		/sys/devices/system/cpu/present
+Date:		December 2008
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	CPU topology files that describe kernel limits related to
+		hotplug. Briefly:
+
+		kernel_max: the maximum cpu index allowed by the kernel
+		configuration.
+
+		offline: cpus that are not online because they have been
+		HOTPLUGGED off or exceed the limit of cpus allowed by the
+		kernel configuration (kernel_max above).
+
+		online: cpus that are online and being scheduled.
+
+		possible: cpus that have been allocated resources and can be
+		brought online if they are present.
+
+		present: cpus that have been identified as being present in
+		the system.
+
+		See Documentation/cputopology.txt for more information.
+
+
 What:      /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
 Date:      August 2008
 KernelVersion:	2.6.27
-- 
cgit v1.1


From 663fb2fc733006f685400fb44551303b72b61a88 Mon Sep 17 00:00:00 2001
From: Alex Chiang <achiang@hp.com>
Date: Wed, 21 Oct 2009 21:45:31 -0600
Subject: Documentation: ABI: /sys/devices/system/cpu/cpu#/ topology files

Add brief descriptions for the following sysfs files:

	/sys/devices/system/cpu/cpu#/topology/core_id
	/sys/devices/system/cpu/cpu#/topology/core_siblings
	/sys/devices/system/cpu/cpu#/topology/core_siblings_list
	/sys/devices/system/cpu/cpu#/topology/physical_package_id
	/sys/devices/system/cpu/cpu#/topology/thread_siblings
	/sys/devices/system/cpu/cpu#/topology/thread_siblings_list

The descriptions in Documentation/cputopology.txt weren't very
informative, so I attempted a better description based on code
reading and hopeful guessing.

Updated Documentation/cputopology.txt with the better descriptions and
fixed some style issues.

Cc: Mike Travis <travis@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++++++++++++++++++
 Documentation/cputopology.txt                      | 47 ++++++++++++++--------
 2 files changed, 69 insertions(+), 17 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 871acdb..2ade5c0 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -38,6 +38,45 @@ Description:	CPU topology files that describe kernel limits related to
 		See Documentation/cputopology.txt for more information.
 
 
+What:		/sys/devices/system/cpu/cpu#/topology/core_id
+		/sys/devices/system/cpu/cpu#/topology/core_siblings
+		/sys/devices/system/cpu/cpu#/topology/core_siblings_list
+		/sys/devices/system/cpu/cpu#/topology/physical_package_id
+		/sys/devices/system/cpu/cpu#/topology/thread_siblings
+		/sys/devices/system/cpu/cpu#/topology/thread_siblings_list
+Date:		December 2008
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	CPU topology files that describe a logical CPU's relationship
+		to other cores and threads in the same physical package.
+
+		One cpu# directory is created per logical CPU in the system,
+		e.g. /sys/devices/system/cpu/cpu42/.
+
+		Briefly, the files above are:
+
+		core_id: the CPU core ID of cpu#. Typically it is the
+		hardware platform's identifier (rather than the kernel's).
+		The actual value is architecture and platform dependent.
+
+		core_siblings: internal kernel map of cpu#'s hardware threads
+		within the same physical_package_id.
+
+		core_siblings_list: human-readable list of the logical CPU
+		numbers within the same physical_package_id as cpu#.
+
+		physical_package_id: physical package id of cpu#. Typically
+		corresponds to a physical socket number, but the actual value
+		is architecture and platform dependent.
+
+		thread_siblings: internel kernel map of cpu#'s hardware
+		threads within the same core as cpu#
+
+		thread_siblings_list: human-readable list of cpu#'s hardware
+		threads within the same core as cpu#
+
+		See Documentation/cputopology.txt for more information.
+
+
 What:      /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
 Date:      August 2008
 KernelVersion:	2.6.27
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index b41f3e5..f1c5c4b 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -1,15 +1,28 @@
 
-Export cpu topology info via sysfs. Items (attributes) are similar
+Export CPU topology info via sysfs. Items (attributes) are similar
 to /proc/cpuinfo.
 
 1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
-represent the physical package id of  cpu X;
+
+	physical package id of cpuX. Typically corresponds to a physical
+	socket number, but the actual value is architecture and platform
+	dependent.
+
 2) /sys/devices/system/cpu/cpuX/topology/core_id:
-represent the cpu core id to cpu X;
+
+	the CPU core ID of cpuX. Typically it is the hardware platform's
+	identifier (rather than the kernel's).  The actual value is
+	architecture and platform dependent.
+
 3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
-represent the thread siblings to cpu X in the same core;
+
+	internel kernel map of cpuX's hardware threads within the same
+	core as cpuX
+
 4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
-represent the thread siblings to cpu X in the same physical package;
+
+	internal kernel map of cpuX's hardware threads within the same
+	physical_package_id.
 
 To implement it in an architecture-neutral way, a new source file,
 drivers/base/topology.c, is to export the 4 attributes.
@@ -32,32 +45,32 @@ not defined by include/asm-XXX/topology.h:
 3) thread_siblings: just the given CPU
 4) core_siblings: just the given CPU
 
-Additionally, cpu topology information is provided under
+Additionally, CPU topology information is provided under
 /sys/devices/system/cpu and includes these files.  The internal
 source for the output is in brackets ("[]").
 
-    kernel_max: the maximum cpu index allowed by the kernel configuration.
+    kernel_max: the maximum CPU index allowed by the kernel configuration.
 		[NR_CPUS-1]
 
-    offline:	cpus that are not online because they have been
+    offline:	CPUs that are not online because they have been
 		HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit
-		of cpus allowed by the kernel configuration (kernel_max
+		of CPUs allowed by the kernel configuration (kernel_max
 		above). [~cpu_online_mask + cpus >= NR_CPUS]
 
-    online:	cpus that are online and being scheduled [cpu_online_mask]
+    online:	CPUs that are online and being scheduled [cpu_online_mask]
 
-    possible:	cpus that have been allocated resources and can be
+    possible:	CPUs that have been allocated resources and can be
 		brought online if they are present. [cpu_possible_mask]
 
-    present:	cpus that have been identified as being present in the
+    present:	CPUs that have been identified as being present in the
 		system. [cpu_present_mask]
 
 The format for the above output is compatible with cpulist_parse()
 [see <linux/cpumask.h>].  Some examples follow.
 
-In this example, there are 64 cpus in the system but cpus 32-63 exceed
+In this example, there are 64 CPUs in the system but cpus 32-63 exceed
 the kernel max which is limited to 0..31 by the NR_CPUS config option
-being 32.  Note also that cpus 2 and 4-31 are not online but could be
+being 32.  Note also that CPUs 2 and 4-31 are not online but could be
 brought online as they are both present and possible.
 
      kernel_max: 31
@@ -67,8 +80,8 @@ brought online as they are both present and possible.
         present: 0-31
 
 In this example, the NR_CPUS config option is 128, but the kernel was
-started with possible_cpus=144.  There are 4 cpus in the system and cpu2
-was manually taken offline (and is the only cpu that can be brought
+started with possible_cpus=144.  There are 4 CPUs in the system and cpu2
+was manually taken offline (and is the only CPU that can be brought
 online.)
 
      kernel_max: 127
@@ -78,4 +91,4 @@ online.)
         present: 0-3
 
 See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter
-as well as more information on the various cpumask's.
+as well as more information on the various cpumasks.
-- 
cgit v1.1


From e6dcfa7c61c4d31797a12d738bfe0bdec0ca2be1 Mon Sep 17 00:00:00 2001
From: Alex Chiang <achiang@hp.com>
Date: Wed, 21 Oct 2009 21:45:36 -0600
Subject: Documentation: ABI:
 /sys/devices/system/cpu/sched_[mc|smt]_power_savings

Document sched_[mc|smt]_power_savings by reading existing code and
git logs.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 24 ++++++++++++++++++++++
 1 file changed, 24 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 2ade5c0..8cdda1c 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -9,6 +9,30 @@ Description:
 
 		/sys/devices/system/cpu/cpu#/
 
+What:		/sys/devices/system/cpu/sched_mc_power_savings
+		/sys/devices/system/cpu/sched_smt_power_savings
+Date:		June 2006
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Discover and adjust the kernel's multi-core scheduler support.
+
+		Possible values are:
+
+		0 - No power saving load balance (default value)
+		1 - Fill one thread/core/package first for long running threads
+		2 - Also bias task wakeups to semi-idle cpu package for power
+		    savings
+
+		sched_mc_power_savings is dependent upon SCHED_MC, which is
+		itself architecture dependent.
+
+		sched_smt_power_savings is dependent upon SCHED_SMT, which
+		is itself architecture dependent.
+
+		The two files are independent of each other. It is possible
+		that one file may be present without the other.
+
+		Introduced by git commit 5c45bf27.
+
 
 What:		/sys/devices/system/cpu/kernel_max
 		/sys/devices/system/cpu/offline
-- 
cgit v1.1


From c1fb5c475126b77b47ba762f5b48535cd0420d24 Mon Sep 17 00:00:00 2001
From: Alex Chiang <achiang@hp.com>
Date: Wed, 21 Oct 2009 21:45:41 -0600
Subject: Documentation: ABI: /sys/devices/system/cpu/cpuidle/

Document cpuidle sysfs attributes by reading code, Documentation/cpuidle/,
and git logs.

Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 8cdda1c..968d8ba 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -101,6 +101,26 @@ Description:	CPU topology files that describe a logical CPU's relationship
 		See Documentation/cputopology.txt for more information.
 
 
+What:		/sys/devices/system/cpu/cpuidle/current_driver
+		/sys/devices/system/cpu/cpuidle/current_governer_ro
+Date:		September 2007
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Discover cpuidle policy and mechanism
+
+		Various CPUs today support multiple idle levels that are
+		differentiated by varying exit latencies and power
+		consumption during idle.
+
+		Idle policy (governor) is differentiated from idle mechanism
+		(driver)
+
+		current_driver: displays current idle mechanism
+
+		current_governor_ro: displays current idle policy
+
+		See files in Documentation/cpuidle/ for more information.
+
+
 What:      /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
 Date:      August 2008
 KernelVersion:	2.6.27
-- 
cgit v1.1


From 657348a056eea4a27be20cf8e22c98a252597447 Mon Sep 17 00:00:00 2001
From: Alex Chiang <achiang@hp.com>
Date: Wed, 21 Oct 2009 22:15:30 -0600
Subject: Documentation: ABI: /sys/devices/system/cpu/cpu#/node

Describe NUMA node symlink created for CPUs when CONFIG_NUMA is set.

Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 968d8ba..a703b9e 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -62,6 +62,21 @@ Description:	CPU topology files that describe kernel limits related to
 		See Documentation/cputopology.txt for more information.
 
 
+
+What:		/sys/devices/system/cpu/cpu#/node
+Date:		October 2009
+Contact:	Linux memory management mailing list <linux-mm@kvack.org>
+Description:	Discover NUMA node a CPU belongs to
+
+		When CONFIG_NUMA is enabled, a symbolic link that points
+		to the corresponding NUMA node directory.
+
+		For example, the following symlink is created for cpu42
+		in NUMA node 2:
+
+		/sys/devices/system/cpu/cpu42/node2 -> ../../node/node2
+
+
 What:		/sys/devices/system/cpu/cpu#/topology/core_id
 		/sys/devices/system/cpu/cpu#/topology/core_siblings
 		/sys/devices/system/cpu/cpu#/topology/core_siblings_list
-- 
cgit v1.1


From 23aebca486429b74c35b41ac5cac7ce97609fd6a Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai@suse.de>
Date: Mon, 2 Nov 2009 14:10:59 +0100
Subject: ALSA: dummy - Fix descriptions of pcm_substreams parameter

Now up to 128 substreams are supported.

Reported-by: Adrian Bridgett <adrian@smop.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 Documentation/sound/alsa/ALSA-Configuration.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 1c8eb45..fd9a2f6 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -522,7 +522,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
     pcm_devs       - Number of PCM devices assigned to each card
                      (default = 1, up to 4)
     pcm_substreams - Number of PCM substreams assigned to each PCM
-                     (default = 8, up to 16)
+                     (default = 8, up to 128)
     hrtimer        - Use hrtimer (=1, default) or system timer (=0)
     fake_buffer    - Fake buffer allocations (default = 1)
 
-- 
cgit v1.1


From d4da6c9ccf648f3f1cb5bf9d981a62c253d30e28 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 2 Nov 2009 10:15:27 -0800
Subject: Revert "ext4: Remove journal_checksum mount option and enable it by
 default"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This reverts commit d0646f7b636d067d715fab52a2ba9c6f0f46b0d7, as
requested by Eric Sandeen.

It can basically cause an ext4 filesystem to miss recovery (and thus get
mounted with errors) if the journal checksum does not match.

Quoth Eric:

   "My hand-wavy hunch about what is happening is that we're finding a
    bad checksum on the last partially-written transaction, which is
    not surprising, but if we have a wrapped log and we're doing the
    initial scan for head/tail, and we abort scanning on that bad
    checksum, then we are essentially running an unrecovered filesystem.

    But that's hand-wavy and I need to go look at the code.

    We lived without journal checksums on by default until now, and at
    this point they're doing more harm than good, so we should revert
    the default-changing commit until we can fix it and do some good
    power-fail testing with the fixes in place."

See

	http://bugzilla.kernel.org/show_bug.cgi?id=14354

for all the gory details.

Requested-by: Eric Sandeen <sandeen@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Alexey Fisher <bug-track@fisher-privat.net>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mathias Burén <mathias.buren@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/filesystems/ext4.txt | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index bf4f4b7..6d94e06 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -134,9 +134,15 @@ ro                   	Mount filesystem read only. Note that ext4 will
                      	mount options "ro,noload" can be used to prevent
 		     	writes to the filesystem.
 
+journal_checksum	Enable checksumming of the journal transactions.
+			This will allow the recovery code in e2fsck and the
+			kernel to detect corruption in the kernel.  It is a
+			compatible change and will be ignored by older kernels.
+
 journal_async_commit	Commit block can be written to disk without waiting
 			for descriptor blocks. If enabled older kernels cannot
-			mount the device.
+			mount the device. This will enable 'journal_checksum'
+			internally.
 
 journal=update		Update the ext4 file system's journal to the current
 			format.
-- 
cgit v1.1


From 38bb0849a6799e05bb474a88f198522ae4ecace2 Mon Sep 17 00:00:00 2001
From: Frans Pop <elendil@planet.nl>
Date: Mon, 26 Oct 2009 08:38:59 +0100
Subject: thermal: sysfs-api.txt - reformat for improved readability

The document currently uses large indentations which make the text
too wide for easy readability. Also improve general consistency.

Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/thermal/sysfs-api.txt | 380 ++++++++++++++++++------------------
 1 file changed, 192 insertions(+), 188 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt
index 70d68ce..895337f 100644
--- a/Documentation/thermal/sysfs-api.txt
+++ b/Documentation/thermal/sysfs-api.txt
@@ -1,5 +1,5 @@
 Generic Thermal Sysfs driver How To
-=========================
+===================================
 
 Written by Sujith Thomas <sujith.thomas@intel.com>, Zhang Rui <rui.zhang@intel.com>
 
@@ -10,20 +10,20 @@ Copyright (c)  2008 Intel Corporation
 
 0. Introduction
 
-The generic thermal sysfs provides a set of interfaces for thermal zone devices (sensors)
-and thermal cooling devices (fan, processor...) to register with the thermal management
-solution and to be a part of it.
+The generic thermal sysfs provides a set of interfaces for thermal zone
+devices (sensors) and thermal cooling devices (fan, processor...) to register
+with the thermal management solution and to be a part of it.
 
-This how-to focuses on enabling new thermal zone and cooling devices to participate
-in thermal management.
-This solution is platform independent and any type of thermal zone devices and
-cooling devices should be able to make use of the infrastructure.
+This how-to focuses on enabling new thermal zone and cooling devices to
+participate in thermal management.
+This solution is platform independent and any type of thermal zone devices
+and cooling devices should be able to make use of the infrastructure.
 
-The main task of the thermal sysfs driver is to expose thermal zone attributes as well
-as cooling device attributes to the user space.
-An intelligent thermal management application can make decisions based on inputs
-from thermal zone attributes (the current temperature and trip point temperature)
-and throttle appropriate devices.
+The main task of the thermal sysfs driver is to expose thermal zone attributes
+as well as cooling device attributes to the user space.
+An intelligent thermal management application can make decisions based on
+inputs from thermal zone attributes (the current temperature and trip point
+temperature) and throttle appropriate devices.
 
 [0-*]	denotes any positive number starting from 0
 [1-*]	denotes any positive number starting from 1
@@ -31,77 +31,77 @@ and throttle appropriate devices.
 1. thermal sysfs driver interface functions
 
 1.1 thermal zone device interface
-1.1.1 struct thermal_zone_device *thermal_zone_device_register(char *name, int trips,
-				void *devdata, struct thermal_zone_device_ops *ops)
-
-	This interface function adds a new thermal zone device (sensor) to
-	/sys/class/thermal folder as thermal_zone[0-*].
-	It tries to bind all the thermal cooling devices registered at the same time.
-
-	name: the thermal zone name.
-	trips: the total number of trip points this thermal zone supports.
-	devdata: device private data
-	ops: thermal zone device call-backs.
-		.bind: bind the thermal zone device with a thermal cooling device.
-		.unbind: unbind the thermal zone device with a thermal cooling device.
-		.get_temp: get the current temperature of the thermal zone.
-		.get_mode: get the current mode (user/kernel) of the thermal zone.
-			   "kernel" means thermal management is done in kernel.
-			   "user" will prevent kernel thermal driver actions upon trip points
-			   so that user applications can take charge of thermal management.
-		.set_mode: set the mode (user/kernel) of the thermal zone.
-		.get_trip_type: get the type of certain trip point.
-		.get_trip_temp: get the temperature above which the certain trip point
-				will be fired.
+1.1.1 struct thermal_zone_device *thermal_zone_device_register(char *name,
+		int trips, void *devdata, struct thermal_zone_device_ops *ops)
+
+    This interface function adds a new thermal zone device (sensor) to
+    /sys/class/thermal folder as thermal_zone[0-*]. It tries to bind all the
+    thermal cooling devices registered at the same time.
+
+    name: the thermal zone name.
+    trips: the total number of trip points this thermal zone supports.
+    devdata: device private data
+    ops: thermal zone device call-backs.
+	.bind: bind the thermal zone device with a thermal cooling device.
+	.unbind: unbind the thermal zone device with a thermal cooling device.
+	.get_temp: get the current temperature of the thermal zone.
+	.get_mode: get the current mode (user/kernel) of the thermal zone.
+	    - "kernel" means thermal management is done in kernel.
+	    - "user" will prevent kernel thermal driver actions upon trip points
+	      so that user applications can take charge of thermal management.
+	.set_mode: set the mode (user/kernel) of the thermal zone.
+	.get_trip_type: get the type of certain trip point.
+	.get_trip_temp: get the temperature above which the certain trip point
+			will be fired.
 
 1.1.2 void thermal_zone_device_unregister(struct thermal_zone_device *tz)
 
-	This interface function removes the thermal zone device.
-	It deletes the corresponding entry form /sys/class/thermal folder and unbind all
-	the thermal cooling devices it uses.
+    This interface function removes the thermal zone device.
+    It deletes the corresponding entry form /sys/class/thermal folder and
+    unbind all the thermal cooling devices it uses.
 
 1.2 thermal cooling device interface
 1.2.1 struct thermal_cooling_device *thermal_cooling_device_register(char *name,
-					void *devdata, struct thermal_cooling_device_ops *)
-
-	This interface function adds a new thermal cooling device (fan/processor/...) to
-	/sys/class/thermal/ folder as cooling_device[0-*].
-	It tries to bind itself to all the thermal zone devices register at the same time.
-	name: the cooling device name.
-	devdata: device private data.
-	ops: thermal cooling devices call-backs.
-		.get_max_state: get the Maximum throttle state of the cooling device.
-		.get_cur_state: get the Current throttle state of the cooling device.
-		.set_cur_state: set the Current throttle state of the cooling device.
+		void *devdata, struct thermal_cooling_device_ops *)
+
+    This interface function adds a new thermal cooling device (fan/processor/...)
+    to /sys/class/thermal/ folder as cooling_device[0-*]. It tries to bind itself
+    to all the thermal zone devices register at the same time.
+    name: the cooling device name.
+    devdata: device private data.
+    ops: thermal cooling devices call-backs.
+	.get_max_state: get the Maximum throttle state of the cooling device.
+	.get_cur_state: get the Current throttle state of the cooling device.
+	.set_cur_state: set the Current throttle state of the cooling device.
 
 1.2.2 void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
 
-	This interface function remove the thermal cooling device.
-	It deletes the corresponding entry form /sys/class/thermal folder and unbind
-	itself from all	the thermal zone devices using it.
+    This interface function remove the thermal cooling device.
+    It deletes the corresponding entry form /sys/class/thermal folder and
+    unbind itself from all the thermal zone devices using it.
 
 1.3 interface for binding a thermal zone device with a thermal cooling device
 1.3.1 int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
-			int trip, struct thermal_cooling_device *cdev);
+		int trip, struct thermal_cooling_device *cdev);
 
-	This interface function bind a thermal cooling device to the certain trip point
-	of a thermal zone device.
-	This function is usually called in the thermal zone device .bind callback.
-	tz: the thermal zone device
-	cdev: thermal cooling device
-	trip: indicates which trip point the cooling devices is associated with
-		 in this thermal zone.
+    This interface function bind a thermal cooling device to the certain trip
+    point of a thermal zone device.
+    This function is usually called in the thermal zone device .bind callback.
+    tz: the thermal zone device
+    cdev: thermal cooling device
+    trip: indicates which trip point the cooling devices is associated with
+	  in this thermal zone.
 
 1.3.2 int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
-				int trip, struct thermal_cooling_device *cdev);
+		int trip, struct thermal_cooling_device *cdev);
 
-	This interface function unbind a thermal cooling device from the certain trip point
-	of a thermal zone device.
-	This function is usually called in the thermal zone device .unbind callback.
-	tz: the thermal zone device
-	cdev: thermal cooling device
-	trip: indicates which trip point the cooling devices is associated with
-		in this thermal zone.
+    This interface function unbind a thermal cooling device from the certain
+    trip point of a thermal zone device. This function is usually called in
+    the thermal zone device .unbind callback.
+    tz: the thermal zone device
+    cdev: thermal cooling device
+    trip: indicates which trip point the cooling devices is associated with
+	  in this thermal zone.
 
 2. sysfs attributes structure
 
@@ -114,153 +114,157 @@ if hwmon is compiled in or built as a module.
 
 Thermal zone device sys I/F, created once it's registered:
 /sys/class/thermal/thermal_zone[0-*]:
-	|-----type:			Type of the thermal zone
-	|-----temp:			Current temperature
-	|-----mode:			Working mode of the thermal zone
-	|-----trip_point_[0-*]_temp:	Trip point temperature
-	|-----trip_point_[0-*]_type:	Trip point type
+    |---type:			Type of the thermal zone
+    |---temp:			Current temperature
+    |---mode:			Working mode of the thermal zone
+    |---trip_point_[0-*]_temp:	Trip point temperature
+    |---trip_point_[0-*]_type:	Trip point type
 
 Thermal cooling device sys I/F, created once it's registered:
 /sys/class/thermal/cooling_device[0-*]:
-	|-----type :			Type of the cooling device(processor/fan/...)
-	|-----max_state:		Maximum cooling state of the cooling device
-	|-----cur_state:		Current cooling state of the cooling device
+    |---type:			Type of the cooling device(processor/fan/...)
+    |---max_state:		Maximum cooling state of the cooling device
+    |---cur_state:		Current cooling state of the cooling device
 
 
-These two dynamic attributes are created/removed in pairs.
-They represent the relationship between a thermal zone and its associated cooling device.
-They are created/removed for each
-thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device successful execution.
+Then next two dynamic attributes are created/removed in pairs. They represent
+the relationship between a thermal zone and its associated cooling device.
+They are created/removed for each successful execution of
+thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device.
 
-/sys/class/thermal/thermal_zone[0-*]
-	|-----cdev[0-*]:		The [0-*]th cooling device in the current thermal zone
-	|-----cdev[0-*]_trip_point:	Trip point that cdev[0-*] is associated with
+/sys/class/thermal/thermal_zone[0-*]:
+    |---cdev[0-*]:		[0-*]th cooling device in current thermal zone
+    |---cdev[0-*]_trip_point:	Trip point that cdev[0-*] is associated with
 
 Besides the thermal zone device sysfs I/F and cooling device sysfs I/F,
-the generic thermal driver also creates a hwmon sysfs I/F for each _type_ of
-thermal zone device. E.g. the generic thermal driver registers one hwmon class device
-and build the associated hwmon sysfs I/F for all the registered ACPI thermal zones.
+the generic thermal driver also creates a hwmon sysfs I/F for each _type_
+of thermal zone device. E.g. the generic thermal driver registers one hwmon
+class device and build the associated hwmon sysfs I/F for all the registered
+ACPI thermal zones.
+
 /sys/class/hwmon/hwmon[0-*]:
-	|-----name:			The type of the thermal zone devices.
-	|-----temp[1-*]_input:		The current temperature of thermal zone [1-*].
-	|-----temp[1-*]_critical:	The critical trip point of thermal zone [1-*].
+    |---name:			The type of the thermal zone devices
+    |---temp[1-*]_input:	The current temperature of thermal zone [1-*]
+    |---temp[1-*]_critical:	The critical trip point of thermal zone [1-*]
+
 Please read Documentation/hwmon/sysfs-interface for additional information.
 
 ***************************
 * Thermal zone attributes *
 ***************************
 
-type				Strings which represent the thermal zone type.
-				This is given by thermal zone driver as part of registration.
-				Eg: "acpitz" indicates it's an ACPI thermal device.
-				In order to keep it consistent with hwmon sys attribute,
-				this should be a short, lowercase string,
-				not containing spaces nor dashes.
-				RO
-				Required
-
-temp				Current temperature as reported by thermal zone (sensor)
-				Unit: millidegree Celsius
-				RO
-				Required
-
-mode				One of the predefined values in [kernel, user]
-				This file gives information about the algorithm
-				that is currently managing the thermal zone.
-				It can be either default kernel based algorithm
-				or user space application.
-				RW
-				Optional
-				kernel	= Thermal management in kernel thermal zone driver.
-				user	= Preventing kernel thermal zone driver actions upon
-					  trip points so that user application can take full
-					  charge of the thermal management.
-
-trip_point_[0-*]_temp		The temperature above which trip point will be fired
-				Unit: millidegree Celsius
-				RO
-				Optional
-
-trip_point_[0-*]_type 		Strings which indicate the type of the trip point
-				E.g. it can be one of critical, hot, passive,
-				    active[0-*] for ACPI thermal zone.
-				RO
-				Optional
-
-cdev[0-*]			Sysfs link to the thermal cooling device node where the sys I/F
-				for cooling device throttling control represents.
-				RO
-				Optional
-
-cdev[0-*]_trip_point		The trip point with which cdev[0-*] is associated in this thermal zone
-				-1 means the cooling device is not associated with any trip point.
-				RO
-				Optional
-
-******************************
-* Cooling device  attributes *
-******************************
-
-type				String which represents the type of device
-				eg: For generic ACPI: this should be "Fan",
-				"Processor" or "LCD"
-				eg. For memory controller device on intel_menlow platform:
-				this should be "Memory controller"
-				RO
-				Required
-
-max_state			The maximum permissible cooling state of this cooling device.
-				RO
-				Required
-
-cur_state			The current cooling state of this cooling device.
-				the value can any integer numbers between 0 and max_state,
-				cur_state == 0 means no cooling
-				cur_state == max_state means the maximum cooling.
-				RW
-				Required
+type
+	Strings which represent the thermal zone type.
+	This is given by thermal zone driver as part of registration.
+	E.g: "acpitz" indicates it's an ACPI thermal device.
+	In order to keep it consistent with hwmon sys attribute; this should
+	be a short, lowercase string, not containing spaces nor dashes.
+	RO, Required
+
+temp
+	Current temperature as reported by thermal zone (sensor).
+	Unit: millidegree Celsius
+	RO, Required
+
+mode
+	One of the predefined values in [kernel, user].
+	This file gives information about the algorithm that is currently
+	managing the thermal zone. It can be either default kernel based
+	algorithm or user space application.
+	kernel	= Thermal management in kernel thermal zone driver.
+	user	= Preventing kernel thermal zone driver actions upon
+		  trip points so that user application can take full
+		  charge of the thermal management.
+	RW, Optional
+
+trip_point_[0-*]_temp
+	The temperature above which trip point will be fired.
+	Unit: millidegree Celsius
+	RO, Optional
+
+trip_point_[0-*]_type
+	Strings which indicate the type of the trip point.
+	E.g. it can be one of critical, hot, passive, active[0-*] for ACPI
+	thermal zone.
+	RO, Optional
+
+cdev[0-*]
+	Sysfs link to the thermal cooling device node where the sys I/F
+	for cooling device throttling control represents.
+	RO, Optional
+
+cdev[0-*]_trip_point
+	The trip point with which cdev[0-*] is associated in this thermal
+	zone; -1 means the cooling device is not associated with any trip
+	point.
+	RO, Optional
+
+*****************************
+* Cooling device attributes *
+*****************************
+
+type
+	String which represents the type of device, e.g:
+	- for generic ACPI: should be "Fan", "Processor" or "LCD"
+	- for memory controller device on intel_menlow platform:
+	  should be "Memory controller".
+	RO, Required
+
+max_state
+	The maximum permissible cooling state of this cooling device.
+	RO, Required
+
+cur_state
+	The current cooling state of this cooling device.
+	The value can any integer numbers between 0 and max_state:
+	- cur_state == 0 means no cooling
+	- cur_state == max_state means the maximum cooling.
+	RW, Required
 
 3. A simple implementation
 
-ACPI thermal zone may support multiple trip points like critical/hot/passive/active.
-If an ACPI thermal zone supports critical, passive, active[0] and active[1] at the same time,
-it may register itself as a thermal_zone_device (thermal_zone1) with 4 trip points in all.
-It has one processor and one fan, which are both registered as thermal_cooling_device.
-If the processor is listed in _PSL method, and the fan is listed in _AL0 method,
-the sys I/F structure will be built like this:
+ACPI thermal zone may support multiple trip points like critical, hot,
+passive, active. If an ACPI thermal zone supports critical, passive,
+active[0] and active[1] at the same time, it may register itself as a
+thermal_zone_device (thermal_zone1) with 4 trip points in all.
+It has one processor and one fan, which are both registered as
+thermal_cooling_device.
+
+If the processor is listed in _PSL method, and the fan is listed in _AL0
+method, the sys I/F structure will be built like this:
 
 /sys/class/thermal:
 
 |thermal_zone1:
-	|-----type:			acpitz
-	|-----temp:			37000
-	|-----mode:			kernel
-	|-----trip_point_0_temp:	100000
-	|-----trip_point_0_type:	critical
-	|-----trip_point_1_temp:	80000
-	|-----trip_point_1_type:	passive
-	|-----trip_point_2_temp:	70000
-	|-----trip_point_2_type:	active0
-	|-----trip_point_3_temp:	60000
-	|-----trip_point_3_type:	active1
-	|-----cdev0:			--->/sys/class/thermal/cooling_device0
-	|-----cdev0_trip_point:		1	/* cdev0 can be used for passive */
-	|-----cdev1:			--->/sys/class/thermal/cooling_device3
-	|-----cdev1_trip_point:		2	/* cdev1 can be used for active[0]*/
+    |---type:			acpitz
+    |---temp:			37000
+    |---mode:			kernel
+    |---trip_point_0_temp:	100000
+    |---trip_point_0_type:	critical
+    |---trip_point_1_temp:	80000
+    |---trip_point_1_type:	passive
+    |---trip_point_2_temp:	70000
+    |---trip_point_2_type:	active0
+    |---trip_point_3_temp:	60000
+    |---trip_point_3_type:	active1
+    |---cdev0:			--->/sys/class/thermal/cooling_device0
+    |---cdev0_trip_point:	1	/* cdev0 can be used for passive */
+    |---cdev1:			--->/sys/class/thermal/cooling_device3
+    |---cdev1_trip_point:	2	/* cdev1 can be used for active[0]*/
 
 |cooling_device0:
-	|-----type:			Processor
-	|-----max_state:		8
-	|-----cur_state:		0
+    |---type:			Processor
+    |---max_state:		8
+    |---cur_state:		0
 
 |cooling_device3:
-	|-----type:			Fan
-	|-----max_state:		2
-	|-----cur_state:		0
+    |---type:			Fan
+    |---max_state:		2
+    |---cur_state:		0
 
 /sys/class/hwmon:
 
 |hwmon0:
-	|-----name:			acpitz
-	|-----temp1_input:		37000
-	|-----temp1_crit:		100000
+    |---name:			acpitz
+    |---temp1_input:		37000
+    |---temp1_crit:		100000
-- 
cgit v1.1


From 29226ed3c3b5cd0b2b0b1fb40ffeac3f796b80e9 Mon Sep 17 00:00:00 2001
From: Frans Pop <elendil@planet.nl>
Date: Mon, 26 Oct 2009 08:39:00 +0100
Subject: thermal: sysfs-api.txt - document passive attribute for thermal zones

Signed-off-by: Frans Pop <elendil@planet.nl>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/thermal/sysfs-api.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt
index 895337f..a87dc27 100644
--- a/Documentation/thermal/sysfs-api.txt
+++ b/Documentation/thermal/sysfs-api.txt
@@ -199,6 +199,15 @@ cdev[0-*]_trip_point
 	point.
 	RO, Optional
 
+passive
+	Attribute is only present for zones in which the passive cooling
+	policy is not supported by native thermal driver. Default is zero
+	and can be set to a temperature (in millidegrees) to enable a
+	passive trip point for the zone. Activation is done by polling with
+	an interval of 1 second.
+	Unit: millidegrees Celsius
+	RW, Optional
+
 *****************************
 * Cooling device attributes *
 *****************************
-- 
cgit v1.1


From 3806e94b0148350c72f9a3214274026b6ca03f49 Mon Sep 17 00:00:00 2001
From: Crane Cai <crane.cai@amd.com>
Date: Sat, 7 Nov 2009 13:10:46 +0100
Subject: i2c-piix4: Modify code name SB900 to Hudson-2

Change SB900 to its formal code name Hudson-2.

Signed-off-by: Crane Cai <crane.cai@amd.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
---
 Documentation/i2c/busses/i2c-piix4 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4
index c5b37c5..ac540c7 100644
--- a/Documentation/i2c/busses/i2c-piix4
+++ b/Documentation/i2c/busses/i2c-piix4
@@ -8,7 +8,7 @@ Supported adapters:
     Datasheet: Only available via NDA from ServerWorks
   * ATI IXP200, IXP300, IXP400, SB600, SB700 and SB800 southbridges
     Datasheet: Not publicly available
-  * AMD SB900
+  * AMD Hudson-2
     Datasheet: Not publicly available
   * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge
     Datasheet: Publicly available at the SMSC website http://www.smsc.com
-- 
cgit v1.1


From 7ab8f5244d6a8695688a91c8a3f6d3f40356ff97 Mon Sep 17 00:00:00 2001
From: Sunil Mushran <sunil.mushran@oracle.com>
Date: Mon, 2 Nov 2009 13:38:10 -0800
Subject: ocfs2: Refresh documentation

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
---
 Documentation/filesystems/ocfs2.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
index c2a0871..c58b9f5 100644
--- a/Documentation/filesystems/ocfs2.txt
+++ b/Documentation/filesystems/ocfs2.txt
@@ -20,15 +20,16 @@ Lots of code taken from ext3 and other projects.
 Authors in alphabetical order:
 Joel Becker   <joel.becker@oracle.com>
 Zach Brown    <zach.brown@oracle.com>
-Mark Fasheh   <mark.fasheh@oracle.com>
+Mark Fasheh   <mfasheh@suse.com>
 Kurt Hackel   <kurt.hackel@oracle.com>
+Tao Ma        <tao.ma@oracle.com>
 Sunil Mushran <sunil.mushran@oracle.com>
 Manish Singh  <manish.singh@oracle.com>
+Tiger Yang    <tiger.yang@oracle.com>
 
 Caveats
 =======
 Features which OCFS2 does not support yet:
-	- quotas
 	- Directory change notification (F_NOTIFY)
 	- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
 
@@ -70,7 +71,6 @@ commit=nrsec	(*)	Ocfs2 can be told to sync all its data and metadata
 			performance.
 localalloc=8(*)		Allows custom localalloc size in MB. If the value is too
 			large, the fs will silently revert it to the default.
-			Localalloc is not enabled for local mounts.
 localflocks		This disables cluster aware flock.
 inode64			Indicates that Ocfs2 is allowed to create inodes at
 			any location in the filesystem, including those which
-- 
cgit v1.1


From 1b98c00bf3a8a417be6412d8a3ed867a72b18f68 Mon Sep 17 00:00:00 2001
From: Josh Triplett <josh@joshtriplett.org>
Date: Fri, 16 Oct 2009 14:06:13 -0700
Subject: Documentation/vm/page-types.c: Declare checked_open static

Nothing outside of Documentation/vm/page-types.c references
checked_open.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
---
 Documentation/vm/page-types.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index 3ec4f2a..4793c6a 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -218,7 +218,7 @@ static void fatal(const char *x, ...)
 	exit(EXIT_FAILURE);
 }
 
-int checked_open(const char *pathname, int flags)
+static int checked_open(const char *pathname, int flags)
 {
 	int fd = open(pathname, flags);
 
-- 
cgit v1.1


From 3d7a641e544e428191667e8b1f83f96fa46dbd65 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:10:23 +0000
Subject: SLOW_WORK: Wait for outstanding work items belonging to a module to
 clear

Wait for outstanding slow work items belonging to a module to clear when
unregistering that module as a user of the facility.  This prevents the put_ref
code of a work item from being taken away before it returns.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/slow-work.txt | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
index ebc50f8..f12fda3 100644
--- a/Documentation/slow-work.txt
+++ b/Documentation/slow-work.txt
@@ -64,9 +64,11 @@ USING SLOW WORK ITEMS
 Firstly, a module or subsystem wanting to make use of slow work items must
 register its interest:
 
-	 int ret = slow_work_register_user();
+	 int ret = slow_work_register_user(struct module *module);
 
-This will return 0 if successful, or a -ve error upon failure.
+This will return 0 if successful, or a -ve error upon failure.  The module
+pointer should be the module interested in using this facility (almost
+certainly THIS_MODULE).
 
 
 Slow work items may then be set up by:
@@ -110,7 +112,12 @@ operation.  When all a module's slow work items have been processed, and the
 module has no further interest in the facility, it should unregister its
 interest:
 
-	slow_work_unregister_user();
+	slow_work_unregister_user(struct module *module);
+
+The module pointer is used to wait for all outstanding work items for that
+module before completing the unregistration.  This prevents the put_ref() code
+from being taken away before it completes.  module should almost certainly be
+THIS_MODULE.
 
 
 ===============
-- 
cgit v1.1


From 4d8bb2cbccf6dccaada509aafeb01c6205c9d8c4 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Thu, 19 Nov 2009 18:10:39 +0000
Subject: SLOW_WORK: Make slow_work_ops ->get_ref/->put_ref optional

Make the ability for the slow-work facility to take references on a work item
optional as not everyone requires this.

Even the internal slow-work stubs them out, so those can be got rid of too.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/slow-work.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
index f12fda3..c655c51 100644
--- a/Documentation/slow-work.txt
+++ b/Documentation/slow-work.txt
@@ -125,7 +125,7 @@ ITEM OPERATIONS
 ===============
 
 Each work item requires a table of operations of type struct slow_work_ops.
-All members are required:
+Only ->execute() is required, getting and putting of a reference are optional.
 
  (*) Get a reference on an item:
 
-- 
cgit v1.1


From 0160950297c08f8233c89b9f9e7dd59cfb080809 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Thu, 19 Nov 2009 18:10:43 +0000
Subject: SLOW_WORK: Add support for cancellation of slow work

Add support for cancellation of queued slow work and delayed slow work items.
The cancellation functions will wait for items that are pending or undergoing
execution to be discarded by the slow work facility.

Attempting to enqueue work that is in the process of being cancelled will
result in ECANCELED.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/slow-work.txt | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
index c655c51..2e384bd 100644
--- a/Documentation/slow-work.txt
+++ b/Documentation/slow-work.txt
@@ -108,7 +108,17 @@ on the item, 0 otherwise.
 
 
 The items are reference counted, so there ought to be no need for a flush
-operation.  When all a module's slow work items have been processed, and the
+operation.  But as the reference counting is optional, means to cancel
+existing work items are also included:
+
+	cancel_slow_work(&myitem);
+
+can be used to cancel pending work.  The above cancel function waits for
+existing work to have been executed (or prevent execution of them, depending
+on timing).
+
+
+When all a module's slow work items have been processed, and the
 module has no further interest in the facility, it should unregister its
 interest:
 
-- 
cgit v1.1


From 6b8268b17a1ffc942bc72d7d00274e433d6b6719 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Thu, 19 Nov 2009 18:10:47 +0000
Subject: SLOW_WORK: Add delayed_slow_work support

This adds support for starting slow work with a delay, similar
to the functionality we have for workqueues.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/slow-work.txt | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
index 2e384bd..a9d1b0f 100644
--- a/Documentation/slow-work.txt
+++ b/Documentation/slow-work.txt
@@ -41,6 +41,13 @@ expand files, provided the time taken to do so isn't too long.
 Operations of both types may sleep during execution, thus tying up the thread
 loaned to it.
 
+A further class of work item is available, based on the slow work item class:
+
+ (*) Delayed slow work items.
+
+These are slow work items that have a timer to defer queueing of the item for
+a while.
+
 
 THREAD-TO-CLASS ALLOCATION
 --------------------------
@@ -95,6 +102,10 @@ Slow work items may then be set up by:
 
      or:
 
+	delayed_slow_work_init(&myitem, &myitem_ops);
+
+     or:
+
 	vslow_work_init(&myitem, &myitem_ops);
 
      depending on its class.
@@ -104,7 +115,9 @@ A suitably set up work item can then be enqueued for processing:
 	int ret = slow_work_enqueue(&myitem);
 
 This will return a -ve error if the thread pool is unable to gain a reference
-on the item, 0 otherwise.
+on the item, 0 otherwise, or (for delayed work):
+
+	int ret = delayed_slow_work_enqueue(&myitem, my_jiffy_delay);
 
 
 The items are reference counted, so there ought to be no need for a flush
@@ -112,6 +125,7 @@ operation.  But as the reference counting is optional, means to cancel
 existing work items are also included:
 
 	cancel_slow_work(&myitem);
+	cancel_delayed_slow_work(&myitem);
 
 can be used to cancel pending work.  The above cancel function waits for
 existing work to have been executed (or prevent execution of them, depending
-- 
cgit v1.1


From 8fba10a42d191de612e60e7009c8f0313f90a9b3 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:10:51 +0000
Subject: SLOW_WORK: Allow the work items to be viewed through a /proc file

Allow the executing and queued work items to be viewed through a /proc file
for debugging purposes.  The contents look something like the following:

    THR PID   ITEM ADDR        FL MARK  DESC
    === ===== ================ == ===== ==========
      0  3005 ffff880023f52348  a 952ms FSC: OBJ17d3: LOOK
      1  3006 ffff880024e33668  2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
      2  3165 ffff8800296dd180  a 424ms FSC: OBJ17e4: LOOK
      3  4089 ffff8800262c8d78  a 212ms FSC: OBJ17ea: CRTN
      4  4090 ffff88002792bed8  2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
      5  4092 ffff88002a0ef308  2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
      6  4094 ffff88002abaf4b8  2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
      7  4095 ffff88002bb188e0  a 388ms FSC: OBJ17e9: CRTN
    vsq     - ffff880023d99668  1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
    vsq     - ffff8800295d1740  1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
    vsq     - ffff880025ba3308  1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
    vsq     - ffff880024ec83e0  1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
    vsq     - ffff880026618e00  1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
    vsq     - ffff880025a2a4b8  1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
    vsq     - ffff880023cbe6d8  9 212ms FSC: OBJ17eb: LOOK
    vsq     - ffff880024d37590  9 212ms FSC: OBJ17ec: LOOK
    vsq     - ffff880027746cb0  9 212ms FSC: OBJ17ed: LOOK
    vsq     - ffff880024d37ae8  9 212ms FSC: OBJ17ee: LOOK
    vsq     - ffff880024d37cb0  9 212ms FSC: OBJ17ef: LOOK
    vsq     - ffff880025036550  9 212ms FSC: OBJ17f0: LOOK
    vsq     - ffff8800250368e0  9 212ms FSC: OBJ17f1: LOOK
    vsq     - ffff880025036aa8  9 212ms FSC: OBJ17f2: LOOK

In the 'THR' column, executing items show the thread they're occupying and
queued threads indicate which queue they're on.  'PID' shows the process ID of
a slow-work thread that's executing something.  'FL' shows the work item flags.
'MARK' indicates how long since an item was queued or began executing.  Lastly,
the 'DESC' column permits the owner of an item to give some information.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/slow-work.txt | 60 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
index a9d1b0f..f120238 100644
--- a/Documentation/slow-work.txt
+++ b/Documentation/slow-work.txt
@@ -149,7 +149,8 @@ ITEM OPERATIONS
 ===============
 
 Each work item requires a table of operations of type struct slow_work_ops.
-Only ->execute() is required, getting and putting of a reference are optional.
+Only ->execute() is required; the getting and putting of a reference and the
+describing of an item are all optional.
 
  (*) Get a reference on an item:
 
@@ -179,6 +180,16 @@ Only ->execute() is required, getting and putting of a reference are optional.
      This should perform the work required of the item.  It may sleep, it may
      perform disk I/O and it may wait for locks.
 
+ (*) View an item through /proc:
+
+	void (*desc)(struct slow_work *work, struct seq_file *m);
+
+     If supplied, this should print to 'm' a small string describing the work
+     the item is to do.  This should be no more than about 40 characters, and
+     shouldn't include a newline character.
+
+     See the 'Viewing executing and queued items' section below.
+
 
 ==================
 POOL CONFIGURATION
@@ -203,3 +214,50 @@ The slow-work thread pool has a number of configurables:
      is bounded to between 1 and one fewer than the number of active threads.
      This ensures there is always at least one thread that can process very
      slow work items, and always at least one thread that won't.
+
+
+==================================
+VIEWING EXECUTING AND QUEUED ITEMS
+==================================
+
+If CONFIG_SLOW_WORK_PROC is enabled, a proc file is made available:
+
+	/proc/slow_work_rq
+
+through which the list of work items being executed and the queues of items to
+be executed may be viewed.  The owner of a work item is given the chance to
+add some information of its own.
+
+The contents look something like the following:
+
+    THR PID   ITEM ADDR        FL MARK  DESC
+    === ===== ================ == ===== ==========
+      0  3005 ffff880023f52348  a 952ms FSC: OBJ17d3: LOOK
+      1  3006 ffff880024e33668  2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
+      2  3165 ffff8800296dd180  a 424ms FSC: OBJ17e4: LOOK
+      3  4089 ffff8800262c8d78  a 212ms FSC: OBJ17ea: CRTN
+      4  4090 ffff88002792bed8  2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
+      5  4092 ffff88002a0ef308  2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
+      6  4094 ffff88002abaf4b8  2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
+      7  4095 ffff88002bb188e0  a 388ms FSC: OBJ17e9: CRTN
+    vsq     - ffff880023d99668  1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
+    vsq     - ffff8800295d1740  1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
+    vsq     - ffff880025ba3308  1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
+    vsq     - ffff880024ec83e0  1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
+    vsq     - ffff880026618e00  1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
+    vsq     - ffff880025a2a4b8  1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
+    vsq     - ffff880023cbe6d8  9 212ms FSC: OBJ17eb: LOOK
+    vsq     - ffff880024d37590  9 212ms FSC: OBJ17ec: LOOK
+    vsq     - ffff880027746cb0  9 212ms FSC: OBJ17ed: LOOK
+    vsq     - ffff880024d37ae8  9 212ms FSC: OBJ17ee: LOOK
+    vsq     - ffff880024d37cb0  9 212ms FSC: OBJ17ef: LOOK
+    vsq     - ffff880025036550  9 212ms FSC: OBJ17f0: LOOK
+    vsq     - ffff8800250368e0  9 212ms FSC: OBJ17f1: LOOK
+    vsq     - ffff880025036aa8  9 212ms FSC: OBJ17f2: LOOK
+
+In the 'THR' column, executing items show the thread they're occupying and
+queued threads indicate which queue they're on.  'PID' shows the process ID of
+a slow-work thread that's executing something.  'FL' shows the work item flags.
+'MARK' indicates how long since an item was queued or began executing.  Lastly,
+the 'DESC' column permits the owner of an item to give some information.
+
-- 
cgit v1.1


From 31ba99d304494cb28fa8671ccc769c5543e1165d Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:10:53 +0000
Subject: SLOW_WORK: Allow the owner of a work item to determine if it is
 queued or not

Add a function (slow_work_is_queued()) to permit the owner of a work item to
determine if the item is queued or not.

The work item is counted as being queued if it is actually on the queue, not
just if it is pending.  If it is executing and pending, then it is not on the
queue, but will rather be put back on the queue when execution finishes.

This permits a caller to quickly work out if it may be able to put another,
dependent work item on the queue behind it, or whether it will have to wait
till that is finished.

This can be used by CacheFiles to work out whether the creation a new object
can be immediately deferred when it has to wait for an old object to be
deleted, or whether a wait must take place.  If a wait is necessary, then the
slow-work thread can otherwise get blocked, preventing the deletion from
taking place.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/slow-work.txt | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
index f120238..0169c9d 100644
--- a/Documentation/slow-work.txt
+++ b/Documentation/slow-work.txt
@@ -144,6 +144,21 @@ from being taken away before it completes.  module should almost certainly be
 THIS_MODULE.
 
 
+================
+HELPER FUNCTIONS
+================
+
+The slow-work facility provides a function by which it can be determined
+whether or not an item is queued for later execution:
+
+	bool queued = slow_work_is_queued(struct slow_work *work);
+
+If it returns false, then the item is not on the queue (it may be executing
+with a requeue pending).  This can be used to work out whether an item on which
+another depends is on the queue, thus allowing a dependent item to be queued
+after it.
+
+
 ===============
 ITEM OPERATIONS
 ===============
-- 
cgit v1.1


From 3bde31a4ac225cb5805be02eff6eaaf7e0766ccd Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:10:57 +0000
Subject: SLOW_WORK: Allow a requeueable work item to sleep till the thread is
 needed

Add a function to allow a requeueable work item to sleep till the thread
processing it is needed by the slow-work facility to perform other work.

Sometimes a work item can't progress immediately, but must wait for the
completion of another work item that's currently being processed by another
slow-work thread.

In some circumstances, the waiting item could instead - theoretically - put
itself back on the queue and yield its thread back to the slow-work facility,
thus waiting till it gets processing time again before attempting to progress.
This would allow other work items processing time on that thread.

However, this only works if there is something on the queue for it to queue
behind - otherwise it will just get a thread again immediately, and will end
up cycling between the queue and the thread, eating up valuable CPU time.

So, slow_work_sleep_till_thread_needed() is provided such that an item can put
itself on a wait queue that will wake it up when the event it is actually
interested in occurs, then call this function in lieu of calling schedule().

This function will then sleep until either the item's event occurs or another
work item appears on the queue.  If another work item is queued, but the
item's event hasn't occurred, then the work item should requeue itself and
yield the thread back to the slow-work facility by returning.

This can be used by CacheFiles for an object that is being created on one
thread to wait for an object being deleted on another thread where there is
nothing on the queue for the creation to go and wait behind.  As soon as an
item appears on the queue that could be given thread time instead, CacheFiles
can stick the creating object back on the queue and return to the slow-work
facility - assuming the object deletion didn't also complete.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/slow-work.txt | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
index 0169c9d..52bc314 100644
--- a/Documentation/slow-work.txt
+++ b/Documentation/slow-work.txt
@@ -158,6 +158,50 @@ with a requeue pending).  This can be used to work out whether an item on which
 another depends is on the queue, thus allowing a dependent item to be queued
 after it.
 
+If the above shows an item on which another depends not to be queued, then the
+owner of the dependent item might need to wait.  However, to avoid locking up
+the threads unnecessarily be sleeping in them, it can make sense under some
+circumstances to return the work item to the queue, thus deferring it until
+some other items have had a chance to make use of the yielded thread.
+
+To yield a thread and defer an item, the work function should simply enqueue
+the work item again and return.  However, this doesn't work if there's nothing
+actually on the queue, as the thread just vacated will jump straight back into
+the item's work function, thus busy waiting on a CPU.
+
+Instead, the item should use the thread to wait for the dependency to go away,
+but rather than using schedule() or schedule_timeout() to sleep, it should use
+the following function:
+
+	bool requeue = slow_work_sleep_till_thread_needed(
+			struct slow_work *work,
+			signed long *_timeout);
+
+This will add a second wait and then sleep, such that it will be woken up if
+either something appears on the queue that could usefully make use of the
+thread - and behind which this item can be queued, or if the event the caller
+set up to wait for happens.  True will be returned if something else appeared
+on the queue and this work function should perhaps return, of false if
+something else woke it up.  The timeout is as for schedule_timeout().
+
+For example:
+
+	wq = bit_waitqueue(&my_flags, MY_BIT);
+	init_wait(&wait);
+	requeue = false;
+	do {
+		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
+		if (!test_bit(MY_BIT, &my_flags))
+			break;
+		requeue = slow_work_sleep_till_thread_needed(&my_work,
+							     &timeout);
+	} while (timeout > 0 && !requeue);
+	finish_wait(wq, &wait);
+	if (!test_bit(MY_BIT, &my_flags)
+		goto do_my_thing;
+	if (requeue)
+		return; // to slow_work
+
 
 ===============
 ITEM OPERATIONS
-- 
cgit v1.1


From 4fbf4291aa15926cd4fdca0ffe9122e89d0459db Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:11:04 +0000
Subject: FS-Cache: Allow the current state of all objects to be dumped

Allow the current state of all fscache objects to be dumped by doing:

	cat /proc/fs/fscache/objects

By default, all objects and all fields will be shown.  This can be restricted
by adding a suitable key to one of the caller's keyrings (such as the session
keyring):

	keyctl add user fscache:objlist "<restrictions>" @s

The <restrictions> are:

	K	Show hexdump of object key (don't show if not given)
	A	Show hexdump of object aux data (don't show if not given)

And paired restrictions:

	C	Show objects that have a cookie
	c	Show objects that don't have a cookie
	B	Show objects that are busy
	b	Show objects that aren't busy
	W	Show objects that have pending writes
	w	Show objects that don't have pending writes
	R	Show objects that have outstanding reads
	r	Show objects that don't have outstanding reads
	S	Show objects that have slow work queued
	s	Show objects that don't have slow work queued

If neither side of a restriction pair is given, then both are implied.  For
example:

	keyctl add user fscache:objlist KB @s

shows objects that are busy, and lists their object keys, but does not dump
their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
not implied.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/filesystems/caching/fscache.txt | 81 +++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index 9e94b94..cac09e1 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -299,6 +299,87 @@ proc files.
      jiffy range covered, and the SECS field the equivalent number of seconds.
 
 
+===========
+OBJECT LIST
+===========
+
+If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
+list of all the objects currently allocated and allow them to be viewed
+through:
+
+	/proc/fs/fscache/objects
+
+This will look something like:
+
+	[root@andromeda ~]# head /proc/fs/fscache/objects
+	OBJECT   PARENT   STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA       OBJECT_KEY, AUX_DATA
+	======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
+	   17e4b        2 ACTV     0   0   0   0  0     0 7b  4 0 8 | NFS.fh           DT  0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
+	   1693a        2 ACTV     0   0   0   0  0     0 7b  4 0 8 | NFS.fh           DT  0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
+
+where the first set of columns before the '|' describe the object:
+
+	COLUMN	DESCRIPTION
+	=======	===============================================================
+	OBJECT	Object debugging ID (appears as OBJ%x in some debug messages)
+	PARENT	Debugging ID of parent object
+	STAT	Object state
+	CHLDN	Number of child objects of this object
+	OPS	Number of outstanding operations on this object
+	OOP	Number of outstanding child object management operations
+	IPR
+	EX	Number of outstanding exclusive operations
+	READS	Number of outstanding read operations
+	EM	Object's event mask
+	EV	Events raised on this object
+	F	Object flags
+	S	Object slow-work work item flags
+
+and the second set of columns describe the object's cookie, if present:
+
+	COLUMN		DESCRIPTION
+	===============	=======================================================
+	NETFS_COOKIE_DEF Name of netfs cookie definition
+	TY		Cookie type (IX - index, DT - data, hex - special)
+	FL		Cookie flags
+	NETFS_DATA	Netfs private data stored in the cookie
+	OBJECT_KEY	Object key	} 1 column, with separating comma
+	AUX_DATA	Object aux data	} presence may be configured
+
+The data shown may be filtered by attaching the a key to an appropriate keyring
+before viewing the file.  Something like:
+
+		keyctl add user fscache:objlist <restrictions> @s
+
+where <restrictions> are a selection of the following letters:
+
+	K	Show hexdump of object key (don't show if not given)
+	A	Show hexdump of object aux data (don't show if not given)
+
+and the following paired letters:
+
+	C	Show objects that have a cookie
+	c	Show objects that don't have a cookie
+	B	Show objects that are busy
+	b	Show objects that aren't busy
+	W	Show objects that have pending writes
+	w	Show objects that don't have pending writes
+	R	Show objects that have outstanding reads
+	r	Show objects that don't have outstanding reads
+	S	Show objects that have slow work queued
+	s	Show objects that don't have slow work queued
+
+If neither side of a letter pair is given, then both are implied.  For example:
+
+	keyctl add user fscache:objlist KB @s
+
+shows objects that are busy, and lists their object keys, but does not dump
+their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
+not implied.
+
+By default all objects and all fields will be shown.
+
+
 =========
 DEBUGGING
 =========
-- 
cgit v1.1


From 52bd75fdb135d6133d878ae60c6e7e3f4ebc1cfc Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:11:08 +0000
Subject: FS-Cache: Add counters for entry/exit to/from cache operation
 functions

Count entries to and exits from cache operation table functions.  Maintain
these as a single counter that's added to or removed from as appropriate.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/filesystems/caching/fscache.txt | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index cac09e1..b6c32c0 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -274,6 +274,22 @@ proc files.
 		dfr=N	Number of async ops queued for deferred release
 		rel=N	Number of async ops released
 		gc=N	Number of deferred-release async ops garbage collected
+	CacheOp	alo=N	Number of in-progress alloc_object() cache ops
+		luo=N	Number of in-progress lookup_object() cache ops
+		luc=N	Number of in-progress lookup_complete() cache ops
+		gro=N	Number of in-progress grab_object() cache ops
+		upo=N	Number of in-progress update_object() cache ops
+		dro=N	Number of in-progress drop_object() cache ops
+		pto=N	Number of in-progress put_object() cache ops
+		syn=N	Number of in-progress sync_cache() cache ops
+		atc=N	Number of in-progress attr_changed() cache ops
+		rap=N	Number of in-progress read_or_alloc_page() cache ops
+		ras=N	Number of in-progress read_or_alloc_pages() cache ops
+		alp=N	Number of in-progress allocate_page() cache ops
+		als=N	Number of in-progress allocate_pages() cache ops
+		wrp=N	Number of in-progress write_page() cache ops
+		ucp=N	Number of in-progress uncache_page() cache ops
+		dsp=N	Number of in-progress dissociate_pages() cache ops
 
 
  (*) /proc/fs/fscache/histogram
-- 
cgit v1.1


From 5753c441889253e4323eee85f791a1d64cf08196 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:11:19 +0000
Subject: FS-Cache: Permit cache retrieval ops to be interrupted in the initial
 wait phase

Permit the operations to retrieve data from the cache or to allocate space in
the cache for future writes to be interrupted whilst they're waiting for
permission for the operation to proceed.  Typically this wait occurs whilst the
cache object is being looked up on disk in the background.

If an interruption occurs, and the operation has not yet been given the
go-ahead to run, the operation is dequeued and cancelled, and control returns
to the read operation of the netfs routine with none of the requested pages
having been read or in any way marked as known by the cache.

This means that the initial wait is done interruptibly rather than
uninterruptibly.

In addition, extra stats values are made available to show the number of ops
cancelled and the number of cache space allocations interrupted.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/filesystems/caching/fscache.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index b6c32c0..0a77868 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -250,6 +250,7 @@ proc files.
 		ok=N	Number of successful alloc reqs
 		wt=N	Number of alloc reqs that waited on lookup completion
 		nbf=N	Number of alloc reqs rejected -ENOBUFS
+		int=N	Number of alloc reqs aborted -ERESTARTSYS
 		ops=N	Number of alloc reqs submitted
 		owt=N	Number of alloc reqs waited for CPU time
 	Retrvls	n=N	Number of retrieval (read) requests seen
@@ -271,6 +272,7 @@ proc files.
 	Ops	pend=N	Number of times async ops added to pending queues
 		run=N	Number of times async ops given CPU time
 		enq=N	Number of times async ops queued for processing
+		can=N	Number of async ops cancelled
 		dfr=N	Number of async ops queued for deferred release
 		rel=N	Number of async ops released
 		gc=N	Number of deferred-release async ops garbage collected
-- 
cgit v1.1


From 1bccf513ac49d44604ba1cddcc29f5886e70f1b6 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:11:25 +0000
Subject: FS-Cache: Fix lock misorder in fscache_write_op()

FS-Cache has two structs internally for keeping track of the internal state of
a cached file: the fscache_cookie struct, which represents the netfs's state,
and fscache_object struct, which represents the cache's state.  Each has a
pointer that points to the other (when both are in existence), and each has a
spinlock for pointer maintenance.

Since netfs operations approach these structures from the cookie side, they get
the cookie lock first, then the object lock.  Cache operations, on the other
hand, approach from the object side, and get the object lock first.  It is not
then permitted for a cache operation to get the cookie lock whilst it is
holding the object lock lest deadlock occur; instead, it must do one of two
things:

 (1) increment the cookie usage counter, drop the object lock and then get both
     locks in order, or

 (2) simply hold the object lock as certain parts of the cookie may not be
     altered whilst the object lock is held.

It is also not permitted to follow either pointer without holding the lock at
the end you start with.  To break the pointers between the cookie and the
object, both locks must be held.

fscache_write_op(), however, violates the locking rules: It attempts to get the
cookie lock without (a) checking that the cookie pointer is a valid pointer,
and (b) holding the object lock to protect the cookie pointer whilst it follows
it.  This is so that it can access the pending page store tree without
interference from __fscache_write_page().

This is fixed by splitting the cookie lock, such that the page store tracking
tree is protected by its own lock, and checking that the cookie pointer is
non-NULL before we attempt to follow it whilst holding the object lock.

The new lock is subordinate to both the cookie lock and the object lock, and so
should be taken after those.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/filesystems/caching/fscache.txt | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index 0a77868..9cf2cfb 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -269,6 +269,9 @@ proc files.
 		oom=N	Number of store reqs failed -ENOMEM
 		ops=N	Number of store reqs submitted
 		run=N	Number of store reqs granted CPU time
+		pgs=N	Number of pages given store req processing time
+		rxd=N	Number of store reqs deleted from tracking tree
+		olm=N	Number of store reqs over store limit
 	Ops	pend=N	Number of times async ops added to pending queues
 		run=N	Number of times async ops given CPU time
 		enq=N	Number of times async ops queued for processing
-- 
cgit v1.1


From e3d4d28b1c8cc7c26536a50b43d86ccd39878550 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:11:32 +0000
Subject: FS-Cache: Handle read request vs lookup, creation or other cache
 failure

FS-Cache doesn't correctly handle the netfs requesting a read from the cache
on an object that failed or was withdrawn by the cache.  A trace similar to
the following might be seen:

	CacheFiles: Lookup failed error -105
	[exe   ] unexpected submission OP165afe [OBJ6cac OBJECT_LC_DYING]
	[exe   ] objstate=OBJECT_LC_DYING [OBJECT_LC_DYING]
	[exe   ] objflags=0
	[exe   ] objevent=9 [fffffffffffffffb]
	[exe   ] ops=0 inp=0 exc=0
	Pid: 6970, comm: exe Not tainted 2.6.32-rc6-cachefs #50
	Call Trace:
	 [<ffffffffa0076477>] fscache_submit_op+0x3ff/0x45a [fscache]
	 [<ffffffffa0077997>] __fscache_read_or_alloc_pages+0x187/0x3c4 [fscache]
	 [<ffffffffa00b6480>] ? nfs_readpage_from_fscache_complete+0x0/0x66 [nfs]
	 [<ffffffffa00b6388>] __nfs_readpages_from_fscache+0x7e/0x176 [nfs]
	 [<ffffffff8108e483>] ? __alloc_pages_nodemask+0x11c/0x5cf
	 [<ffffffffa009d796>] nfs_readpages+0x114/0x1d7 [nfs]
	 [<ffffffff81090314>] __do_page_cache_readahead+0x15f/0x1ec
	 [<ffffffff81090228>] ? __do_page_cache_readahead+0x73/0x1ec
	 [<ffffffff810903bd>] ra_submit+0x1c/0x20
	 [<ffffffff810906bb>] ondemand_readahead+0x227/0x23a
	 [<ffffffff81090762>] page_cache_sync_readahead+0x17/0x19
	 [<ffffffff8108a99e>] generic_file_aio_read+0x236/0x5a0
	 [<ffffffffa00937bd>] nfs_file_read+0xe4/0xf3 [nfs]
	 [<ffffffff810b2fa2>] do_sync_read+0xe3/0x120
	 [<ffffffff81354cc3>] ? _spin_unlock_irq+0x2b/0x31
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff811848e5>] ? selinux_file_permission+0x5d/0x10f
	 [<ffffffff81352bdb>] ? thread_return+0x3e/0x101
	 [<ffffffff8117d7b0>] ? security_file_permission+0x11/0x13
	 [<ffffffff810b3b06>] vfs_read+0xaa/0x16f
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff810b3c84>] sys_read+0x45/0x6c
	 [<ffffffff8100ae2b>] system_call_fastpath+0x16/0x1b

The object state might also be OBJECT_DYING or OBJECT_WITHDRAWING.

This should be handled by simply rejecting the new operation with ENOBUFS.
There's no need to log an error for it.  Events of this type now appear in the
stats file under Ops:rej.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/filesystems/caching/fscache.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index 9cf2cfb..057a3c7 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -276,6 +276,7 @@ proc files.
 		run=N	Number of times async ops given CPU time
 		enq=N	Number of times async ops queued for processing
 		can=N	Number of async ops cancelled
+		rej=N	Number of async ops rejected due to object lookup/create failure
 		dfr=N	Number of async ops queued for deferred release
 		rel=N	Number of async ops released
 		gc=N	Number of deferred-release async ops garbage collected
-- 
cgit v1.1


From 201a15428bd54f83eccec8b7c64a04b8f9431204 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:11:35 +0000
Subject: FS-Cache: Handle pages pending storage that get evicted under OOM
 conditions

Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs]
	 [<ffffffff810885d3>] try_to_release_page+0x32/0x3b
	 [<ffffffff81093203>] shrink_page_list+0x316/0x4ac
	 [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c
	 [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb
	 [<ffffffff81093aa2>] shrink_list+0x8d/0x8f
	 [<ffffffff81093d1c>] shrink_zone+0x278/0x33c
	 [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba
	 [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392
	 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212
	 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf
	 [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa
	 [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb
	 [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c
	 [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29
	 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385
	 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae
	 [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae
	 [<ffffffff810b2e82>] do_sync_write+0xe3/0x120
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8
	 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89
	 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache]
	 [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/filesystems/caching/fscache.txt   |  4 ++++
 Documentation/filesystems/caching/netfs-api.txt | 21 ++++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index 057a3c7..7097fd2 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -272,6 +272,10 @@ proc files.
 		pgs=N	Number of pages given store req processing time
 		rxd=N	Number of store reqs deleted from tracking tree
 		olm=N	Number of store reqs over store limit
+	VmScan	nos=N	Number of release reqs against pages with no pending store
+		gon=N	Number of release reqs against pages stored by time lock granted
+		bsy=N	Number of release reqs ignored due to in-progress store
+		can=N	Number of page stores cancelled due to release req
 	Ops	pend=N	Number of times async ops added to pending queues
 		run=N	Number of times async ops given CPU time
 		enq=N	Number of times async ops queued for processing
diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
index 2666b1e..1902c57 100644
--- a/Documentation/filesystems/caching/netfs-api.txt
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -641,7 +641,7 @@ data file must be retired (see the relinquish cookie function below).
 
 Furthermore, note that this does not cancel the asynchronous read or write
 operation started by the read/alloc and write functions, so the page
-invalidation and release functions must use:
+invalidation functions must use:
 
 	bool fscache_check_page_write(struct fscache_cookie *cookie,
 				      struct page *page);
@@ -654,6 +654,25 @@ to see if a page is being written to the cache, and:
 to wait for it to finish if it is.
 
 
+When releasepage() is being implemented, a special FS-Cache function exists to
+manage the heuristics of coping with vmscan trying to eject pages, which may
+conflict with the cache trying to write pages to the cache (which may itself
+need to allocate memory):
+
+	bool fscache_maybe_release_page(struct fscache_cookie *cookie,
+					struct page *page,
+					gfp_t gfp);
+
+This takes the netfs cookie, and the page and gfp arguments as supplied to
+releasepage().  It will return false if the page cannot be released yet for
+some reason and if it returns true, the page has been uncached and can now be
+released.
+
+To make a page available for release, this function may wait for an outstanding
+storage request to complete, or it may attempt to cancel the storage request -
+in which case the page will not be stored in the cache this time.
+
+
 ==========================
 INDEX AND DATA FILE UPDATE
 ==========================
-- 
cgit v1.1


From 60d543ca724be155c2b6166e36a00c80b21bd810 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:11:45 +0000
Subject: FS-Cache: Start processing an object's operations on that object's
 death

Start processing an object's operations when that object moves into the DYING
state as the object cannot be destroyed until all its outstanding operations
have completed.

Furthermore, make sure that read and allocation operations handle being woken
up on a dead object.  Such events are recorded in the Allocs.abt and
Retrvls.abt statistics as viewable through /proc/fs/fscache/stats.

The code for waiting for object activation for the read and allocation
operations is also extracted into its own function as it is much the same in
all cases, differing only in the stats incremented.

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/filesystems/caching/fscache.txt | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index 7097fd2..3c23411 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -253,6 +253,7 @@ proc files.
 		int=N	Number of alloc reqs aborted -ERESTARTSYS
 		ops=N	Number of alloc reqs submitted
 		owt=N	Number of alloc reqs waited for CPU time
+		abt=N	Number of alloc reqs aborted due to object death
 	Retrvls	n=N	Number of retrieval (read) requests seen
 		ok=N	Number of successful retr reqs
 		wt=N	Number of retr reqs that waited on lookup completion
@@ -262,6 +263,7 @@ proc files.
 		oom=N	Number of retr reqs failed -ENOMEM
 		ops=N	Number of retr reqs submitted
 		owt=N	Number of retr reqs waited for CPU time
+		abt=N	Number of retr reqs aborted due to object death
 	Stores	n=N	Number of storage (write) requests seen
 		ok=N	Number of successful store reqs
 		agn=N	Number of store reqs on a page already pending storage
-- 
cgit v1.1


From fee096deb4f33897937b974cb2c5168bab7935be Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Thu, 19 Nov 2009 18:12:05 +0000
Subject: CacheFiles: Catch an overly long wait for an old active object

Catch an overly long wait for an old, dying active object when we want to
replace it with a new one.  The probability is that all the slow-work threads
are hogged, and the delete can't get a look in.

What we do instead is:

 (1) if there's nothing in the slow work queue, we sleep until either the dying
     object has finished dying or there is something in the slow work queue
     behind which we can queue our object.

 (2) if there is something in the slow work queue, we return ETIMEDOUT to
     fscache_lookup_object(), which then puts us back on the slow work queue,
     presumably behind the deletion that we're blocked by.  We are then
     deferred for a while until we work our way back through the queue -
     without blocking a slow-work thread unnecessarily.

A backtrace similar to the following may appear in the log without this patch:

	INFO: task kslowd004:5711 blocked for more than 120 seconds.
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	kslowd004     D 0000000000000000     0  5711      2 0x00000080
	 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000
	 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8
	Call Trace:
	 [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffffa011c4e1>] cachefiles_wait_bit+0x9/0xd [cachefiles]
	 [<ffffffff81353153>] __wait_on_bit+0x43/0x76
	 [<ffffffff8111ae39>] ? ext3_xattr_get+0x1ec/0x270
	 [<ffffffff813531ef>] out_of_line_wait_on_bit+0x69/0x74
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffff8104c125>] ? wake_bit_function+0x0/0x2e
	 [<ffffffffa011bc79>] cachefiles_mark_object_active+0x203/0x23b [cachefiles]
	 [<ffffffffa011c209>] cachefiles_walk_to_object+0x558/0x827 [cachefiles]
	 [<ffffffffa011a429>] cachefiles_lookup_object+0xac/0x12a [cachefiles]
	 [<ffffffffa00aa1e9>] fscache_lookup_object+0x1c7/0x214 [fscache]
	 [<ffffffffa00aafc5>] fscache_object_state_machine+0xa5/0x52d [fscache]
	 [<ffffffffa00ab4ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20
	1 lock held by kslowd004/5711:
	 #0:  (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffffa011be64>] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles]

Signed-off-by: David Howells <dhowells@redhat.com>
---
 Documentation/filesystems/caching/fscache.txt | 1 +
 1 file changed, 1 insertion(+)

(limited to 'Documentation')

diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
index 3c23411..a91e2e2 100644
--- a/Documentation/filesystems/caching/fscache.txt
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -235,6 +235,7 @@ proc files.
 		neg=N	Number of negative lookups made
 		pos=N	Number of positive lookups made
 		crt=N	Number of objects created by lookup
+		tmo=N	Number of lookups timed out and requeued
 	Updates	n=N	Number of update cookie requests seen
 		nul=N	Number of upd reqs given a NULL parent
 		run=N	Number of upd reqs granted CPU time
-- 
cgit v1.1


From c69f677cc852f3f7b2342ab2f1598670a463d576 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Fri, 20 Nov 2009 20:48:31 +0100
Subject: fbdev: Migrate mailing lists to vger

The fbdev mailing lists at SourceForge have been migrated to a single
mailing list at kernel.org: linux-fbdev@vger.kernel.org.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/fb/framebuffer.txt | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/fb/framebuffer.txt b/Documentation/fb/framebuffer.txt
index b3e3a03..fe79e3c 100644
--- a/Documentation/fb/framebuffer.txt
+++ b/Documentation/fb/framebuffer.txt
@@ -312,10 +312,8 @@ and to the following documentation:
 8. Mailing list
 ---------------
 
-There are several frame buffer device related mailing lists at SourceForge:
-  - linux-fbdev-announce@lists.sourceforge.net, for announcements,
-  - linux-fbdev-user@lists.sourceforge.net, for generic user support,
-  - linux-fbdev-devel@lists.sourceforge.net, for project developers.
+There is a frame buffer device related mailing list at kernel.org:
+linux-fbdev@vger.kernel.org.
 
 Point your web browser to http://sourceforge.net/projects/linux-fbdev/ for
 subscription information and archive browsing.
-- 
cgit v1.1


From f13a48bd798a159291ca583b95453171b88b7448 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells@redhat.com>
Date: Tue, 1 Dec 2009 15:36:11 +0000
Subject: SLOW_WORK: Move slow_work's proc file to debugfs

Move slow_work's debugging proc file to debugfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Requested-and-acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 Documentation/slow-work.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'Documentation')

diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
index 52bc314..9dbf447 100644
--- a/Documentation/slow-work.txt
+++ b/Documentation/slow-work.txt
@@ -279,9 +279,9 @@ The slow-work thread pool has a number of configurables:
 VIEWING EXECUTING AND QUEUED ITEMS
 ==================================
 
-If CONFIG_SLOW_WORK_PROC is enabled, a proc file is made available:
+If CONFIG_SLOW_WORK_DEBUG is enabled, a debugfs file is made available:
 
-	/proc/slow_work_rq
+	/sys/kernel/debug/slow_work/runqueue
 
 through which the list of work items being executed and the queues of items to
 be executed may be viewed.  The owner of a work item is given the chance to
-- 
cgit v1.1