aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [POWERPC] Early serial debug support for PPC44xDavid Gibson2007-05-089-26/+97
| | | | | | | | | | This adds support for early serial debugging via the built in port on IBM/AMCC PowerPC 44x CPUs. It uses a bolted TLB entry in address space 1 for the UART's mapping, allowing robust debugging both before and after the initialization of the MMU. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] Support for the Ebony 440GP reference board in arch/powerpcDavid Gibson2007-05-0815-11/+1418
| | | | | | | | | | | | This adds platform support code for the Ebony (440GP) evaluation board. This includes both code in arch/powerpc/platforms/44x for board initialization, and zImage wrapper code to correctly tweak the flattened device tree based on information from the firmware. The zImage supports both IBM OpenBIOS (aka "treeboot") and old versions of uboot which don't support a flattened device tree. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] Add device tree for EbonyDavid Gibson2007-05-081-0/+307
| | | | | | | | | Add a device tree for the Ebony evaluation board (440GP based). This tree is not complete or finalized. This tree needs a version of dtc recent enough to include reference-to-labels to process. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] Add powerpc/platforms/44x, disable platforms/4xx for nowDavid Gibson2007-05-085-11/+35
| | | | | | | | | | | | | | | | | | | This prepares for Ebony/440 support by creating an arch/powerpc/platforms/44x directory. It is populated with a single misc_44x.S file, into which is moved the 44x specific reset code from head_44x.S (on the grounds that we should really stop clogging up the head_* files with random asm helper routines). At the same time, we disable the (empty save Kconfig and Makefile) arch/powerpc/platforms/4xx directory from the arch/powerpc/platforms Makefile. Contrary to the comment in arch/powerpc/platforms/4xx/Makefile, attempting to build such an empty Makefile will fail, thus breaking compile for the 44x platforms we're about to add. It can go back in once we start porting some of the 40x platforms (and thus it becomes non-empty). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] MPIC U3/U4 MSI backendMichael Ellerman2007-05-084-6/+205
| | | | | | | | | | | | | | | | MPIC U3/U4 MSI backend. Based on code from Segher, heavily hacked by me. This only deals with MSI on U3/U4 MPICs, aka. CPC 9x5. If we find a U3/U4 then we enable this backend, ie. take over the ppc_md MSI hooks. We might need more elaborate logic in future to decide which backend is enabled. We need our own irq_chip so that we can do MSI masking/unmasking on the device itself. We also need to mask explicitly on shutdown to make sure we don't get bitten by lazy-disable semantics. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] MPIC MSI allocatorMichael Ellerman2007-05-085-1/+222
| | | | | | | | | | | To support MSI on MPIC we need a way to reserve and allocate hardware irq numbers, this patch implements an allocator for that purpose. New firmware platforms must define a "msi-available-ranges" property on their MPIC node for MSI to work. For U3/U4 we do a best-guess setup. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] Enable MSI mappings for MPICMichael Ellerman2007-05-081-0/+45
| | | | | | | | | | On some Apple machines the HT MSI mappings are not enabled by firmware, so we need to do it by hand. We can't use the pci routines as this code runs too early. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] Tell Phyp we support MSIMichael Ellerman2007-05-081-1/+7
| | | | | | | | Tell Phyp we support MSI via the client architecture support mechanism. Signed-off-by: Jake Moilanen <moilanen@austin.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] RTAS MSI implementationMichael Ellerman2007-05-082-0/+271
| | | | | | | | | | | | | | | | | | | | | | Implement MSI support via RTAS (RTAS = run-time firmware on pSeries machines). For now we assumes that if the required RTAS tokens for MSI are present, then we want to use the RTAS MSI routines. When RTAS is managing MSIs for us, it will/may enable MSI on devices that support it by default. This is contrary to the Linux model where a device is in LSI mode until the driver requests MSIs. To remedy this we add a pci_irq_fixup call, which disables MSI if they've been assigned by firmware and the device also supports LSI. Devices that don't support LSI at all will be left as is, drivers are still expected to call pci_enable_msi() before using the device. At the moment there is no pci_irq_fixup on pSeries, so we can just set it unconditionally. If other platforms use the RTAS MSI backend they'll need to check that still holds. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] PowerPC MSI infrastructureMichael Ellerman2007-05-084-0/+48
| | | | | | | | | | | | | This provides the architecture specific hooks to support MSI on powerpc. We implement the newly added arch_setup_msi_irqs() and arch_teardown_msi_irqs(), and then delegate to ppc_md routines. Platforms that don't implement MSI will leave the ppc_md calls blank, arch_msi_check_device() will detect this and return ENOSYS. Drivers should detect this error and continue to use LSI. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] Rip out the existing powerpc msi stubsMichael Ellerman2007-05-082-32/+0
| | | | | | | | Rip out the existing powerpc msi stubs. These were the start of an implementation based on ppc_md calls, but were never used in mainline. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] Remove use of 4level-fixup.h for ppc32David Gibson2007-05-085-47/+31
| | | | | | | | | | | | | | For 32-bit systems, powerpc still relies on the 4level-fixup.h hack, to pretend that the generic pagetable handling stuff is 3-levels rather than 4. This patch removes this, instead using the newer pgtable-nopmd.h to handle the elision of both the pud and pmd pagetable levels (ppc32 pagetables are actually 2 levels). This removes a little extraneous code, and makes it more easily compared to the 64-bit pagetable code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
* [POWERPC] Add powerpc PCI-E reset API implementationBrian King2007-05-081-0/+30
| | | | | | | | | | | | | | Adds the pSeries platform implementation for a new PCI API which can be used to issue various types of PCI-E reset, including PCI-E warm reset and PCI-E hot reset. This is needed for an ipr PCI-E adapter which does not properly implement BIST. Running BIST on this adapter results in PCI-E errors. The only reliable reset mechanism that exists on this hardware is PCI Fundamental reset (warm reset). Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* Merge branch 'linux-2.6'Paul Mackerras2007-05-082278-48339/+147975
|\
| * Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds2007-05-072-9/+9
| |\ | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SCSI] esp_scsi: Fix section mismatch warnings. [VIDEO] sunxvr2500: Fix PCI device ID table.
| | * [SCSI] esp_scsi: Fix section mismatch warnings.Martin Habets2007-05-071-1/+1
| | | | | | | | | | | | | | | Signed-off-by: Martin Habets <errandir_news@mph.eclipse.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * [VIDEO] sunxvr2500: Fix PCI device ID table.David S. Miller2007-05-071-8/+8
| | | | | | | | | | | | | | | | | | Noticed by Meelis Roos. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Fix bluetooth HCI sysfs compileLinus Torvalds2007-05-071-2/+2
| |/ | | | | | | | | | | | | More fallout from the removal of "struct subsystem" from the core device model. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6Linus Torvalds2007-05-0724-106/+1861
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] update memory attribute aliasing documentation & test cases [IA64] fail mmaps that span areas with incompatible attributes [IA64] allow WB /sys/.../legacy_mem mmaps [IA64] make ioremap avoid unsupported attributes [IA64] rename ioremap variables to match i386 [IA64] relax per-cpu TLB requirement to DTC [IA64] remove per-cpu ia64_phys_stacked_size_p8 [IA64] Fix example error injection program [IA64] Itanium MC Error Injection Tool: pal_mc_error_inject() interface [IA64] Itanium MC Error Injection Tool: Makefile changes [IA64] Itanium MC Error Injection Tool: Driver sysfs interface [IA64] Itanium MC Error Injection Tool: Doc and sample application [IA64] Itanium MC Error Injection Tool: Kernel configuration
| | * Pull mem-attribute into release branchTony Luck2007-04-306-58/+392
| | |\
| | | * [IA64] update memory attribute aliasing documentation & test casesBjorn Helgaas2007-03-302-34/+284
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Updates documentation and adds some test cases. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * [IA64] fail mmaps that span areas with incompatible attributesBjorn Helgaas2007-03-301-3/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Example memory map (from HP sx1000 with VGA enabled): 0x00000 - 0x9FFFF supports only WB (cacheable) access 0xA0000 - 0xBFFFF supports only UC (uncacheable) access 0xC0000 - 0xFFFFF supports only WB (cacheable) access Some versions of X map the entire 0x00000-0xFFFFF area at once. With the example above, this mmap must fail because there's no memory attribute that's safe for the entire area. Prior to this patch, we performed the mmap with a UC mapping. When X accessed the WB memory at 0xC0000, it caused an MCA. The crash can happen when mapping 0xC0000 from either /dev/mem or a /sys/.../legacy_mem file. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * [IA64] allow WB /sys/.../legacy_mem mmapsBjorn Helgaas2007-03-301-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow cacheable mmaps of legacy_mem if WB access is supported for the region. The "legacy_mem" file often contains a shadow option ROM, and some versions of X depend on this. Tim Yamin <plasm@roo.me.uk> reported that this change fixes X on a Dell PowerEdge 3250. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * [IA64] make ioremap avoid unsupported attributesBjorn Helgaas2007-03-302-10/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Example memory map (from HP sx1000 with VGA enabled): 0x00000 - 0x9FFFF supports only WB (cacheable) access 0xA0000 - 0xBFFFF supports only UC (uncacheable) access 0xC0000 - 0xFFFFF supports only WB (cacheable) access pci_read_rom() indirectly uses ioremap(0xC0000) to read the shadow VGA option ROM. ioremap() used to default to a 16MB or 64MB UC kernel identity mapping, which would cause an MCA when reading 0xC0000 since only WB is supported there. X uses reads the option ROM to initialize devices. A smaller test case is: # echo 1 > /sys/bus/pci/devices/0000:aa:03.0/rom # cp /sys/bus/pci/devices/0000:aa:03.0/rom x To avoid this, we can use the same ioremap_page_range() strategy that most architectures use for all ioremaps. These page table mappings come out of the vmalloc area. On ia64, these are in region 5 (0xA... addresses) and typically use 16KB or 64KB mappings instead of 16MB or 64MB mappings. The smaller mappings give more flexibility to use the correct attributes. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * [IA64] rename ioremap variables to match i386Bjorn Helgaas2007-03-301-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional change, just use the same names as i386. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | * | Pull percpu-dtc into release branchTony Luck2007-04-3012-48/+63
| | |\ \
| | | * | [IA64] relax per-cpu TLB requirement to DTCChen, Kenneth W2007-02-064-41/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of pinning per-cpu TLB into a DTR, use DTC. This will free up one TLB entry for application, or even kernel if access pattern to per-cpu data area has high temporal locality. Since per-cpu is mapped at the top of region 7 address, we just need to add special case in alt_dtlb_miss. The physical address of per-cpu data is already conveniently stored in IA64_KR(PER_CPU_DATA). Latency for alt_dtlb_miss is not affected as we can hide all the latency. It was measured that alt_dtlb_miss handler has 23 cycles latency before and after the patch. The performance effect is massive for applications that put lots of tlb pressure on CPU. Workload environment like database online transaction processing or application uses tera-byte of memory would benefit the most. Measurement with industry standard database benchmark shown an upward of 1.6% gain. While smaller workloads like cpu, java also showing small improvement. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * | [IA64] remove per-cpu ia64_phys_stacked_size_p8Chen, Kenneth W2007-02-068-7/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not efficient to use a per-cpu variable just to store how many physical stack register a cpu has. Ever since the incarnation of ia64 up till upcoming Montecito processor, that variable has "glued" to 96. Having a variable in memory means that the kernel is burning an extra cacheline access on every syscall and kernel exit path. Such "static" value is better served with the instruction patching utility exists today. Convert ia64_phys_stacked_size_p8 into dynamic insn patching. This also has a pleasant side effect of eliminating access to per-cpu area while psr.ic=0 in the kernel exit path. (fixable for per-cpu DTC work, but why bother?) There are some concerns with the default value that the instruc- tion encoded in the kernel image. It shouldn't be concerned. The reasons are: (1) cpu_init() is called at CPU initialization. In there, we find out physical stack register size from PAL and patch two instructions in kernel exit code. The code in question can not be executed before the patching is done. (2) current implementation stores zero in ia64_phys_stacked_size_p8, and that's what the current kernel exit path loads the value with. With the new code, it is equivalent that we store reg size 96 in ia64_phys_stacked_size_p8, thus creating a better safety net. Given (1) above can never fail, having (2) is just a bonus. All in all, this patch allow one less memory reference in the kernel exit path, thus reducing syscall and interrupt return latency; and avoid polluting potential useful data in the CPU cache. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | * | | Pull error-inject into release branchTony Luck2007-04-306-0/+1406
| | |\ \ \
| | | * | | [IA64] Fix example error injection programTony Luck2007-02-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Progam accessed using /sys/devices/system/node/node0/cpu%d/err_inject/ This path only exists for CONFIG_NUMA=y systems. Better to use /sys/devices/system/cpu/cpu%d/err_inject/ which is available on all systems. Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * | | [IA64] Itanium MC Error Injection Tool: pal_mc_error_inject() interfaceFenghua Yu2007-01-291-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements pal_mc_error_inject() interface in kernel. Both physical mode and virtual mode are supported. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * | | [IA64] Itanium MC Error Injection Tool: Makefile changesFenghua Yu2007-01-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch has Makefile changes. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * | | [IA64] Itanium MC Error Injection Tool: Driver sysfs interfaceFenghua Yu2007-01-291-0/+293
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This kernel driver patch provides sysfs interface for user application to call pal_mc_error_inject() procedure. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * | | [IA64] Itanium MC Error Injection Tool: Doc and sample applicationFenghua Yu2007-01-291-0/+1068
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains a documention and sample application. Since the sample application has ~1000 lines of code, it might not be suitable in a kernel documention in kenrel tree. If you think this is not good place to hold the sample application, please let me know and I'm open to other choices e.g. sourceforge etc. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| | | * | | [IA64] Itanium MC Error Injection Tool: Kernel configurationFenghua Yu2007-01-292-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch has kenrel configuration changes for the MC Error Injection Tool. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | | | | Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2007-05-0715-205/+546
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux: gfs2: nfs lock support for gfs2 lockd: add code to handle deferred lock requests lockd: always preallocate block in nlmsvc_lock() lockd: handle test_lock deferrals lockd: pass cookie in nlmsvc_testlock lockd: handle fl_grant callbacks lockd: save lock state on deferral locks: add fl_grant callback for asynchronous lock return nfsd4: Convert NFSv4 to new lock interface locks: add lock cancel command locks: allow {vfs,posix}_lock_file to return conflicting lock locks: factor out generic/filesystem switch from setlock code locks: factor out generic/filesystem switch from test_lock locks: give posix_test_lock same interface as ->lock locks: make ->lock release private data before returning in GETLK case locks: create posix-to-flock helper functions locks: trivial removal of unnecessary parentheses
| | * | | | | gfs2: nfs lock support for gfs2Marc Eshel2007-05-062-10/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add NFS lock support to GFS2. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
| | * | | | | lockd: add code to handle deferred lock requestsMarc Eshel2007-05-064-7/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite nlmsvc_lock() to use the asynchronous interface. As with testlock, we answer nlm requests in nlmsvc_lock by first looking up the block and then using the results we find in the block if B_QUEUED is set, and calling vfs_lock_file() otherwise. If this a new lock request and we get -EINPROGRESS return on a non-blocking request then we defer the request. Also modify nlmsvc_unlock() to call the filesystem method if appropriate. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| | * | | | | lockd: always preallocate block in nlmsvc_lock()Marc Eshel2007-05-061-23/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally we could skip ever having to allocate a block in the case where the client asks for a non-blocking lock, or asks for a blocking lock that succeeds immediately. However we're going to want to always look up a block first in order to check whether we're revisiting a deferred lock call, and to be prepared to handle the case where the filesystem returns -EINPROGRESS--in that case we want to make sure the lock we've given the filesystem is the one embedded in the block that we'll use to track the deferred request. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| | * | | | | lockd: handle test_lock deferralsMarc Eshel2007-05-063-16/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite nlmsvc_testlock() to use the new asynchronous interface: instead of immediately doing a posix_test_lock(), we first look for a matching block. If the subsequent test_lock returns anything other than -EINPROGRESS, we then remove the block we've found and return the results. If it returns -EINPROGRESS, then we defer the lock request. In the case where the block we find in the first step has B_QUEUED set, we bypass the vfs_test_lock entirely, instead using the block to decide how to respond: with nlm_lck_denied if B_TIMED_OUT is set. with nlm_granted if B_GOT_CALLBACK is set. by dropping if neither B_TIMED_OUT nor B_GOT_CALLBACK is set Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| | * | | | | lockd: pass cookie in nlmsvc_testlockMarc Eshel2007-05-064-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change NLM internal interface to pass more information for test lock; we need this to make sure the cookie information is pushed down to the place where we do request deferral, which is handled for testlock by the following patch. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| | * | | | | lockd: handle fl_grant callbacksMarc Eshel2007-05-061-4/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add code to handle file system callback when the lock is finally granted. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| | * | | | | lockd: save lock state on deferralMarc Eshel2007-05-062-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to keep some state for a pending asynchronous lock request, so this patch adds that state to struct nlm_block. This also adds a function which defers the request, by calling rqstp->rq_chandle.defer and storing the resulting deferred request in a nlm_block structure which we insert into lockd's global block list. That new function isn't called yet, so it's dead code until a later patch. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| | * | | | | locks: add fl_grant callback for asynchronous lock returnMarc Eshel2007-05-062-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Acquiring a lock on a cluster filesystem may require communication with remote hosts, and to avoid blocking lockd or nfsd threads during such communication, we allow the results to be returned asynchronously. When a ->lock() call needs to block, the file system will return -EINPROGRESS, and then later return the results with a call to the routine in the fl_grant field of the lock_manager_operations struct. This differs from the case when ->lock returns -EAGAIN to a blocking lock request; in that case, the filesystem calls fl_notify when the lock is granted, and the caller retries the original lock. So while fl_notify is merely a hint to the caller that it should retry, fl_grant actually communicates the final result of the lock operation (with the lock already acquired in the succesful case). Therefore fl_grant takes a lock, a status and, for the test lock case, a conflicting lock. We also allow fl_grant to return an error to the filesystem, to handle the case where the fl_grant requests arrives after the lock manager has already given up waiting for it. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| | * | | | | nfsd4: Convert NFSv4 to new lock interfaceMarc Eshel2007-05-061-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert NFSv4 to the new lock interface. We don't define any callback for now, so we're not taking advantage of the asynchronous feature--that's less critical for the multi-threaded nfsd then it is for the single-threaded lockd. But this does allow a cluster filesystems to export cluster-coherent locking to NFS. Note that it's cluster filesystems that are the issue--of the filesystems that define lock methods (nfs, cifs, etc.), most are not exportable by nfsd. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
| | * | | | | locks: add lock cancel commandMarc Eshel2007-05-063-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lock managers need to be able to cancel pending lock requests. In the case where the exported filesystem manages its own locks, it's not sufficient just to call posix_unblock_lock(); we need to let the filesystem know what's happening too. We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this might also be made available to userspace applications that could benefit from an asynchronous locking api. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| | * | | | | locks: allow {vfs,posix}_lock_file to return conflicting lockMarc Eshel2007-05-065-36/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nfsv4 protocol's lock operation, in the case of a conflict, returns information about the conflicting lock. It's unclear how clients can use this, so for now we're not going so far as to add a filesystem method that can return a conflicting lock, but we may as well return something in the local case when it's easy to. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| | * | | | | locks: factor out generic/filesystem switch from setlock codeMarc Eshel2007-05-062-32/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for all the setlk code. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| | * | | | | locks: factor out generic/filesystem switch from test_lockJ. Bruce Fields2007-05-062-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for test_lock. Note that this hasn't been necessary until recently, because the few filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS. However GFS (and, in the future, other cluster filesystems) need to implement their own locking to get cluster-coherent locking, and also want to be able to export locking to NFS (lockd and NFSv4). So we accomplish this by factoring out code such as this and exporting it for the use of lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
| | * | | | | locks: give posix_test_lock same interface as ->lockMarc Eshel2007-05-068-51/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | posix_test_lock() and ->lock() do the same job but have gratuitously different interfaces. Modify posix_test_lock() so the two agree, simplifying some code in the process. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>