aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'pm+acpi-fixes-3.12-rc1' of ↵Linus Torvalds2013-09-1213-188/+328
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "All of these commits are fixes that have emerged recently and some of them fix bugs introduced during this merge window. Specifics: 1) ACPI-based PCI hotplug (ACPIPHP) fixes related to spurious events After the recent ACPIPHP changes we've seen some interesting breakage on a system that triggers device check notifications during boot for non-existing devices. Although those notifications are really spurious, we should be able to deal with them nevertheless and that shouldn't introduce too much overhead. Four commits to make that work properly. 2) Memory hotplug and hibernation mutual exclusion rework This was maent to be a cleanup, but it happens to fix a classical ABBA deadlock between system suspend/hibernation and ACPI memory hotplug which is possible if they are started roughly at the same time. Three commits rework memory hotplug so that it doesn't acquire pm_mutex and make hibernation use device_hotplug_lock which prevents it from racing with memory hotplug. 3) ACPI Intel LPSS (Low-Power Subsystem) driver crash fix The ACPI LPSS driver crashes during boot on Apple Macbook Air with Haswell that has slightly unusual BIOS configuration in which one of the LPSS device's _CRS method doesn't return all of the information expected by the driver. Fix from Mika Westerberg, for stable. 4) ACPICA fix related to Store->ArgX operation AML interpreter fix for obscure breakage that causes AML to be executed incorrectly on some machines (observed in practice). From Bob Moore. 5) ACPI core fix for PCI ACPI device objects lookup There still are cases in which there is more than one ACPI device object matching a given PCI device and we don't choose the one that the BIOS expects us to choose, so this makes the lookup take more criteria into account in those cases. 6) Fix to prevent cpuidle from crashing in some rare cases If the result of cpuidle_get_driver() is NULL, which can happen on some systems, cpuidle_driver_ref() will crash trying to use that pointer and the Daniel Fu's fix prevents that from happening. 7) cpufreq fixes related to CPU hotplug Stephen Boyd reported a number of concurrency problems with cpufreq related to CPU hotplug which are addressed by a series of fixes from Srivatsa S Bhat and Viresh Kumar. 8) cpufreq fix for time conversion in time_in_state attribute Time conversion carried out by cpufreq when user space attempts to read /sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state won't work correcty if cputime_t doesn't map directly to jiffies. Fix from Andreas Schwab. 9) Revert of a troublesome cpufreq commit Commit 7c30ed5 (cpufreq: make sure frequency transitions are serialized) was intended to address some known concurrency problems in cpufreq related to the ordering of transitions, but unfortunately it introduced several problems of its own, so I decided to revert it now and address the original problems later in a more robust way. 10) Intel Haswell CPU models for intel_pstate from Nell Hardcastle. 11) cpufreq fixes related to system suspend/resume The recent cpufreq changes that made it preserve CPU sysfs attributes over suspend/resume cycles introduced a possible NULL pointer dereference that caused it to crash during the second attempt to suspend. Three commits from Srivatsa S Bhat fix that problem and a couple of related issues. 12) cpufreq locking fix cpufreq_policy_restore() should acquire the lock for reading, but it acquires it for writing. Fix from Lan Tianyu" * tag 'pm+acpi-fixes-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits) cpufreq: Acquire the lock in cpufreq_policy_restore() for reading cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu cpufreq: Restructure if/else block to avoid unintended behavior cpufreq: Fix crash in cpufreq-stats during suspend/resume intel_pstate: Add Haswell CPU models Revert "cpufreq: make sure frequency transitions are serialized" cpufreq: Use signed type for 'ret' variable, to store negative error values cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock cpufreq: Split __cpufreq_remove_dev() into two parts cpufreq: Fix wrong time unit conversion cpufreq: serialize calls to __cpufreq_governor() cpufreq: don't allow governor limits to be changed when it is disabled ACPI / bind: Prefer device objects with _STA to those without it ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks ACPI / hotplug / PCI: Use _OST to notify firmware about notify status ACPI / hotplug / PCI: Avoid doing too much for spurious notifies ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field. ACPI / hotplug / PCI: Don't trim devices before scanning the namespace ...
| * Merge branch 'pm-cpufreq'Rafael J. Wysocki2013-09-121-17/+32
| |\ | | | | | | | | | | | | | | | | | | | | | * pm-cpufreq: cpufreq: Acquire the lock in cpufreq_policy_restore() for reading cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu cpufreq: Restructure if/else block to avoid unintended behavior cpufreq: Fix crash in cpufreq-stats during suspend/resume
| | * cpufreq: Acquire the lock in cpufreq_policy_restore() for readingLan Tianyu2013-09-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cpufreq_policy_restore() before system suspend policy is read from percpu's cpufreq_cpu_data_fallback. It's a read operation rather than a write one, so take the lock for reading in there. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpuSrivatsa S. Bhat2013-09-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If update_policy_cpu() is invoked with the existing policy->cpu itself as the new-cpu parameter, then a lot of things can go terribly wrong. In its present form, update_policy_cpu() always assumes that the new-cpu is different from policy->cpu and invokes other functions to perform their respective updates. And those functions implement the actual update like this: per_cpu(..., new_cpu) = per_cpu(..., last_cpu); per_cpu(..., last_cpu) = NULL; Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu references vanish into thin air! (memory leak). From there, it leads to more problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference and hence tries to create a new sysfs-group; but sysfs already had created the group earlier, so it complains that it cannot create a duplicate filename. In short, the repercussions of a rather innocuous invocation of update_policy_cpu() can turn out to be pretty nasty. Ideally update_policy_cpu() should handle this situation (new == last) gracefully, and not lead to such severe problems. So fix it by adding an appropriate check. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Restructure if/else block to avoid unintended behaviorSrivatsa S. Bhat2013-09-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __cpufreq_remove_dev_prepare(), the code which decides whether to remove the sysfs link or nominate a new policy cpu, is governed by an if/else block with a rather complex set of conditionals. Worse, they harbor a subtlety which leads to certain unintended behavior. The code looks like this: if (cpu != policy->cpu && !frozen) { sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { new_cpu = cpufreq_nominate_new_policy_cpu(...); ... update_policy_cpu(..., new_cpu); } The original intention was: If the CPU going offline is not policy->cpu, just remove the link. On the other hand, if the CPU going offline is the policy->cpu itself, handover the policy->cpu job to some other surviving CPU in that policy. But because the 'if' condition also includes the 'frozen' check, now there are *two* possibilities by which we can enter the 'else' block: 1. cpu == policy->cpu (intended) 2. cpu != policy->cpu && frozen (unintended) Due to the second (unintended) scenario, we end up spuriously nominating a CPU as the policy->cpu, even when the existing policy->cpu is alive and well. This can cause problems further down the line, especially when we end up nominating the same policy->cpu as the new one (ie., old == new), because it totally confuses update_policy_cpu(). To avoid this mess, restructure the if/else block to only do what was originally intended, and thus prevent any unwelcome surprises. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Fix crash in cpufreq-stats during suspend/resumeSrivatsa S. Bhat2013-09-111-13/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephen Warren reported that the cpufreq-stats code hits a NULL pointer dereference during the second attempt to suspend a system. He also pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight init/teardown during suspend/resume". That commit actually ensured that the cpufreq-stats table and the cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during suspend/resume, which makes it all the more surprising. However, it turns out that the root-cause is not that we access an already freed memory, but that the reference to the allocated memory gets moved around and we lose track of that during resume, leading to the reported crash in a subsequent suspend attempt. In the suspend path, during CPU offline, the value of policy->cpu is updated by choosing one of the surviving CPUs in that policy, as long as there is atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is invoked to update the reference to the stats structure by assigning it to the new CPU. However, in the resume path, during CPU online, we end up assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats know about this. Thus the reference to the stats structure remains (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt, during CPU offline, we end up accessing an incorrect location to get the stats structure, which eventually leads to the NULL pointer dereference. Fix this by letting cpufreq-stats know about the update of the policy->cpu during CPU online in the resume path. (Also, move the update_policy_cpu() function higher up in the file, so that __cpufreq_add_dev() can invoke it). Reported-and-tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | Merge branch 'pm-cpufreq'Rafael J. Wysocki2013-09-114-35/+76
| |\ \ | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpufreq: intel_pstate: Add Haswell CPU models Revert "cpufreq: make sure frequency transitions are serialized" cpufreq: Use signed type for 'ret' variable, to store negative error values cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock cpufreq: Split __cpufreq_remove_dev() into two parts cpufreq: Fix wrong time unit conversion cpufreq: serialize calls to __cpufreq_governor() cpufreq: don't allow governor limits to be changed when it is disabled
| | * intel_pstate: Add Haswell CPU modelsNell Hardcastle2013-09-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable the intel_pstate driver for Haswell CPUs. One missing Ivy Bridge model (0x3E) is also included. Models referenced from tools/power/x86/turbostat/turbostat.c:has_nehalem_turbo_ratio_limit Signed-off-by: Nell Hardcastle <nell@spicious.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * Revert "cpufreq: make sure frequency transitions are serialized"Rafael J. Wysocki2013-09-102-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7c30ed5 (cpufreq: make sure frequency transitions are serialized) attempted to serialize frequency transitions by adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE notifications. However, it assumed that the notifications will always originate from the driver's .target() callback, but they also can be triggered by cpufreq_out_of_sync() and that leads to warnings like this on some systems: WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317 __cpufreq_notify_transition+0x238/0x260() In middle of another frequency transition accompanied by a call trace similar to this one: [<ffffffff81720daa>] dump_stack+0x46/0x58 [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320 [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50 [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260 [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70 [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0 [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160 [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160 [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5 [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0 [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90 [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140 [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0 [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30 [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff81081060>] process_one_work+0x170/0x4a0 [<ffffffff81082121>] worker_thread+0x121/0x390 [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170 [<ffffffff81088fe0>] kthread+0xc0/0xd0 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 For this reason, revert commit 7c30ed5 along with the fix 266c13d (cpufreq: Fix serialization of frequency transitions) on top of it and we will revisit the serialization problem later. Reported-by: Alessandro Bono <alessandro.bono@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Use signed type for 'ret' variable, to store negative error valuesSrivatsa S. Bhat2013-09-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are places where the variable 'ret' is declared as unsigned int and then used to store negative return values such as -EINVAL. Fix them by declaring the variable as a signed quantity. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writesSrivatsa S. Bhat2013-09-102-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary and partial solution to the race condition between writing to a cpufreq sysfs file and taking a CPU offline. Now that we have a proper and complete solution to that problem, remove the temporary fix. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplugSrivatsa S. Bhat2013-09-101-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions that are used to write to cpufreq sysfs files (such as store_scaling_max_freq()) are not hotplug safe. They can race with CPU hotplug tasks and lead to problems such as trying to acquire an already destroyed timer-mutex etc. Eg: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL So use get_online_cpus()/put_online_cpus() in the store_*() functions, to synchronize with CPU hotplug. However, there is an additional point to note here: some parts of the CPU teardown in the cpufreq subsystem are done in the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the get/put_online_cpus() functions alone is insufficient; we should also ensure that we don't race with those latter steps in the hotplug sequence. We can easily achieve this by checking if the CPU is online before proceeding with the store, since the CPU would have been marked offline by the time the CPU_POST_DEAD notifiers are executed. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lockSrivatsa S. Bhat2013-09-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going offline. But because we destroy the kobject towards the end of the CPU offline phase, there are certain race windows where a task can try to write to a cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking that CPU offline, and this can bump up the kobject refcount, which in turn might hinder the CPU offline task from running to completion. (It can also cause other more serious problems such as trying to acquire a destroyed timer-mutex etc., depending on the exact stage of the cleanup at which the task managed to take a new refcount). To fix the race window, we will need to synchronize those store_*() call-sites with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that in turn can cause a total deadlock because it can end up waiting for the CPU offline task to complete, with incremented refcount! Write to sysfs CPU offline task -------------- ---------------- kobj_refcnt++ Acquire cpu_hotplug.lock get_online_cpus(); Wait for kobj_refcnt to drop to zero **DEADLOCK** A simple way to avoid this problem is to perform the kobject cleanup in the CPU offline path, with the cpu_hotplug.lock *released*. That is, we can perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock released. Doing this helps us avoid deadlocks due to holding kobject refcounts and waiting on each other on the cpu_hotplug.lock. (Note: We can't move all of the cpufreq CPU offline steps to the CPU_POST_DEAD stage, because certain things such as stopping the governors have to be done before the outgoing CPU is marked offline. So retain those parts in the CPU_DOWN_PREPARE stage itself). Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Split __cpufreq_remove_dev() into two partsSrivatsa S. Bhat2013-09-101-12/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During CPU offline, the cpufreq core invokes __cpufreq_remove_dev() to perform work such as stopping the cpufreq governor, clearing the CPU from the policy structure etc, and finally cleaning up the kobject. There are certain subtle issues related to the kobject cleanup, and it would be much easier to deal with them if we separate that part from the rest of the cleanup-work in the CPU offline phase. So split the __cpufreq_remove_dev() function into 2 parts: one that handles the kobject cleanup, and the other that handles the rest of the work. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: Fix wrong time unit conversionAndreas Schwab2013-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The time spent by a CPU under a given frequency is stored in jiffies unit in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of the frequency. This is what is displayed in the following file on the right column: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 19835820 2300000 3172 [...] Now cpufreq converts this jiffies unit delta to clock_t before returning it to the user as in the above file. And that conversion is achieved using the API cputime64_to_clock_t(). Although it accidentally works on traditional tick based cputime accounting, where cputime_t maps directly to jiffies, it doesn't work with other types of cputime accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs or any granularity preffered by the architecture. For example we get a buggy zero delta on full dyntick configurations: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 0 2300000 0 [...] Fix this with using the proper jiffies_64_t to clock_t conversion. Reported-and-tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: serialize calls to __cpufreq_governor()Viresh Kumar2013-09-102-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't take a big lock around __cpufreq_governor() as this causes recursive locking for some cases. But calls to this routine must be serialized for every policy. Otherwise we can see some unpredictable events. For example, consider following scenario: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL And so store() will eventually result in a crash if cur_policy is NULL at this point. Introduce an additional variable which would guarantee serialization here. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * cpufreq: don't allow governor limits to be changed when it is disabledViresh Kumar2013-09-101-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __cpufreq_governor() returns with -EBUSY when governor is already stopped and we try to stop it again, but when it is stopped we must not allow calls to CPUFREQ_GOV_LIMITS event as well. This patch adds this check in __cpufreq_governor(). Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | Merge branch 'acpi-bind'Rafael J. Wysocki2013-09-111-11/+24
| |\ \ | | | | | | | | | | | | | | | | * acpi-bind: ACPI / bind: Prefer device objects with _STA to those without it
| | * | ACPI / bind: Prefer device objects with _STA to those without itRafael J. Wysocki2013-09-091-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported at https://bugzilla.kernel.org/show_bug.cgi?id=60829, there still are cases in which do_find_child() doesn't choose the ACPI device object it is "expected" to choose if there are more such objects matching one PCI device present. This particular problem may be worked around by making do_find_child() return device obejcts witn _STA whose result indicates that the device is enabled before device objects without _STA if there's more than one device object to choose from. This change doesn't affect the case in which there's only one matching ACPI device object per PCI device. References: https://bugzilla.kernel.org/show_bug.cgi?id=60829 Reported-by: Peter Wu <lekensteyn@gmail.com> Tested-by: Felix Lisczyk <felix.lisczyk@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | Merge branch 'pm-cpuidle'Rafael J. Wysocki2013-09-101-1/+2
| |\ \ \ | | | | | | | | | | | | | | | | | | | | * pm-cpuidle: cpuidle: Check the result of cpuidle_get_driver() against NULL
| | * | | cpuidle: Check the result of cpuidle_get_driver() against NULLDaniel Fu2013-08-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the current CPU has no cpuidle driver, drv will be NULL in cpuidle_driver_ref(). Check if that is the case before trying to bump up the driver's refcount to prevent the kernel from crashing. [rjw: Subject and changelog] Signed-off-by: Daniel Fu <danifu@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | Merge branch 'acpi-assorted'Rafael J. Wysocki2013-09-101-1/+2
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | * acpi-assorted: ACPI / LPSS: don't crash if a device has no MMIO resources
| | * | | | ACPI / LPSS: don't crash if a device has no MMIO resourcesMika Westerberg2013-09-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel LPSS devices that are enumerated from ACPI have both MMIO and IRQ resources returned in their _CRS method. However, Apple Macbook Air with Haswell has LPSS devices enumerated from PCI bus instead and _CRS method returns only an interrupt number (but the device has _HID set that causes the scan handler to match it). The current ACPI / LPSS code sets pdata->dev_desc only when MMIO resource is found for the device and in case of Macbook Air it is never found. That leads to a NULL pointer dereference in register_device_clock(). Correct this by always setting the pdata->dev_desc. Reported-and-tested-by: Imre Kaloz <kaloz@openwrt.org> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | Merge branch 'acpica'Rafael J. Wysocki2013-09-101-64/+102
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpica: ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field.
| | * | | | | ACPICA: Fix for a Store->ArgX when ArgX contains a reference to a field.Bob Moore2013-09-061-64/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change fixes a problem where a Store operation to an ArgX object that contained a reference to a field object did not complete the automatic dereference and then write to the actual field object. Instead, the object type of the field object was inadvertently changed to match the type of the source operand. The new behavior will actually write to the field object (buffer field or field unit), thus matching the correct ACPI-defined behavior. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | Merge branch 'acpi-pci-hotplug'Rafael J. Wysocki2013-09-101-14/+47
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-pci-hotplug: ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks ACPI / hotplug / PCI: Use _OST to notify firmware about notify status ACPI / hotplug / PCI: Avoid doing too much for spurious notifies ACPI / hotplug / PCI: Don't trim devices before scanning the namespace
| | * | | | | | ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checksRafael J. Wysocki2013-09-091-7/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current ACPIPHP notify handler we always go directly for a rescan of the parent bus if we get a device check notification for a device that is not a bridge. However, this obviously is overzealous if nothing really changes, because this way we may rescan the whole PCI hierarchy pretty much in vain. That happens on Alex Williamson's machine whose ACPI tables contain device objects that are supposed to coresspond to PCIe root ports, but those ports aren't physically present (or at least they aren't visible in the PCI config space to us). The BIOS generates multiple device check notifies for those objects during boot and for each of them we go straight for the parent bus rescan, but the parent bus is the root bus in this particular case. In consequence, we rescan the whole PCI bus from the top several times in a row, which is completely unnecessary, increases boot time by 50% (after previous fixes) and generates excess dmesg output from the PCI subsystem. Fix the problem by checking if we can find anything new in the slot corresponding to the device we've got a device check notify for and doing nothig if that's not the case. The spec (ACPI 5.0, Section 5.6.6) appears to mandate this behavior, as it says: Device Check. Used to notify OSPM that the device either appeared or disappeared. If the device has appeared, OSPM will re-enumerate from the parent. If the device has disappeared, OSPM will invalidate the state of the device. OSPM may optimize out re-enumeration. Therefore, according to the spec, we are free to do nothing if nothing changes. References: https://bugzilla.kernel.org/show_bug.cgi?id=60865 Reported-and-tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | ACPI / hotplug / PCI: Use _OST to notify firmware about notify statusRafael J. Wysocki2013-09-071-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spec suggests that we should use _OST to notify the platform about the status of notifications it sends us, for example so that it doesn't repeate a notification that has been handled already. This turns out to help reduce the amount of diagnostic output from the ACPIPHP subsystem and speed up boot on at least one system that generates multiple device check notifies for PCIe devices on the root bus during boot. Reported-and-tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | ACPI / hotplug / PCI: Avoid doing too much for spurious notifiesRafael J. Wysocki2013-09-071-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes we may get a spurious device check or bus check notify for a hotplug device and in those cases we should avoid doing all of the configuration work needed when something actually changes. To that end, check the return value of pci_scan_slot() in enable_slot() and bail out early if it is 0. This turns out to help reduce the amount of diagnostic output from the ACPIPHP subsystem and speed up boot on at least one system that generates multiple device check notifies for PCIe devices on the root bus during boot. Reported-and-tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | ACPI / hotplug / PCI: Don't trim devices before scanning the namespaceRafael J. Wysocki2013-09-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In acpiphp_bus_add() we first remove device objects corresponding to the given handle and the ACPI namespace branch below it, which are then re-created by acpi_bus_scan(). This used to be done to clean up after surprise removals, but now we do the cleanup through trim_stale_devices() which checks if the devices in question are actually gone before removing them, so the device hierarchy trimming in acpiphp_bus_add() is not necessary any more and, moreover, it may lead to problems if it removes device objects corresponding to devices that are actually present. For this reason, remove the leftover acpiphp_bus_trim() from acpiphp_bus_add(). Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | Merge branch 'acpi-hotplug'Rafael J. Wysocki2013-09-104-45/+43
| |\ \ \ \ \ \ \ | | |_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-hotplug: PM / hibernate / memory hotplug: Rework mutual exclusion PM / hibernate: Create memory bitmaps after freezing user space ACPI / scan: Change ordering of locks for device hotplug
| | * | | | | | PM / hibernate / memory hotplug: Rework mutual exclusionRafael J. Wysocki2013-08-313-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since all of the memory hotplug operations have to be carried out under device_hotplug_lock, they won't need to acquire pm_mutex if device_hotplug_lock is held around hibernation. For this reason, make the hibernation code acquire device_hotplug_lock after freezing user space processes and release it before thawing them. At the same tim drop the lock_system_sleep() and unlock_system_sleep() calls from lock_memory_hotplug() and unlock_memory_hotplug(), respectively. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Toshi Kani <toshi.kani@hp.com>
| | * | | | | | PM / hibernate: Create memory bitmaps after freezing user spaceRafael J. Wysocki2013-08-312-32/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hibernation core uses special memory bitmaps during image creation and restoration and traditionally those bitmaps are allocated before freezing tasks, because in the past GFP_KERNEL allocations might not work after all tasks had been frozen. However, this is an anachronism, because hibernation_snapshot() now calls hibernate_preallocate_memory() which allocates memory for the image upfront anyway, so the memory bitmaps may be allocated after freezing user space safely. For this reason, move all of the create_basic_memory_bitmaps() calls after freeze_processes() and all of the corresponding free_basic_memory_bitmaps() calls before thaw_processes(). This will allow us to hold device_hotplug_lock around hibernation without the need to worry about freezing issues with user space processes attempting to acquire it via sysfs attributes after the creation of memory bitmaps and before the freezing of tasks. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Toshi Kani <toshi.kani@hp.com>
| | * | | | | | ACPI / scan: Change ordering of locks for device hotplugRafael J. Wysocki2013-08-311-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the ordering of device hotplug locks in scan.c so that acpi_scan_lock is always acquired after device_hotplug_lock. This will make it possible to use device_hotplug_lock around some code paths that acquire acpi_scan_lock safely (most importantly system suspend and hibernation). Apart from that, acpi_scan_lock is platform-specific and device_hotplug_lock is general, so the new ordering appears to be more appropriate from the overall design viewpoint. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Toshi Kani <toshi.kani@hp.com>
* | | | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2013-09-1229-82/+381
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixes. The -g perf report lockup you reported is only partially addressed, patches that fix the excessive runtime are still being worked on" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix uncore PCI fixed counter handling uprobes: Fix utask->depth accounting in handle_trampoline() perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING perf: Fix up MMAP2 buffer space reservation perf tools: Add attr->mmap2 support perf kvm: Fix sample_type manipulation perf evlist: Fix id pos in perf_evlist__open() perf trace: Handle perf.data files with no tracepoints perf session: Separate progress bar update when processing events perf trace: Check if MAP_32BIT is defined perf hists: Fix formatting of long symbol names perf evlist: Fix parsing with no sample_id_all bit set perf tools: Add test for parsing with no sample_id_all bit perf trace: Check control+C more often
| * | | | | | | | perf/x86: Fix uncore PCI fixed counter handlingStephane Eranian2013-09-121-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a bug in the handling of SNB-EP/IVB-EP uncore PCI fixed counters, e.g., IMC. It would cause erratic values to be returned for the IMC clockticks event. This was due to a bogus hwc->config value which was then written to PCI config space. The erratic values can be seen via: $ perf stat -a -C 0 -e uncore_imc_0/clockticks/ -I 1000 sleep 10 The fixed counter has most fields marked as reserved with hw reset values of 0. Yet the kernel was defaulting to a hwc->config = ~0 and that was causing the issues. This patch sets the hwc->config values for fixed uncore event to 0. Now, the values of IMC clockticks is correct. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: peterz@infradead.org Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20130909195350.GA17643@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | uprobes: Fix utask->depth accounting in handle_trampoline()Oleg Nesterov2013-09-121-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently utask->depth is simply the number of allocated/pending return_instance's in uprobe_task->return_instances list. handle_trampoline() should decrement this counter every time we handle/free an instance, but due to typo it does this only if ->chained == T. This means that in the likely case this counter is never decremented and the probed task can't report more than MAX_URETPROBE_DEPTH events. Reported-by: Mikhail Kulemin <Mikhail.Kulemin@ru.ibm.com> Reported-by: Hemant Kumar Shaw <hkshaw@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Anton Arapov <anton@redhat.com> Cc: masami.hiramatsu.pt@hitachi.com Cc: srikar@linux.vnet.ibm.com Cc: systemtap@sourceware.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20130911154726.GA8093@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDINGStephane Eranian2013-09-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IvyBridge event CYCLE_ACTIVITY:CYCLES_LDM_PENDING can only be measured on counters 0-3 when HT is off. When HT is on, you only have counters 0-3. If you program it on the eight counters for 1s on a 3GHz IVB laptop running a noploop, you see: 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING Clearly the last 4 values are bogus. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: dhsharp@google.com Link: http://lkml.kernel.org/r/20130911152222.GA28761@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2013-09-1220-67/+229
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: * Handle perf.data files with no tracepoints in 'perf trace', fixing a segfault. * Fix up MMAP2 buffer space reservation, a problem that was caught via 'perf test' consistency tests. * Add attr->mmap2 support in the tools, a patch that should've been merged together with the kernel counterpart: 13d7a24 "perf: Add attr->mmap2 attribute to an event". Merging it allowed us to catch the MMAP buffer space reservation problem via 'perf test'. From Stephane Eranian. The tools deals with older kernels by disabling this feature, resetting the perf_event_attr.mmap2 bit, when -EINVAL is returned by perf_event_open, just like with perf_event_attr.{sample_id_all,exclude_{guest,host}}. When such fallback happens the perf_missing_features.mmap2 flag is set to true and can be used by tooling that strictly needs this feature to check for its availability on the running kernel. * Make sure we can find PERF_SAMPLE_ID in the variable part of PERF_RECORD_ ring buffer records in 'perf kvm', where direct manipulation of sample_type was being done. Fixed by making use of the perf_evlist__set_sample_bit() helper and by setting the evlist->id_pos in perf_evlist__open(), from Adrian Hunter. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | | | | perf: Fix up MMAP2 buffer space reservationArnaldo Carvalho de Melo2013-09-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ino_generation field was added in the PERF_RECORD_MMAP2 record in the 13d7a24 cset but no space for it was allocated, corrupting the PERF_FORMAT_{TIME,CPU,TID,etc} area (sample_type/sample_id_all), fix it. Detected with one of the regression tests done by 'perf test': [root@sandy ~]# perf test -v 7 7: Validate PERF_RECORD_* events & perf_sample fields : --- start --- 61315294449606 0 PERF_RECORD_SAMPLE 61315294453161 0 PERF_RECORD_SAMPLE 61315294454441 0 PERF_RECORD_SAMPLE 61315294455709 0 PERF_RECORD_SAMPLE 61315295600899 0 PERF_RECORD_COMM: sleep:6500 27917287430500 342521613 PERF_RECORD_MMAP2 6500/6500: [0x400000(0x7000) @ 0 00:1d 311442 9016]: /usr/bin/sleep MMAP2 going backwards in time, prev=61315295600899, curr=27917287430500 MMAP2 with unexpected cpu, expected 0, got 342521613 MMAP2 with unexpected pid, expected 6500, got 1701606191 MMAP2 with unexpected tid, expected 6500, got 28773 27917287430500 342561333 PERF_RECORD_MMAP2 6500/6500: [0x3b7e000000(0x223000) @ 0 00:1d 309186 9016]: /usr/lib64/ld-2.16.so MMAP2 with unexpected cpu, expected 0, got 342561333 MMAP2 with unexpected pid, expected 6500, got 1932408369 MMAP2 with unexpected tid, expected 6500, got 111 27917287430500 342600095 PERF_RECORD_MMAP2 6500/6500: [0x7fffbd7dc000(0x1000) @ 0x7fffbd7dc000 00:00 0 0]: [vdso] MMAP2 with unexpected cpu, expected 0, got 342600095 MMAP2 with unexpected pid, expected 6500, got 1935963739 MMAP2 with unexpected tid, expected 6500, got 23919 27917287430500 342882834 PERF_RECORD_MMAP2 6500/6500: [0x3b7e400000(0x3b8000) @ 0 00:1d 309187 9016]: /usr/lib64/libc-2.16.so MMAP2 with unexpected cpu, expected 0, got 342882834 MMAP2 with unexpected pid, expected 6500, got 909192754 MMAP2 with unexpected tid, expected 6500, got 7303982 61316297195411 0 PERF_RECORD_EXIT(6500:6500):(6500:6500) ---- end ---- Validate PERF_RECORD_* events & perf_sample fields: FAILED! [root@sandy ~]# After this patch: [root@sandy ~]# perf test 7 7: Validate PERF_RECORD_* events & perf_sample fields : Ok [root@sandy ~]# Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Stephane Eranian <eranian@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-heeuv986b8ha7whqg4o3he7c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | | perf tools: Add attr->mmap2 supportStephane Eranian2013-09-1117-25/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the new PERF_RECORD_MMAP2 record type exposed by the kernel. This is an extended PERF_RECORD_MMAP record. It adds for each file-backed mapping the device major, minor number and the inode number and generation. This triplet uniquely identifies the source of a file-backed mapping. It can be used to detect identical virtual mappings between processes, for instance. The patch will prefer MMAP2 over MMAP. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1377079825-19057-3-git-send-email-eranian@google.com [ Cope with 314add6 "Change machine__findnew_thread() to set thread pid", fix 'perf test' regression test entry affected, use perf_missing_features.mmap2 to fallback to not using .mmap2 in older kernels, so that new tools can work with kernels where this feature is not present ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | | perf kvm: Fix sample_type manipulationAdrian Hunter2013-09-091-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Manipulating the sample_type of an evsel requires the use of: perf_evsel__set_sample_bit() and perf_evsel__reset_sample_bit() Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: David Ahern <dsahern@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1378496412-2424-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | | perf evlist: Fix id pos in perf_evlist__open()Adrian Hunter2013-09-091-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure the id_pos is correct when perf_evlist__open() is used. This fixes a problem introduced in 7556257 that broke 'perf kvm stat live' in that this tool wasn't updated to use the sample_type bits setting helpers. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: David Ahern <dsahern@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1378496412-2424-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | | perf trace: Handle perf.data files with no tracepointsArnaldo Carvalho de Melo2013-09-091-33/+7
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before: perf trace -i perf.data Segmentation fault (core dumped) # After: # perf trace -i perf.data Data file does not have raw_syscalls:sys_enter events # When there are no tracepoints in a perf.data file the struct pevent that contains the list of tracepoints that will be used to lookup the tracepoint id by name will not be populated, causing a NULL deref. And we don't need to do all that dance to look at pevents for an entry with a slighly different name to then lookup the tracepoint by its id on the evlist, just use the perf_evlist__find_tracepoint_by_name() routine, that will find the tracepoint, if present. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-egcm21k1e6gcyxpcgjxtmsq3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2013-09-068-10/+146
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: * Fix parsing with no sample_id_all bit set, this regression prevents perf from reading old perf.data files generated in systems where perf_event_attr.sample_id_all isn't available, from Adrian Hunter. * Add signal checking to the inner 'perf trace' event processing loop, allowing faster response to control+C. * Fix formatting of long symbol names removing the hardcoding of a buffer size used to format histogram entries, which was truncating the lines. * Separate progress bar update when processing events, reducing potentially big overhead in not needed TUI progress bar screen updates, from Jiri Olsa. * Fix 'perf trace' build in architectures where MAP_32BIT is not defined, from Kyle McMartin. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | | | | perf session: Separate progress bar update when processing eventsJiri Olsa2013-09-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when processing events in the __perf_session__process_events function we update a progress bar based on the file_size. During the same processing we update the progress bar from within flush_sample_queue which is based on number of samples count. Having 2 different based updates is causing the progress bar to jump heavily back and forth giving not much usefull info. Fixing this by keeping only __perf_session__process_events based progress bar update. And turning on flush_sample_queue progress bar update only for final flushing. This reduces the number of time the progress bar update function is called and it significantly reduces the loading time for TUI, where the progress bar update takes quite a lot of time. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20130905091449.GC1100@krava.brq.redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | | perf trace: Check if MAP_32BIT is definedKyle McMartin2013-09-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAP_32BIT is defined only on x86... this means perf fails to build on all other platforms. Signed-off-by: Kyle McMartin <kyle@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20130905142947.GA25882@merlin.infradead.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | | perf hists: Fix formatting of long symbol namesArnaldo Carvalho de Melo2013-09-051-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had a hardcoded buffer for formatting histogram entries, truncating long symbol names (C++ anyone?). Fix it by using hists__sort_list_width() before formatting the first histogram entry to calculate the max lenght needed by traversing the overheads and columns lists (sort order). Reported-by: Stephane Eranian <eranian@google.com> Tested-by: Stephane Eranian <eranian@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-vdfkkyfdp8rboh7j9344o3ss@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | | perf evlist: Fix parsing with no sample_id_all bit setAdrian Hunter2013-09-051-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The perf_evlist__event2evsel() is changed to handle non-sample events (such as mmap events) that have no id sample appended i.e. when sample_id_all is not set. Note that such events have a fixed format, so that the selected event (evsel) they are associated with is immaterial. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: David Ahern <dsahern@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1378325897-3840-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | | | | | perf tools: Add test for parsing with no sample_id_all bitAdrian Hunter2013-09-054-1/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a test for parsing a non-sample event when there is more than one selected event but no sample_id_all bit set. The test fails because of a bug in the evlist logic. That is fixed in a separate patch. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1378325897-3840-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>