It6251 fixes #4

hackwerk · 2015-06-08T22:18:37Z

Hi,

Here are the fixes that I have been running with. I have seen few situations I'm really blocked out of the screen. They also mirror the fixes I use in U-boot.

Job

Holding the DRM struct_mutex over a call to dma_alloc_coherent() is asking for trouble; the DRM struct_mutex is held inside several other system-wide locks, and when CMA is enabled, this causes the CMA lock to be nested inside struct_mutex. In conjunction with other system locks, this eventually causes a AB-BA lock ordering bug, which becomes most apparent when using CPU hotplug: [ INFO: possible circular locking dependency detected ] 3.19.0-rc6+ #1497 Not tainted ------------------------------------------------------- bash/2154 is trying to acquire lock: (console_lock){+.+.+.}, at: [<c006b75c>] console_cpu_notify+0x28/0x34 but task is already holding lock: (cpu_hotplug.lock#2){+.+.+.}, at: [<c00270f0>] cpu_hotplug_begin+0x64/0xb8 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (cpu_hotplug.lock#2){+.+.+.}: [<c06ecaac>] mutex_lock_nested+0x5c/0x3d8 [<c00dad28>] lru_add_drain_all+0x1c/0x18c [<c010fbe8>] migrate_prep+0x10/0x18 [<c00d5d74>] alloc_contig_range+0xd0/0x2cc [<c0111448>] cma_alloc+0xe0/0x1ac [<c0397268>] dma_alloc_from_contiguous+0x3c/0x44 [<c001a7bc>] __alloc_from_contiguous+0x3c/0xe8 [<c09a3b48>] atomic_pool_init+0x6c/0x15c [<c0008a5c>] do_one_initcall+0x88/0x1d8 [<c099ee44>] kernel_init_freeable+0x110/0x1dc [<c06e3104>] kernel_init+0x10/0xec [<c000ecd0>] ret_from_fork+0x14/0x24 -> #4 (lock){+.+...}: [<c06ecaac>] mutex_lock_nested+0x5c/0x3d8 [<c00dad28>] lru_add_drain_all+0x1c/0x18c [<c010fbe8>] migrate_prep+0x10/0x18 [<c00d5d74>] alloc_contig_range+0xd0/0x2cc [<c0111448>] cma_alloc+0xe0/0x1ac [<c0397268>] dma_alloc_from_contiguous+0x3c/0x44 [<c001a7bc>] __alloc_from_contiguous+0x3c/0xe8 [<c09a3b48>] atomic_pool_init+0x6c/0x15c [<c0008a5c>] do_one_initcall+0x88/0x1d8 [<c099ee44>] kernel_init_freeable+0x110/0x1dc [<c06e3104>] kernel_init+0x10/0xec [<c000ecd0>] ret_from_fork+0x14/0x24 -> #3 (cma_mutex){+.+.+.}: [<c06ecaac>] mutex_lock_nested+0x5c/0x3d8 [<c0111438>] cma_alloc+0xd0/0x1ac [<c0397268>] dma_alloc_from_contiguous+0x3c/0x44 [<c001a7bc>] __alloc_from_contiguous+0x3c/0xe8 [<c001abbc>] __dma_alloc+0x15c/0x28c [<c001ae38>] arm_dma_alloc+0xa0/0xa8 [<c0508a24>] etnaviv_iommu_domain_init+0x54/0x138 [<c0508b54>] etnaviv_iommu_domain_alloc+0x4c/0xd8 [<c0507c8c>] etnaviv_gpu_init+0x380/0x620 [<c0503ff8>] etnaviv_load+0xc0/0x128 [<c0369244>] drm_dev_register+0xac/0x10c [<c036ad1c>] drm_platform_init+0x48/0xd4 [<c050382c>] etnaviv_bind+0x18/0x20 [<c038d138>] try_to_bring_up_master+0x140/0x17c [<c038d2f0>] component_master_add_with_match+0x84/0xe0 [<c0504114>] etnaviv_pdev_probe+0xb4/0x104 [<c0392ed8>] platform_drv_probe+0x50/0xac [<c03916f0>] driver_probe_device+0x114/0x234 [<c03918ac>] __driver_attach+0x9c/0xa0 [<c038fd3c>] bus_for_each_dev+0x5c/0x90 [<c03911e0>] driver_attach+0x24/0x28 [<c0390e58>] bus_add_driver+0xe0/0x1d8 [<c03920ac>] driver_register+0x80/0xfc [<c0392d60>] __platform_driver_register+0x50/0x64 [<c09d8b08>] etnaviv_init+0x2c/0x4c [<c0008a5c>] do_one_initcall+0x88/0x1d8 [<c099ee44>] kernel_init_freeable+0x110/0x1dc [<c06e3104>] kernel_init+0x10/0xec [<c000ecd0>] ret_from_fork+0x14/0x24 -> #2 (&dev->struct_mutex){+.+.+.}: [<c06ecaac>] mutex_lock_nested+0x5c/0x3d8 [<c0364508>] drm_gem_mmap+0x3c/0xd4 [<c037ef88>] drm_gem_cma_mmap+0x14/0x2c [<c00fd020>] mmap_region+0x3d0/0x6a4 [<c00fd5d8>] do_mmap_pgoff+0x2e4/0x374 [<c00e6c18>] vm_mmap_pgoff+0x6c/0x9c [<c00fbb98>] SyS_mmap_pgoff+0x94/0xb8 [<c000ec00>] ret_fast_syscall+0x0/0x4c -> #1 (&mm->mmap_sem){++++++}: [<c00f4a68>] might_fault+0x64/0x98 [<c033dd3c>] con_set_unimap+0x160/0x25c [<c03381c8>] vt_ioctl+0x126c/0x1328 [<c032bfdc>] tty_ioctl+0x498/0xc5c [<c012493c>] do_vfs_ioctl+0x84/0x66c [<c0124f60>] SyS_ioctl+0x3c/0x60 [<c000ec00>] ret_fast_syscall+0x0/0x4c -> #0 (console_lock){+.+.+.}: [<c00627f0>] lock_acquire+0xb0/0x124 [<c00697cc>] console_lock+0x44/0x6c [<c006b75c>] console_cpu_notify+0x28/0x34 [<c00431b0>] notifier_call_chain+0x4c/0x8c [<c00432cc>] __raw_notifier_call_chain+0x1c/0x24 [<c0026d74>] __cpu_notify+0x34/0x50 [<c0026da8>] cpu_notify+0x18/0x1c [<c0026eec>] cpu_notify_nofail+0x10/0x1c [<c06e3484>] _cpu_down+0x100/0x248 [<c06e35f8>] cpu_down+0x2c/0x48 [<c03939a8>] cpu_subsys_offline+0x14/0x18 [<c038f6d0>] device_offline+0x90/0xc0 [<c038f7e0>] online_store+0x4c/0x74 [<c038d46c>] dev_attr_store+0x20/0x2c [<c017c0f8>] sysfs_kf_write+0x54/0x58 [<c017b430>] kernfs_fop_write+0xfc/0x1ac [<c0114274>] vfs_write+0xac/0x1b4 [<c01145b0>] SyS_write+0x44/0x90 [<c000ec00>] ret_fast_syscall+0x0/0x4c other info that might help us debug this: Chain exists of: console_lock --> lock --> cpu_hotplug.lock#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug.lock#2); lock(lock); lock(cpu_hotplug.lock#2); lock(console_lock); *** DEADLOCK *** 8 locks held by bash/2154: #0: (sb_writers#5){.+.+.+}, at: [<c0114354>] vfs_write+0x18c/0x1b4 #1: (&of->mutex){+.+.+.}, at: [<c017b3bc>] kernfs_fop_write+0x88/0x1ac #2: (s_active#40){.+.+.+}, at: [<c017b3c4>] kernfs_fop_write+0x90/0x1ac #3: (device_hotplug_lock){+.+.+.}, at: [<c038e4d8>] lock_device_hotplug_sysfs+0x14/0x54 #4: (&dev->mutex){......}, at: [<c038f684>] device_offline+0x44/0xc0 #5: (cpu_add_remove_lock){+.+.+.}, at: [<c0026de0>] cpu_maps_update_begin+0x18/0x20 #6: (cpu_hotplug.lock){++++++}, at: [<c002708c>] cpu_hotplug_begin+0x0/0xb8 #7: (cpu_hotplug.lock#2){+.+.+.}, at: [<c00270f0>] cpu_hotplug_begin+0x64/0xb8 stack backtrace: CPU: 0 PID: 2154 Comm: bash Not tainted 3.19.0-rc6+ #1497 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [<c0012280>] (dump_backtrace) from [<c0012418>] (show_stack+0x18/0x1c) [<c0012400>] (show_stack) from [<c06e8eac>] (dump_stack+0x7c/0x98) [<c06e8e30>] (dump_stack) from [<c06e6520>] (print_circular_bug+0x28c/0x2d8) [<c06e6294>] (print_circular_bug) from [<c00621a8>] (__lock_acquire+0x1acc/0x1bb0) [<c00606dc>] (__lock_acquire) from [<c00627f0>] (lock_acquire+0xb0/0x124) [<c0062740>] (lock_acquire) from [<c00697cc>] (console_lock+0x44/0x6c) [<c0069788>] (console_lock) from [<c006b75c>] (console_cpu_notify+0x28/0x34) [<c006b734>] (console_cpu_notify) from [<c00431b0>] (notifier_call_chain+0x4c/0x8c) [<c0043164>] (notifier_call_chain) from [<c00432cc>] (__raw_notifier_call_chain+0x1c/0x24) [<c00432b0>] (__raw_notifier_call_chain) from [<c0026d74>] (__cpu_notify+0x34/0x50) [<c0026d40>] (__cpu_notify) from [<c0026da8>] (cpu_notify+0x18/0x1c) [<c0026d90>] (cpu_notify) from [<c0026eec>] (cpu_notify_nofail+0x10/0x1c) [<c0026edc>] (cpu_notify_nofail) from [<c06e3484>] (_cpu_down+0x100/0x248) [<c06e3384>] (_cpu_down) from [<c06e35f8>] (cpu_down+0x2c/0x48) [<c06e35cc>] (cpu_down) from [<c03939a8>] (cpu_subsys_offline+0x14/0x18) [<c0393994>] (cpu_subsys_offline) from [<c038f6d0>] (device_offline+0x90/0xc0) [<c038f640>] (device_offline) from [<c038f7e0>] (online_store+0x4c/0x74) [<c038f794>] (online_store) from [<c038d46c>] (dev_attr_store+0x20/0x2c) [<c038d44c>] (dev_attr_store) from [<c017c0f8>] (sysfs_kf_write+0x54/0x58) [<c017c0a4>] (sysfs_kf_write) from [<c017b430>] (kernfs_fop_write+0xfc/0x1ac) [<c017b334>] (kernfs_fop_write) from [<c0114274>] (vfs_write+0xac/0x1b4) [<c01141c8>] (vfs_write) from [<c01145b0>] (SyS_write+0x44/0x90) [<c011456c>] (SyS_write) from [<c000ec00>] (ret_fast_syscall+0x0/0x4c) The locking ordering for each of the chain backtraces are: 5: cpu_hotplug.lock (in lru_add_drain_all, get_online_cpus) lock (in lru_add_drain_all) cma_mutex (in cma_alloc) 4: lock (in lru_add_drain_all), cma_mutex (in cma_alloc) 3: cma_mutex (in cma_alloc) drm dev->struct_mutex (in etnaviv_load) 2: drm dev->struct_mutex (in drm_gem_mmap) mm->mmap_sem (in vm_mmap_pgoff) 1: mm->mmap_sem (in might_fault) console_lock (in con_set_unimap, console_lock) 0: console_lock (in console_cpu_notify, console_lock) cpu_hotplug.lock (in _cpu_down, cpu_hotplug_begin) Hence the dependency chain of: cpu_hotplug.lock -> console_lock -> mmap_sem -> struct_mutex -> cma_mutex -> cpu_hotplug.lock *deadlock* The operation which etnadrm needs to lock is not the allocations, but the addition of the etnaviv_obj to the inactive list (to prevent the list becoming corrupted.) Move this to a separate operation which is performed once all the setup of the object is complete, and move the locking such that the allocation and setup is unlocked. This is overall more efficient, as we permit multiple expensive operations to occur in parallel (memory allocation) while only locking what we need. Signed-off-by: Russell King <[email protected]>

When a43eec3 ("bpf: introduce bpf_perf_event_output() helper") added PERF_COUNT_SW_BPF_OUTPUT we ended up with a new entry in the event_symbols_sw array that wasn't initialized, thus set to NULL, fix print_symbol_events() to check for that case so that we don't crash if this happens again. (gdb) bt #0 __match_glob (ignore_space=false, pat=<optimized out>, str=<optimized out>) at util/string.c:198 #1 strglobmatch (str=<optimized out>, pat=pat@entry=0x7fffffffe61d "stall") at util/string.c:252 #2 0x00000000004993a5 in print_symbol_events (type=1, syms=0x872880 <event_symbols_sw+160>, max=11, name_only=false, event_glob=0x7fffffffe61d "stall") at util/parse-events.c:1615 #3 print_events (event_glob=event_glob@entry=0x7fffffffe61d "stall", name_only=false) at util/parse-events.c:1675 #4 0x000000000042c79e in cmd_list (argc=1, argv=0x7fffffffe390, prefix=<optimized out>) at builtin-list.c:68 #5 0x00000000004788a5 in run_builtin (p=p@entry=0x871758 <commands+120>, argc=argc@entry=2, argv=argv@entry=0x7fffffffe390) at perf.c:370 #6 0x0000000000420ab0 in handle_internal_command (argv=0x7fffffffe390, argc=2) at perf.c:429 #7 run_argv (argv=0x7fffffffe110, argcp=0x7fffffffe11c) at perf.c:473 #8 main (argc=2, argv=0x7fffffffe390) at perf.c:588 (gdb) p event_symbols_sw[PERF_COUNT_SW_BPF_OUTPUT] $4 = {symbol = 0x0, alias = 0x0} (gdb) A patch to robustify perf to not segfault when the next counter gets added in the kernel will follow this one. Reported-by: Ingo Molnar <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

hackwerk added 2 commits June 8, 2015 19:42

IT6251: Some fixes to the initialization

73fd810

IT6251: More fixes and more robust retry.

9065c43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

It6251 fixes #4

It6251 fixes #4

Uh oh!

hackwerk commented Jun 8, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

It6251 fixes #4

Are you sure you want to change the base?

It6251 fixes #4

Uh oh!

Conversation

hackwerk commented Jun 8, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant