Skip to content

Conversation

@j77197272
Copy link

kerneltoast and others added 30 commits May 7, 2024 18:29
rc can be used uninitialized here. Since the function is supposed to
return 0 to indicate success at this point, just return 0 instead of rc.

Signed-off-by: Sultan Alsawaf <[email protected]>
This reverts commit 3609953a05c88e692e2fb595867994840cfe6b95.
- clean up some commit
See if this helps reduce lags when GPU is throttling due to temp limitations.

Signed-off-by: zeroblade1984 <[email protected]>
Signed-off-by: negrroo <[email protected]>
Even with an affinity mask that has multiple CPUs set, IRQs always run
on the first CPU in their affinity mask. Drivers that register an IRQ
affinity notifier (such as pm_qos) will therefore have an incorrect
assumption of where an IRQ is affined.

Fix the IRQ affinity mask deception by forcing it to only contain one
set CPU.

Signed-off-by: Sultan Alsawaf <[email protected]>
Signed-off-by: Danny Lin <[email protected]>
The following error conditions jump to the put_dwc3 part of the error
path, which unregisters the bus client. However, the bus client is not
actually registered until after this code, so nothing should attempt to
unregister it before that.

Signed-off-by: Sultan Alsawaf <[email protected]>
Populating all the children of the ssusb node when only the dwc3 child is
used results in the extra memory allocated being leaked.

To fix the memory leak, explicitly populate only the child that this
driver cares about (the dwc3 child).

This fixes the following memory leak:
unreferenced object 0xffffffd42a4a2000 (size 1024):
  comm "kworker/2:0", pid 27, jiffies 4294877705 (age 84.353s)
  hex dump (first 32 bytes):
    80 b2 4b 2a d4 ff ff ff ff ff ff ff 00 00 00 00  ..K*............
    10 ac cf 33 d4 ff ff ff 00 70 50 2a d4 ff ff ff  ...3.....pP*....
  backtrace:
    [<ffffff81d67fc814>] __kmalloc+0x1a4/0x2f8
    [<ffffff81d6d21464>] platform_device_alloc+0x2c/0x80
    [<ffffff81d7157828>] of_device_alloc+0x48/0x1c0
    [<ffffff81d71579f4>] of_platform_device_create_pdata+0x54/0xf8
    [<ffffff81d7157d14>] of_platform_bus_create+0x27c/0x440
    [<ffffff81d71580ac>] of_platform_populate+0x74/0x118
    [<ffffff81d6ea4fb4>] dwc3_msm_probe+0x85c/0x1560
    [<ffffff81d6d20e18>] platform_drv_probe+0x58/0xb8
    [<ffffff81d6d1f124>] driver_probe_device+0x234/0x400
    [<ffffff81d6d1f4c0>] __device_attach_driver+0x88/0x140
    [<ffffff81d6d1d310>] bus_for_each_drv+0x78/0xc8
    [<ffffff81d6d1ed24>] __device_attach+0xd4/0x150
    [<ffffff81d6d1f5b0>] device_initial_probe+0x10/0x18
    [<ffffff81d6d1d5f8>] bus_probe_device+0x90/0x98
    [<ffffff81d6d1e5d0>] deferred_probe_work_func+0x90/0xe0
    [<ffffff81d66c32dc>] process_one_work+0x1dc/0x4a0

Signed-off-by: Sultan Alsawaf <[email protected]>
Android will unset SD_LOAD_BALANCE for single core cluster domain and
for some product it is true to have a single core cluster and the MC
domain thus lacks the SD_LOAD_BALANCE flag. This will cause
select_task_rq_fair logic break and the task will spin forever
in that core.

Fixes: 00bbe7d "ANDROID: sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability"

Bug: 141334320
Test: boot and see task on core7 scheduled correctly
Change-Id: I7c2845b1f7bc1d4051eb3ad6a5f9838fb0b1ba04
Signed-off-by: Wei Wang <[email protected]>
Currently the iowait_boost feature in schedutil makes the frequency
go to max on iowait wakeups.  This feature was added to handle a case
that Peter described where the throughput of operations involving
continuous I/O requests [1] is reduced due to running at a lower
frequency, however the lower throughput itself causes utilization to
be low and hence causing frequency to be low hence its "stuck".

Instead of going to max, its also possible to achieve the same effect
by ramping up to max if there are repeated in_iowait wakeups
happening. This patch is an attempt to do that. We start from a lower
frequency (policy->min) and double the boost for every consecutive
iowait update until we reach the maximum iowait boost frequency
(iowait_boost_max).

I ran a synthetic test (continuous O_DIRECT writes in a loop) on an
x86 machine with intel_pstate in passive mode using schedutil.  In
this test the iowait_boost value ramped from 800MHz to 4GHz in 60ms.
The patch achieves the desired improved throughput as the existing
behavior.

[1] https://patchwork.kernel.org/patch/9735885/

Suggested-by: Peter Zijlstra <[email protected]>
Suggested-by: Viresh Kumar <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Make iowait_boost and iowait_boost_max as unsigned int since its unit
is kHz and this is consistent with struct cpufreq_policy.  Also change
the local variables in sugov_iowait_boost() to match this.

Signed-off-by: Joel Fernandes <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
When idle injection is used to cap power, we need to override the
governor's choice of idle states.

For this reason, make it possible the deepest idle state selection to
be enforced by setting a flag on a given CPU to achieve the maximum
potential power draw reduction.

Change-Id: I9737e99c4f3f4bc38016b313e76b50cec4cf56cb
Signed-off-by: Jacob Pan <[email protected]>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Correctly check for list empty condition to get least cluster
latency.

Change-Id: I56f11e4a80c24d337ee231f6d0bc461a90f1a2ea
Signed-off-by: Maulik Shah <[email protected]>
Select default level for cluster lpms when there are pending
rpm requests.

Change-Id: I67019ff616e60b86a609856ad2bfe0ca9aa2ff8c
Signed-off-by: Raghavendra Kakarla <[email protected]>
Doing more aggressive balance if a sched_group is overloaded.

Change-Id: I00950c23c67a40b3431b68ac7ce2a1e470e563ed
Signed-off-by: Runmin Wang <[email protected]>
A more energy efficient update of the IO wait boosting mechanism has
been introduced in:

   commit a5a0809 ("cpufreq: schedutil: Make iowait boost more energy
efficient")

where the boost value is expected to be:

 - doubled at each successive wakeup from IO
   staring from the minimum frequency supported by a CPU

 - reset when a CPU is not updated for more then one tick
   by either disabling the IO wait boost or resetting its value to the
   minimum frequency if this new update requires an IO boost.

This approach is supposed to "ignore" boosting for sporadic wakeups from
IO, while still getting the frequency boosted to the maximum to benefit
long sequence of wakeup from IO operations.

However, these assumptions are not always satisfied.
For example, when an IO boosted CPU enters idle for more the one tick
and then wakes up after an IO wait, since in sugov_set_iowait_boost() we
first check the IOWAIT flag, we keep doubling the iowait boost instead
of restarting from the minimum frequency value.

This misbehavior could happen mainly on non-shared frequency domains,
thus defeating the energy efficiency optimization, but it can also
happen on shared frequency domain systems.

Let fix this issue in sugov_set_iowait_boost() by:
 - first check the IO wait boost reset conditions
   to eventually reset the boost value
 - then applying the correct IO boost value
   if required by the caller

Fixes: a5a0809 (cpufreq: schedutil: Make iowait boost more energy
efficient)
Reported-by: Viresh Kumar <[email protected]>
Signed-off-by: Patrick Bellasi <[email protected]>
Reviewed-by: Joel Fernandes (Google) <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Yaroslav Furman <[email protected]> - backport to 4.4
Signed-off-by: Danny Lin <[email protected]>
Reducing this interval allows us to not drop to lower frequencies during
a regular scroll without finger release while using a relatively low boost
ms value.

Signed-off-by: Alex Naidis <[email protected]>
Signed-off-by: Park Ju Hyung <[email protected]>
Signed-off-by: KenHV <[email protected]>
We have a 16 KiB stack; buffers of 4 KiB and 512 B work perfectly fine
in place of a small 128-byte buffer and especially no on-stack buffer at
all. This avoids dynamic memory allocation more often, improving
performance.

Signed-off-by: Sultan Alsawaf <[email protected]>
When period gets restarted after some idle time, start_cfs_bandwidth()
doesn't update the expiration information, expire_cfs_rq_runtime() will
see cfs_rq->runtime_expires smaller than rq clock and go to the clock
drift logic, wasting needless CPU cycles on the scheduler hot path.

Update the global expiration in start_cfs_bandwidth() to avoid frequent
expire_cfs_rq_runtime() calls once a new period begins.

Signed-off-by: Xunlei Pang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Ben Segall <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: RuRuTiaSaMa <[email protected]>
Binder code is very hot, so checking frequently to see if a debug
message should be printed is a waste of cycles. We're not debugging
binder, so just stub out the debug prints to compile them out entirely.

Signed-off-by: Sultan Alsawaf <[email protected]>
Signed-off-by: Oktapra Amtono <[email protected]>
Signed-off-by: clarencelol <[email protected]>
Signed-off-by: Anush02198 <[email protected]>
Signed-off-by: Divyanshu-Modi <[email protected]>
Signed-off-by: Tashfin Shakeer Rhythm <[email protected]>
Free the pages parallely for a task that receives SIGKILL using the
oom_reaper. This freeing of pages will help to give the pages to buddy
system well advance.
This reaps for the process which received SIGKILL through
either sys_kill from user or kill_pid from kernel and that sending
process has CAP_KILL capability.
Also sysctl interface, reap_mem_on_sigkill, is added to turn on/off this
feature.

[ExactExampl]: make it enabled by default

Change-Id: I21adb95de5e380a80d7eb0b87d9b5b553f52e28a
Signed-off-by: Charan Teja Reddy <[email protected]>
(cherry picked from commit f9920cfa7ecf420e6a1ced2b53920f3ea9ddfc19)
* Since commit 5f51fc208d551731eb6c37aa147c5a3ad111d45d makes oom_reaper work much more often it is good idea to silence it`s logs to keep dmesg readable
Signed-off-by: Francisco Franco <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Tight loop of adreno_spin_idle() causing RT throttling.
Relax the tight loop by giving chance to other thread.

Change-Id: Ic23d4551c0cc0b5f2fa7844ca73444d1412d480c
Signed-off-by: Prakash Kamliya <[email protected]>
Signed-off-by: Raphiel Rollerscaperers <[email protected]>
until wake_up_interruptible_sync returns, couldn't another thread
just take control of the CPU anyway before the return.
This call will also prevent a reschedule.

Should be use wake_up_interruptible() call to immediately reschedule
target_thread instead of using wake_up_interruptible_sync() to block
the target_thread until CPU available to run.

Also protect target_thread with spinlock.

Change-Id: Ib2702bc798bf6927cee753811b39032f0aefe166
Signed-off-by: Jignesh Patel <[email protected]>
Reviewed-on: https://gerrit.mot.com/1264379
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Igor Kovalenko <[email protected]>
Submit-Approved: Jira Key
Call cleancache_init_fs with the VFS super_block object when mounting,
like filesystems are supposed to if they use cleancache. ext4 was used
as a reference.

Signed-off-by: kdrag0n <[email protected]>
Signed-off-by: Twisted Prime <[email protected]>
Peter Zijlstra and others added 12 commits May 10, 2024 10:47
[ Upstream commit 175f0e25abeaa2218d431141ce19cf1de70fa82d ]

As already enforced by the WARN() in __set_cpus_allowed_ptr(), the rules
for running on an online && !active CPU are stricter than just being a
kthread, you need to be a per-cpu kthread.

If you're not strictly per-CPU, you have better CPUs to run on and
don't need the partially booted one to get your work done.

The exception is to allow smpboot threads to bootstrap the CPU itself
and get kernel 'services' initialized before we allow userspace on it.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 955dbdf ("sched: Allow migrating kthreads into online but inactive CPUs")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
There are two places in the cpufreq core in which low-level driver
callbacks may be invoked for an inactive cpufreq policy, which isn't
guaranteed to work in general.  Both are due to possible races with
CPU offline.

First, in cpufreq_get(), the policy may become inactive after
the check against policy->cpus in cpufreq_cpu_get() and before
policy->rwsem is acquired, in which case using it going forward may
not be correct.

Second, an analogous situation is possible in cpufreq_update_policy().

Avoid using inactive policies by adding policy_is_inactive() checks
to the code in the above places.

Change-Id: Ifb10b31b2f56760f2bd795706a3000a884f96187
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Git-commit: 182e36a
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Santosh Mardi <[email protected]>
This allows pm_qos votes with, say, 100 us for example to select power
levels with exit latencies equal to 100 us. The extra microsecond of
exit latency doesn't hurt.

Signed-off-by: Sultan Alsawaf <[email protected]>
Signed-off-by: Lau <[email protected]>
Signed-off-by: GeoPD <[email protected]>
Calling clock_debug_print_enabled with print_parent = true
during suspend may cause a scheduling while atomic violation.
Call with print_parent = false instead to prevent the violation.

Bug: 132511008
Change-Id: I80f646d77d0cc98b4004084022ce1dce0e80cc93
Signed-off-by: Jonglin Lee <[email protected]>
Signed-off-by: GeoPD <[email protected]>
…asks"

* Revert "sched/core: Fix PI boosting between RT and DEADLINE tasks"
* The commit creates frequent jitter spikes and affects boosting behavior.

This reverts commit 37b31f4.

Signed-off-by: Chenyang Zhong <[email protected]>
Change-Id: Ib400a8522c471bbdb807953d8ccdf89d7e4997ee
* If disable, this seems to impact battery drain and performance negatively.

Signed-off-by: Tashfin Shakeer Rhythm <[email protected]>
Change-Id: Ia6d68f246aeda1ef9ef8b0691b8f46495ebcdc76
Signed-off-by: TogoFire <[email protected]>
…mall cores

We want boosted tasks to run on big cores. But CAF's load balancer
changes do not account for SchedTune boosting, so this allows for
boosted tasks to be migrated to a suboptimal core. Let's mitigate
this by setting the LBF_IGNORE_BIG_TASKS for tasks migrating from a
larger capacity core to a smaller one and to check if the task is
SchedTune boosted. If both are true, do not migrate this task.

Signed-off-by: Zachariah Kennedy <[email protected]>
Signed-off-by: twisted <[email protected]>
quiet_vmstat() is an expensive function that only makes sense when we
go into NOHZ.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Diab Neiroukh <[email protected]>
Signed-off-by: Lau <[email protected]>
Currently iowait doesn't distinguish background/foreground tasks and we
have seen cases where a device run to high frequency unnecessarily when
running some background I/O. This patch limits iowait boost to tasks with
prefer_idle only. Specifically, on Pixel, those are foreground and top
app tasks.

Bug: 130308826
Test: Boot and trace
Change-Id: I2d892beeb4b12b7e8f0fb2848c23982148648a10
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Lau <[email protected]>
…rent

The OR logic always evaluates to true. Fix it by changing the
logic to AND, since the comment says ESR measurement current
should be between 60 and 240.

Signed-off-by: Chenyang Zhong <[email protected]>
Change-Id: I7b0707f59f8fbd1a2c4a0b2072c44c3dfbe00b7b
@j77197272 j77197272 changed the title Added workaround for apple dac Workaround for low volume with Apple USB-C to 3.5mm Adapter Aug 12, 2024
@DhineshCool DhineshCool force-pushed the unity branch 2 times, most recently from acff04f to 158bbf5 Compare August 14, 2024 14:21
DhineshCool pushed a commit that referenced this pull request Aug 14, 2024
Bing-Jhong Billy Jheng reported null-ptr-deref in unix_stream_sendpage()
with detailed analysis and a nice repro.

unix_stream_sendpage() tries to add data to the last skb in the peer's
recv queue without locking the queue.

If the peer's FD is passed to another socket and the socket's FD is
passed to the peer, there is a loop between them.  If we close both
sockets without receiving FD, the sockets will be cleaned up by garbage
collection.

The garbage collection iterates such sockets and unlinks skb with
FD from the socket's receive queue under the queue's lock.

So, there is a race where unix_stream_sendpage() could access an skb
locklessly that is being released by garbage collection, resulting in
use-after-free.

To avoid the issue, unix_stream_sendpage() must lock the peer's recv
queue.

Note the issue does not exist in 6.5+ thanks to the recent sendpage()
refactoring.

This patch is originally written by Linus Torvalds.

BUG: unable to handle page fault for address: ffff988004dd6870
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
PREEMPT SMP PTI
CPU: 4 PID: 297 Comm: garbage_uaf Not tainted 6.1.46 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmem_cache_alloc_node+0xa2/0x1e0
Code: c0 0f 84 32 01 00 00 41 83 fd ff 74 10 48 8b 00 48 c1 e8 3a 41 39 c5 0f 85 1c 01 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 40 <49> 8b 1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 74 a1 41 8b 44
RSP: 0018:ffffc9000079fac0 EFLAGS: 00000246
RAX: 0000000000000070 RBX: 0000000000000005 RCX: 000000000001a284
RDX: 000000000001a244 RSI: 0000000000400cc0 RDI: 000000000002eee0
RBP: 0000000000400cc0 R08: 0000000000400cc0 R09: 0000000000000003
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888003970f00
R13: 00000000ffffffff R14: ffff988004dd6800 R15: 00000000000000e8
FS:  00007f174d6f3600(0000) GS:ffff88807db00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff988004dd6870 CR3: 00000000092be000 CR4: 00000000007506e0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die_body.cold+0x1a/0x1f
 ? page_fault_oops+0xa9/0x1e0
 ? fixup_exception+0x1d/0x310
 ? exc_page_fault+0xa8/0x150
 ? asm_exc_page_fault+0x22/0x30
 ? kmem_cache_alloc_node+0xa2/0x1e0
 ? __alloc_skb+0x16c/0x1e0
 __alloc_skb+0x16c/0x1e0
 alloc_skb_with_frags+0x48/0x1e0
 sock_alloc_send_pskb+0x234/0x270
 unix_stream_sendmsg+0x1f5/0x690
 sock_sendmsg+0x5d/0x60
 ____sys_sendmsg+0x210/0x260
 ___sys_sendmsg+0x83/0xd0
 ? kmem_cache_alloc+0xc6/0x1c0
 ? avc_disable+0x20/0x20
 ? percpu_counter_add_batch+0x53/0xc0
 ? alloc_empty_file+0x5d/0xb0
 ? alloc_file+0x91/0x170
 ? alloc_file_pseudo+0x94/0x100
 ? __fget_light+0x9f/0x120
 __sys_sendmsg+0x54/0xa0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x69/0xd3
RIP: 0033:0x7f174d639a7d
Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 8a c1 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 de c1 f4 ff 48
RSP: 002b:00007ffcb563ea50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f174d639a7d
RDX: 0000000000000000 RSI: 00007ffcb563eab0 RDI: 0000000000000007
RBP: 00007ffcb563eb10 R08: 0000000000000000 R09: 00000000ffffffff
R10: 00000000004040a0 R11: 0000000000000293 R12: 00007ffcb563ec28
R13: 0000000000401398 R14: 0000000000403e00 R15: 00007f174d72c000
 </TASK>

Bug: 299922588
Fixes: 869e7c6 ("net: af_unix: implement stream sendpage support")
Reported-by: Bing-Jhong Billy Jheng <[email protected]>
Reviewed-by: Bing-Jhong Billy Jheng <[email protected]>
Co-developed-by: Linus Torvalds <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 790c2f9d15b594350ae9bca7b236f2b1859de02c)
Signed-off-by: Lee Jones <[email protected]>
Change-Id: Ied4a40e6ae0e594ef84f36287bc6194602935585
DhineshCool pushed a commit that referenced this pull request Aug 14, 2024
Author: @tanish2k09 (email: [email protected])

What is it?
Kernel-based Lapse ("K-Lapse") is a linear RGB scaling module that 'shifts' RGB based on time (of the day/selected by user), or (since v2.0) brightness. This concept is inspired from
LineageOS (formerly known as 'CyanogenMod') ROM's feature "livedisplay" which also changes the display settings (RGB, hue, temperature, etc) based on time.

Why did you decide to make this? (Tell me a story).
I (personally) am a big fan of the livedisplay feature found on LineageOS ROM. I used it every single day, since Android Lollipop. Starting from Android Nougat, a native night mode
solution was added to AOSP and it felt like livedisplay was still way superior, thanks to its various options (you could say it spoiled me, sure). I also maintained a kernel (Venom
kernel) for the device I was using at that time. It was all good until the OEM dropped support for the device at Android M, and XDA being XDA, was already working on N ROMs. The issue
was, these ROMs weren't LineageOS or based on it, so livedisplay was... gone. I decided I'll try to bring that feature to every other ROM. How would I do that? Of course! The kernel! It
worked on every single ROM, it was the key! I started to work on it ASAP and here it is, up on GitHub, licensed under GPL (check klapse.c), open to everyone :)

How does it work?
Think of it like a fancy night mode, but not really. Klapse is dependent on an RGB interface (like Gamma on MTK and KCAL on SD chipsets). It fetches time from the kernel, converts it to
local time, and selects and RGB set based on the time. The result is really smooth shifting of RGB over time.

How does it really work (dev)?
Klapse mode 1 (time-based scaling) uses a method void klapse_pulse(void) that should ideally be called every minute. This can be done by injecting a pulse call inside another method that
is called repeatedly naturally, like cpufreq or atomic or frame commits. It can be anything, whatever you like, even a kthread, as long as it is called repeatedly naturally. To execute
every 60 seconds, use jiffies or ktime, or any similar method. The pulse function fetches the current time and makes calculations based on the current hour and the values of the tunables
listed down below.

Klapse mode 2 (brightness-based scaling) uses a method void set_rgb_slider(<type> bl_lvl) where is the data type of the brightness level used in your kernel source. (OnePlus 6 uses u32
data type for bl_lvl) set_rgb_slider needs to be called/injected inside a function that sets brightness for your device. (OnePlus 6 uses dsi_panel.c for that, check out the diff for that
file in /op6)

What all stuff can it do?

1, Emulate night mode with the proper RGB settings
2, Smoothly scale from one set of RGB to another set of RGB in integral intervals over time.
3, Reduce perceived brightness using brightness_factor by reducing the amount of color on screen. Allows lower apparent brightness than system permits.
4, Scale RGB based on brightness of display (low brightness usually implies a dark environment, where yellowness is probably useful).
5, Automate the perceived brightness independent of whether klapse is enabled, using its own set of start and stop hours.
6, Be more efficient,faster by residing inside the kernel instead of having to use the HWC HAL like android's night mode.
7, (On older devices) Reduce stuttering or frame lags caused by native night mode.
8, An easier solution against overlay-based apps that run as service in userspace/Android and sometimes block apps asking for permissions.
9, Give you a Livedisplay alternative if it doesn't work in your ROM.
10, Impress your crush so you can get a date (Hey, don't forget to credit me if it works).

Alright, so this is a replacement for night mode?
NO! Not at all. One can say this is merely an alternative for LineageOS' Livedisplay, but inside a kernel. Night mode is a sub-function of both Livedisplay and KLapse. Most comparisons
here were made with night mode because that's what an average user uses, and will relate to the most. There is absolutely no reason for your Android kernel to not have KLapse. Go ahead
and add it or ask your kernel maintainer to. It's super-easy!

What can it NOT do (yet)?

1, Calculate scaling to the level of minutes, like "Start from 5:37pm till 7:19am". --TODO
2, Make coffee for you.
3, Fly you to the moon. Without a heavy suit.
4, Get you a monthly subscription of free food, cereal included.

All these following tunables are found in their respective files in /sys/klapse/

1. enable_klapse : A switch to enable or disable klapse. Values : 0 = off, 1 = on (since v2.0, 2 = brightness-dependent mode)
2. klapse_start_hour : The hour at which klapse should start scaling the RGB values from daytime to target (see next points). Values : 0-23
3. klapse_stop_hour : The hour by which klapse should scale back the RGB values from target to daytime (see next points). Values : 0-23
4. daytime_rgb : The RGB set that must be used for all the time outside of start and stop hour range.
5. target_rgb : The RGB set that must be scaled towards for all the time inside of start and stop hour range.
6. klapse_scaling_rate : Controls how soon the RGB reaches from daytime to target inside of start and stop hour range. Once target is reached, it remains constant till 30 minutes before
   stop hour, where target RGB scales back to daytime RGB.
7. brightness_factor : From the name itself, this value has the ability to bend perception and make your display appear as if it is at a lesser brightness level than it actually is at.
   It works by reducing the RGB values by the same factor. Values : 2-10, (10 means accurate brightness, 5 means 50% of current brightness, you get it)
8. brightness_factor_auto : A switch that allows you to automatically set the brightness factor in a set time range. Value : 0 = off, 1 = on
9. brightness_factor_auto_start_hour : The hour at which brightness_factor should be applied. Works only if #8 is 1. Values : 0-23
10. brightness_factor_auto_stop_hour : The hour at which brightness_factor should be reverted to 10. Works only if #8 is 1. Values : 0-23
11. backlight_range : The brightness range within which klapse should scale from daytime to target_rgb. Works only if #1 is 2. Values : MIN_BRIGHTNESS-MAX_BRIGHTNESS

(cherry picked from commit db025ef403939125f1541a5ff92ad30a2be5253c)
(cherry picked from commit f2bc75b94628ada76167949eeb81ad9e7deb5123)
DhineshCool pushed a commit that referenced this pull request Aug 14, 2024
[ Upstream commit cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f ]

Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.

This leads to the following splat:

    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
    Hardware name: ARM Juno development board (r0) (DT)
    Workqueue: events cpuset_hotplug_workfn
    pstate: 60000005 (nZCv daif -PAN -UAO)
    pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    lr : build_sched_domains (kernel/sched/topology.c:1966)
    Call trace:
    build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    partition_sched_domains_locked (kernel/sched/topology.c:2250)
    rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
    rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
    cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
    process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
    worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
    kthread (kernel/kthread.c:255)
    ret_from_fork (arch/arm64/kernel/entry.S:1167)
    Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)

The faulty line in question is:

  cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));

and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.

Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.

The above splat was obtained on my Juno r0 with the following reproducer:

  $ cgcreate -g cpuset:asym
  $ cgset -r cpuset.cpus=0-3 asym
  $ cgset -r cpuset.mems=0 asym
  $ cgset -r cpuset.cpu_exclusive=1 asym

  $ cgcreate -g cpuset:smp
  $ cgset -r cpuset.cpus=4-5 smp
  $ cgset -r cpuset.mems=0 smp
  $ cgset -r cpuset.cpu_exclusive=1 smp

  $ cgset -r cpuset.sched_load_balance=0 .

  $ echo 0 > /sys/devices/system/cpu/cpu4/online
  $ echo 0 > /sys/devices/system/cpu/cpu5/online

Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
DhineshCool pushed a commit that referenced this pull request Aug 14, 2024
This patch implements deduplication feature in zram. The purpose
of this work is naturally to save amount of memory usage by zram.

Android is one of the biggest users to use zram as swap and it's
really important to save amount of memory usage. There is a paper
that reports that duplication ratio of Android's memory content is
rather high [1]. And, there is a similar work on zswap that also
reports that experiments has shown that around 10-15% of pages
stored in zswp are duplicates and deduplicate them provides some
benefits [2].

Also, there is a different kind of workload that uses zram as blockdev
and store build outputs into it to reduce wear-out problem of real
blockdev. In this workload, deduplication hit is very high due to
temporary files and intermediate object files. Detailed analysis is
on the bottom of this description.

Anyway, if we can detect duplicated content and avoid to store duplicated
content at different memory space, we can save memory. This patch
tries to do that.

Implementation is almost simple and intuitive but I should note
one thing about implementation detail.

To check duplication, this patch uses checksum of the page and
collision of this checksum could be possible. There would be
many choices to handle this situation but this patch chooses
to allow entry with duplicated checksum to be added to the hash,
but, not to compare all entries with duplicated checksum
when checking duplication. I guess that checksum collision is quite
rare event and we don't need to pay any attention to such a case.
Therefore, I decided the most simplest way to implement the feature.
If there is a different opinion, I can accept and go that way.

Following is the result of this patch.

Test result #1 (Swap):
Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18

orig_data_size: 145297408
compr_data_size: 32408125
mem_used_total: 32276480
dup_data_size: 3188134
meta_data_size: 1444272

Last two metrics added to mm_stat are related to this work.
First one, dup_data_size, is amount of saved memory by avoiding
to store duplicated page. Later one, meta_data_size, is the amount of
data structure to support deduplication. If dup > meta, we can judge
that the patch improves memory usage.

In Adnroid, we can save 5% of memory usage by this work.

Test result #2 (Blockdev):
build the kernel and store output to ext4 FS on zram

<no-dedup>
Elapsed time: 249 s
mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0

<dedup>
Elapsed time: 250 s
mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038  3945792

There is no performance degradation and save 23% memory.

Test result #3 (Blockdev):
copy android build output dir(out/host) to ext4 FS on zram

<no-dedup>
Elapsed time: out/host: 88 s
mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0

<dedup>
Elapsed time: out/host: 100 s
mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 80880336

It shows performance degradation roughly 13% and save 24% memory. Maybe,
it is due to overhead of calculating checksum and comparison.

Test result #4 (Blockdev):
copy android build output dir(out/target/common) to ext4 FS on zram

<no-dedup>
Elapsed time: out/host: 203 s
mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0

<dedup>
Elapsed time: out/host: 201 s
mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336

Memory is saved by 42% and performance is the same. Even if there is overhead
of calculating checksum and comparison, large hit ratio compensate it since
hit leads to less compression attempt.

I checked the detailed reason of savings on kernel build workload and
there are some cases that deduplication happens.

1) *.cmd
Build command is usually similar in one directory so content of these file
are very similar. In my system, more than 789 lines in fs/ext4/.namei.o.cmd
and fs/ext4/.inode.o.cmd are the same in 944 and 938 lines of the file,
respectively.

2) intermediate object files
built-in.o and temporary object file have the similar contents. More than
50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o.

3) vmlinux
.tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boo/compressed/vmlinux.bin
have the similar contents.

Android test has similar case that some of object files(.class
and .so) are similar with another ones.
(./host/linux-x86/lib/libartd.so and
./host/linux-x86-lib/libartd-comiler.so)

Anyway, benefit seems to be largely dependent on the workload so
following patch will make this feature optional. However, this feature
can help some usecases so is deserved to be merged.

[1]: MemScope: Analyzing Memory Duplication on Android Systems,
dl.acm.org/citation.cfm?id=2797023
[2]: zswap: Optimize compressed pool memory utilization,
lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2

Change-Id: I8fe80c956c33f88a6af337d50d9e210e5c35ce37
Reviewed-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
Link: https://lore.kernel.org/patchwork/patch/787162/
Patch-mainline: linux-kernel@ Thu, 11 May 2017 22:30:26
Signed-off-by: Charan Teja Reddy <[email protected]>
Signed-off-by: Marco Zanin <[email protected]>
Signed-off-by: snnbyyds <[email protected]>
DhineshCool pushed a commit that referenced this pull request Aug 15, 2024
Author: @tanish2k09 (email: [email protected])

What is it?
Kernel-based Lapse ("K-Lapse") is a linear RGB scaling module that 'shifts' RGB based on time (of the day/selected by user), or (since v2.0) brightness. This concept is inspired from
LineageOS (formerly known as 'CyanogenMod') ROM's feature "livedisplay" which also changes the display settings (RGB, hue, temperature, etc) based on time.

Why did you decide to make this? (Tell me a story).
I (personally) am a big fan of the livedisplay feature found on LineageOS ROM. I used it every single day, since Android Lollipop. Starting from Android Nougat, a native night mode
solution was added to AOSP and it felt like livedisplay was still way superior, thanks to its various options (you could say it spoiled me, sure). I also maintained a kernel (Venom
kernel) for the device I was using at that time. It was all good until the OEM dropped support for the device at Android M, and XDA being XDA, was already working on N ROMs. The issue
was, these ROMs weren't LineageOS or based on it, so livedisplay was... gone. I decided I'll try to bring that feature to every other ROM. How would I do that? Of course! The kernel! It
worked on every single ROM, it was the key! I started to work on it ASAP and here it is, up on GitHub, licensed under GPL (check klapse.c), open to everyone :)

How does it work?
Think of it like a fancy night mode, but not really. Klapse is dependent on an RGB interface (like Gamma on MTK and KCAL on SD chipsets). It fetches time from the kernel, converts it to
local time, and selects and RGB set based on the time. The result is really smooth shifting of RGB over time.

How does it really work (dev)?
Klapse mode 1 (time-based scaling) uses a method void klapse_pulse(void) that should ideally be called every minute. This can be done by injecting a pulse call inside another method that
is called repeatedly naturally, like cpufreq or atomic or frame commits. It can be anything, whatever you like, even a kthread, as long as it is called repeatedly naturally. To execute
every 60 seconds, use jiffies or ktime, or any similar method. The pulse function fetches the current time and makes calculations based on the current hour and the values of the tunables
listed down below.

Klapse mode 2 (brightness-based scaling) uses a method void set_rgb_slider(<type> bl_lvl) where is the data type of the brightness level used in your kernel source. (OnePlus 6 uses u32
data type for bl_lvl) set_rgb_slider needs to be called/injected inside a function that sets brightness for your device. (OnePlus 6 uses dsi_panel.c for that, check out the diff for that
file in /op6)

What all stuff can it do?

1, Emulate night mode with the proper RGB settings
2, Smoothly scale from one set of RGB to another set of RGB in integral intervals over time.
3, Reduce perceived brightness using brightness_factor by reducing the amount of color on screen. Allows lower apparent brightness than system permits.
4, Scale RGB based on brightness of display (low brightness usually implies a dark environment, where yellowness is probably useful).
5, Automate the perceived brightness independent of whether klapse is enabled, using its own set of start and stop hours.
6, Be more efficient,faster by residing inside the kernel instead of having to use the HWC HAL like android's night mode.
7, (On older devices) Reduce stuttering or frame lags caused by native night mode.
8, An easier solution against overlay-based apps that run as service in userspace/Android and sometimes block apps asking for permissions.
9, Give you a Livedisplay alternative if it doesn't work in your ROM.
10, Impress your crush so you can get a date (Hey, don't forget to credit me if it works).

Alright, so this is a replacement for night mode?
NO! Not at all. One can say this is merely an alternative for LineageOS' Livedisplay, but inside a kernel. Night mode is a sub-function of both Livedisplay and KLapse. Most comparisons
here were made with night mode because that's what an average user uses, and will relate to the most. There is absolutely no reason for your Android kernel to not have KLapse. Go ahead
and add it or ask your kernel maintainer to. It's super-easy!

What can it NOT do (yet)?

1, Calculate scaling to the level of minutes, like "Start from 5:37pm till 7:19am". --TODO
2, Make coffee for you.
3, Fly you to the moon. Without a heavy suit.
4, Get you a monthly subscription of free food, cereal included.

All these following tunables are found in their respective files in /sys/klapse/

1. enable_klapse : A switch to enable or disable klapse. Values : 0 = off, 1 = on (since v2.0, 2 = brightness-dependent mode)
2. klapse_start_hour : The hour at which klapse should start scaling the RGB values from daytime to target (see next points). Values : 0-23
3. klapse_stop_hour : The hour by which klapse should scale back the RGB values from target to daytime (see next points). Values : 0-23
4. daytime_rgb : The RGB set that must be used for all the time outside of start and stop hour range.
5. target_rgb : The RGB set that must be scaled towards for all the time inside of start and stop hour range.
6. klapse_scaling_rate : Controls how soon the RGB reaches from daytime to target inside of start and stop hour range. Once target is reached, it remains constant till 30 minutes before
   stop hour, where target RGB scales back to daytime RGB.
7. brightness_factor : From the name itself, this value has the ability to bend perception and make your display appear as if it is at a lesser brightness level than it actually is at.
   It works by reducing the RGB values by the same factor. Values : 2-10, (10 means accurate brightness, 5 means 50% of current brightness, you get it)
8. brightness_factor_auto : A switch that allows you to automatically set the brightness factor in a set time range. Value : 0 = off, 1 = on
9. brightness_factor_auto_start_hour : The hour at which brightness_factor should be applied. Works only if #8 is 1. Values : 0-23
10. brightness_factor_auto_stop_hour : The hour at which brightness_factor should be reverted to 10. Works only if #8 is 1. Values : 0-23
11. backlight_range : The brightness range within which klapse should scale from daytime to target_rgb. Works only if #1 is 2. Values : MIN_BRIGHTNESS-MAX_BRIGHTNESS

(cherry picked from commit db025ef403939125f1541a5ff92ad30a2be5253c)
(cherry picked from commit f2bc75b94628ada76167949eeb81ad9e7deb5123)
DhineshCool pushed a commit that referenced this pull request Aug 15, 2024
[ Upstream commit cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f ]

Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.

This leads to the following splat:

    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
    Hardware name: ARM Juno development board (r0) (DT)
    Workqueue: events cpuset_hotplug_workfn
    pstate: 60000005 (nZCv daif -PAN -UAO)
    pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    lr : build_sched_domains (kernel/sched/topology.c:1966)
    Call trace:
    build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    partition_sched_domains_locked (kernel/sched/topology.c:2250)
    rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
    rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
    cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
    process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
    worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
    kthread (kernel/kthread.c:255)
    ret_from_fork (arch/arm64/kernel/entry.S:1167)
    Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)

The faulty line in question is:

  cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));

and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.

Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.

The above splat was obtained on my Juno r0 with the following reproducer:

  $ cgcreate -g cpuset:asym
  $ cgset -r cpuset.cpus=0-3 asym
  $ cgset -r cpuset.mems=0 asym
  $ cgset -r cpuset.cpu_exclusive=1 asym

  $ cgcreate -g cpuset:smp
  $ cgset -r cpuset.cpus=4-5 smp
  $ cgset -r cpuset.mems=0 smp
  $ cgset -r cpuset.cpu_exclusive=1 smp

  $ cgset -r cpuset.sched_load_balance=0 .

  $ echo 0 > /sys/devices/system/cpu/cpu4/online
  $ echo 0 > /sys/devices/system/cpu/cpu5/online

Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
DhineshCool pushed a commit that referenced this pull request Dec 7, 2024
[ Upstream commit 4d322dce82a1d44f8c83f0f54f95dd1b8dcf46c9 ]

syzbot reported a lockdep splat [1].

Blamed commit hinted about the possible lockdep
violation, and code used unix_state_lock_nested()
in an attempt to silence lockdep.

It is not sufficient, because unix_state_lock_nested()
is already used from unix_state_double_lock().

We need to use a separate subclass.

This patch adds a distinct enumeration to make things
more explicit.

Also use swap() in unix_state_double_lock() as a clean up.

v2: add a missing inline keyword to unix_state_lock_nested()

[1]
WARNING: possible circular locking dependency detected
6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Not tainted

syz-executor.1/2542 is trying to acquire lock:
 ffff88808b5df9e8 (rlock-AF_UNIX){+.+.}-{2:2}, at: skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863

but task is already holding lock:
 ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&u->lock/1){+.+.}-{2:2}:
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
        sk_diag_dump_icons net/unix/diag.c:87 [inline]
        sk_diag_fill+0x6ea/0xfe0 net/unix/diag.c:157
        sk_diag_dump net/unix/diag.c:196 [inline]
        unix_diag_dump+0x3e9/0x630 net/unix/diag.c:220
        netlink_dump+0x5c1/0xcd0 net/netlink/af_netlink.c:2264
        __netlink_dump_start+0x5d7/0x780 net/netlink/af_netlink.c:2370
        netlink_dump_start include/linux/netlink.h:338 [inline]
        unix_diag_handler_dump+0x1c3/0x8f0 net/unix/diag.c:319
       sock_diag_rcv_msg+0xe3/0x400
        netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2543
        sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
        netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
        netlink_unicast+0x7e6/0x980 net/netlink/af_netlink.c:1367
        netlink_sendmsg+0xa37/0xd70 net/netlink/af_netlink.c:1908
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        sock_write_iter+0x39a/0x520 net/socket.c:1160
        call_write_iter include/linux/fs.h:2085 [inline]
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0xa74/0xca0 fs/read_write.c:590
        ksys_write+0x1a0/0x2c0 fs/read_write.c:643
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b

-> #0 (rlock-AF_UNIX){+.+.}-{2:2}:
        check_prev_add kernel/locking/lockdep.c:3134 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
        skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
        unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        ____sys_sendmsg+0x592/0x890 net/socket.c:2584
        ___sys_sendmsg net/socket.c:2638 [inline]
        __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
        __do_sys_sendmmsg net/socket.c:2753 [inline]
        __se_sys_sendmmsg net/socket.c:2750 [inline]
        __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&u->lock/1);
                               lock(rlock-AF_UNIX);
                               lock(&u->lock/1);
  lock(rlock-AF_UNIX);

 *** DEADLOCK ***

1 lock held by syz-executor.1/2542:
  #0: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089

stack backtrace:
CPU: 1 PID: 2542 Comm: syz-executor.1 Not tainted 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
  check_noncircular+0x366/0x490 kernel/locking/lockdep.c:2187
  check_prev_add kernel/locking/lockdep.c:3134 [inline]
  check_prevs_add kernel/locking/lockdep.c:3253 [inline]
  validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
  __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
  lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
  skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
  unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg net/socket.c:745 [inline]
  ____sys_sendmsg+0x592/0x890 net/socket.c:2584
  ___sys_sendmsg net/socket.c:2638 [inline]
  __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
  __do_sys_sendmmsg net/socket.c:2753 [inline]
  __se_sys_sendmmsg net/socket.c:2750 [inline]
  __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f26d887cda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f26d95a60c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f26d89abf80 RCX: 00007f26d887cda9
RDX: 000000000000003e RSI: 00000000200bd000 RDI: 0000000000000004
RBP: 00007f26d88c947a R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000008c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f26d89abf80 R15: 00007ffcfe081a68

Fixes: 2aac7a2 ("unix_diag: Pending connections IDs NLA")
Reported-by: syzbot <[email protected]>
Change-Id: I1edd4036bd364c8fa0d83d4675b31f5b7f9d9958
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 875f31aaa67e306098befa5e798a049075910fa7)
DhineshCool pushed a commit that referenced this pull request Dec 7, 2024
…nix_gc().

commit 1971d13ffa84a551d29a81fdf5b5ec5be166ac83 upstream.

syzbot reported a lockdep splat regarding unix_gc_lock and
unix_state_lock().

One is called from recvmsg() for a connected socket, and another
is called from GC for TCP_LISTEN socket.

So, the splat is false-positive.

Let's add a dedicated lock class for the latter to suppress the splat.

Note that this change is not necessary for net-next.git as the issue
is only applied to the old GC impl.

[0]:
WARNING: possible circular locking dependency detected
6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted
 -----------------------------------------------------
kworker/u8:1/11 is trying to acquire lock:
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302

but task is already holding lock:
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

 -> #1 (unix_gc_lock){+.+.}-{2:2}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:351 [inline]
       unix_notinflight+0x13d/0x390 net/unix/garbage.c:140
       unix_detach_fds net/unix/af_unix.c:1819 [inline]
       unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876
       skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188
       skb_release_all net/core/skbuff.c:1200 [inline]
       __kfree_skb net/core/skbuff.c:1216 [inline]
       kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252
       kfree_skb include/linux/skbuff.h:1262 [inline]
       manage_oob net/unix/af_unix.c:2672 [inline]
       unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749
       unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981
       do_splice_read fs/splice.c:985 [inline]
       splice_file_to_pipe+0x299/0x500 fs/splice.c:1295
       do_splice+0xf2d/0x1880 fs/splice.c:1379
       __do_splice fs/splice.c:1436 [inline]
       __do_sys_splice fs/splice.c:1652 [inline]
       __se_sys_splice+0x331/0x4a0 fs/splice.c:1634
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

 -> #0 (&u->lock){+.+.}-{2:2}:
       check_prev_add kernel/locking/lockdep.c:3134 [inline]
       check_prevs_add kernel/locking/lockdep.c:3253 [inline]
       validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
       __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:351 [inline]
       __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
       process_one_work kernel/workqueue.c:3254 [inline]
       process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
       worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
       kthread+0x2f0/0x390 kernel/kthread.c:388
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(unix_gc_lock);
                               lock(&u->lock);
                               lock(unix_gc_lock);
  lock(&u->lock);

 *** DEADLOCK ***

3 locks held by kworker/u8:1/11:
 #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
 #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335
 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
 #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335
 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
 #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261

stack backtrace:
CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: events_unbound __unix_gc
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
 check_prev_add kernel/locking/lockdep.c:3134 [inline]
 check_prevs_add kernel/locking/lockdep.c:3253 [inline]
 validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
 _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
 spin_lock include/linux/spinlock.h:351 [inline]
 __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
 process_one_work kernel/workqueue.c:3254 [inline]
 process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
 kthread+0x2f0/0x390 kernel/kthread.c:388
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>

Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307
Change-Id: If679e3c1dd8a6ae69a421de905905b4fe1e353a5
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit b29dcdd0582c00cd6ee0bd7c958d3639aa9db27f)
DhineshCool pushed a commit that referenced this pull request Dec 7, 2024
Author: @tanish2k09 (email: [email protected])

What is it?
Kernel-based Lapse ("K-Lapse") is a linear RGB scaling module that 'shifts' RGB based on time (of the day/selected by user), or (since v2.0) brightness. This concept is inspired from
LineageOS (formerly known as 'CyanogenMod') ROM's feature "livedisplay" which also changes the display settings (RGB, hue, temperature, etc) based on time.

Why did you decide to make this? (Tell me a story).
I (personally) am a big fan of the livedisplay feature found on LineageOS ROM. I used it every single day, since Android Lollipop. Starting from Android Nougat, a native night mode
solution was added to AOSP and it felt like livedisplay was still way superior, thanks to its various options (you could say it spoiled me, sure). I also maintained a kernel (Venom
kernel) for the device I was using at that time. It was all good until the OEM dropped support for the device at Android M, and XDA being XDA, was already working on N ROMs. The issue
was, these ROMs weren't LineageOS or based on it, so livedisplay was... gone. I decided I'll try to bring that feature to every other ROM. How would I do that? Of course! The kernel! It
worked on every single ROM, it was the key! I started to work on it ASAP and here it is, up on GitHub, licensed under GPL (check klapse.c), open to everyone :)

How does it work?
Think of it like a fancy night mode, but not really. Klapse is dependent on an RGB interface (like Gamma on MTK and KCAL on SD chipsets). It fetches time from the kernel, converts it to
local time, and selects and RGB set based on the time. The result is really smooth shifting of RGB over time.

How does it really work (dev)?
Klapse mode 1 (time-based scaling) uses a method void klapse_pulse(void) that should ideally be called every minute. This can be done by injecting a pulse call inside another method that
is called repeatedly naturally, like cpufreq or atomic or frame commits. It can be anything, whatever you like, even a kthread, as long as it is called repeatedly naturally. To execute
every 60 seconds, use jiffies or ktime, or any similar method. The pulse function fetches the current time and makes calculations based on the current hour and the values of the tunables
listed down below.

Klapse mode 2 (brightness-based scaling) uses a method void set_rgb_slider(<type> bl_lvl) where is the data type of the brightness level used in your kernel source. (OnePlus 6 uses u32
data type for bl_lvl) set_rgb_slider needs to be called/injected inside a function that sets brightness for your device. (OnePlus 6 uses dsi_panel.c for that, check out the diff for that
file in /op6)

What all stuff can it do?

1, Emulate night mode with the proper RGB settings
2, Smoothly scale from one set of RGB to another set of RGB in integral intervals over time.
3, Reduce perceived brightness using brightness_factor by reducing the amount of color on screen. Allows lower apparent brightness than system permits.
4, Scale RGB based on brightness of display (low brightness usually implies a dark environment, where yellowness is probably useful).
5, Automate the perceived brightness independent of whether klapse is enabled, using its own set of start and stop hours.
6, Be more efficient,faster by residing inside the kernel instead of having to use the HWC HAL like android's night mode.
7, (On older devices) Reduce stuttering or frame lags caused by native night mode.
8, An easier solution against overlay-based apps that run as service in userspace/Android and sometimes block apps asking for permissions.
9, Give you a Livedisplay alternative if it doesn't work in your ROM.
10, Impress your crush so you can get a date (Hey, don't forget to credit me if it works).

Alright, so this is a replacement for night mode?
NO! Not at all. One can say this is merely an alternative for LineageOS' Livedisplay, but inside a kernel. Night mode is a sub-function of both Livedisplay and KLapse. Most comparisons
here were made with night mode because that's what an average user uses, and will relate to the most. There is absolutely no reason for your Android kernel to not have KLapse. Go ahead
and add it or ask your kernel maintainer to. It's super-easy!

What can it NOT do (yet)?

1, Calculate scaling to the level of minutes, like "Start from 5:37pm till 7:19am". --TODO
2, Make coffee for you.
3, Fly you to the moon. Without a heavy suit.
4, Get you a monthly subscription of free food, cereal included.

All these following tunables are found in their respective files in /sys/klapse/

1. enable_klapse : A switch to enable or disable klapse. Values : 0 = off, 1 = on (since v2.0, 2 = brightness-dependent mode)
2. klapse_start_hour : The hour at which klapse should start scaling the RGB values from daytime to target (see next points). Values : 0-23
3. klapse_stop_hour : The hour by which klapse should scale back the RGB values from target to daytime (see next points). Values : 0-23
4. daytime_rgb : The RGB set that must be used for all the time outside of start and stop hour range.
5. target_rgb : The RGB set that must be scaled towards for all the time inside of start and stop hour range.
6. klapse_scaling_rate : Controls how soon the RGB reaches from daytime to target inside of start and stop hour range. Once target is reached, it remains constant till 30 minutes before
   stop hour, where target RGB scales back to daytime RGB.
7. brightness_factor : From the name itself, this value has the ability to bend perception and make your display appear as if it is at a lesser brightness level than it actually is at.
   It works by reducing the RGB values by the same factor. Values : 2-10, (10 means accurate brightness, 5 means 50% of current brightness, you get it)
8. brightness_factor_auto : A switch that allows you to automatically set the brightness factor in a set time range. Value : 0 = off, 1 = on
9. brightness_factor_auto_start_hour : The hour at which brightness_factor should be applied. Works only if #8 is 1. Values : 0-23
10. brightness_factor_auto_stop_hour : The hour at which brightness_factor should be reverted to 10. Works only if #8 is 1. Values : 0-23
11. backlight_range : The brightness range within which klapse should scale from daytime to target_rgb. Works only if #1 is 2. Values : MIN_BRIGHTNESS-MAX_BRIGHTNESS

(cherry picked from commit db025ef403939125f1541a5ff92ad30a2be5253c)
(cherry picked from commit f2bc75b94628ada76167949eeb81ad9e7deb5123)
DhineshCool pushed a commit that referenced this pull request Dec 7, 2024
[ Upstream commit cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f ]

Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.

This leads to the following splat:

    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    Modules linked in:
    CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
    Hardware name: ARM Juno development board (r0) (DT)
    Workqueue: events cpuset_hotplug_workfn
    pstate: 60000005 (nZCv daif -PAN -UAO)
    pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    lr : build_sched_domains (kernel/sched/topology.c:1966)
    Call trace:
    build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
    partition_sched_domains_locked (kernel/sched/topology.c:2250)
    rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
    rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
    cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
    process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
    worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
    kthread (kernel/kthread.c:255)
    ret_from_fork (arch/arm64/kernel/entry.S:1167)
    Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)

The faulty line in question is:

  cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));

and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.

Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.

The above splat was obtained on my Juno r0 with the following reproducer:

  $ cgcreate -g cpuset:asym
  $ cgset -r cpuset.cpus=0-3 asym
  $ cgset -r cpuset.mems=0 asym
  $ cgset -r cpuset.cpu_exclusive=1 asym

  $ cgcreate -g cpuset:smp
  $ cgset -r cpuset.cpus=4-5 smp
  $ cgset -r cpuset.mems=0 smp
  $ cgset -r cpuset.cpu_exclusive=1 smp

  $ cgset -r cpuset.sched_load_balance=0 .

  $ echo 0 > /sys/devices/system/cpu/cpu4/online
  $ echo 0 > /sys/devices/system/cpu/cpu5/online

Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
DhineshCool pushed a commit that referenced this pull request Dec 7, 2024
This patch implements deduplication feature in zram. The purpose
of this work is naturally to save amount of memory usage by zram.

Android is one of the biggest users to use zram as swap and it's
really important to save amount of memory usage. There is a paper
that reports that duplication ratio of Android's memory content is
rather high [1]. And, there is a similar work on zswap that also
reports that experiments has shown that around 10-15% of pages
stored in zswp are duplicates and deduplicate them provides some
benefits [2].

Also, there is a different kind of workload that uses zram as blockdev
and store build outputs into it to reduce wear-out problem of real
blockdev. In this workload, deduplication hit is very high due to
temporary files and intermediate object files. Detailed analysis is
on the bottom of this description.

Anyway, if we can detect duplicated content and avoid to store duplicated
content at different memory space, we can save memory. This patch
tries to do that.

Implementation is almost simple and intuitive but I should note
one thing about implementation detail.

To check duplication, this patch uses checksum of the page and
collision of this checksum could be possible. There would be
many choices to handle this situation but this patch chooses
to allow entry with duplicated checksum to be added to the hash,
but, not to compare all entries with duplicated checksum
when checking duplication. I guess that checksum collision is quite
rare event and we don't need to pay any attention to such a case.
Therefore, I decided the most simplest way to implement the feature.
If there is a different opinion, I can accept and go that way.

Following is the result of this patch.

Test result #1 (Swap):
Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18

orig_data_size: 145297408
compr_data_size: 32408125
mem_used_total: 32276480
dup_data_size: 3188134
meta_data_size: 1444272

Last two metrics added to mm_stat are related to this work.
First one, dup_data_size, is amount of saved memory by avoiding
to store duplicated page. Later one, meta_data_size, is the amount of
data structure to support deduplication. If dup > meta, we can judge
that the patch improves memory usage.

In Adnroid, we can save 5% of memory usage by this work.

Test result #2 (Blockdev):
build the kernel and store output to ext4 FS on zram

<no-dedup>
Elapsed time: 249 s
mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0

<dedup>
Elapsed time: 250 s
mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038  3945792

There is no performance degradation and save 23% memory.

Test result #3 (Blockdev):
copy android build output dir(out/host) to ext4 FS on zram

<no-dedup>
Elapsed time: out/host: 88 s
mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0

<dedup>
Elapsed time: out/host: 100 s
mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 80880336

It shows performance degradation roughly 13% and save 24% memory. Maybe,
it is due to overhead of calculating checksum and comparison.

Test result #4 (Blockdev):
copy android build output dir(out/target/common) to ext4 FS on zram

<no-dedup>
Elapsed time: out/host: 203 s
mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0

<dedup>
Elapsed time: out/host: 201 s
mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336

Memory is saved by 42% and performance is the same. Even if there is overhead
of calculating checksum and comparison, large hit ratio compensate it since
hit leads to less compression attempt.

I checked the detailed reason of savings on kernel build workload and
there are some cases that deduplication happens.

1) *.cmd
Build command is usually similar in one directory so content of these file
are very similar. In my system, more than 789 lines in fs/ext4/.namei.o.cmd
and fs/ext4/.inode.o.cmd are the same in 944 and 938 lines of the file,
respectively.

2) intermediate object files
built-in.o and temporary object file have the similar contents. More than
50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o.

3) vmlinux
.tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boo/compressed/vmlinux.bin
have the similar contents.

Android test has similar case that some of object files(.class
and .so) are similar with another ones.
(./host/linux-x86/lib/libartd.so and
./host/linux-x86-lib/libartd-comiler.so)

Anyway, benefit seems to be largely dependent on the workload so
following patch will make this feature optional. However, this feature
can help some usecases so is deserved to be merged.

[1]: MemScope: Analyzing Memory Duplication on Android Systems,
dl.acm.org/citation.cfm?id=2797023
[2]: zswap: Optimize compressed pool memory utilization,
lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2

Change-Id: I8fe80c956c33f88a6af337d50d9e210e5c35ce37
Reviewed-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
Link: https://lore.kernel.org/patchwork/patch/787162/
Patch-mainline: linux-kernel@ Thu, 11 May 2017 22:30:26
Signed-off-by: Charan Teja Reddy <[email protected]>
Signed-off-by: Marco Zanin <[email protected]>
Signed-off-by: snnbyyds <[email protected]>
@DhineshCool DhineshCool force-pushed the unity branch 2 times, most recently from d1efd6c to 77fc80b Compare February 20, 2025 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.