-
-
Notifications
You must be signed in to change notification settings - Fork 241
feat: Match fingerprints by instruction filters #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
feat: Match fingerprints by instruction filters #329
Conversation
Still a work in progress. But so far works well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should be updated to match the new usage.
84739dd
to
98c98e2
Compare
98c98e2
to
7547319
Compare
… `@context` usage, simplify instruction filter block calls.
7ef1357
to
4d38837
Compare
57a5370
to
319a8a7
Compare
src/main/kotlin/app/revanced/patcher/patch/BytecodePatchContext.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/app/revanced/patcher/patch/BytecodePatchContext.kt
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,633 @@ | |||
@file:Suppress("unused") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's multiple issues I have with this file and still confused from past discussions.
-
If InstructionFilter filters instructions as it implies, it is not supposed to have context about methods or classes, but instructions, regardless of the reason behind it. If context about classes or methods is needed, "InstructionFilter" needs to have a different name as it is something else than what the current name implies.
-
There are currently some existing filters implemented in Patcher under assumptions for relevance that should not be taken. Today field references, method calls, consts, and object instantiations may be relevant, tomorrow class references, string values, or other things that would require changing this file to adjust to a new assumption. APIs shouldn't be offered based on assumptions. Based on the current assumptions, you will restrict someone to current existing filters to avoid implementing the interface for whatever filter they need. Instead, an universal filter API should be offered that does not make any assumptions of what might or might not be useful and leave this decision to the API consumer. That said, those filters can be implemented somewhere in a separate module, outside of the patcher module, but can't be part of the patcher module just based on assumptions of relevance.
-
Comments should follow the current style. Constructor parameters should be commented as @param in an inline comment for the constructor. Constructors should start with a sentence explaining what the class is/does. Currently, some just jump to examples (such as in LiteralFilter)
-
Currently no DSL api is present, even though the patches & fingerprint API is currently fully DSL. Something like this would be acceptable:
fingerprint { instructions { field(...) Opcode.X() Opcode.Y() "string"() 123() } }
However with the filter API the current simple usage of opcode patterns now involves more boilerplate. Before:
fingerprint { opcodes(Opcode.X, Opcode.Y, ...) }
fingerprint { Opcode.X() Opcode.Y() ...() }
Every filter is added via invocation which has to be done as many times as many opcodes exist, a linear amount of times.
An alternative API would be
fingerprint { instructions { opcodes(Opcode.X, Opcode.Y) // Also works for one field() string("") .... } }
-
Filters look to me more like something suitable in custom block rather than replacing the opcode pattern. As explained in another review comment, fingerprints image a method, filters are not a direct attribute of a method making them suitable at most in custom (where also context about class and method both exist furthermore showing evidence of being a suitable place). Not sure how you'd want to pull that off, but replacing a direct attribute a fingerprint can image is something to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will always be the usage of method filters, field, and literal constants.
It's up to the patches to declare new filters if desired. Such as ResourceMappingPatch declaring it's own filter that finds decoded resource literals.
Instruction filter no longer has a classDef parameter. The instruction method is passed as a parameter and most filters don't use it, but some require it to check how many indexes the method has and others to parse out the enclosing class.
It's important to note this is not a custom block replacement. It's checking the instruction on a more fine grained scale, and checking the ordering of the instructions, and it produces a list of indexes for those matches that is then used by the patch itself. There should be little to no usage of method.indexOfFirst(previousIndex) { /* do checking here */ }
, as these checks are now part of the fingerprint itself.
With just opcodes you only get patternMatchResults
which is the start/end. But with instruction filters you get indexes of each filter since there can be variable spacing between each filter.
This is an expansion of what opcodes previously did, which is why opcode filters still exist and can still be used.
Previously with only opcode method calls you could only declare invoke_interface
, but there is no way to indicate what it's invoking, especially when it's a non obfuscated class such as List.add()
. Now you can declare more specific usage of these opcodes and not just patterns which are fragile, can match completely unrelated stuff, and frequently break when just a single extra register move opcode is added by the compiler.
DSL style declarations can be added, that's not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a simple example, where all the index searching was previously in the patch execute:
Before:
internal val shortsBottomBarContainerFingerprint = fingerprint {
accessFlags(AccessFlags.PUBLIC, AccessFlags.FINAL)
returns("V")
parameters("Landroid/view/View;", "Landroid/os/Bundle;")
strings("r_pfvc")
literal { bottomBarContainer }
}
shortsBottomBarContainerFingerprint.method.apply {
// Search for indexes after the fact, after the fingerprint already resolved.
// First instruction of interest.
val resourceIndex = indexOfFirstLiteralInstruction(bottomBarContainer)
// Second instruction of interest.
val index = indexOfFirstInstructionOrThrow(resourceIndex) {
getReference<MethodReference>()?.name == "getHeight"
}
// Third instruction of interest.
val heightRegister = getInstruction<OneRegisterInstruction>(index + 1).registerA
addInstructions(
index + 2,
"""
invoke-static { v$heightRegister }, $FILTER_CLASS_DESCRIPTOR->getNavigationBarHeight(I)I
move-result v$heightRegister
"""
)
}
Now the indexOfFirst() logic is in the fingerprint itself:
internal val shortsBottomBarContainerFingerprint by fingerprint {
accessFlags(AccessFlags.PUBLIC, AccessFlags.FINAL)
returns("V")
parameters("Landroid/view/View;", "Landroid/os/Bundle;")
strings("r_pfvc")
instructions(
// First instruction of interest.
ResourceMappingFilter("id", "bottom_bar_container"),
// Here lies other unrelated instructions.
// Second instruction of interest.
MethodFilter(methodName = "getHeight"),
// Third instruction of interest.
OpcodeFilter(Opcode.MOVE_RESULT)
)
}
shortsBottomBarContainerFingerprint.let {
it.method.apply {
val index = it.filterMatches.last().index
val heightRegister = getInstruction<OneRegisterInstruction>(targetIndex).registerA
addInstructions(
index + 1,
"""
invoke-static { v$heightRegister }, $FILTER_CLASS_DESCRIPTOR->getNavigationBarHeight(I)I
move-result v$heightRegister
"""
)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstructionFilter
can be renamed, but I'm unsure what other name to use.
MethodFilter
might be more appropriately named MethodCallFilter
, since it matches method calls based on specifics of that call.
FieldFilter
could be renamed to something like FieldAccessFilter
since it matches iget
, iput
, sget
, sput
, etc.
Edit: Renamed to MethodCallFilter
and FieldAccessFilter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will always be the usage of method filters, field, and literal constants.
It's up to the patches to declare new filters if desired. Such as ResourceMappingPatch declaring it's own filter that finds decoded resource literals.
While the current filters and functions like addInstructions
are useful for certain cases, they can't cover every scenario. Assuming that filters will always be necessary is a flawed approach, as there may be situations where none of the existing filters are suitable. These utilities are based on assumptions of common usage, but this can limit flexibility.
The same applies to addInstructions
. Its essentially just adding an item to a list, but introducing this specific functionality as an extension function is an overspecialization that conflicts with the goal of maintaining a generic library. Filters should follow the same principle: while the interface for filters is generic, providing a predefined set of filters creates unnecessary constraints. I’d prefer to move the actual filter implementations to an external module, ideally separate from the patcher repo.
Instruction filter no longer has a classDef parameter. The instruction method is passed as a parameter and most filters don't use it, but some require it to check how many indexes the method has and others to parse out the enclosing class.
An instruction filter should only have context about the instructions. Bringing the method into this context is problematic in terms of abstraction. A filter for instructions relying on a method does not sound right. If somehow it is necessary, it means you need to rethink what "instruction filters" actually are. Perhaps they are more than just that given that you need context about the method.
This is an expansion of what opcodes previously did, which is why opcode filters still exist and can still be used.
There should be one clear way to handle instruction fingerprinting. If we’re moving forward with filters, the old opcode filter approach should be replaced with the new approach and reimplemented in it if necessary.
Now you can declare more specific usage of these opcodes and not just patterns which are fragile, can match completely unrelated stuff, and frequently break when just a single extra register move opcode is added by the compiler.
I think here it also shows that there is a specialization in one direction that is assumed to be likely useful. However, it is nonetheless a specialization that shouldn't happen in a library context that is supposed to be abstract. An example is that you can now filter for method references, but how about filtering for the field type only in field references? You'd now ask to implement the interface to satisfy this situation and would have failed to provide an universal API via the existing filters implementations, because, albeit being likely useful, they are after all specialized for specific usecases.
val index = it.filterMatches.last().index
Regarding the API, it can also be useful, if you can declare a filter so that you can reference it later on. This avoids having to rely on the index of filterMatches. In your example it could look like that:
val opcodeFilter = OpcodeFilter(Opcode.MOVE_RESULT)
internal val shortsBottomBarContainerFingerprint by fingerprint {
accessFlags(AccessFlags.PUBLIC, AccessFlags.FINAL)
returns("V")
parameters("Landroid/view/View;", "Landroid/os/Bundle;")
strings("r_pfvc")
instructions(
// First instruction of interest.
ResourceMappingFilter("id", "bottom_bar_container"),
// Here lies other unrelated instructions.
// Second instruction of interest.
MethodFilter(methodName = "getHeight"),
// Third instruction of interest.
opcodeFilter()
)
}
shortsBottomBarContainerFingerprint.let {
it.method.apply {
val index = iopcodeFilter.index // Notice the reference to the val opcodeFilter
val heightRegister = getInstruction<OneRegisterInstruction>(targetIndex).registerA
addInstructions(
index + 1,
"""
invoke-static { v$heightRegister }, $FILTER_CLASS_DESCRIPTOR->getNavigationBarHeight(I)I
move-result v$heightRegister
"""
)
}
}
InstructionFilter can be renamed, but I'm unsure what other name to use.
After all you don't just filter based on the instructions. This is the reason I had initially assumed it to be similar to the custom block of fingerprints. For that reason, a different name is needed. Can you explain why you need anything else than instructions to filter instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the API, it can also be useful, if you can declare a filter so that you can reference it later on. This avoids having to rely on the index of filterMatches. In your example it could look like that:
The example you list is exactly how this works right now, and the code shown is taken from the patches PR of this change.
Assuming that filters will always be necessary is a flawed approach, as there may be situations where none of the existing filters are suitable.
The built in filters are no different than the previous opcode declarations, except the filters allow declaring more of the opcode call.
Before the opcode declarations were very limited such as INVOKE_STATIC
where its literally any static call with no way to restrict to specific defining classes, method names, return types, or parameters. Now you can specify what the method call is, such as methodCall(definingClass = "Ljava/lang/String;", name = "toString")
which makes fingerprinting way simpler and much more precise.
The JVM opcode instruction set is never going to remove any of the opcodes used for the built in filters, so there is no assumption of the built in filters becoming outdated.
I don't see any reason to make a different project just for the 9 built in filters, especially since declaring more precise method and field access calls are very basic. Making a separate project means a project needs to import both patcher and the basic declaration of filters, which means they're not really separate projects and should be one project.
If a patch wants to declare some very specific instruction filter they can do that in their own project (just as the RV Patches has it's own resourceLiteral
fingerprint.
Of note, with this PR I added YT 20.02 support in only a few minutes because the small changes in opcode patterns no longer break fingerprints, since the instruction fingerprinting allows picking out instructions that are common between all versions and then ignore all the junk opcodes such as register moves.
but how about filtering for the field type only in field references?
You can already do exactly that with this PR. A field access can declare any part of the access, and leave out the parts that are obfuscated or it's desired to ignore. Such as: fieldAccess(type = "Ljava/lang/String;")
Can you explain why you need anything else than instructions to filter instructions?
The enclosing method is needed for methodCall()
andfieldAccess()
to use the this
keyword, since it's impossible to declare an obfuscated class for the method/field access but using this
can be used to indicate it's a call to the enclosing class. The declaring class (which is part of the method object) is needed for support the functionality. The declaring method is also used by lastInstruction()
since it needs to know how many instructions are in the enclosed method. I don't see any issues with passing along the enclosing method of the instruction as it allows more flexibility such as here.
There should be one clear way to handle instruction fingerprinting. If we’re moving forward with filters, the old opcode filter approach should be replaced with the new approach and reimplemented in it if necessary.
I deprecated the old opcodes()
method, but unless someone wants to spend the possible 10+ hours updating all the old code (I definitely do not want to), then it's much easier and more reliably to deprecate but still support the old patches code.
…ters # Conflicts: # docs/2_2_0_patch_anatomy.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review, but can be looked at until the next review batch.
@@ -108,4 +108,4 @@ val resources = patcherResult.resources | |||
|
|||
The next page teaches the fundamentals of ReVanced Patches. | |||
|
|||
Continue: [🧩 Introduction to ReVanced Patches](2_patches_intro.md) | |||
Continue: [🧩 Introduction to ReVanced Patches](2_0_0_patches_intro.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for naming the files like this? If they are renamed, they should be named like this in the other repos where we have the same kind of docs as well for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed them so the files are sorted by number, and they show in the same order as presented.
https://github.com/ReVanced/revanced-patcher/blob/main/docs/2_patches_intro.md
Notice on the left file selector, 2 comes after 2_1, which is confusing when navigating thru the docs.
But if they're renamed, the file sorting is correct and matches the naviation:
https://github.com/ReVanced/revanced-patcher/blob/f675f865bc435dcd149a8602582a12785190a385/docs/2_0_0_patches_intro.md
The other repos docs could be renamed if any have the same inconsistent file ordering compared it's navigation order.
@@ -85,7 +85,7 @@ val disableAdsPatch = bytecodePatch( | |||
// Business logic of the patch to disable ads in the app. | |||
execute { | |||
// Fingerprint to find the method to patch. | |||
val showAdsFingerprint = fingerprint { | |||
val showAdsFingerprint by fingerprint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The by API is undocumented anywhere. I had to lookup the source code to understand its existence or difference and when to use = or by. How does the new API in this PR behave differently from current dev?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'by' semantics allows capturing the fingerprint name (showAdsFingerprint
) and the name is then shown in stack traces when fingerprints don't resolve.
This was briefly discussed in one of the other conversations in this PR.
Without this PR, it's more difficult to tell which fingerprint failed to resolve. And for some code it's impossible to disambiguate which fingerprint failed such as looping over multiple fingerprints:
SEVERE: "GmsCore support" failed:
app.revanced.patcher.patch.PatchException: The patch "GmsCore support" depends on "ResourcePatch", which raised an exception:
app.revanced.patcher.patch.PatchException: The patch "ResourcePatch" depends on "BytecodePatch", which raised an exception:
app.revanced.patcher.patch.PatchException: Failed to match the fingerprint: app.revanced.patcher.Fingerprint@3574481d
at app.revanced.patcher.Fingerprint.getException(Fingerprint.kt:254)
at app.revanced.patcher.Fingerprint.getMatch(Fingerprint.kt:263)
at app.revanced.patcher.Fingerprint.getMethod(Fingerprint.kt:392)
at app.revanced.patches.youtube.misc.gms.AccountCredentialsInvalidTextPatchKt.accountCredentialsInvalidTextPatch$lambda$6$lambda$5(AccountCredentialsInvalidTextPatch.kt:63)
And with this change the problem is immediately clear:
SEVERE: "GmsCore support" failed:
app.revanced.patcher.patch.PatchException: The patch "GmsCore support" depends on "ResourcePatch@1521568953", which raised an exception:
app.revanced.patcher.patch.PatchException: The patch "ResourcePatch@1521568953" depends on "BytecodePatch@1059300256", which raised an exception:
app.revanced.patcher.patch.PatchException: Failed to match the fingerprint: specificNetworkErrorViewControllerFingerprint
at app.revanced.patcher.Fingerprint.patchException(Fingerprint.kt:280)
at app.revanced.patcher.Fingerprint.match(Fingerprint.kt:292)
at app.revanced.patcher.Fingerprint.getInstructionMatches(Fingerprint.kt:457)
at app.revanced.patches.youtube.misc.gms.AccountCredentialsInvalidTextPatchKt.accountCredentialsInvalidTextPatch$lambda$3$lambda$2(AccountCredentialsInvalidTextPatch.kt:37)
docs/2_2_1_fingerprinting.md
Outdated
If a single instruction varies slightly between different app targets but otherwise the fingerprint | ||
is still the same, the `anyInstruction()` wrapper can be used to specify variations of the | ||
same instruction. Such as: | ||
`anyInstruction(string("string in early app target"), string("updated string in latest app target"))` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd API. I think, instruction filters should be generified into a lambda where the user can evaluate however they want, such as:
fingerprint {
// this == FingerprintBuilder
instruction {
// this == InstructionFilterBuilder
}
}
This would define the basis of revanced patcher API.
A module with specific abstractions can be implemented such as:
InstructionFilterBuilder.referencesString("")
InstructionFilterBuilder.opcode(Jumbo)
as well as
FingerprintBuilder.instructionReferencesString("")
FingerprintBuilder.instructionOpcode(Jumbo)
Now the patcher module would be generic to any specific use case thanks to the open "lambda" API providing full generic context about a method and a separate module, which users of the patcher library can depend on, can add support for specific scenarios via extension APIs. Adding this extension module would add APIs such as:
fingerprint {
// this == FingerprintBuilder
instructionReferencesString("")
instruction {
// this == InstructionFilterBuilder
referencesString("")
}
}
As you can see, patcher module is kept generic to any specific use case providing a generic API without over specializing, while a separate, optional, "handy" module provides ready crafted APIs if the user of the patcher library wishes to use. Please reflect this architectural suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with allowing anything like that, is users will be writing similar logic of what's already here (method calls, constant strings/numbers, field references).
All use cases are already covered here with the filters provided. Anything custom can be either a custom inline declaration (as described above), or thru a custom filter class (which is what resource mapping patch does, where it's an subclass of literal()
that uses a resource mapping literal).
Adds instruction filters to support more flexible instruction fingerprinting.
Changes
Fingerprints can still use opcode patterns, or they can use instruction filters that allow more precise matching.
Basic support exists for matching instructions using:
Projects can define their own custom instruction filters, such as ResourceMappingPatch with it's own kind of LiteralFilter that matches resource literal values (no more mucking about with using a ResourcePatches to first set a resource value a fingerprint then uses).
Variable space allowed between instruction filters
By default, all filters allow any amount of space between them. But if filters are always immediately after each other, or there is a rough estimate of the maximum number of indexes until the next instruction, then a maximum distance can be set. An example is using an opcode filter of
MOVE_RESULT
orMOVE_RESULT_OBJECT
after a method call, where the max instruction spacing is always 0.Breaking changes
Fuzzy pattern match is now obsolete, as it's functionality is now part of the filtering itself. Variable spacing is allowed between instruction filters, and non important instructions are now ignored by simply not defining instruction filters for them.
Fingerprints are now declared using
by
semantics.Before:
internal val shortsBottomBarContainerFingerprint = fingerprint {
After:
internal val shortsBottomBarContainerFingerprint by fingerprint {
If a fingerprint fails to resolve, the stack traces now includes the fingerprint name.
Example fingerprint before and after this change
Before:
Now the indexOfFirst() logic is in the fingerprint itself: