Skip to content

Commit eb2f0eb

Browse files
author
Gal Ben David
authored
V0.6.0 (#4)
- added file_name rules to support both matching file content and file names - code refactoring - implemented ContentRule and FileNameRule. Better code arrangement.
1 parent 1cdbb65 commit eb2f0eb

File tree

6 files changed

+497
-251
lines changed

6 files changed

+497
-251
lines changed

README.md

Lines changed: 33 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -79,19 +79,31 @@ This class holds all the added rules for fast reuse.
7979

8080

8181
```python
82-
def add_rule(
82+
def add_content_rule(
8383
self,
8484
name: str,
85-
match_pattern: str,
86-
match_whitelist_patterns: typing.List[str],
87-
match_blacklist_patterns: typing.List[str],
85+
regex_pattern: str,
86+
whitelist_regex_patterns: typing.List[str],
87+
blacklist_regex_patterns: typing.List[str],
8888
) -> None
8989
```
90-
The `add_rule` function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name.
90+
The `add_content_rule` function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name. Content rule means that the regex pattern would be tested against the content of the files.
9191
- `name` - The name of the rule so it can be identified.
92-
- `match_pattern` - The regex pattern (RE2 syntax) to match against the content of the commited files.
93-
- `match_whitelist_patterns` - A list of regex patterns (RE2 syntax) to match against the content of the committed file to filter in results. Only one of the patterns should be matched to pass through the result. There is an OR relation between the patterns.
94-
- `match_blacklist_patterns` - A list of regex patterns (RE2 syntax) to match against the content of the committed file to filter out results. Only one of the patterns should be matched to omit the result. There is an OR relation between the patterns.
92+
- `regex_pattern` - The regex pattern (RE2 syntax) to match against the content of the commited files.
93+
- `whitelist_regex_patterns` - A list of regex patterns (RE2 syntax) to match against the content of the committed file to filter in results. Only one of the patterns should be matched to pass through the result. There is an OR relation between the patterns.
94+
- `blacklist_regex_patterns` - A list of regex patterns (RE2 syntax) to match against the content of the committed file to filter out results. Only one of the patterns should be matched to omit the result. There is an OR relation between the patterns.
95+
96+
97+
```python
98+
def add_file_name_rule(
99+
self,
100+
name: str,
101+
regex_pattern: str,
102+
) -> None
103+
```
104+
The `add_file_name_rule` function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name. File name rule means that the regex pattern would be tested against the file names.
105+
- `name` - The name of the rule so it can be identified.
106+
- `regex_pattern` - The regex pattern (RE2 syntax) to match against the file names of the commited files.
95107

96108

97109
```python
@@ -119,7 +131,7 @@ def scan(
119131
self,
120132
repository_path: str,
121133
branch_glob_pattern: '*',
122-
from_timestamp: int,
134+
from_timestamp: int = 0,
123135
) -> typing.List[typing.Dict[str, str]]
124136
```
125137
The `scan` function is the main function in the library. Calling this function would trigger a new scan that would return a list of matches. The scan function is a multithreaded operation, that would utilize all the available core in the system. The results would not include the file content but only the regex matching group. To retrieve the full file content one should take the `results['oid']` and to call `get_file_content` function.
@@ -162,11 +174,19 @@ import pyrepscan
162174
grs = pyrepscan.GitRepositoryScanner()
163175

164176
# Adds a specific rule, can be called multiple times or none
165-
grs.add_rule(
177+
grs.add_content_rule(
166178
name='First Rule',
167-
match_pattern=r'''(-----BEGIN PRIVATE KEY-----)''',
168-
match_whitelist_patterns=[],
169-
match_blacklist_patterns=[],
179+
regex_pattern=r'(-----BEGIN PRIVATE KEY-----)',
180+
whitelist_regex_patterns=[],
181+
blacklist_regex_patterns=[],
182+
)
183+
grs.add_file_name_rule(
184+
name='Second Rule',
185+
regex_pattern=r'.+\.pem',
186+
)
187+
grs.add_file_name_rule(
188+
name='Third Rule',
189+
regex_pattern=r'(prod|dev|stage).+key',
170190
)
171191

172192
# Add file extensions to ignore during the search
@@ -189,7 +209,6 @@ grs.add_ignored_file_path(
189209
results = grs.scan(
190210
repository_path='/repository/path',
191211
branch_glob_pattern='*',
192-
from_timestamp=0,
193212
)
194213

195214
# Results is a list of dicts. Each dict is in the following format:

setup.cfg

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[aliases]
2+
test=pytest
3+
4+
[tool:pytest]
5+
addopts = --tb=native -s

setup.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
setuptools.setup(
77
name='PyRepScan',
8-
version='0.5.2',
8+
version='0.6.0',
99
author='Gal Ben David',
1010
author_email='[email protected]',
1111
url='https://github.com/intsights/PyRepScan',
@@ -25,8 +25,12 @@
2525
keywords='git repository leaks scanner detector libgit2 re2 c++',
2626
python_requires='>=3.6',
2727
zip_safe=False,
28+
setup_requires=[
29+
'pytest-runner',
30+
],
2831
tests_require=[
2932
'gitpython',
33+
'pytest',
3034
],
3135
package_data={},
3236
include_package_data=True,

0 commit comments

Comments
 (0)