Skip to content

Conversation

@kx499-zz
Copy link

This employs both singe value matching and full text extractions (think re.find_all) to support pulling indicators out of bobs of text like email bodies. Additionally it supports indicator validators to assist in removing false positives post regex extraction. It exposes the functions so you can call it separately from an analyzer or automatically from the iterable function. in the iterable function it fist calls check_type, and then if not match it goes on to process the full text regex. This is an iteration of this PR #1

@nadouani
Copy link
Contributor

Hello @kx499 thanks for the PR.

Can you please remove the .DS_Store file.
The code you are submitting is a good candidate for unit tests, could you add some of them to cover your changes?

Thanks

@kx499-zz
Copy link
Author

kx499-zz commented Jun 27, 2019

thanks - will do. I'm not real familiar with unit tests, but I'll work up some tests though

@kx499-zz
Copy link
Author

@nadouani made the updates, let me know what you think and if any other updates are needed.

@kx499-zz
Copy link
Author

Any word on this? It's been a few months so I figured I'd check in

@nadouani nadouani changed the base branch from master to develop August 20, 2019 11:21
@kx499-zz
Copy link
Author

@nadouni is there anything else needed for this? I'm looking to develop/update some analyzers based on this code and was hoping it could either get committed or we could discuss other ways of accomplishing the same

@iwitz
Copy link

iwitz commented Oct 3, 2019

@kx499 Thanks for the PR ! Could you add a closing '>' after the opening '<' in the following line in extractor.py ? Otherwise the closing angle bracket is captured by the regular expression :

ft_r = '(' + \
               '(?:(?:meows?|h[Xxt]{2}ps?)://)?(?:(?:(?:[a-zA-Z0-9\-]+\[?\.\]?)+[a-z]{2,8})' + \
               '|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\[?\.\]?){3}(?:25[0-5]|2[0-4][0-9]' + \
               '|[01]?[0-9][0-9]?))/[^\s\<>"]+' + \
                ')'

(this is the modified line. If there is a bracket in the URL it may stop capturing the URL early though)

Also, @nadouani it'd be fantastic if you could have a look at this or at PR #1 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants