Performs extraction of RFC822 e-mail and attachments. Produces "raw" artifacts for further classification.
Minimalistic, use only the standard "email" python package.
Consumes:
{
"type": "sample",
"stage": "recognized",
"extension": "eml"
},
{
"type": "sample",
"payload": {
"magic": "SMTP mail,*"
},
"mime": [
"message/rfc822"
]
}
Produces:
{
"type": "sample",
"kind": "raw"
}
First of all, make sure you have setup the core system: https://github.com/CERT-Polska/karton
Do not forget to add your karton.ini in this folder.
Then, simply install the Karton dependency and run it.
$ python3 -m venv venv && source venv/bin/activate
$ pip install -r requirements.txt
$ python3 karton-email-extractor.pyIn theory the sflock used by karton-archive-extractor can extract eml, but there is some hardcoded stuff that I dont like in it:
- It does not extract the attachments if the filename is empty.
- "text/plain" and "text/html" are not extracted beacause hardcoded in a whitelist.
- It decodes from Latin-1.
I also prefer to not extract images files, to limit the volume of data produced.