Skip to content

Commit b6b071a

Browse files
committed
scripts/es-sarif/ for MRVA->SARIF->Elasticsearch
1 parent b43fff2 commit b6b071a

File tree

8 files changed

+1756
-0
lines changed

8 files changed

+1756
-0
lines changed

scripts/es-sarif/.gitignore

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
share/python-wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# PyInstaller
30+
# Usually these files are written by a python script from a template
31+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py,cover
50+
.hypothesis/
51+
.pytest_cache/
52+
cover/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
.pybuilder/
76+
target/
77+
78+
# Jupyter Notebook
79+
.ipynb_checkpoints
80+
81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
85+
# pyenv
86+
# For a library or package, you might want to ignore these files since the code is
87+
# intended to run in multiple environments; otherwise, check them in:
88+
# .python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
#Pipfile.lock
96+
97+
# poetry
98+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99+
# This is especially recommended for binary packages to ensure reproducibility, and is more
100+
# commonly ignored for libraries.
101+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102+
#poetry.lock
103+
104+
# pdm
105+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106+
#pdm.lock
107+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108+
# in version control.
109+
# https://pdm.fming.dev/#use-with-ide
110+
.pdm.toml
111+
112+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113+
__pypackages__/
114+
115+
# Celery stuff
116+
celerybeat-schedule
117+
celerybeat.pid
118+
119+
# SageMath parsed files
120+
*.sage.py
121+
122+
# Environments
123+
.env
124+
.env.local
125+
.env.*.local
126+
.venv
127+
env/
128+
venv/
129+
ENV/
130+
env.bak/
131+
venv.bak/
132+
133+
# Spyder project settings
134+
.spyderproject
135+
.spyproject
136+
137+
# Rope project settings
138+
.ropeproject
139+
140+
# mkdocs documentation
141+
/site
142+
143+
# mypy
144+
.mypy_cache/
145+
.dmypy.json
146+
dmypy.json
147+
148+
# Pyre type checker
149+
.pyre/
150+
151+
# pytype static type analyzer
152+
.pytype/
153+
154+
# Cython debug symbols
155+
cython_debug/
156+
157+
# PyCharm
158+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
159+
# be added to the global gitignore or merged into this project gitignore. For a PyCharm
160+
# project, it is recommended to include/store project specific gitignore file(s) within
161+
# the project root.
162+
# https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
163+
.idea/
164+
165+
# IDE specific files
166+
.vscode/
167+
*.swp
168+
*.swo
169+
*~
170+
171+
# OS specific files
172+
.DS_Store
173+
.DS_Store?
174+
._*
175+
.Spotlight-V100
176+
.Trashes
177+
ehthumbs.db
178+
Thumbs.db
179+
180+
elastic-start-local/
181+
mrva/

scripts/es-sarif/activate.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
# Convenience script to activate the SARIF Elasticsearch Indexer environment
3+
4+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
5+
VENV_DIR="$SCRIPT_DIR/.venv"
6+
7+
if [ ! -d "$VENV_DIR" ]; then
8+
echo "Virtual environment not found. Run setup.sh first."
9+
exit 1
10+
fi
11+
12+
echo "Activating SARIF Elasticsearch Indexer environment..."
13+
echo "Python version: $($VENV_DIR/bin/python --version)"
14+
echo "To deactivate, run: deactivate"
15+
echo
16+
17+
# Start a new shell with the virtual environment activated
18+
exec bash --rcfile <(echo "source $VENV_DIR/bin/activate; PS1='(es-sarif) \u@\h:\w\$ '")
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# SARIF Files Elasticsearch Indexer
2+
3+
This script creates a fresh Elasticsearch index and indexes SARIF 2.1.0 results from multiple SARIF files into it.
4+
5+
## Requirements
6+
7+
- Python 3.11+
8+
- SARIF files conforming to version 2.1.0 specification (such as those produced by `gh mrva`)
9+
- Accessible URLs for running instances of Elasticsearch (aka "es") and Kibana (e.g. via `Quick Setup` below)
10+
11+
## Usage
12+
13+
```bash
14+
python index-sarif-results-in-elasticsearch.py <sarif_files_list.txt> <elasticsearch_index_name>
15+
```
16+
17+
## Input File Format
18+
19+
The SARIF files list should be a plain text file with one relative file path per line:
20+
21+
```text
22+
output_misra-c-and-cpp-default_top-1000/solvespace/solvespace/solvespace_solvespace_18606.sarif
23+
output_misra-c-and-cpp-default_top-1000/solvespace/solvespace/solvespace_solvespace_18607.sarif
24+
# Comments starting with # are ignored
25+
```
26+
27+
**Note**: Paths are resolved relative to the directory containing the list file.
28+
29+
## Quick Setup
30+
31+
1. **Set up Python environment:**
32+
33+
```bash
34+
## Change to the directory that contains this document
35+
cd scripts/es-sarif
36+
bash setup-venv.sh
37+
source .venv/bin/activate
38+
```
39+
40+
1. **Set up Elasticsearch and Kibana with Docker:**
41+
42+
```bash
43+
curl -fsSL https://elastic.co/start-local | sh
44+
```
45+
46+
1. **Run the indexer:**
47+
48+
```bash
49+
## from the `scripts/es-sarif` directory
50+
python index-sarif-results-in-elasticsearch.py mrva/sessions/sarif-files.txt codeql-coding-standards-misra-sarif
51+
```
52+
53+
The `elastic-start-local` setup provides:
54+
55+
- Elasticsearch at `http://localhost:9200`
56+
- Kibana at `http://localhost:5601`
57+
- API key stored in `elastic-start-local/.env` as `ES_LOCAL_API_KEY`
58+
59+
## Example Queries
60+
61+
Search for high-severity results:
62+
63+
```json
64+
GET /codeql-coding-standards-misra-sarif/_search
65+
{
66+
"query": { "term": { "level": "error" } }
67+
}
68+
```
69+
70+
Find results for a specific rule:
71+
72+
```json
73+
GET /codeql-coding-standards-misra-sarif/_search
74+
{
75+
"query": { "term": { "ruleId": "CERT-C-MSC30-C" } }
76+
}
77+
```
78+
79+
## Managing Elasticsearch Services
80+
81+
Control the Docker services:
82+
83+
```bash
84+
cd elastic-start-local
85+
./start.sh # Start services
86+
./stop.sh # Stop services
87+
./uninstall.sh # Remove everything (deletes all data)
88+
```

0 commit comments

Comments
 (0)