Skip to content

Commit 20aee02

Browse files
Merge branch 'main' into gsoc-report-25
2 parents ab0eb74 + bfbe7b3 commit 20aee02

File tree

6 files changed

+231
-0
lines changed

6 files changed

+231
-0
lines changed
179 KB
Loading
1.91 MB
Loading
90.3 KB
Loading
114 KB
Loading

docs/source/archive/gsoc-toc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ GSoC 2025
1515
:maxdepth: 2
1616

1717
gsoc/reports/2025/scancode_toolkit_alok
18+
gsoc/reports/2025/vulnerablecode_michael
1819

1920
GSoC 2024
2021
---------
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
VulnerableCode: On-demand live evaluation of packages
2+
=====================================================
3+
4+
Organization - `AboutCode <https://www.aboutcode.org>`_
5+
-----------------------------------------------------------
6+
| **Michael Ehab Mikhail**
7+
| GitHub: `michaelehab <https://github.com/michaelehab>`_
8+
| LinkedIn: `@michaelehab16 <https://www.linkedin.com/in/michaelehab16/>`_
9+
| Project: `VulnerableCode
10+
<https://github.com/aboutcode-org/vulnerablecode>`_
11+
| Official GSoC project page: `Project Link
12+
<https://summerofcode.withgoogle.com/programs/2025/projects/uF0kzMAg>`_
13+
| GSoC Proposal: `Proposal Link
14+
<https://docs.google.com/document/d/1Tkk4MoPWXFj9r_U5cp3E4AhJW6QlHxTElyzpII_f4LM/edit?usp=sharing>`_
15+
16+
Overview
17+
--------
18+
19+
VulnerableCode traditionally relied on **batch importers** to fetch
20+
and store all advisories from a source at once. While effective for
21+
building complete databases, batch importers are slow and
22+
resource-heavy for developers who only need vulnerability
23+
data for a **single package**.
24+
25+
This project introduces **live importers**, a new class of
26+
importers that operate in a *package-first* mode. Instead of
27+
pulling all advisories, they run against a single
28+
PackageURL (PURL), returning only the advisories affecting
29+
that package. This makes vulnerability evaluation
30+
**faster, more efficient, and more personalized**, since the
31+
database is gradually filled with only the advisories
32+
that matter to each user.
33+
34+
To support this, I added:
35+
36+
* A new **LIVE_IMPORTERS_REGISTRY** that tracks available live importers.
37+
* A new **API endpoint** that accepts a **PURL**, enqueues compatible
38+
live importer pipelines into a Redis queue, and executes them asynchronously
39+
via workers.
40+
* Integration with **VulnTotal** and its **browser extension**, enabling users
41+
to evaluate packages in real-time through a seamless interface.
42+
43+
This work bridges the gap between **batch-first databases** and
44+
**package-first queries**, improving VulnerableCode's flexibility and enabling
45+
better integration with developer workflows.
46+
47+
.. note::
48+
A PURL (Package URL) is a universal way to identify and locate software
49+
packages. `More on PURL <https://github.com/package-url/purl-spec>`_
50+
51+
52+
Project Design and Architecture
53+
-------------------------------
54+
55+
The new live importers system builds on existing batch importers, while introducing
56+
a parallel registry and asynchronous execution model for package-first runs.
57+
58+
Importer Registries
59+
^^^^^^^^^^^^^^^^^^^
60+
61+
* ``IMPORTERS_REGISTRY`` continues to hold batch importers (V1/V2).
62+
* ``LIVE_IMPORTERS_REGISTRY`` holds live importers.
63+
64+
Each live importer:
65+
66+
* Inherits from its batch importer (when logic can be reused), or directly
67+
from ``VulnerableCodeBaseImporterPipelineV2`` when a separate
68+
implementation is needed.
69+
* Declares a ``supported_types`` array, defining compatible package
70+
ecosystems (``"pypi"``, ``"npm"``, ``"maven"``, ``"generic"``, etc).
71+
* Implements a package-first ``collect_advisories()`` method, which
72+
restricts results to advisories relevant to the given PURL.
73+
74+
Live importer executions are asynchronous: once triggered, they are placed in
75+
a Redis-backed job queue and processed by dedicated workers. This prevents
76+
blocking the main API thread and allows multiple evaluations to run safely
77+
in parallel.
78+
79+
.. figure:: /_static/gsoc2025/vulnerablecode_michael/registries.png
80+
:alt: Class architecture of importers registries
81+
:align: center
82+
:width: 70%
83+
84+
Class architecture showing relationship between ``IMPORTERS_REGISTRY`` and
85+
``LIVE_IMPORTERS_REGISTRY``.
86+
87+
API Endpoint
88+
^^^^^^^^^^^^
89+
90+
The new API endpoint is responsible for handling live evaluation requests.
91+
92+
* Input:
93+
94+
* ``purl`` (required)
95+
* Execution:
96+
97+
* Checks ``LIVE_IMPORTERS_REGISTRY`` for importers whose ``supported_types``
98+
match the PURL.
99+
* Enqueues the pipelines runs of these live importers in a ``live`` rq.
100+
* Returns the **Live Run ID**, information about the pipelines to
101+
run, and the status url.
102+
* The status URL shows the current state of a live evaluation run
103+
and its individual pipeline runs.
104+
105+
* Output:
106+
107+
* Once workers complete execution, the resulting advisories are imported
108+
into the database and exposed as JSON through the status endpoint.
109+
110+
.. figure:: /_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png
111+
:alt: Live Pipeline Run Class
112+
:align: center
113+
:width: 70%
114+
115+
Live Pipeline Run Class and how it groups multiple PipelineRuns.
116+
117+
.. figure:: /_static/gsoc2025/vulnerablecode_michael/api.png
118+
:alt: Live Importers API request flow
119+
:align: center
120+
:width: 70%
121+
122+
Flow of API endpoint: selecting compatible live importers and executing
123+
them in parallel.
124+
125+
Integration with VulnTotal
126+
^^^^^^^^^^^^^^^^^^^^^^^^^^
127+
128+
The new API was integrated into VulnTotal as an optional datasource:
129+
130+
* VulnTotal now checks the local environment for
131+
``VCIO_HOST``, ``VCIO_PORT``, and ``ENABLE_LIVE_EVAL`` flags in ``.env``.
132+
* If enabled, VulnTotal queries VulnerableCode in package-first mode.
133+
* This allows VulnTotal to use both its proprietary datasources **and**
134+
the user's gradually built local database, improving coverage and
135+
personalization.
136+
137+
Integration with VulnTotal Browser Extension
138+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
139+
140+
The VulnTotal browser extension was updated to support live importers:
141+
142+
* Users can enable the "Local VulnerableCode" datasource and live evaluation option.
143+
* When enabled, package lookups are forwarded to the new API, retrieving
144+
advisories in real-time.
145+
* This reduces setup effort—developers can get live vulnerability checks
146+
directly in their browser, provided they have a local VC instance.
147+
148+
.. figure:: /_static/gsoc2025/vulnerablecode_michael/extension_demo.gif
149+
:alt: Live evaluation demo in VulnTotal browser extension
150+
:align: center
151+
:width: 70%
152+
153+
VulnTotal and its browser extension consuming the new live evaluation API.
154+
155+
Linked Pull Requests
156+
--------------------
157+
158+
.. list-table::
159+
:widths: 10 40 20
160+
:header-rows: 1
161+
162+
* - Sr. no
163+
- Name
164+
- Link
165+
* - 1
166+
- Add Live Evaluation API endpoint and PyPa live pipeline importer
167+
- `aboutcode-org/vulnerablecode#1969
168+
<https://github.com/aboutcode-org/vulnerablecode/pull/1969>`_
169+
* - 2
170+
- Add Gitlab Live V2 Importer
171+
- `aboutcode-org/vulnerablecode#1910
172+
<https://github.com/aboutcode-org/vulnerablecode/pull/1910>`_
173+
* - 3
174+
- Add Curl Live Importer V2
175+
- `aboutcode-org/vulnerablecode#1923
176+
<https://github.com/aboutcode-org/vulnerablecode/pull/1923>`_
177+
* - 4
178+
- Add Elixir Security Live V2 Importer
179+
- `aboutcode-org/vulnerablecode#1935
180+
<https://github.com/aboutcode-org/vulnerablecode/pull/1935>`_
181+
* - 5
182+
- Add NPM Live Importer V2
183+
- `aboutcode-org/vulnerablecode#1941
184+
<https://github.com/aboutcode-org/vulnerablecode/pull/1941>`_
185+
* - 6
186+
- Add GitHub OSV Live V2 Importer Pipeline
187+
- `aboutcode-org/vulnerablecode#1977
188+
<https://github.com/aboutcode-org/vulnerablecode/pull/1977>`_
189+
* - 7
190+
- Add Postgres Live V2 Importer Pipeline
191+
- `aboutcode-org/vulnerablecode#1982
192+
<https://github.com/aboutcode-org/vulnerablecode/pull/1982>`_
193+
* - 8
194+
- Add PySec Live V2 Importer Pipeline
195+
- `aboutcode-org/vulnerablecode#1983
196+
<https://github.com/aboutcode-org/vulnerablecode/pull/1983>`_
197+
* - 9
198+
- Add Local VulnerableCode Datasource in VulnTotal and allow live evaluation
199+
- `aboutcode-org/vulnerablecode#1985
200+
<https://github.com/aboutcode-org/vulnerablecode/pull/1985>`_
201+
* - 10
202+
- Integrate Local VulnerableCode datasource and live evaluation
203+
- `aboutcode-org/vulntotal-extension#17
204+
<https://github.com/aboutcode-org/vulntotal-extension/pull/17>`_
205+
206+
207+
Closing Thoughts
208+
-------------------
209+
210+
This project was an exciting step forward from my 2024 GSoC work. By moving
211+
from batch importers to package-first live importers, We enabled a faster,
212+
more personalized, and more flexible way of building vulnerability databases.
213+
214+
I especially enjoyed designing the **registry + API architecture** and
215+
integrating Redis queues and workers for asynchronous execution. This improved
216+
scalability, responsiveness, and fault tolerance, ensuring the API never blocks
217+
and multiple live evaluations can run in parallel. I also appreciated discussing
218+
it with mentors and integrating it seamlessly across
219+
**VulnerableCode, VulnTotal, and the browser extension**.
220+
221+
This work lays the foundation for even richer interactivity
222+
in the ecosystem and brings vulnerability evaluation closer
223+
to developers' workflows.
224+
225+
I appreciated the weekly status calls and the feedback I received from my
226+
mentors and the amazing team. They were really helpful and supportive.
227+
`Philippe Ombredanne <https://github.com/pombredanne>`_,
228+
`Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_,
229+
`Tushar Goel <https://github.com/TG1999>`_,
230+
`Keshav Priyadarshi <https://github.com/keshav-space>`_

0 commit comments

Comments
 (0)