|
| 1 | +VulnerableCode: On-demand live evaluation of packages |
| 2 | +===================================================== |
| 3 | + |
| 4 | +Organization - `AboutCode <https://www.aboutcode.org>`_ |
| 5 | +----------------------------------------------------------- |
| 6 | +| **Michael Ehab Mikhail** |
| 7 | +| GitHub: `michaelehab <https://github.com/michaelehab>`_ |
| 8 | +| LinkedIn: `@michaelehab16 <https://www.linkedin.com/in/michaelehab16/>`_ |
| 9 | +| Project: `VulnerableCode |
| 10 | + <https://github.com/aboutcode-org/vulnerablecode>`_ |
| 11 | +| Official GSoC project page: `Project Link |
| 12 | + <https://summerofcode.withgoogle.com/programs/2025/projects/uF0kzMAg>`_ |
| 13 | +| GSoC Proposal: `Proposal Link |
| 14 | + <https://docs.google.com/document/d/1Tkk4MoPWXFj9r_U5cp3E4AhJW6QlHxTElyzpII_f4LM/edit?usp=sharing>`_ |
| 15 | +
|
| 16 | +Overview |
| 17 | +-------- |
| 18 | + |
| 19 | +VulnerableCode traditionally relied on **batch importers** to fetch |
| 20 | +and store all advisories from a source at once. While effective for |
| 21 | +building complete databases, batch importers are slow and |
| 22 | +resource-heavy for developers who only need vulnerability |
| 23 | +data for a **single package**. |
| 24 | + |
| 25 | +This project introduces **live importers**, a new class of |
| 26 | +importers that operate in a *package-first* mode. Instead of |
| 27 | +pulling all advisories, they run against a single |
| 28 | +PackageURL (PURL), returning only the advisories affecting |
| 29 | +that package. This makes vulnerability evaluation |
| 30 | +**faster, more efficient, and more personalized**, since the |
| 31 | +database is gradually filled with only the advisories |
| 32 | +that matter to each user. |
| 33 | + |
| 34 | +To support this, I added: |
| 35 | + |
| 36 | +* A new **LIVE_IMPORTERS_REGISTRY** that tracks available live importers. |
| 37 | +* A new **API endpoint** that accepts a **PURL**, enqueues compatible |
| 38 | + live importer pipelines into a Redis queue, and executes them asynchronously |
| 39 | + via workers. |
| 40 | +* Integration with **VulnTotal** and its **browser extension**, enabling users |
| 41 | + to evaluate packages in real-time through a seamless interface. |
| 42 | + |
| 43 | +This work bridges the gap between **batch-first databases** and |
| 44 | +**package-first queries**, improving VulnerableCode's flexibility and enabling |
| 45 | +better integration with developer workflows. |
| 46 | + |
| 47 | +.. note:: |
| 48 | + A PURL (Package URL) is a universal way to identify and locate software |
| 49 | + packages. `More on PURL <https://github.com/package-url/purl-spec>`_ |
| 50 | + |
| 51 | + |
| 52 | +Project Design and Architecture |
| 53 | +------------------------------- |
| 54 | + |
| 55 | +The new live importers system builds on existing batch importers, while introducing |
| 56 | +a parallel registry and asynchronous execution model for package-first runs. |
| 57 | + |
| 58 | +Importer Registries |
| 59 | +^^^^^^^^^^^^^^^^^^^ |
| 60 | + |
| 61 | +* ``IMPORTERS_REGISTRY`` continues to hold batch importers (V1/V2). |
| 62 | +* ``LIVE_IMPORTERS_REGISTRY`` holds live importers. |
| 63 | + |
| 64 | +Each live importer: |
| 65 | + |
| 66 | +* Inherits from its batch importer (when logic can be reused), or directly |
| 67 | + from ``VulnerableCodeBaseImporterPipelineV2`` when a separate |
| 68 | + implementation is needed. |
| 69 | +* Declares a ``supported_types`` array, defining compatible package |
| 70 | + ecosystems (``"pypi"``, ``"npm"``, ``"maven"``, ``"generic"``, etc). |
| 71 | +* Implements a package-first ``collect_advisories()`` method, which |
| 72 | + restricts results to advisories relevant to the given PURL. |
| 73 | + |
| 74 | +Live importer executions are asynchronous: once triggered, they are placed in |
| 75 | +a Redis-backed job queue and processed by dedicated workers. This prevents |
| 76 | +blocking the main API thread and allows multiple evaluations to run safely |
| 77 | +in parallel. |
| 78 | + |
| 79 | +.. figure:: /_static/gsoc2025/vulnerablecode_michael/registries.png |
| 80 | + :alt: Class architecture of importers registries |
| 81 | + :align: center |
| 82 | + :width: 70% |
| 83 | + |
| 84 | + Class architecture showing relationship between ``IMPORTERS_REGISTRY`` and |
| 85 | + ``LIVE_IMPORTERS_REGISTRY``. |
| 86 | + |
| 87 | +API Endpoint |
| 88 | +^^^^^^^^^^^^ |
| 89 | + |
| 90 | +The new API endpoint is responsible for handling live evaluation requests. |
| 91 | + |
| 92 | +* Input: |
| 93 | + |
| 94 | + * ``purl`` (required) |
| 95 | +* Execution: |
| 96 | + |
| 97 | + * Checks ``LIVE_IMPORTERS_REGISTRY`` for importers whose ``supported_types`` |
| 98 | + match the PURL. |
| 99 | + * Enqueues the pipelines runs of these live importers in a ``live`` rq. |
| 100 | + * Returns the **Live Run ID**, information about the pipelines to |
| 101 | + run, and the status url. |
| 102 | + * The status URL shows the current state of a live evaluation run |
| 103 | + and its individual pipeline runs. |
| 104 | + |
| 105 | +* Output: |
| 106 | + |
| 107 | + * Once workers complete execution, the resulting advisories are imported |
| 108 | + into the database and exposed as JSON through the status endpoint. |
| 109 | + |
| 110 | +.. figure:: /_static/gsoc2025/vulnerablecode_michael/live_pipeline_run.png |
| 111 | + :alt: Live Pipeline Run Class |
| 112 | + :align: center |
| 113 | + :width: 70% |
| 114 | + |
| 115 | + Live Pipeline Run Class and how it groups multiple PipelineRuns. |
| 116 | + |
| 117 | +.. figure:: /_static/gsoc2025/vulnerablecode_michael/api.png |
| 118 | + :alt: Live Importers API request flow |
| 119 | + :align: center |
| 120 | + :width: 70% |
| 121 | + |
| 122 | + Flow of API endpoint: selecting compatible live importers and executing |
| 123 | + them in parallel. |
| 124 | + |
| 125 | +Integration with VulnTotal |
| 126 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 127 | + |
| 128 | +The new API was integrated into VulnTotal as an optional datasource: |
| 129 | + |
| 130 | +* VulnTotal now checks the local environment for |
| 131 | + ``VCIO_HOST``, ``VCIO_PORT``, and ``ENABLE_LIVE_EVAL`` flags in ``.env``. |
| 132 | +* If enabled, VulnTotal queries VulnerableCode in package-first mode. |
| 133 | +* This allows VulnTotal to use both its proprietary datasources **and** |
| 134 | + the user's gradually built local database, improving coverage and |
| 135 | + personalization. |
| 136 | + |
| 137 | +Integration with VulnTotal Browser Extension |
| 138 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 139 | + |
| 140 | +The VulnTotal browser extension was updated to support live importers: |
| 141 | + |
| 142 | +* Users can enable the "Local VulnerableCode" datasource and live evaluation option. |
| 143 | +* When enabled, package lookups are forwarded to the new API, retrieving |
| 144 | + advisories in real-time. |
| 145 | +* This reduces setup effort—developers can get live vulnerability checks |
| 146 | + directly in their browser, provided they have a local VC instance. |
| 147 | + |
| 148 | +.. figure:: /_static/gsoc2025/vulnerablecode_michael/extension_demo.gif |
| 149 | + :alt: Live evaluation demo in VulnTotal browser extension |
| 150 | + :align: center |
| 151 | + :width: 70% |
| 152 | + |
| 153 | + VulnTotal and its browser extension consuming the new live evaluation API. |
| 154 | + |
| 155 | +Linked Pull Requests |
| 156 | +-------------------- |
| 157 | + |
| 158 | +.. list-table:: |
| 159 | + :widths: 10 40 20 |
| 160 | + :header-rows: 1 |
| 161 | + |
| 162 | + * - Sr. no |
| 163 | + - Name |
| 164 | + - Link |
| 165 | + * - 1 |
| 166 | + - Add Live Evaluation API endpoint and PyPa live pipeline importer |
| 167 | + - `aboutcode-org/vulnerablecode#1969 |
| 168 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1969>`_ |
| 169 | + * - 2 |
| 170 | + - Add Gitlab Live V2 Importer |
| 171 | + - `aboutcode-org/vulnerablecode#1910 |
| 172 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1910>`_ |
| 173 | + * - 3 |
| 174 | + - Add Curl Live Importer V2 |
| 175 | + - `aboutcode-org/vulnerablecode#1923 |
| 176 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1923>`_ |
| 177 | + * - 4 |
| 178 | + - Add Elixir Security Live V2 Importer |
| 179 | + - `aboutcode-org/vulnerablecode#1935 |
| 180 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1935>`_ |
| 181 | + * - 5 |
| 182 | + - Add NPM Live Importer V2 |
| 183 | + - `aboutcode-org/vulnerablecode#1941 |
| 184 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1941>`_ |
| 185 | + * - 6 |
| 186 | + - Add GitHub OSV Live V2 Importer Pipeline |
| 187 | + - `aboutcode-org/vulnerablecode#1977 |
| 188 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1977>`_ |
| 189 | + * - 7 |
| 190 | + - Add Postgres Live V2 Importer Pipeline |
| 191 | + - `aboutcode-org/vulnerablecode#1982 |
| 192 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1982>`_ |
| 193 | + * - 8 |
| 194 | + - Add PySec Live V2 Importer Pipeline |
| 195 | + - `aboutcode-org/vulnerablecode#1983 |
| 196 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1983>`_ |
| 197 | + * - 9 |
| 198 | + - Add Local VulnerableCode Datasource in VulnTotal and allow live evaluation |
| 199 | + - `aboutcode-org/vulnerablecode#1985 |
| 200 | + <https://github.com/aboutcode-org/vulnerablecode/pull/1985>`_ |
| 201 | + * - 10 |
| 202 | + - Integrate Local VulnerableCode datasource and live evaluation |
| 203 | + - `aboutcode-org/vulntotal-extension#17 |
| 204 | + <https://github.com/aboutcode-org/vulntotal-extension/pull/17>`_ |
| 205 | + |
| 206 | + |
| 207 | +Closing Thoughts |
| 208 | +------------------- |
| 209 | + |
| 210 | +This project was an exciting step forward from my 2024 GSoC work. By moving |
| 211 | +from batch importers to package-first live importers, We enabled a faster, |
| 212 | +more personalized, and more flexible way of building vulnerability databases. |
| 213 | + |
| 214 | +I especially enjoyed designing the **registry + API architecture** and |
| 215 | +integrating Redis queues and workers for asynchronous execution. This improved |
| 216 | +scalability, responsiveness, and fault tolerance, ensuring the API never blocks |
| 217 | +and multiple live evaluations can run in parallel. I also appreciated discussing |
| 218 | +it with mentors and integrating it seamlessly across |
| 219 | +**VulnerableCode, VulnTotal, and the browser extension**. |
| 220 | + |
| 221 | +This work lays the foundation for even richer interactivity |
| 222 | +in the ecosystem and brings vulnerability evaluation closer |
| 223 | +to developers' workflows. |
| 224 | + |
| 225 | +I appreciated the weekly status calls and the feedback I received from my |
| 226 | +mentors and the amazing team. They were really helpful and supportive. |
| 227 | +`Philippe Ombredanne <https://github.com/pombredanne>`_, |
| 228 | +`Ayan Sinha Mahapatra <https://github.com/AyanSinhaMahapatra>`_, |
| 229 | +`Tushar Goel <https://github.com/TG1999>`_, |
| 230 | +`Keshav Priyadarshi <https://github.com/keshav-space>`_ |
0 commit comments