Skip to content

Commit 055f10a

Browse files
authored
Merge pull request #1060 from google/badges
Update README badges
2 parents 0758673 + bd12bcd commit 055f10a

File tree

2 files changed

+20
-17
lines changed

2 files changed

+20
-17
lines changed

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
11
# Magika
22

3-
[![image](https://img.shields.io/pypi/v/magika.svg)](https://pypi.python.org/pypi/magika)<!-- [![image](https://img.shields.io/pypi/l/magika.svg)](https://pypi.python.org/pypi/magika) -->
3+
[![image](https://img.shields.io/pypi/v/magika.svg)](https://pypi.python.org/pypi/magika)
4+
[![NPM Version](https://img.shields.io/npm/v/magika)](https://npmjs.com/package/magika)
5+
[![image](https://img.shields.io/pypi/l/magika.svg)](https://pypi.python.org/pypi/magika)
46
[![image](https://img.shields.io/pypi/pyversions/magika.svg)](https://pypi.python.org/pypi/magika)
5-
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/google/magika/badge)](https://securityscorecards.dev/viewer/?uri=github.com/google/magika)
67
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/8706/badge)](https://www.bestpractices.dev/en/projects/8706)
78
![CodeQL](https://github.com/google/magika/workflows/CodeQL/badge.svg)
89
[![Actions status](https://github.com/google/magika/actions/workflows/python-build-package.yml/badge.svg)](https://github.com/google/magika/actions)
9-
[![PyPI Monthly Downloads](https://img.shields.io/pypi/dm/magika)](https://pypi.org/project/magika/)
10+
[![PyPI Monthly Downloads](https://static.pepy.tech/badge/magika/month)](https://pepy.tech/projects/magika)
11+
[![PyPI Downloads](https://static.pepy.tech/badge/magika)](https://pepy.tech/projects/magika)
12+
13+
<!-- [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/google/magika/badge)](https://securityscorecards.dev/viewer/?uri=github.com/google/magika) -->
1014

1115
Magika is a novel AI-powered file type detection tool that relies on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized model that only weighs about a few MBs, and enables precise file identification within milliseconds, even when running on a single CPU. Magika has been trained and evaluated on a dataset of ~100M samples across 200+ content types (covering both binary and textual file formats), and it achieves an average ~99% accuracy on our test set.
1216

python/README.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
11
# Magika Python Package
22

3-
[![image](https://img.shields.io/pypi/v/magika.svg)](https://pypi.python.org/pypi/magika)<!-- [![image](https://img.shields.io/pypi/l/magika.svg)](https://pypi.python.org/pypi/magika) -->
3+
[![image](https://img.shields.io/pypi/v/magika.svg)](https://pypi.python.org/pypi/magika)
4+
[![NPM Version](https://img.shields.io/npm/v/magika)](https://npmjs.com/package/magika)
5+
[![image](https://img.shields.io/pypi/l/magika.svg)](https://pypi.python.org/pypi/magika)
46
[![image](https://img.shields.io/pypi/pyversions/magika.svg)](https://pypi.python.org/pypi/magika)
5-
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/google/magika/badge)](https://securityscorecards.dev/viewer/?uri=github.com/google/magika)
67
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/8706/badge)](https://www.bestpractices.dev/en/projects/8706)
78
![CodeQL](https://github.com/google/magika/workflows/CodeQL/badge.svg)
89
[![Actions status](https://github.com/google/magika/actions/workflows/python-build-package.yml/badge.svg)](https://github.com/google/magika/actions)
9-
[![PyPI Monthly Downloads](https://img.shields.io/pypi/dm/magika)](https://pypi.org/project/magika/)
10+
[![PyPI Monthly Downloads](https://static.pepy.tech/badge/magika/month)](https://pepy.tech/projects/magika)
11+
[![PyPI Downloads](https://static.pepy.tech/badge/magika)](https://pepy.tech/projects/magika)
1012

13+
<!-- [![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/google/magika/badge)](https://securityscorecards.dev/viewer/?uri=github.com/google/magika) -->
1114

1215
Magika is a novel AI-powered file type detection tool that relies on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized model that only weighs about a few MBs, and enables precise file identification within milliseconds, even when running on a single CPU. Magika has been trained and evaluated on a dataset of ~100M samples across 200+ content types (covering both binary and textual file formats), and it achieves an average ~99% accuracy on our test set.
1316

@@ -19,12 +22,12 @@ You can find more information on which content types are supported, extended doc
1922
2023
> **IMPORTANT**: This latest 0.6.1 version has a few breaking changes from the latest stable version, 0.5.1. Please consult the [CHANGELOG.md](https://github.com/google/magika/blob/main/python/CHANGELOG.md#061---2025-03-19) and the [migration guide](https://github.com/google/magika/blob/main/python/CHANGELOG.md#breaking-changes-and-migration-guide).
2124
22-
2325
## Installing Magika
2426

2527
Magika is available as `magika` on [PyPI](https://pypi.org/project/magika):
2628

2729
To install the most recent stable version:
30+
2831
```shell
2932
$ pip install magika
3033
```
@@ -33,7 +36,6 @@ If you intend to use Magika only as a command line, you may want to use `$ pipx
3336

3437
If you want to test out the latest release candidate, you can install it with `pip install --pre magika`.
3538

36-
3739
## Using Magika as a command-line tool
3840

3941
> Beginning with version `0.6.0`, the magika Python package includes a pre-compiled Rust-based command-line tool, replacing the previous Python version. This binary is distributed as platform-specific wheels for most common architectures. For unsupported platforms, a pure-Python wheel is also available, providing the legacy Python client as a fallback.
@@ -168,10 +170,8 @@ Options:
168170
Print version
169171
```
170172

171-
172173
Check the [Rust CLI docs](https://github.com/google/magika/blob/main/rust/cli/README.md) for more information.
173174

174-
175175
## Using Magika as a Python module
176176

177177
> Note: The Python API introduced in version `0.6.0` closely resembles the previous version, but includes several enhancements and a few breaking changes. Migrating existing clients should be relatively straightforward. Where possible, we have maintained compatibility with the old API and added deprecation warnings. For a complete list of changes and migration guidance, consult the [CHANGELOG.md](https://github.com/google/magika/blob/main/python/CHANGELOG.md).
@@ -203,26 +203,26 @@ ini
203203
ini
204204
```
205205

206-
207206
## Documentation on core concepts
208207

209208
To get the most out of Magika, it's worth learning about its core concepts. You can read about the models, prediction modes, output structure, and content type knowledge base in the documentation [here](https://github.com/google/magika/blob/main/docs/concepts.md).
210209

211-
212210
### API documentation
213211

214212
First, create a `Magika` instance: `magika = Magika()`.
215213

216214
The constructor accepts the following optional arguments:
215+
217216
- `model_dir`: path to a model to use; defaults to the latest available model.
218217
- `prediction_mode`: which prediction mode to use; defaults to `PredictionMode.HIGH_CONFIDENCE`.
219218
- `no_dereference`: controls whether symlinks should be dereferenced; defaults to `False`.
220219

221220
Once instantiated, the `Magika` object exposes methods to identify the content type of a `bytes` object, of files identified by their paths, and of an already-open binary stream:
221+
222222
- `magika.identify_bytes(b"test")`: takes as input a stream of bytes and predict its content type.
223223
- `magika.identify_path("test.txt")`: takes as input one `str | os.PathLike` object and predicts its content type.
224224
- `magika.identify_paths(["test.txt", "test2.txt"])`: takes as input a list of `str | os.PathLike` objects and returns the predicted type for each of them.
225-
- `magika.identify_stream(stream: typing.BinaryIO)`: takes as input an *already open* binary file-like object (e.g., the output of `open(file_path, 'rb')`) and returns its predicted content type. Keep in mind that Magika will `seek()` around the stream, and that the stream *is not closed* (closing is the responsibility of the caller).
225+
- `magika.identify_stream(stream: typing.BinaryIO)`: takes as input an _already open_ binary file-like object (e.g., the output of `open(file_path, 'rb')`) and returns its predicted content type. Keep in mind that Magika will `seek()` around the stream, and that the stream _is not closed_ (closing is the responsibility of the caller).
226226

227227
If you are dealing with large files, the `identify_path`, `identify_paths`, and `identify_stream` variants are generally better: their implementation `seek()`s around the file/stream to extract the needed features, without loading the entire content in memory.
228228

@@ -267,25 +267,24 @@ class ContentTypeLabel(StrEnum):
267267
[...]
268268
```
269269

270-
271270
### Additional APIs
272271

273272
- `get_output_content_types()`: Returns a list of all possible content type labels that Magika can output (i.e., the possible values of `MagikaResult.prediction.output.label`). This is the recommended method for most users that want to have a list of what is the output space of Magika.
274-
- `get_model_content_types()`: Returns a list of all possible content type labels the *deep learning model* can output (i.e., `MagikaResult.prediction.dl.label`). Useful for debugging, most users should refer to `get_output_content_types()`.
273+
- `get_model_content_types()`: Returns a list of all possible content type labels the _deep learning model_ can output (i.e., `MagikaResult.prediction.dl.label`). Useful for debugging, most users should refer to `get_output_content_types()`.
275274
- `get_module_version()` and `get_model_version()`: Returns the module version and the model's name being used, respectively.
276275

277-
278276
## Development setup
279277

280278
- `magika` uses `uv` as a project and dependency managment tool. To install all the dependencies: `$ cd python; uv sync`.
281279
- To run the tests suite: `$ cd python; uv run pytest tests -m "not slow"`. Check the github action workflows for more information.
282280
- We use the `maturin` backend to combine the Rust CLI with the python codebase in the `magika` python package. This process is automated via the [build python package GitHub action](https://github.com/google/magika/blob/main/.github/workflows/python-build-package.yml).
283281

284-
285282
## Research Paper and Citation
283+
286284
We describe how we developed Magika and the choices we made in our research paper, which was accepted at the International Conference on Software Engineering (ICSE) 2025. A pre-print of our paper is available on arxiv: [https://arxiv.org/abs/2409.13768](https://arxiv.org/abs/2409.13768).
287285

288286
If you use this software for your research, please cite it as:
287+
289288
```bibtex
290289
@InProceedings{fratantonio25:magika,
291290
author = {Yanick Fratantonio and Luca Invernizzi and Loua Farah and Kurt Thomas and Marina Zhang and Ange Albertini and Francois Galilee and Giancarlo Metitieri and Julien Cretin and Alexandre Petit-Bianco and David Tao and Elie Bursztein},

0 commit comments

Comments
 (0)