You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 3, 2025. It is now read-only.
* altered emoji and title font sizes to match other readmes
* fix yaml code block indentation
* aligned indentation 2nd time
* fix yaml identation
* edited tables to sync with docs, added urls for new readmes, and edited grammar
* removed border
* fixed resources section
* altered urls to tasks in the nlp inference section
* edited grammar and URL issues
* edited grammar
* updating squad model stubs
Copy file name to clipboardExpand all lines: README.md
+34-41Lines changed: 34 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,25 +76,25 @@ pip install deepsparse
76
76
77
77
## 🔌 DeepSparse Server
78
78
79
-
The DeepSparse Server allows you to serve models and pipelines in deployment in CLI. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command:
79
+
The DeepSparse Server allows you to serve models and pipelines from the terminal. The server runs on top of the popular FastAPI web framework and Uvicorn web server. Install the server using the following command:
80
80
81
81
```bash
82
82
pip install deepsparse[server]
83
83
```
84
84
85
-
**⭐ Single Model ⭐**
85
+
###Single Model
86
86
87
87
Once installed, the following example CLI command is available for running inference with a single BERT model:
To look up arguments run: `deepsparse.server --help`.
96
96
97
-
**⭐ Multiple Models ⭐**
97
+
###Multiple Models
98
98
To serve multiple models in your deployment you can easily build a `config.yaml`. In the example below, we define two BERT models in our configuration for the question answering task:
[Getting Started with CLI Benchmarking](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark_model) includes examples of select inference scenarios:
DeepSparse can accept ONNX models from two sources:
166
168
167
-
1. `SparseZoo ONNX`: our open-source collection of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) hosts inference-optimized models, trained on repeatable sparsification recipes using state-of-the-art techniques from [SparseML.](https://github.com/neuralmagic/sparseml)
168
-
169
-
2. `Custom ONNX`: Your own ONNX model, can be dense or sparse. Plug in your model to compare performance with other solutions.
169
+
- **SparseZoo ONNX**: our open-source collection of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) hosts inference-optimized models, trained on repeatable sparsification recipes using state-of-the-art techniques from [SparseML](https://github.com/neuralmagic/sparseml).
170
170
171
+
- **Custom ONNX**: your own ONNX model, can be dense or sparse. Plug in your model to compare performance with other solutions.
- ONNX IR version has not been tested at this time
195
196
196
197
The [GitHub repository](https://github.com/neuralmagic/deepsparse) includes package APIs along with examples to quickly get started benchmarking and inferencing sparse models.
197
198
198
-
__ __
199
-
200
199
## Scheduling Single-Stream, Multi-Stream, and Elastic Inference
201
200
202
201
The DeepSparse Engine offers up to three types of inferences based on your use case. Read more details here: [Inference Types](https://github.com/neuralmagic/deepsparse/blob/main/docs/source/scheduler.md).
@@ -216,7 +215,6 @@ PRO TIP: The most common use cases for the multi-stream scheduler are where para
216
215
3 ⚡ Elastic scheduling: requests execute in parallel, but not multiplexed on individual NUMA nodes.
217
216
218
217
Use Case: A workload that might benefit from the elastic scheduler is one in which multiple requests need to be handled simultaneously, but where performance is hindered when those requests have to share an L3 cache.
219
-
__ __
220
218
221
219
## 🧰 CPU Hardware Support
222
220
@@ -233,34 +231,29 @@ Here is a table detailing specific support for some algorithms over different mi
233
231
234
232
## Resources
235
233
236
-
<table>
237
-
<tr><th> Documentation </th><th>    Versions </th><th> Info </th></tr>
@@ -270,7 +263,7 @@ Here is a table detailing specific support for some algorithms over different mi
270
263
271
264
Contribute with code, examples, integrations, and documentation as well as bug reports and feature requests! [Learn how here.](https://github.com/neuralmagic/deepsparse/blob/main/CONTRIBUTING.md)
272
265
273
-
For user help or questions about DeepSparse, sign up or log in to our [**Deep Sparse Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community.
266
+
For user help or questions about DeepSparse, sign up or log in to our **[Deep Sparse Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)**. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community.
274
267
275
268
For more general questions about Neural Magic, complete this [form.](http://neuralmagic.com/contact/)
0 commit comments