Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Commit 500d132

Browse files
authored
Add Twitter NLP sentiment analysis example to accompany the corresponding video and social push (#311)
* Add Twitter NLP sentiment analysis example to accompany the corresponding video and social push * minor fixes * updates from reviews * make style * update sentiment analysis models * update for make style
1 parent 0b5f6a0 commit 500d132

File tree

5 files changed

+344
-1
lines changed

5 files changed

+344
-1
lines changed

examples/twitter-nlp/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
<!--
2+
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing,
11+
software distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
# Twitter NLP Inference Examples
18+
19+
This directory contains examples for scraping, processing, and classifying Twitter data
20+
using the DeepSparse engine for >=10x increase in inference performance on commodity CPUs.
21+
22+
## Installation
23+
24+
The dependencies for this example can be installed using `pip`:
25+
```bash
26+
pip3 install -r requirements.txt
27+
```
28+
29+
## Sentiment Analysis Example
30+
31+
The `analyze_sentiment.py` script is used to analyze and classify tweets as either positive or negative
32+
depending on their contents.
33+
For example, you can analyze the general sentiment of crypto or other common topics across Twitter.
34+
35+
To use, first gather the desired number of tweets for your topic(s) and save them as a text file to use with `analyze_sentiment.py`.
36+
The script expects one tweet per row with each tweet formatted as a json object containing a `"tweet"` key that maps to the text content.
37+
38+
An example script `scrape.py` is given to show this in action.
39+
Note, it uses the Twint library which does not abide by Twitter's terms of service.
40+
The script is given as an example only and users are expected to use Twitter's developer pathways and APIs in place of this script.
41+
```bash
42+
python scrape.py --topic '#crypto' --total_tweets 1000
43+
```
44+
45+
Next, use the `analyze_sentiment.py` along with sparsified sentiment analysis models from the [SparseZoo](https://sparsezoo.neuralmagic.com/?domain=nlp&sub_domain=sentiment_analysis&page=1)
46+
to performantly analyze the general sentiment across the gathered tweets:
47+
```bash
48+
python analyze_sentiment.py
49+
--model_path "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/12layer_pruned80_quant-none-vnni"
50+
--tweets_file "#crypto.txt"
51+
```
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing,
10+
# software distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# flake8: noqa
16+
17+
"""
18+
Script to analyze the sentiment of a given file of tweets from Twitter
19+
in batch processing mode.
20+
21+
##########
22+
Command help:
23+
Usage: analyze_sentiment.py [OPTIONS]
24+
25+
Analyze the sentiment of the tweets given in the tweets_file and print out
26+
the results.
27+
28+
Options:
29+
--model_path TEXT The path to the sentiment analysis model to
30+
load.Either a model.onnx file, a model folder
31+
containing the model.onnx and supporting files, or a
32+
SparseZoo model stub.
33+
--tweets_file TEXT The path to the tweets json txt file to analyze
34+
sentiment for.
35+
--batch_size INTEGER The batch size to process the tweets with. A higher
36+
batch size may increase performance at the expense
37+
of memory resources and individual latency.
38+
--total_tweets INTEGER The total number of tweets to analyze from the
39+
tweets_file.Defaults to None which will run through
40+
all tweets contained in the file.
41+
--help Show this message and exit.
42+
43+
##########
44+
Example running a sparse, quantized sentiment analysis model:
45+
python analyze_sentiment.py
46+
--model_path "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/12layer_pruned80_quant-none-vnni"
47+
--tweets_file /PATH/TO/OUTPUT/FROM/scrape.py
48+
49+
##########
50+
Example running a dense, unoptimized sentiment analysis model:
51+
python analyze_sentiment.py
52+
--model_path "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/base-none"
53+
--tweets_file /PATH/TO/OUTPUT/FROM/scrape.py
54+
"""
55+
56+
import json
57+
from itertools import cycle, islice
58+
from typing import Any, Dict, List, Optional
59+
60+
import click
61+
62+
from deepsparse.transformers import pipeline
63+
from rich import print
64+
65+
66+
def _load_tweets(tweets_file: str):
67+
tweets = []
68+
with open(tweets_file, "r") as file:
69+
for line in file.readlines():
70+
tweets.append(json.loads(line))
71+
72+
return tweets
73+
74+
75+
def _prep_data(tweets: List[Dict], total_num: int) -> List[str]:
76+
if total_num:
77+
tweets = islice(cycle(tweets), total_num)
78+
79+
return [tweet["tweet"].strip().replace("\n", "") for tweet in tweets]
80+
81+
82+
def _batched_model_input(tweets: List[str], batch_size: int) -> Optional[List[str]]:
83+
if batch_size >= len(tweets):
84+
return None
85+
86+
batched = tweets[0:batch_size]
87+
del tweets[0:batch_size]
88+
89+
return batched
90+
91+
92+
def _classified_positive(sentiment: Dict[str, Any]):
93+
return sentiment["label"] == "LABEL_1"
94+
95+
96+
def _display_results(batch, sentiments):
97+
for text, sentiment in zip(batch, sentiments):
98+
color = "green" if _classified_positive(sentiment) else "magenta"
99+
print(f"[{color}]{text}[/{color}]")
100+
101+
102+
@click.command()
103+
@click.option(
104+
"--model_path",
105+
type=str,
106+
help="The path to the sentiment analysis model to load."
107+
"Either a model.onnx file, a model folder containing the model.onnx "
108+
"and supporting files, or a SparseZoo model stub.",
109+
)
110+
@click.option(
111+
"--tweets_file",
112+
type=str,
113+
help="The path to the tweets json txt file to analyze sentiment for.",
114+
)
115+
@click.option(
116+
"--batch_size",
117+
type=int,
118+
default=16,
119+
help="The batch size to process the tweets with. "
120+
"A higher batch size may increase performance at the expense of memory resources "
121+
"and individual latency.",
122+
)
123+
@click.option(
124+
"--total_tweets",
125+
type=int,
126+
default=None,
127+
help="The total number of tweets to analyze from the tweets_file."
128+
"Defaults to None which will run through all tweets contained in the file.",
129+
)
130+
def analyze_tweets_sentiment(
131+
model_path: str, tweets_file: str, batch_size: int, total_tweets: int
132+
):
133+
"""
134+
Analyze the sentiment of the tweets given in the tweets_file and
135+
print out the results.
136+
"""
137+
text_pipeline = pipeline(
138+
task="text-classification",
139+
model_path=model_path,
140+
batch_size=batch_size,
141+
)
142+
tweets = _load_tweets(tweets_file)
143+
tweets = _prep_data(tweets, total_tweets)
144+
tot_sentiments = []
145+
146+
while True:
147+
batch = _batched_model_input(tweets, batch_size)
148+
if batch is None:
149+
break
150+
sentiments = text_pipeline(batch)
151+
_display_results(batch, sentiments)
152+
tot_sentiments.extend(sentiments)
153+
154+
num_positive = sum(
155+
[1 if _classified_positive(sent) else 0 for sent in tot_sentiments]
156+
)
157+
num_negative = sum(
158+
[1 if not _classified_positive(sent) else 0 for sent in tot_sentiments]
159+
)
160+
print("\n\n\n")
161+
print("###########################################################################")
162+
print(f"Completed analyzing {len(tweets)} tweets for sentiment,")
163+
164+
if num_positive >= num_negative:
165+
print(
166+
f"[green]General sentiment is positive with "
167+
f"{100*num_positive/float(len(tot_sentiments)):.0f}% in favor.[/green]"
168+
)
169+
else:
170+
171+
print(
172+
f"[magenta]General sentiment is negative with "
173+
f"{100*num_negative/float(len(tot_sentiments)):.0f}% against.[/magenta]"
174+
)
175+
print("###########################################################################")
176+
177+
178+
if __name__ == "__main__":
179+
analyze_tweets_sentiment()
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
click==8.0.1
2+
deepsparse>=0.11
3+
git+https://github.com/twintproject/twint@e7c8a0c764f6879188e5c21e25fb6f1f856a7221#egg=twint
4+
rich==12.2.0

examples/twitter-nlp/scrape.py

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing,
10+
# software distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""
16+
Twitter scraping script using Twint.
17+
Give a topic, or multiple, and it will pull down the desired number of tweets
18+
that match.
19+
Writes the results as JSON to the given output_file.
20+
If None given, will write the results under a new file named after the given topic.
21+
22+
23+
##########
24+
Command help:
25+
Usage: scrape.py [OPTIONS]
26+
27+
Twitter scraping script using Twint. Give a topic, or multiple, and it will
28+
pull down the desired number of tweets that match. Writes the results as
29+
JSON to the given output_file. If None given, will write the results under a
30+
new file named after the given topic.
31+
32+
Options:
33+
-t, --topic TEXT The topics to scrape twitter for, either keywords or
34+
hashtags.For example: '--topic #crypto'. Multiple
35+
topics can be used as well, for example: '-t #crypto
36+
-t #bitcoin'
37+
--total_tweets INTEGER The total number of tweets to gather from Twitter.
38+
Note, the API used from Twitter has a maximum date
39+
range of around 1 week.
40+
--output_file TEXT The output file to write the tweets to. If not
41+
supplied, will create a new file using the topics as
42+
names.
43+
--help Show this message and exit.
44+
45+
##########
46+
Example command for scraping Twitter for #crypto tweets:
47+
python scrape.py --topic '#crypto' --total_tweets 1000
48+
"""
49+
50+
from typing import List, Optional
51+
52+
import click
53+
import twint
54+
55+
56+
@click.command()
57+
@click.option(
58+
"--topic",
59+
"-t",
60+
multiple=True,
61+
help="The topics to scrape twitter for, either keywords or hashtags."
62+
"For example: '--topic #crypto'. "
63+
"Multiple topics can be used as well, for example: '-t #crypto -t #bitcoin'",
64+
)
65+
@click.option(
66+
"--total_tweets",
67+
type=int,
68+
default=100,
69+
help="The total number of tweets to gather from Twitter. "
70+
"Note, the API used from Twitter has a maximum date range of around 1 week.",
71+
)
72+
@click.option(
73+
"--output_file",
74+
type=str,
75+
default=None,
76+
help="The output file to write the tweets to. "
77+
"If not supplied, will create a new file using the topics as names.",
78+
)
79+
def scrape_tweets(topic: List[str], total_tweets: int, output_file: Optional[str]):
80+
"""
81+
Twitter scraping script using Twint.
82+
Give a topic, or multiple, and it will pull down the desired number of tweets
83+
that match.
84+
Writes the results as JSON lines as text to the given output_file.
85+
If None given, will write the results under a new file named after the given topic.
86+
"""
87+
print(
88+
"WARNING: Twint does not abide by Twitter's terms of service. "
89+
"The script listed here is given as only an example and for searching. "
90+
"User's should use Twitter's accepted APIs and developer console for search. "
91+
)
92+
config = twint.Config()
93+
topics_str = " ".join(
94+
[f"({top})" if top.startswith("#") else top for top in topic]
95+
) # reformat as hashtags
96+
config.Custom_query = (
97+
f"{topics_str} min_faves:2 lang:en -filter:links -filter:replies "
98+
)
99+
config.Limit = total_tweets
100+
config.Store_json = True
101+
config.Output = f"{'_'.join(topic)}.txt" if not output_file else output_file
102+
103+
print(f"Scraping {total_tweets} tweets")
104+
twint.run.Search(config)
105+
print(f"Finished scraping, tweets written to {config.Output}")
106+
107+
108+
if __name__ == "__main__":
109+
scrape_tweets()

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ensure_newline_before_comments = True
55
force_grid_wrap = 0
66
include_trailing_comma = True
77
known_first_party = deepsparse,sparsezoo
8-
known_third_party = bs4,requests,packaging,setuptools,numpy,onnx,onnxruntime,flask,flask_cors,tqdm,transformers,pydantic,click,yaml
8+
known_third_party = bs4,requests,packaging,setuptools,numpy,onnx,onnxruntime,flask,flask_cors,tqdm,transformers,pydantic,click,yaml,twint,colorama
99
sections = FUTURE,STDLIB,THIRDPARTY,FIRSTPARTY,LOCALFOLDER
1010

1111
line_length = 88

0 commit comments

Comments
 (0)