-
Notifications
You must be signed in to change notification settings - Fork 97
Description
In the fall, PSAW worked for me. Currently, it does not. I can import PSAW but when I try to execute
from psaw import PushshiftAPI
# Initialize PushShift
api = PushshiftAPI()
the system tries to initialize the api for 5 or 6 minutes then times out.
I'm looking for a replacement and think praw might be suitable (I'm running this in colab, so the !pip install functions are needed to get praw loaded into the virtual system.) One problem is that students will need to set up a reddit account (if they don't have one already) and then create an API key. But, at least for read-only access, they don't need to go through the OAuth process.
The code needs to be edited to reflect the different way praw returns data from Reddit, but it is pretty straight-forward as the following snippets show.
!pip install praw
!pip install --upgrade https://github.com/praw-dev/praw/archive/master.zip
import praw
reddit = praw.Reddit(
client_id="xxx",
client_secret="xxx",
user_agent="script by u/xxx",
)
# snippet to show how to get similar data from AITA as in the textbook
for submission in reddit.subreddit("AmITheAsshole").hot(limit=10):
print(submission.selftext)
# to search ALL subreddits by keyword , PRAW uses a different syntax:
for submission in reddit.subreddit("all").search("Missy Elliott", limit=10):
print(f"{submission.subreddit} | {submission.title} | {submission.selftext:<30}\n")
# PRAW results can be put into a DF pretty easily:
# this code is adapted from https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c
df = pd.DataFrame() # initialize dataframe
# loop through each post retrieved from GET request
for post in reddit.subreddit("all").search("Missy Elliot", limit=100):
# append relevant data to dataframe
df = df.append({
'subreddit': post.subreddit,
'title': post.title,
'selftext': post.selftext,
'upvote_ratio': post.upvote_ratio,
'ups': post.ups,
'downs': post.downs,
'score': post.score
}, ignore_index=True)
Once the data about Missy Elliott is in the DF, the parts of the lesson that relate to analyzing the DF should follow.