BlueSkyScraper

A simple BlueSky Scraper in Python3 using Selenium Libray

In this example, I was collecting posts that cite DOIs. You can change the search URL to what you want. A file with all the links on the page and the full page will be saved at the end. BlueSky enables you to scroll back 6 months of posts and also you may need to limit your scrape based on the amount of memory you have. A chromium browser with the correct chromedriver is required. The script will wait for you to log into your account and change the view tab from top to latest. You can type "E" anytime to end the scrape. This code was generated with the help of ChatGPT 3.5

Dependencies: Ensure you have the necessary libraries installed:

pip install selenium pandas webdriver-manager keyboard

Running the Script:

Run the script.
Wait for the browser to open and load the page.
Log to your BlueSky Account - optional
Manually click on the "Latest" tab.
Press Enter in the console where the script is running to continue execution.
The script will scroll down and automatically save the files either when the end of the page is reached or when you press 'E'.

python3 blueskyscrape.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
blueskyscrape.py		blueskyscrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BlueSkyScraper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BlueSkyScraper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages