Skip to content

"A Python script to stream, filter, and process the Cellosaurus XML for cell line ancestry values"

License

Notifications You must be signed in to change notification settings

njbowen/Cellosaurus_Ancestry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cellosaurus_Ancestry

A Python script to stream, filter, and process the Cellosaurus XML for cell line ancestry values

Cellosaurus XML Processor

This project contains a Python script to stream and process the XML file from the Cellosaurus database. The script filters the Cellosaurus database for cell lines with ancestry data and outputs relevant data, including disease site, ancestry percentages, and other key information.

Generated Files

csv file of Cell Lines and Ancestry

xlsx file of Cell Lines and Ancestry

Features

  • Streaming: Efficiently process large XML files without loading them entirely into memory.
  • Filtering: Filter cell lines based on presence of ancestry data.
  • Output: Generates a CSV file with results and an Excel file with results sorted by descending African ancestry % from Cellosaurus.

Installation

Clone this repository and install the required dependencies:

git clone https://github.com/YOUR-USERNAME/cellosaurus-xml-processor.git
cd cellosaurus-xml-processor
pip install pandas openpyxl requests

About

"A Python script to stream, filter, and process the Cellosaurus XML for cell line ancestry values"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages