ai-model-for-connectomics

AI Model for Connectomics is a project designed to investigate the neurotransmitter identities of VNC neurons in the drosophila.

Set up

There are two things that you will need to get all the notebooks up and running. You will need to set up a conda environment and also get the tokens for Flywire and CAVE which you will need to store in your .env file.

Creating your `.env` file

You can use the .env.example but you should store two variables

CAVE_AUTH_TOKEN="this should be a 32 character long random string"
FLYWIRE_AUTH_TOKEN="this should be a 32 character long random string"

The load_data/connect_clients.py module will automatically generate new tokens for you to use if you don't have them stored in the .env file.

Set up virtual environment

This is designed to run as a conda environment. So use the standard commands to set up and enter the environment

conda env create -f environment.yml
conda activate fau_connectomics

notebooks

Jupyter notebooks that explore the data. Note to gain access to the cave client and flywire client you will need to get/use your own secret keys.

NB Corresponding output data is stored in subfolders. For all the possible root-ids investigated I suggest using the fauai-13_data folder.

fauai-9_GatherSynapsePredictions.ipynb

Purpose: This uses the CAVEclient and Flywire in order to extract the Eckstein et al. synaptic predictions and look at their distributions and generally get a feel for the data.

Major findings:

Created a simple plot of 8 panels to visualise the neurotransmitter of all presynaptic terminals belonging to that neuron.

Top row: Left to Right
- Heatmap of NT predictions: Row = each sympse, col=NT identity, colour indicates the softmax scores from the CNN.
- Vectors of heatmap predictions: Vectors, each line represents the softmax scores for a particular presynapse.
- Stripplot showing the determined identity of each presynapse (based off the largest softmax score of the presynapse).
- Barplot showing the raw count plots for the number of presynapses with each NT.
Bottom row: Left to right
- Coronal (xy) plot showing the structure of the neuron and the colour-coded positions of each presynaptic terminal.
- Sagittal (yz) plot showing the structure of the neuron and the colour-coded positions of each presynaptic terminal.
- Horizontal (xz) plot showing the structure of the neuron and the colour-coded positions of each presynaptic terminal.
- Barplot showing the ratio plots for the number of presynapses with each NT.

We found that a threshold of 0.25 for the lowest maximum softmax score for a synapse removed many of the worst estimates
We also saw that we'd have to consider spatial positioning too as there are hotspots of neurotransmitter specific hotspots. Even if the number of these synapses are low this seems to be vital.

fauai-10_IdentifyCotransmission.ipynb

Purpose: Building on the initial exploration, this notebook develops methods to identify co-transmitting neurons using both ratio-based thresholds and spatial clustering approaches.

Major findings:

Implemented mean-shift clustering to identify spatial clusters of synapses with consistent neurotransmitter predictions
Established dual criteria for cotransmission detection:
- Ratio threshold method: NT must comprise a minimum percentage of total synapses
- Clustering method: Spatial clusters of synapses with consistent NT predictions
Combined both methods to create robust cotransmission predictions

fauai-11_EvaluateAllVncCellTypes.ipynb

Purpose: Systematic evaluation of all VNC (Ventral Nerve Cord) cell types to identify neurotransmitter expression patterns and cotransmission across different neuron classifications.

Major findings:

Evaluated neurons across three classification systems: cell_type, cell_class, and cell_flow
Processed neurons in batches to handle large-scale analysis efficiently
Applied cotransmission detection methods across different cell type categories (afferent, efferent, ascending neurons, etc.)
Generated comprehensive results files organized by classification system

fauai-12_ViewCoTransmissionPatterns.ipynb

Purpose: Visualization and statistical analysis of cotransmission patterns identified in the previous analyses.

Major findings:

Created comprehensive visualizations showing:
- Distribution of number of neurotransmitters per neuron
- Frequency of specific neurotransmitter combinations
- Cotransmission rates across different cell types
Identified potential false positives requiring further investigation
Established baseline statistics for cotransmission prevalence

fauai-13_DetailedCotransmissionPatterns.ipynb

Purpose: In-depth investigation of cotransmission patterns to identify and address false positives, refining the detection methodology.

Major findings:

Investigated sources of false positives in cotransmission detection
Refined threshold parameters:
- Softmax threshold: 0.25
- Bandwidth quantile for clustering: 0.01
- Minimum synapses ratio: 0.05 (cluster must be ≥5% of neuron's synapses)
Tested various combinations of ratio and clustering thresholds
Generated detailed results organized by classification system (cell_type, cell_class, cell_flow)

fauai-14_Investigate_CellTypeClassification.ipynb

Purpose: Comprehensive analysis of neurotransmitter expression patterns across different cell type classifications.

Major findings:

Created clustered heatmaps showing NT expression patterns across cell types
Identified optimal threshold parameters:
- Ratio threshold: 0.2 (20%)
- Bandwidth: 0.01
- Min synapses ratio: 0.05
Analyzed cotransmission patterns for specific cell type groupings
Generated interactive visualizations for exploring cell type-specific NT patterns

fauai-15_EvaluateSpecificNeurons.ipynb

Purpose: Detailed evaluation of specific neurons requested by collaborators (Pena lab), including comprehensive visualizations of their neurotransmitter profiles.

Major findings:

Analyzed 10 specific neurons of interest provided by collaborators:
- All neurons were from the 'efferent' category in the 'flow' classification
Generated detailed 8-panel visualizations for each neuron showing:
- Synapse prediction heatmaps and vectors
- Spatial distribution of synapses on neuron skeleton (3 views)
- NT counts and ratios
- Softmax confidence distributions
Applied optimized threshold parameters (ratio: 0.2, bandwidth: 0.01, min_synapses: 0.05)
Created summary outputs (CSV and figures) for collaborator communication

fauai-16_AnalyseAllCODEXdata.ipynb

Purpose: Comprehensive analysis of all VNC CODEX data, synthesizing results from all previous analyses to identify overall cotransmission patterns and neurotransmitter combinations.

Major findings:

Consolidated data from all three classification systems (cell_type, cell_class, cell_flow)
Analyzed all possible NT combinations to identify common and rare patterns
Created hierarchical clustering visualizations showing:
- Cell types clustered by NT expression similarity
- Interactive plotly dendrograms with heatmaps
Identified final optimal threshold settings for the entire dataset
Generated comprehensive statistics on:
- NT expression frequencies across cell types
- Most common cotransmission combinations
- Cell type-specific NT profiles

Data

The data folder contains datasets and utilities for working with FAFB and hemibrain connectome data:

Supplemental Data from Paper

Located in supplemental_data_from_paper/:

DataS1_CellTypesUsedForGroundTruth.csv - Cell type classifications used for validation
DataS2_NeuronalReconstructionsUsedForGroundTruth.csv - Reference neuronal reconstructions
DataS3_HemiBrainReconstructionData.csv - Hemibrain connectome reconstruction data
DataS4_FAFBreconstruction.csv - FAFB reconstruction data (also available as simplified_DataS4.csv)
DataS6_SummaryResults.csv - Summary statistics and results
DataS7.csv - Additional supplementary data

Test Data from Collaborators

Located in test_data_from_alex/:

hemi_lineages.json - Hemibrain lineage information
skeletons.json / skeletons_fixed.json - Neuron skeleton data in JSON format
synapses_xyz_pos.json - Synapse spatial coordinates
single_neuron/ - Individual neuron data for testing

Utility Scripts

fix_json.py - Converts JSONL (JSON Lines) format to standard JSON array format
read_jsonl_example.py - Example script for reading JSONL files
single_neuron_presynapses.parquet - Parquet format data for single neuron presynaptic terminals

load_data

The load_data folder contains Python modules for data loading and analysis:

Core Modules

connect_clients.py - Functions to connect to CAVE and FlyWire clients
- Handles authentication token management
- Provides secure connection setup with environment variable support
- Prompts for tokens if not found in .env file
get_synapse_locations.py - Functions to extract synapse location data from FAFB
- Initializes FAFB CAVE client
- Retrieves synapse coordinates and metadata
- Handles voxel resolution conversions
fafb_cotransmission_investigation.py - Core analysis functions for cotransmission detection
- get_codex_synapse_predictions() - Retrieves NT predictions for a given root ID
- get_neuron_skeleton() - Fetches neuron skeleton geometry
- identify_nt_contributions() - Analyzes NT composition per neuron
- Implements mean-shift clustering for spatial analysis
- Generates visualization functions for NT distributions

Usage

These modules are imported by the analysis notebooks to provide consistent data access and processing functions. They require authentication tokens stored in a .env file (see setup instructions above).

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
load_data		load_data
notebooks		notebooks
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py
environment.yml		environment.yml
fauai_xx_test_model_outputs.ipynb		fauai_xx_test_model_outputs.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ai-model-for-connectomics

Set up

Creating your `.env` file

Set up virtual environment

notebooks

fauai-9_GatherSynapsePredictions.ipynb

fauai-10_IdentifyCotransmission.ipynb

fauai-11_EvaluateAllVncCellTypes.ipynb

fauai-12_ViewCoTransmissionPatterns.ipynb

fauai-13_DetailedCotransmissionPatterns.ipynb

fauai-14_Investigate_CellTypeClassification.ipynb

fauai-15_EvaluateSpecificNeurons.ipynb

fauai-16_AnalyseAllCODEXdata.ipynb

Data

Supplemental Data from Paper

Test Data from Collaborators

Utility Scripts

load_data

Core Modules

Usage

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MetaCell/ai-model-for-connectomics

Folders and files

Latest commit

History

Repository files navigation

ai-model-for-connectomics

Set up

Creating your .env file

Set up virtual environment

notebooks

fauai-9_GatherSynapsePredictions.ipynb

fauai-10_IdentifyCotransmission.ipynb

fauai-11_EvaluateAllVncCellTypes.ipynb

fauai-12_ViewCoTransmissionPatterns.ipynb

fauai-13_DetailedCotransmissionPatterns.ipynb

fauai-14_Investigate_CellTypeClassification.ipynb

fauai-15_EvaluateSpecificNeurons.ipynb

fauai-16_AnalyseAllCODEXdata.ipynb

Data

Supplemental Data from Paper

Test Data from Collaborators

Utility Scripts

load_data

Core Modules

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Creating your `.env` file

Packages