AI Model for Connectomics is a project designed to investigate the neurotransmitter identities of VNC neurons in the drosophila.
There are two things that you will need to get all the notebooks up and running. You will need to set up a conda environment and also get the tokens for Flywire
and CAVE
which you will need to store in your .env
file.
You can use the .env.example
but you should store two variables
CAVE_AUTH_TOKEN="this should be a 32 character long random string"
FLYWIRE_AUTH_TOKEN="this should be a 32 character long random string"
The load_data/connect_clients.py
module will automatically generate new tokens for you to use if you don't have them stored in the .env
file.
This is designed to run as a conda environment. So use the standard commands to set up and enter the environment
conda env create -f environment.yml
conda activate fau_connectomics
Jupyter notebooks that explore the data. Note to gain access to the cave client and flywire client you will need to get/use your own secret keys.
NB Corresponding output data is stored in subfolders. For all the possible root-ids investigated I suggest using the fauai-13_data folder.
Purpose: This uses the CAVEclient and Flywire in order to extract the Eckstein et al. synaptic predictions and look at their distributions and generally get a feel for the data.
Major findings:
- Created a simple plot of 8 panels to visualise the neurotransmitter of all presynaptic terminals belonging to that neuron.
- Top row: Left to Right
- Heatmap of NT predictions: Row = each sympse, col=NT identity, colour indicates the softmax scores from the CNN.
- Vectors of heatmap predictions: Vectors, each line represents the softmax scores for a particular presynapse.
- Stripplot showing the determined identity of each presynapse (based off the largest softmax score of the presynapse).
- Barplot showing the raw count plots for the number of presynapses with each NT.
- Bottom row: Left to right
- Coronal (xy) plot showing the structure of the neuron and the colour-coded positions of each presynaptic terminal.
- Sagittal (yz) plot showing the structure of the neuron and the colour-coded positions of each presynaptic terminal.
- Horizontal (xz) plot showing the structure of the neuron and the colour-coded positions of each presynaptic terminal.
- Barplot showing the ratio plots for the number of presynapses with each NT.
- We found that a threshold of 0.25 for the lowest maximum softmax score for a synapse removed many of the worst estimates
- We also saw that we'd have to consider spatial positioning too as there are hotspots of neurotransmitter specific hotspots. Even if the number of these synapses are low this seems to be vital.
Purpose: Building on the initial exploration, this notebook develops methods to identify co-transmitting neurons using both ratio-based thresholds and spatial clustering approaches.
Major findings:
- Implemented mean-shift clustering to identify spatial clusters of synapses with consistent neurotransmitter predictions
- Established dual criteria for cotransmission detection:
- Ratio threshold method: NT must comprise a minimum percentage of total synapses
- Clustering method: Spatial clusters of synapses with consistent NT predictions
- Combined both methods to create robust cotransmission predictions
Purpose: Systematic evaluation of all VNC (Ventral Nerve Cord) cell types to identify neurotransmitter expression patterns and cotransmission across different neuron classifications.
Major findings:
- Evaluated neurons across three classification systems: cell_type, cell_class, and cell_flow
- Processed neurons in batches to handle large-scale analysis efficiently
- Applied cotransmission detection methods across different cell type categories (afferent, efferent, ascending neurons, etc.)
- Generated comprehensive results files organized by classification system
Purpose: Visualization and statistical analysis of cotransmission patterns identified in the previous analyses.
Major findings:
- Created comprehensive visualizations showing:
- Distribution of number of neurotransmitters per neuron
- Frequency of specific neurotransmitter combinations
- Cotransmission rates across different cell types
- Identified potential false positives requiring further investigation
- Established baseline statistics for cotransmission prevalence
Purpose: In-depth investigation of cotransmission patterns to identify and address false positives, refining the detection methodology.
Major findings:
- Investigated sources of false positives in cotransmission detection
- Refined threshold parameters:
- Softmax threshold: 0.25
- Bandwidth quantile for clustering: 0.01
- Minimum synapses ratio: 0.05 (cluster must be ≥5% of neuron's synapses)
- Tested various combinations of ratio and clustering thresholds
- Generated detailed results organized by classification system (cell_type, cell_class, cell_flow)
Purpose: Comprehensive analysis of neurotransmitter expression patterns across different cell type classifications.
Major findings:
- Created clustered heatmaps showing NT expression patterns across cell types
- Identified optimal threshold parameters:
- Ratio threshold: 0.2 (20%)
- Bandwidth: 0.01
- Min synapses ratio: 0.05
- Analyzed cotransmission patterns for specific cell type groupings
- Generated interactive visualizations for exploring cell type-specific NT patterns
Purpose: Detailed evaluation of specific neurons requested by collaborators (Pena lab), including comprehensive visualizations of their neurotransmitter profiles.
Major findings:
- Analyzed 10 specific neurons of interest provided by collaborators:
- All neurons were from the 'efferent' category in the 'flow' classification
- Generated detailed 8-panel visualizations for each neuron showing:
- Synapse prediction heatmaps and vectors
- Spatial distribution of synapses on neuron skeleton (3 views)
- NT counts and ratios
- Softmax confidence distributions
- Applied optimized threshold parameters (ratio: 0.2, bandwidth: 0.01, min_synapses: 0.05)
- Created summary outputs (CSV and figures) for collaborator communication
Purpose: Comprehensive analysis of all VNC CODEX data, synthesizing results from all previous analyses to identify overall cotransmission patterns and neurotransmitter combinations.
Major findings:
- Consolidated data from all three classification systems (cell_type, cell_class, cell_flow)
- Analyzed all possible NT combinations to identify common and rare patterns
- Created hierarchical clustering visualizations showing:
- Cell types clustered by NT expression similarity
- Interactive plotly dendrograms with heatmaps
- Identified final optimal threshold settings for the entire dataset
- Generated comprehensive statistics on:
- NT expression frequencies across cell types
- Most common cotransmission combinations
- Cell type-specific NT profiles
The data folder contains datasets and utilities for working with FAFB and hemibrain connectome data:
Located in supplemental_data_from_paper/
:
- DataS1_CellTypesUsedForGroundTruth.csv - Cell type classifications used for validation
- DataS2_NeuronalReconstructionsUsedForGroundTruth.csv - Reference neuronal reconstructions
- DataS3_HemiBrainReconstructionData.csv - Hemibrain connectome reconstruction data
- DataS4_FAFBreconstruction.csv - FAFB reconstruction data (also available as
simplified_DataS4.csv
) - DataS6_SummaryResults.csv - Summary statistics and results
- DataS7.csv - Additional supplementary data
Located in test_data_from_alex/
:
- hemi_lineages.json - Hemibrain lineage information
- skeletons.json / skeletons_fixed.json - Neuron skeleton data in JSON format
- synapses_xyz_pos.json - Synapse spatial coordinates
- single_neuron/ - Individual neuron data for testing
- fix_json.py - Converts JSONL (JSON Lines) format to standard JSON array format
- read_jsonl_example.py - Example script for reading JSONL files
- single_neuron_presynapses.parquet - Parquet format data for single neuron presynaptic terminals
The load_data folder contains Python modules for data loading and analysis:
-
connect_clients.py - Functions to connect to CAVE and FlyWire clients
- Handles authentication token management
- Provides secure connection setup with environment variable support
- Prompts for tokens if not found in
.env
file
-
get_synapse_locations.py - Functions to extract synapse location data from FAFB
- Initializes FAFB CAVE client
- Retrieves synapse coordinates and metadata
- Handles voxel resolution conversions
-
fafb_cotransmission_investigation.py - Core analysis functions for cotransmission detection
get_codex_synapse_predictions()
- Retrieves NT predictions for a given root IDget_neuron_skeleton()
- Fetches neuron skeleton geometryidentify_nt_contributions()
- Analyzes NT composition per neuron- Implements mean-shift clustering for spatial analysis
- Generates visualization functions for NT distributions
These modules are imported by the analysis notebooks to provide consistent data access and processing functions. They require authentication tokens stored in a .env
file (see setup instructions above).