Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 0 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,59 +224,6 @@ the data that you want to integrate.
./weave.py –oncokb /path_to_file/test_genomics_oncokbannotation.csv
```


### Gene Ontology adapter

**Gene Ontology** is one of the biggest biomedical databases. The described
adapter helps to integrate the data about the molecular function of the gene
product, as well as the biological process in which these genes are involved.

- Molecular function: GO annotations that have relation type `enabled`
or `contributes_to`.
- Biological process: GO annotations that have relation type `involved_in`.

**To integrate the data, three files are necessary:**
- `--gene_ontology` option for GO annotations in GAF format [Download GO annotations](http://current.geneontology.org/products/pages/downloads.html)
- `--gene_ontology_owl` option for GO ontology in OWL format [Download GO ontology](https://geneontology.org/docs/download-ontology/)
- `--gene_ontology_genes` option for the list of genes for which we want to
integrate the GO annotations (example in adapters/Hugo_Symbol_genes.conf file,
by default = list of genes from OncoKB database).

**Example of use:**

``` sh
./weave.py --gene_ontology /path_to_file/goa_human.gaf --gene_ontology_owl /path_to_file/go.owl --gene_ontology_genes /path_to_file/Hugo_Symbol_genes.conf
```

If you want to integrate annotations with another type of relations, you can
modify the `adapters/gene_ontology.py` file by adding the next code in the
**class Gene_ontology** (example for the `involved_in` edge type):

``` python
# Create new columns that depends on edge type.
df['GO_involved_in'] = None

# Cut df to include only edge type that we have chosen and annotations
# for genes from OncoKB.
df = df[((df['Qualifier'].isin(['enables', 'involved_in', 'contributes_to'])) &
(df['DB_Object_Symbol'].isin(included_genes)))]
```
Also, you need to add code in `separate_edges_types` method:

``` sh
# Function to copy GO_term to related column for future ontoweaver mapping
# based on Qualifier column (relation type).
def separate_edges_types(row):
if row['Qualifier'] == 'enables':
row['GO_enables'] = row['GO_term']
elif row['Qualifier'] == 'involved_in':
row['GO_involved_in'] = row['GO_term']
```

Finally, you need to specify the node and edge types in the `gene_ontology.yaml`
for `GO_involved_in` column.


### Open Targets adapter

Open Targets is a public database that aims to systematically identify and
Expand Down
4 changes: 3 additions & 1 deletion config/biopathnet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ biocypher:
root_node: entity

biopathnet:
file_format: txt
file_format: txt:bn
entity_types_file_stem: entity_types
entity_names_file_stem: entity_names
background_graph_file_stem: brg
skg_file_stem: skg
targeted_relation: "(alteration, variant biomarker for treatment, drug)"
include_properties: False

16 changes: 16 additions & 0 deletions config/owl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
biocypher:
debug: false
offline: true
dbms: owl

# Ontology configuration
head_ontology:
url: https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl
root_node: entity

owl:
edge_model: ObjectProperty
file_format: turtle
labels_order: "Ascending" # Default: From more specific to more generic.
node_labels_order: "Ascending" # Default: use labels_order.
edge_labels_order: "Leaves"
118 changes: 81 additions & 37 deletions config/schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@

# Defined in alphabetical order

alteration:
short mutation:
is_a: sequence variant
represented_as: node
label_in_input: alteration
label_in_input: short_mutation
properties:
gene_symbol_alteration: str
citation_PM_ids: str
consequence: str
homogenous: str
Expand All @@ -23,7 +22,47 @@ alteration:
refCount: int64
altCount: int64
expressed: bool
ensembl_id_alteration: str
# ensembl_id_alteration: str

copy number amplification:
is_a: sequence variant
represented_as: node
label_in_input: copy_number_amplification
properties:
citation_PM_ids: str
consequence: str
homogenous: str
mutation_effect_description: str
data_source: str
oncogenic: str
reference_genome: str
tumor_type: str
tumor_type_summary: str
variant_summary: str
refCount: int64
altCount: int64
expressed: bool
# ensembl_id_alteration: str

structural variant:
is_a: sequence variant
represented_as: node
label_in_input: structural_variant
properties:
citation_PM_ids: str
consequence: str
homogenous: str
mutation_effect_description: str
data_source: str
oncogenic: str
reference_genome: str
tumor_type: str
tumor_type_summary: str
variant_summary: str
refCount: int64
altCount: int64
expressed: bool
# ensembl_id_alteration: str

disease:
represented_as: node
Expand Down Expand Up @@ -202,6 +241,12 @@ protein:
ncbi_tax_id: str
data_source: str

# Treatment

treatment:
represented_as: node
input_label: treatment

########################
# EDGES
########################
Expand All @@ -212,7 +257,7 @@ protein:

### CARRIES

# To allow queries for patient carrying samples, and samples carrying alterations,
# To allow queries for patient carrying samples, and samples carrying variants,
# without mixing with "effects" causes.
carries:
is_a: causes
Expand All @@ -232,12 +277,12 @@ patient carries sample:
data_source: str
edglelabel: str

sample carries alteration:
sample carries variant:
is_a: carries
represented_as: edge
label_in_input: sample_carries_alteration
label_in_input: sample_carries_variant
source: sample
target: alteration
target: sequence variant
properties:
data_source: str
edglelabel: str
Expand All @@ -246,16 +291,16 @@ sample carries alteration:

# A gene is linked to its gene status (gain or loss of function),
# which are represented as nodes, so as to allow a causal path
# to go through alteration -> gene status -> transcript activity.
# to go through variant -> gene status -> transcript activity.
# Hence, outcomes have at least two instances:
# - Gene:GoF, and
# - Gene:LoF.

alteration causes gene status:
variant causes gene status:
is_a: causes
represented_as: edge
label_in_input: alteration_causes_gene_status
source: alteration
label_in_input: variant_causes_gene_status
source: sequence variant
target: gene status
properties:
data_source: str
Expand All @@ -267,13 +312,17 @@ alteration causes gene status:
# as predictive markers for treatment response,
# based on clinical evidence categorized by evidence levels.

alteration biomarker for drug:
is_a: biomarker for
variant biomarker for treatment:
is_a: sequence variant modulates treatment association
represented_as: edge
label_in_input: alteration_biomarker_for_drug
source: alteration
target: drug
properties:
label_in_input: variant_biomarker_for_treatment
source: sequence variant
target: treatment
properties:
level_of_evidence: str
cgi_level: str
citations: str
tumorType: str
data_source: str
edglelabel: str

Expand Down Expand Up @@ -302,10 +351,6 @@ gene status affects gene:


### GO -- TO BE FIXED
# annotation:
# is_a: named thing
# represented_as: node
# label_in_input: annotation

biological process:
is_a: named thing
Expand All @@ -314,20 +359,6 @@ biological process:
properties:
data_source: str

# annotation for gene:
# is_a: association
# represented_as: edge
# label_in_input: annotation_for_gene
# source: annotation
# target: gene

# involved in:
# is_a: association
# represented_as: edge
# label_in_input: involved_in
# source: annotation
# target: biological process

gene to biological process:
is_a: association
represented_as: edge
Expand All @@ -345,9 +376,10 @@ biological process to gene:
source: biological process
target: gene
properties:
# edglelabel: str
data_source: str

### FUNCTIONAL PROTEIN PROTEIN INTERACTIONS

undirected molecular interaction:
is_a: pairwise molecular interaction
represented_as: edge
Expand Down Expand Up @@ -447,16 +479,28 @@ inhibition:
extra_attrs: str
evidences: str

### TRASNCRIPT TO GENE RELATIONSHIP

transcript to gene relationship:
# is_a: transcript to gene relationship
represented_as: edge
input_label: transcript_to_gene_relationship
properties:
data_source: str

### DRUG HAS TARGET

drug has target:
is_a: drug to gene association
represented_as: edge
label_in_input: drug_has_target
properties:
data_source: str

treatment has part drug:
is_a: association
represented_as: edge
label_in_input: treatment_has_part_drug
properties:
data_source: str

16 changes: 8 additions & 8 deletions make.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ echo "Activate virtual environment..." >&2
source $(dirname $(uv python find))/activate


if [[ "$2" == "config/neo4j.yaml" ]] ; then
if [[ "$CONFIG" == "config/neo4j.yaml" ]] ; then
echo "Stop Neo4j server..." >&2
neo_version=$(neo4j-admin --version | cut -d. -f 1)
if [[ "$neo_version" -eq 4 ]]; then
Expand All @@ -78,23 +78,23 @@ fi

echo "Weave data..." >&2

echo "CONFIG = $CONFIG" >&2

cmd="uv run python3 ${py_args} $script_dir/weave.py \
--config $CONFIG \
--copy-number-amplifications-external $decider_dir/cnas_external.csv \
--short-mutations-local $decider_dir/short_mutations_local.csv \
--short-mutations-external $decider_dir/short_mutations_external.csv \
--copy-number-amplifications-local $decider_dir/cnas_local.csv \
--copy-number-amplifications-external $decider_dir/cnas_external.csv \
--omnipath-networks $data_dir/omnipath_networks/omnipath_webservice_interactions__latest.tsv.gz \
--open-targets-drug-molecule $data_dir/OT/drug_molecule/
--open-targets-drug_mechanism_of_action $data_dir/OT/drug_mechanism_of_action/
--open-targets-target $data_dir/OT/target/
--cgi $decider_dir/treatments_cgi.csv \
--config $CONFIG \
${weave_args}" # \
# --omnipath-networks $data_dir/omnipath_networks/omnipath_networks_different_type_entity_type_source_and_entity_type_target_shorter.tsv \
# --structural-variants $decider_dir/structural_variants.xlsx \
# --clinical $data_dir/DECIDER/clinical/clinical_export.xlsx \
# --gene_ontology_genes $data_dir/DECIDER/$data_version/OncoKB_gene_symbols.conf \
# --oncokb $data_dir/DECIDER/$data_version/treatments.csv \
# --gene_ontology $data_dir/GO/goa_human.gaf.gz \
# --gene_ontology_owl $data_dir/GO/go.owl \
# --gene_ontology_reverse


echo "Weaving command:" >&2
Expand Down
Loading