-
Notifications
You must be signed in to change notification settings - Fork 0
Molecule curation
Add two fields/tables to the Molecule OA:
- Field name "GO target" ; Table name "mop_gotarget"
- Field name "Gene target"; Table name "mop_genetarget"
Do not dump these fields, no model change requested yet- accruing data before requesting the changes.
- Assign WBMoleculeIDs to all WB molecules.
- Add a WBMolID field to the OA between WBPaper and Public name
- Assign WBMolIDs to all molecules based on the pgid.
- Change the dumper on tazendra acedb/karen/Molecule/dump_molecule_ace.pl to dump the WBMolID as the Molecule name as shown below.
Molecule : "WBMolecule:00000001" Public_name "beta-selinene" Database "NLM_MeSH" "UID" "C087920" Database "CTD" "ChemicalID" "C087920" Database "ChemIDplus" "17066-67-0" Database "ChEBI" "CHEBI_ID" "10443" Database "KEGG COMPOUND" "ACCESSION_NUMBER" "C09723"
- Change ...{ print OUT "Database\t\"ChemIDplus\" \"$hash{chemi}{$joinkey}\"\n"; } to
...{ print OUT "Database\t\"ChemIDplus\" \"RN\" \"$hash{chemi}{$joinkey}\"\n"; }
- Fix SMMID print out line to read SMID-DB
- Dump "Molecule_use" as "Molecule_use" (this is already in the model)
- Dump "Paper" as "Reference" (this requires model change approval)
- automate dump on Wed morning for pick up by spica at 8am.
0 4 * * wed /home/acedb/karen/Molecule/dump_molecule_ace.pl writes to /home/acedb/karen/Molecule/Molecule.ace
- spica pick up will also grab database.ace flat file in same directory: acedb/karen/Molecule/
1. Line 42 'ChEMI' needs to be changed to 'ChEBI'
if ($hash{chebi}{$joinkey}) { print OUT "Database\t\"ChEMI\" \"CHEBI_ID\" \"$hash{chebi}...
2. Lines 40 and 41 need to be suppressed when $molecule = "WBMol:"
print OUT "Database\t\"NLM_MeSH\" \"UID\" \"$molecule\"\n"; print OUT "Database\t\"CTD\" \"ChemicalID\" \"$molecule\"\n";
3. After changes to OA - for the .ace dump create storage and print out for SMMID DB values
if ($hash{smmid}{$joinkey}) { print OUT "Database\t\"SMMID DB\" \"$hash{smmid}{$joinkey}\"\n"; }
on tazendra acedb/karen/Molecule/dump_molecule_ace.pl (also /home/acedb/public_html/karen/)
dumps molecule OA postgres tables starting WS219
dumps to:
/home/acedb/public_html/karen/Molecule.ace
/home/acedb/karen/Molecule/dump_molecule.ace.pl
which is symlinked to : http://tazendra.caltech.edu/~acedb/karen/Molecule.ace (a publicly accessible list of the molecules)
(if you want to keep copies, make a copy before the dump; the dump script overwrites the file)
the original source file is now Molecule.ace.20100728
NOTE: Variation (strain, transgene) phenotype-molecule associations are dumped through usepackage.pl.
(tazendra or mangolassi) ~postgres/public_html/cgi-bin/oa/wormOA.pm
use &initWormFields to load a specific OA, each of them has a 3 letter code that corresponds to the postgres table name. 'mop' is for the molecule oa and the 'mop_' tables in postgres
mop calls &initWormMopFields currently around line 1200
mop hash set :
$fields{mop}{id}{type} = 'text';
$fields{mop}{id}{label} = 'pgid';
$fields{mop}{id}{tab} = 'all';
$fields{mop}{paper}{type} = 'multiontology';
$fields{mop}{paper}{label} = 'WBPaper';
$fields{mop}{paper}{tab} = 'all';
$fields{mop}{paper}{ontology_type} = 'WBPaper';
$fields{mop}{name}{type} = 'text';
$fields{mop}{name}{label} = 'Name';
$fields{mop}{name}{tab} = 'all';
most correspond to a postgres table (except mop {id}, which is the pgid. So in the example above, there are 2 postgres tables, mop_paper and mop_name
- {type} is the type of data the table holds text/bigtext/ontology/dropdown/toggle/multiontology/multidropdown
- {label} is the field name in the OA
- {tab} is which tab number it should show in. 'all' shows up in all tabs, used in OAs with no numbered tabs
- {ontology_type} refers to the type of ontology/multiontology values used for autocomplete and validation.
- {ontology_table} is used if the ontology_type is 'obo', and the value refers to the obo_table that is updated via cronjob or populated just once. e.g.>br/>
$fields{mop}{chebi}{type} = 'ontology';
$fields{mop}{chebi}{label} = 'ChEBI_ID';
$fields{mop}{chebi}{tab} = 'all';
$fields{mop}{chebi}{ontology_type} = 'obo';
$fields{mop}{chebi}{ontology_table} = 'chebi';
in this case
$fields{mop}{chebi}{ontology_type} = 'obo';
$fields{mop}{chebi}{ontology_table} = 'chebi';
maps to obo_<name|data|syn>_chebi
if the {type} is dropdown, there's a {dropdown_type} which refers to the type of data like the {ontology_type}, but the values are hardcoded further down in the code instead of stored in postgres for querying.
For the molecule OA the tables are:
mop_cellcomp
mop_genetarget
# dump mop_ data into Molecule.ace for citace upload 2010 07 31
use strict;
use diagnostics;
use DBI;
my $dbh = DBI->connect ( "dbi:Pg:dbname=testdb", "", "") or die "Cannot connect to database!\n";
my @tables = qw( molecule publicname synonym chemi chebi kegg );
my %hash;
my $result;
my $directory = '/home/acedb/karen/Molecule';
chdir ($directory) or die "Cannot chdir to $directory : $!";
my $outfile = 'Molecule.ace';
open (OUT, ">$outfile") or die "Cannot create $outfile : $!";
foreach my $table (@tables) {
my $pgtable = 'mop_' . $table;
$result = $dbh->prepare( "SELECT * FROM $pgtable" );
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
while (my @row = $result->fetchrow) {
if ($row[0]) { $hash{$table}{$row[0]} = $row[1]; } }
}
foreach my $joinkey (sort {$a<=>$b} keys %{ $hash{molecule} }) {
my ($molecule, $publicname, $syns, $chemi, $chebi, $kegg);
if ($hash{molecule}{$joinkey}) { $molecule = $hash{molecule}{$joinkey}; }
next unless $molecule;
# next if ($molecule =~ m/WBMol/); # always print objects for Karen 2011 07 05
print OUT "Molecule : \"$molecule\"\n";
if ($hash{publicname}{$joinkey}) { print OUT "Public_name\t\"$hash{publicname}{$joinkey}\"\n"; }
if ($hash{synonym}{$joinkey}) {
my (@syns) = split/ \| /, $hash{synonym}{$joinkey};
foreach my $syn (@syns) { print OUT "Synonym\t\"$syn\"\n"; } }
print OUT "Database\t\"NLM_MeSH\" \"UID\" \"$molecule\"\n";
print OUT "Database\t\"CTD\" \"ChemicalID\" \"$molecule\"\n";
if ($hash{chebi}{$joinkey}) { print OUT "Database\t\"ChEMI\" \"CHEBI_ID\" \"$hash{chebi}{$joinkey}\"\n"; }
if ($hash{chemi}{$joinkey}) { print OUT "Database\t\"ChemIDplus\" \"$hash{chemi}{$joinkey}\"\n"; }
if ($hash{kegg}{$joinkey}) { print OUT "Database\t\"KEGG COMPOUND\" \"ACCESSION_NUMBER\" \"$hash{kegg} {$joinkey}\"\n"; }
print OUT "\n";
} # foreach my $joinkey (sort {$a<=>$b} keys %{ $hash{molecule} })
close (OUT) or die "Cannot close $outfile : $!";
__END__
Molecule : "C087920"
Public_name "beta-selinene"
Database "NLM_MeSH" "UID" "C087920"
Database "CTD" "ChemicalID" "C087920"
Database "ChemIDplus" "17066-67-0"
Database "ChEBI" "CHEBI_ID" "10443"
Database "KEGG COMPOUND" "ACCESSION_NUMBER" "C09723"
my $result = $dbh->prepare( "SELECT * FROM two_comment WHERE two_comment ~ ?" );
$result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
while (my @row = $result->fetchrow) {
if ($row[0]) {
$row[0] =~ s/
//g;
$row[1] =~ s/
//g;
$row[2] =~ s/
//g;
print "$row[0]\t$row[1]\t$row[2]\n";
} # if ($row[0])
} # while (@row = $result->fetchrow)
__END__
my $result = $dbh->prepare( 'SELECT * FROM two_comment WHERE two_comment ~ ?' );
$result->execute('elegans') or die "Cannot prepare statement: $DBI::errstr\n";
$result->execute("doesn't") or die "Cannot prepare statement: $DBI::errstr\n";
my $var = "doesn't";
$result->execute($var) or die "Cannot prepare statement: $DBI::errstr\n";
my $data = 'data';
unless (is_utf8($data)) { from_to($data, "iso-8859-1", "utf8"); }
my $result = $dbh->do( "DELETE FROM friend WHERE firstname = 'bl\"ah'" );
(also do for INSERT and UPDATE if don't have a variable to interpolate with ? )
can cache prepared SELECTs with $dbh->prepare_cached( &c. );
if ($result->rows == 0) { print "No names matched.\n\n"; } # test if no return
$result->finish; # allow reinitializing of statement handle (done with query)
$dbh->disconnect; # disconnect from DB
http://209.85.173.132/search?q=cache:5CFTbTlhBGMJ:www.perl.com/pub/1999/10/DBI.html+dbi+prepare+execute&cd=4&hl=en&ct=clnk&gl=us
interval stuff :
SELECT * FROM afp_passwd WHERE joinkey NOT IN (SELECT joinkey FROM afp_lasttouched) AND joinkey NOT IN (SELECT joinkey FROM cfp_curator) AND afp_timestamp < CURRENT_TIMESTAMP - interval '21 days' AND afp_timestamp > CURRENT_TIMESTAMP - interval '28 days';
casting stuff to substring on other types :
SELECT * FROM afp_passwd WHERE CAST (afp_timestamp AS TEXT) ~ '2009-05-14';
to concatenate string to query result :
SELECT 'WBPaper' || joinkey FROM pap_identifier WHERE pap_identifier ~ 'pmid';
to get :
SELECT DISTINCT(gop_paper_evidence) FROM gop_paper_evidence WHERE gop_paper_evidence NOT IN (SELECT 'WBPaper' || joinkey FROM pap_identifier WHERE pap_identifier ~ 'pmid') AND gop_paper_evidence != '';