This repository was archived by the owner on Mar 21, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Overview
cstubben edited this page Sep 5, 2014
·
5 revisions
The pmcOAI function loads PMC Open Access articles into an XMLInternalDocument. Other functions are used to parse the XML document including
-
pmcTextsplits xml into a list of subsections, where each subsection is a vector of paragraphs or sentences -
pmcTableextracts tables into a list of data frames -
pmcSupplists supplementary files and optionally downloads them -
pmcRefreturns a data frame containing references -
pmcMetadatalists metadata fields
The package was initially described in BMC Bioinformatics and that paper focused on extracting locus tags mentioned in full text and tables. You can use this code to find Burkholderia pseudomallei locus tags
bpgff <- read.ncbi.ftp( "Burkholderia_pseudomallei/GCF_000011545", "gff")
tags <- "(BPSL0* OR BPSL1* OR BPSL2* OR BPSL3* OR BPSS0* OR BPSS1* OR BPSS2*)"
bp <- ncbiPMC(paste(tags, "AND (Burkholderia[TITLE] OR Burkholderia[ABSTRACT]) AND open access[FILTER]"))
pmcLoop(bp, bpgff, prefix = "BPS[SL]" , suffix= "[abc]", file="bp.tab")Check the links for more details