.. xenopus supplement file, created by ARichards ========== Database ========== About the database _____________________ The databased we used for this analysis used `PostgreSQL `_ and several python modules, including the object `relational mapper `_ provided by `SQLAlchemy `_. The main purpose of the database is to be able to handle at a detailed level Gene Ontology [Ashburner00]_ information. Here is the relational schema produced with a tool called `sqlalchemy_schemadisplay `_. The database population scripts along with a number of classes to interface with the data are made possible through a project called `hts-integrate `_. .. figure:: dbschema.png :scale: 70% :align: center :alt: database schema :figclass: align-center Database contents ___________________ The following files were downloaded on **September 15, 2014**. * `go.obo `_ * `taxdump.tar.gz `_ * `gene_info.gz `_ * `gene2go.gz `_ * `gene2refseq.gz `_ * `gene_association.goa_uniprot.gz `_ * `idmapping.dat.gz `_ * `uniprot_sprot.fasta.gz `_ * `uniprot_trembl.fasta.gz `_ The uniprot and gene centric data from these files was parsed and used to populate the database. The number of rows in each of the tables are shown below. .. code-block:: none There are 1262260 entries in the taxa table There are 681732 entries in the genes table There are 777608 entries in the uniprot table There are 42627 entries in the go_terms table There are 7463568 entries in the go_annotations table Links _________ * `Gene Ontology `_ * `GO annotation file format `_ * `GO annotation README `_ * `Refseq accession number and molecule types `_ * `NCBI FTP README `_ * `Uniprot database `_ * `Uniprot README `_