This short (~10 min) tutorial will take you through some of the domain assignment and phylogenetic features of the SUPERFAMILY database.

Open your favourite web browser and navigate to the Genome3D website:  http://genome3d.eu/

Click on the Search button in the top navigation, then click on Human in the list of filters:  http://genome3d.eu/search?page=1&species=human

Here you'll find the first page of all Human proteins in Genome3D, taken from the Uniprot database. Using the search bar at the top of the page, search for "Q3KNS1" and click on the first gene in the results titled PTHD3_HUMAN:  http://genome3d.eu/uniprot/id/Q3KNS1/annotations

From here you can see an overview of the predicted domains from the various Genome3D partners. Down the right of the page you can see each of the predicted superfamilies and which structural classification hierarchy they came from (SCOP or CATH). Note the gold, silver & bronze rating, these indicate the degree to which the SCOP and CATH classifications agree.

Look at the first figure, the numbers along the bottom mark the protein sequence position, and the coloured bars show each Genome3D partner's domain predictions.

From the figure we can see three of the partner predictions methods agree on one domain (Multidrug efflux transporter AcrB transmembrane domain), but no obvious consensus for the first domains in the sequence, lets try another protein. Click back to return to the search results and search for "Q3KNS6":  http://genome3d.eu/search?q=Q3KNS6

Click on the first result reading Zinc finger protein 829 :  http://genome3d.eu/uniprot/id/Q3KNS6/annotations

Here we can see better concensus between the different prediction methods. Look at the SUPERFAMILY domain prediction row; you'll see two domains predicted, a repeat of beta-beta-alpha zinc finger domain and a single Krüppel associated box (KRAB) domain.

To find out a little more about this specific domain arrangement or "architecture" we can jump to the SUPERFAMILY online resource by clicking on the SUPERFAMILY link to the right of the domain predictions:  http://supfam.org/SUPERFAMILY/cgi-bin/gene.cgi?genome=up&seqid=Q3KNS6

This is the SUPERFAMILY page listing domains for the Q3KNS6 UniProt? entry for this protein. From here you can see a list of all the assigned domains, the SCOP family and superfamily assignments, and their E-value, along with some links to other information.

Let's see what other species have genes with this identical architecture. Scroll back to the top and click on the third link under the red & blue figure labeled See the phylogenetic distribution for this domain architecture Link:  http://supfam.org/SUPERFAMILY/cgi-bin/createtree.cgi?tophl=1;highlight=arc_109640,57667,57667,57667,57667,57667,57667

This tree shows all of the species in SUPERFAMILY contain this archiecture exactly, green branches indicate assignment to the architecture, blue indicates no hit. You can see that our architecture exists in all primates, many mammals, and even the Green Anole lizard Anolis carolinensis. This specific KRAB related architecture appears to have been created in a common ancestor to reptiles, birds and mammals. Of special interest is the level of conservation in primates compared with other mammals, and loss among non-mammals making this perhaps an interesting target for human research.

Lets look at the KRAB Domain in isolation, Press back to navigate back to the list of domains in our domain architecture (with the blue and red demain diagram at the top), scroll down to the 4th domain under the Domain assignment details heading and click on the KRAB Domain link:  http://supfam.org/SUPERFAMILY/cgi-bin/scop.cgi?sunid=109640

Here we can see an overview of SCOP classification and functional terms from dcGO[1] specific to the KRAB domain. Notice in the table of Gene Ontology terms you can see many mentions of regulation and negative regulation, indicating this domains known role as a transcriptional repressor. However, for other proteins of novel domain-architecture these terms are predicted using just domain composition and annotation from other hand annotated proteins.

Lets look at the distribution of this domain in our tree of life, click on the Taxonomic Distribution tab at the top of the page, then at the bottom of the following page, click on Plot Tree as SVG:  http://supfam.org/SUPERFAMILY/cgi-bin/createtree.cgi?tophl=1;genomes=;highlight=109640

This tree is the same as before, each leaf of the tree is a species, green indicates strong hits against this superfamily for it, blue indicates none. As you can see, we find something unusual, a large subtree of animals where everything is assigned this superfamily, and further down, another large subtree, where just one species has an assignment: Brugia malayi. Let's see if we can figure out what's going on. Click on Brugia malayi in the tree:  http://supfam.org/SUPERFAMILY/cgi-bin/info.cgi?genome=r0

On this page you can see a table of assignment statistics, and a larger table showing all superfamily assignments. Lets look for our KRAB domain, search the page (using ctrl+f in your browser) for 'KRAB' and you'll find one hit towards the bottom of the table, click on it:  http://supfam.org/SUPERFAMILY/cgi-bin/genome.cgi?sf=109640&listtype=sf&cgi_r0=yes

This page lists each SUPERFAMILY assignment of the KRAB domain to the Brugia malayi species in SUPERFAMILY. You can see a single assignment in the grey outlined box (typically you would see more here) which has an E-value of 9.94e-20, indicating this is unlikely to be a false positive assignment given the sequence.

Press back to navigate back to the genome page:  http://supfam.org/SUPERFAMILY/cgi-bin/gen_list.cgi?genome=r0 and click on the Genome information tab at the top of the page:  http://supfam.org/SUPERFAMILY/cgi-bin/info.cgi?genome=r0;listtype=sf;subgenome=

This page shows metadata about the Brugia malayi WS218 genome in SUPERFAMILY, including where the genome sequences were sourced, the NCBI taxonomy name, synonyms, and when the genome was added to the database.

Look at the Names and Synonyms section of the table and you will see 'agent of lymphatic filariasis'. A cursory Google confirms that Brugia malayi is a parasitic worm that causes Elephantitis in humans, interesting!

There are two possible reasons for the unusual assignment of the KRAB containing protein outside of Tetrapods:

1:  Horizontal gene transfer, perhaps due to the parasitic nature of Brugia malayi

2: Contamination during sequencing

Which one is right? If you think you know, or if you have any other feedback about SUPERFAMILY let us know:


Thanks for participating.

[1]  http://supfam.org/SUPERFAMILY/dcGO/