The SUPERFAMILY database provides SCOP structural domain annotation of protein sequences at the superfamily and family levels using a library of HMMs.

SUPERFAMILY domain definitions and classifications are directly inherited from SCOP. A library of hidden Markov models (HMMs) is used to represent all the domains of known structure, and is produced by a hand-curated semi-automated pipeline. The HMMs are designed to find the most distantly-related homologues possible for which there is a genuine evolutionary relationship (common superfamily). Subsequent to detecting the superfamily, the family is identified by a hybrid pairwise/profile method for sub-classification. Full length multi-domain proteins are expressed as architectures consisting of: their domains, the order in which they appear, and the superfamilies to which they belong. The HMM library is used in this way to annotate the proteins from of all completely sequenced genomes with their domains of known structure, to classify them into the SCOP hierarchy in their superfamilies and families, and determine the complete domain architecture of each protein.

Group Leader:Julian Gough