BioXSD: The common XML Schema (canonical data model) for bioinformatic Web services and everyday bioinformatics.
Defines XML exchange formats for: sequences, sequence annotations, sequence similarity/alignments, references and supporting data-types
Any integer number (~-9.2x10^18 ... ~9.2x10^18). Represented by a 64-bit (8B) signed long integer (long, long long, int64, same as xs:long, different from xs:integer)
Non-negative integer number (0 ... ~9.2x10^18). Represented by a 64-bit (8B) signed long integer
Non-zero integer number (~-9.2x10^18 ... -1, 1 ... ~9.2x10^18). Represented by a 64-bit (8B) signed long integer
Insertion-point specific integer number (-1, 1 ... ~9.2x10^18). Represented by a 64-bit (8B) signed long integer
Positive integer number (1 ... ~9.2x10^18). Represented by a 64-bit (8B) signed long integer
Any decimal number with double precision (-INF, ~-1.7E308 ... 0, ~5E-324 subnormal ... ~2.3E-308 ... ~1.7E308, INF; +NaN). Represented by a 64-bit (8B) signed floating point number (double, long double, same as xs:double, different from xs:decimal)
Probability with double precision (0 ... 1). Represented by a 64-bit (8B) signed floating point number
Quantitave or qualitative certainty
An absolute Unified Resource Identifier (URI), possibly a Web link.
NB. Supports a subset of RFC 3986 generic syntax (selected schemes, DNS only, no user info, constrained port and characters)
Any plain text (possibly formatted)
Name of a public or private database (predefined or arbitrary name)
Name of a public or private controlled vocabulary/ontology (predefined or arbitrary name)
Phase of an incomplete peptide-coding nucleotide sequence, in the direction of translation
Nucleotide sequence in any letter case, possibly with ambiguous ("degenerate") bases
Amino-acid sequence in capital letters, possibly with ambiguous residues (Asx, Xle, Glx, Xaa/Unk) and additional residues (Pyl and Sec)
Nucleotide or amino-acid sequence in any letter case, possibly with ambiguous bases and residues
Nucleotide sequence in any letter case, without ambiguous ("degenerate") bases
Amino-acid sequence in capital letters, without ambiguous and additional residues (Pyl, Sec)
Nucleotide or amino-acid sequence record including the generic sequence and optional metadata
Reference to where the sequence originates from: a database entry or an explicit super-sequence
Nucleotide sequence record including the generic nucleotide sequence and optional metadata
Reference to where the sequence originates from: a database entry or an explicit super-sequence
Amino-acid sequence record including the generic amino-acid sequence and optional metadata
Reference to where the sequence originates from: a database entry or an explicit super-sequence
Nucleotide sequence record including the unambiguous nucleotide sequence and optional metadata
Reference to where the sequence originates from: a database entry or an explicit super-sequence
Amino-acid sequence record including the unambiguous amino-acid sequence and optional metadata
Reference to where the sequence originates from: a database entry or an explicit super-sequence
Particular genetic code (codon encoding)
Codon code consisting of 3 bases (possibly ambiguous, "degenerate")
One amino-acid (possibly ambiguous). NB. If the same codon codes for multiple amino-acids, use multiple 'codon' elements and fill in the 'note' attribute
Reference to a database
General URI of the database
Reference to an entry in a database
Date when this reference was fully valid: created or last time updated
Reference to a controlled vocabulary (ontology)
General URI of the controlled vocabulary (ontology)
Reference to a term from a controlled vocabulary (ontology)
Human-readable name of the term. NB. Does not have to be always up to date and canonical.
(Use also termUri or accession: up-to-date name and other properties of the term should be downloadable given the termUri or accession)
Reference to a species (can be used also for a phenotype, cell line, tissue, sample, geo location, ...)
Custom human-readable name of the species. NB. Does not have to be always up to date and canonical. Use also entryUri or accession if possible
Formal reference to a sequence in a database or an explicit super-sequence
Coordinates of the sub-sequence within the referenced sequence
Explicit super-sequence, in case it is not desired or not possible to point to a database entry
Custom name of the super-sequence, in case it is not desired or not possible to point to a database entry
A reference to a SOAP Web service
Date when this Web-service reference was fully valid: created or last time updated
Identifier for local references within a data record
Generalisation of bioinformatic accession numbers (stable primary keys/identifiers)
UniProt accession number, optionally with the sequence version or the splice-variant suffix
UniProt accession number, without the sequence version and splice-variant suffix
GenBank/EMBL/DDBJ nucleotide accession number
GenBank/EMBL/DDBJ protein accession number
GenBank/EMBL/DDBJ WGS accession number
GenBank/EMBL/DDBJ MGA accession number
NCBI Taxonomy ID (0 ... 999999999; i.e. a subset of 32-bit (4B) signed int)
NCBI ID of a genetic code (1 ... 99)
Term ID in an OBO-Foundry ontology
Human-readable display name of the feature
Alternative human-readable display name of the feature
More generic class of features containing this feature (human-readable display name of the class)
Specific properties of a feature
Occurence of a feature in a reference sequence. Positioned or non-positioned (applied to the whole sequence)
NB. Should be 0 for non-translated but transcribed features. Should be 'strand'*(('min'-1) mod 3 + 1) for translated features within a whole-chromosome annotation
NB. Corresponds to a position in the reference sequence (not to a position within the feature occurence)
Sequence annotated with sequence features
Score of a prediction
NB. Corresponds to a position in the reference sequence (not to a position within the feature occurence)
Position in the sequence, referring to a single point (base, residue, or C-alpha atom), possibly with a certain level of uncertainty
Certain position in the sequence, referring to a single point (base, residue, or C-alpha atom)
Position of an insertion into the sequence, possibly with a certain level of uncertainty
Certain position of an insertion into the sequence
Position outside of the reference sequence, referring to a single point (a nucleotide, residue, or C-alpha atom), possibly uncertain
Certain position outside of the reference sequence, referring to a single point (a nucleotide, residue, or C-alpha atom)
Position in the sequence referring to a continuous segment of the sequence, possibly uncertain
NB. Keep 'min' < 'max', use 'strand' if necessary. Leave empty (set 'nil') only if 'certainty'="Unknown"
NB. Keep 'max' > 'min', use 'strand' if necessary. Leave empty (set 'nil') only if 'certainty'="Unknown"
Certain position in the sequence referring to a scontinuous segment of the sequence
NB. Keep 'min' < 'max', use 'strand' if necessary
NB. Keep 'max' > 'min', use 'strand' if necessary
Position outside of the reference sequence, referring to a continuous segment, possibly uncertain
NB. Keep 'min' < 'max', use 'strand' if necessary. Leave empty (set 'nil') only if 'certainty'="Unknown"
NB. Keep 'max' > 'min', use 'strand' if necessary. Leave empty (set 'nil') only if 'certainty'="Unknown"
Certain position outside of the reference sequence, referring to a continuous segment
NB. Keep 'min' < 'max', use 'strand' if necessary
NB. Keep 'max' > 'min', use 'strand' if necessary
Position in the sequence, referring either to a subsequence, a single point, an insertion, or outside of the sequence, possibly uncertain
NB. Leave empty (set 'nil') only if 'certainty'="Unknown"
Insertion to the right of the given point (-1 for preceeding the sequence). NB. Leave empty (set 'nil') only if 'certainty'="Unknown"
Certain position in the sequence, referring either to a subsequence, a single point, an insertion, or outside of the sequence
Insertion to the right of the given point (-1 for preceeding the sequence)
Position outside of the sequence, possibly uncertain
NB. Leave empty (set 'nil') only if 'certainty'="Unknown"
Insertion to the right of the given point. NB. Leave empty (set 'nil') only if 'certainty'="Unknown"
Certain position outside of the sequence
Insertion to the right of the given point
Single gap in an aligned sequence
Frame-shift in an aligned amino-acid sequence
Coordinates of the locally aligned sub-sequence. (Not present means global alignment)
Coordinates of the locally aligned sub-sequence. (Not present means global alignment)
Coordinates of the locally aligned sub-sequence. (Not present means global alignment)
Coordinates of the locally aligned sub-sequence. (Not present means global alignment)
Coordinates of the locally aligned sub-sequence. (Not present means global alignment)
Alignment of 2..n generic nucleotide or amino-acid sequences
Alignment of 2..n generic nucleotide sequences
Alignment of 2..n generic amino-acid sequences
Alignment of 2..n nucleotide sequences
Alignment of 2..n amino-acid sequences
Predefined, recommended verdict of a predicted or experimental evidence
Predefined, recommended reliability of an experimental evidence
Qualitative certainty tag
The reffered value completely unknown, not the certainty unknown
Correspondance of the outside positions to the reference sequence
The outside position is in the nucleotide sequence of the either explicitly given chromosome or the chromosome of the reference sequence.
Outside-positions 1..m are positions in the chromosome
The outside position is in an explicitly referenced nucleotide or amino-acid supersequence (respectively) of the reference sequence.
Outside-positions 1..m are in the explicitly referenced supersequence
The outside position is in a nucleotide or amino-acid supersequence (respectively) of the reference subsequence.
Positions 1..n correspond, position -1 is the preceding (next point to the left) from 1
The outside position is in a nucleotide sequence.
Position 1 corresponds to the 1st base of the 1st translated codon within the reference isoform, position -1 is the next to the left from 1
(NB. CDSs include the start and stop codons: do not use this correspondance option for outside features of CDSs)
The outside position is in a nucleotide sequence.
Position 1 corresponds to the 1st base transcribed within the reference isoform, position -1 is the next to the left from 1
Predefined, recommended database name of a public database
Predefined, recommended ontology name of a public ontology