A data model for biomolecular sequences, features, alignments, references

Suitable as an exchange format for Web services and programmatic libraries

The XML Schema
Example data
Example workflows
Compliant tools
Ongoing development


Enhanced and standardised provenance metadata are under development
We are improving the provenance metadata in BioXSD, by implementing a rich but simple model that will be aligned with the W3C
PROV standard. It will most likely be part of the next minor release. Thanks shall among others be sent to the OpenBio Codefest 2013 for input and discussions.
For desired additions into the next release, please contact

BioXSD version 1.1, release 1.1.2
A minor release of BioXSD 1.1, the version BioXSD 1.1.2, was released on the 13th of May, 2013.
As all minor releases of BioXSD are compatible with all previous minor releases within the last major release, the BioXSD 1.1.2 is backwards compatible with BioXSD 1.1 data (i.e. less restricted).
For the list of changes, please refer to the CHANGELOG in the BioXSD-1.1.xsd file.

The 50 BioXSD-compatible Web services challenge
Volunteers are still warmly welcome to join the 50 BioXSD-compatible Web services challenge!
The aim is to provide compatible Web services and other tools that adopt BioXSD as one of their input/output formats.
Implementations have already been ongoing at CBS in Greater Copenhagen (Denmark), Rostlab in Greater Munich (Germany), IBCP in Lyon (France), CBU in Bergen (Norway), and at a couple of more sites.
For more details, help, and consultation, please contact


BioXSD defines exchange formats of basic bioinformatics types of data. BioXSD aims to serve as the common, canonical XML format for the basic bioinformatics data.

Canonical data format does not mean "the only format", but an exchange format that can be common to several tools (as one of multiple formats the tools are supporting). Tools can produce and consume BioXSD directly, or BioXSD can be used as an intermediate canonical format rich enough to enable conversions among diverse formats. Using common exchange format enables smooth intergration of compatible tools into analysis workflows.

BioXSD is a rich but not too complicated XML-Schema-based exchange format for sequences, alignments, feature records, and references to external resources. Specialised standard XML formats such as for example SBML, MAGE-ML, GCDML, PDBML, PSI MI MIF, PhyloXML or NeXML are orthogonal efforts and should be used where applicable. BioXSD however aims at filling the gap between these specialised XML formats.

BioXSD enables deployment of globally and smoothly interoperable bioinformatics tools on the World Wide Web of Services. BioXSD supports WS-I compliant Web services and interoperates with ordinary SOAP and XML libraries for common programming languages, and naturally also with the REST architecture. No other infrastructure than standard HTTP, XML, and eventually SOAP is necessary for using BioXSD-compatible Web services.

BioXSD is an initiative coming from the scientific community: from the EMBRACE project partners.

The EMBRACE standards:

Diagram of EMBRACE standards

BioXSD data-type definitions are annotated with the EDAM (EMBRACE Data And Methods) ontology and with the main Semantic Web vocabularies. BioXSD thus offers ready-made building blocks for Web-service interfaces with a globally defined, controlled meaning (semantics).

BioXSD has been developed by analysing existing requirements, tools, Web services, data formats, and ontologies. Feasibility was tested at different pilot providers, using diverse libraries and programming languages.

BioXSD types can be used directly if applicable; or can be included in other standard or custom types, extended or restricted. With services that use other or proprietary formats, BioXSD can be used as the canonical intermediate exchange format.

Open collaboration within the community: BioXSD welcomes feature requests and new collaborations!

To submit your requirements, please write to A request-tracking system will be available in the future.




Please reference this publication if you use BioXSD:

Kalaš, M., Puntervoll, P., Joseph, A., Bartaševičiūtė (now Karosiene), E., Töpfer, A., Venkataraman, P., Pettifer, S., Bryne, J.C., Ison, J., Blanchet, C., Rapacki, K. and Jonassen, I. (2010). BioXSD: the common data-exchange format for everyday bioinformatics web services. Bioinformatics, 26, i540-i546.
doi: 10.1093/bioinformatics/btq391   PMID: 20823319

If you make use of the optimised sequence/genome feature representation, please reference also:

Gundersen, S., Kalaš, M., Abul, O., Frigessi, A., Hovig, E. and Sandve, G.K. (2011). Identifying elemental genomic track types and representing them uniformly. BMC Bioinformatics, 212, 494.
doi: 10.1186/1471-2105-12-494   PMID: 22208806

The XML Schema

BioXSD 1.1 is available at This stable version is available for implementations and open for additions and further requirements. Suggestions for changes are welcome and may be reflected in the future versions. This is the canonical Schema location to be imported in document XSDs (such as in Web services' WSDLs) or to be synchronised with.

(There is no 'worked-around' version available yet for Web-service providers using Python ZSI for their SOAP stack. It can, however, be available soon. Please contact with requests.)

BioXSD 1.0 is available at This is the canonical Schema location to be imported in document XSDs (such as in Web services' WSDLs) or to be synchronised with.

(For Web-service providers using Python ZSI for their SOAP stack: Due to some important basic features missing in the Python Zolera Soap Infrastructure (ZSI) library, a special version for generating ZSI code is at Do not forget to get the ZSI patch for the empty-complexType bug. This xsd is "SOAP-compatible" with the normal xsd. It means that the services in Python should be generated from WSDLs importing the BioXSD-x.x.zsi.Workaround.xsd Schema, but WSDLs of the deployed services should then be importing the normal BioXSD-x.x.xsd Schema.)

BioXSD Schemas are available under the Creative Commons BY-ND 3.0 license with additionally allowed inclusion, extensions and restrictions in user's XML namespace. Contributions to new canonical versions, in the XML namespace, are welcome under supervision of the BioXSD consortium (in order to keep BioXSD a common, canonical data model).

For release information including CHANGELOGs, please refer to the bottom of the XSD files.


A concise quick overview guide to the BioXSD format is at ./QuickReference.

Full technical reference of BioXSD version 1.1 is at ./technicalDocumentation/BioXSD-1.1.

Documentation of BioXSD version 1.0 is available at ./technicalDocumentation/BioXSD-1.0.

Examples of data

Examples of feature data represented in BioXSD 1.1 format.

Examples of diverse types of bioinformatics data represented in BioXSD 1.0 format are available in an example file. This example file contains examples of sequence records, annotated sequences, and multiple-sequence alignments.

Example workflows

Example workflows (analysis pipelines) combine multiple bioinformatics Web services using BioXSD.

Workflows show that such services are smoothly compatible.

Compliant tools

Thanks to the providers of bioinformatics tools who started adopting BioXSD as pilot users, the number of services and software compatible with BioXSD rises. Currently adapted tools are:

Web services:


If you started using BioXSD for your software, services, libraries, or other tools, please let us know by sending an email to For maintenance and support purposes we would love to know about the providers using BioXSD. A registration system will be available in the future.

Ongoing development

Contributions from the community are warmly welcome and needed! (

Volunteers are especially welcome to join the 50 BioXSD-compatible Web services challenge.


BioXSD has been and is further being developed as part of multiple collaborative projects. There has never been any funding directed exclusively to BioXSD.

Contribution and advise to the development of BioXSD
CBU, University of Bergen, Norway: Matúš Kalaš, Inge Jonassen, Pål Puntervoll (until 2010; now Uni Miljø, Bergen), Jan Christian Bryne (until 2010; later Oslo University Hospital), Armin Töpfer (until 2011; also CeBiTec, Bielefeld, Germany; later D-BSSE, ETH Zürich, Basel, Switzerland), Prabu Venkataraman (until 2011; later Fiskeridirektoratet, Bergen)
CBS, DTU, Greater Copenhagen, Denmark: Kristoffer Rapacki, Jon Ison (at CBS, DTU since 2014), Edita Karosiene (until 2010)
Oslo University Hospital, Norway: Sveinung Gundersen
IBCP, CNRS, Lyon, France: Christophe Blanchet (now also IFB, Gif-sur-Yvette, France), Alexandre Joseph (until 2010)
EBI, EMBL, Hinxton, U.K.: Jon Ison (until 2010; now see at CBS, DTU above), Rodrigo Lopez
CS, University of Manchester, U.K.: Steve Pettifer
Rostlab, TUM, Greater Munich, Germany: László Kaján (until 2013; now itelligence, Poznań, Poland)
... and multiple supporters at diverse research institutions

Research Council of Norway (to eSysbio, to the FUGE Bioinformatics Platform, and ELIXIR.NO to the Norwegian Bioinformatics Platform)
Villum Foundation (to the Center for Disease Systems Biology)
l'Agence Nationale de la Recherche (to HIPCAL)
Alexander von Humboldt Foundation (through the German Ministry for Research and Education)
European Commission FP6 and FP7 (to EMBRACE and ELIXIR)


Last update: 2014-November-17