About Omnicrobe


In recent years, developments in molecular technologies have led to an exponential growth of experimental data and publications, many of which are open, yet separately accessible. Therefore, it is now crucial to make available to researchers bioinformatics applications that propose unified access to both data and related scientific articles. With the help of text mining tools, they can rapidly access and process textual data, link them with other data and make the results available for microbiology research and technology.


The Omnicrobe application aggregates knowledge from the literature with knowledge available in other databases on microbial biodiversity, which makes possible their comparison for further analysis. Key knowledge is the microbial biotopes and microbial phenotypes.

  Information extraction

Information extraction tools achieve information content analysis and standardization. They automatically analyze textual descriptions of microorganism biotopes so that biotope descriptions originating from different experiments can be compared at a large scale. Here analysis means not only the extraction of the relevant spans of text, but also the normalization or categorization with reference resources (e.g. taxonomy of organisms, ontology of habitats, ontology of phenotypes, ontology of physico-chemical properties, etc.). Information retrieval is supported by a powerful semantic search engine that enables ontology-based query.

More on Omnicrobe

  Public data

The information managed by the Omnicrobe application is publicly accessible online. It offers numerous cross-functional avenues of use in different fields like food security, ecology, or human health. The main source of information in Omnicrobe are scientific references from PubMed. Omnicrobe includes an increasing volume of textual and non-textual information from relevant biological databases such as Biological Resource Centers (e.g. INRAE CIRM, DSMZ) and major genetic databases (GenBank). The available data is the result of an automatic predictive text-mining process. Users must be aware that this information is not curated.

  Text mining process

The text-mining process behind Omnicrobe has been set up by INRAE using the AlvisNLP environment. It consists in extracting the relevant information, mostly textual, from scientific literature and databases. Words or word groups are identified and assigned a type ("habitat", "phenotype", "use" or "taxon"). They are then normalized, meaning they are assigned either a finer category (e.g. cheese as habitat) or an ID that is shared with other public databases (e.g. 1639 is Listeria monocytogenes ID in the NCBI taxonomy). Reference semantic resources such as nomenclatures, ontologies define these IDs or categories. For example, "Irish dairy farms", "dairy cattle farms" or "dairy farms environment" are designated by the same habitat reference class "dairy farm" according to the OntoBiotope ontology.

  Who benefits?

Researchers - Rapid overview of microorganisms and their functionality in their ecosystems, leading to a better ability to understand, control and use them in food processing.
Agro-industrial Technology Institutes - Provide more efficient access to research results and therefore boost innovation.
Agrofood Companies and artisans - Gather information which helps identify food microbes more quickly, thereby increasing food safety and speeding up the development of new products.
Food Safety Agencies - Better ability to determine which microbe might interfere with food products and the origin of harmful microbes.

  How to cite Omnicrobe?

The Omnicrobe database and the associated data are free of use, available under the CC-BY license. If you share or adapt it, you must give appropriate credit i.e. provide a link to the license, indicate if changes were made and cite the paper: Dérozier S, Bossy R, Deléger L, Ba M, Chaix E, et al. (2023) Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach. PLOS ONE 18(1): e0272473. https://doi.org/10.1371/journal.pone.0272473.

About us

  Core group

  • Text-mining team: Bibliome at MaIAGE (Applied Mathematics and Computer Science, from Genomes to the Environment), INRAE, Université Paris-Saclay
  • Bioinformatics and Statistics for omics data team: StatInfOmics at MaIAGE, INRAE, Université Paris-Saclay
  • Bioinformatics platform: Migale at MaIAGE, INRAE, Université Paris-Saclay
  • Microbiology team: Microbiology of Milk and Egg Sectors at STLO, INRAE, AgroCampus Ouest
  •   Participant teams

  • UR BIA, INRAE, Nantes
  • URTAL, INRAE, Poligny
  • UMR GMPA, INRAE, Grignon
  • UMR SPO, INRAE, Montpellier
  • LRF, INRAE, Aurillac
  • CIRM BIA, INRAE, Rennes
  • DipSO, INRAE, Université Paris-Saclay, Versailles
  • CIRM Levures, INRAE, Grignon
  • MICALIS, INRAE, Université Paris-Saclay, Jouy-en Josas
  •   Bibliography

    Omnicrobe database


    Data production by text-mining and ontologies



    Analysis of needs


    Using Omnicrobe for knowledge discovery

       Oral presentation at meetings

    Terms of use and Copyright

    The access to Omnicrobe is free for academic/non-commercial users. The Omnicrobe webpage can be browsed and all text downloads can be freely copied. The re publication of information is permitted provided that the source is indicated (see the Source column in Omnicrobe interface). The redistribution of data or commercial use from BacDive requires written permission by the Leibniz-Institut DSMZ.


  • PubMed: information is available under CC-BY License. PubMed must be cited as the source.
  • GenBank: NCBI places no restrictions on the use or distribution of the GenBank data, for more details see.
  • BacDive: the use of information is permitted provided BacDive is indicated as the source or with prior approval by the Leibniz-Institut DSMZ. Any redistribution of data or commercial use requires written permission by the Leibniz-Institut DSMZ.
  • CIRM-BIA: information is available under CC-BY License. CIRM-BIA must be cited as the source.
  • CIRM-CFBP: information is available under CC-BY License. CIRM-CFBP must be cited as the source.
  • CIRM-Levures: information is available under CC-BY License. CIRM-Levures must be cited as the source.
  •   Disclaimer

    The authors of Omnicrobe information have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated in the creation and publication of this information hold any sort of responsibility that might occur as a result of using its content.


    Logo vector created by kreativkolors - www.freepik.com