Omnicrobe

About Omnicrobe

Context

In recent years, developments in molecular technologies have led to an exponential growth of experimental data and publications, many of which are open, yet separately accessible. Therefore, it is now crucial to make available to researchers bioinformatics applications that propose unified access to both data and related scientific articles. With the help of text mining tools, they can rapidly access and process textual data, link them with other data and make the results available for microbiology research and technology.

Goal

The Omnicrobe application aggregates knowledge from the literature with knowledge available in other databases on microbial biodiversity, which makes possible their comparison for further analysis. Key knowledge is the microbial biotopes and microbial phenotypes.

Information extraction

Information extraction tools achieve information content analysis and standardization. They automatically analyze textual descriptions of microorganism biotopes so that biotope descriptions originating from different experiments can be compared at a large scale. Here analysis means not only the extraction of the relevant spans of text, but also the normalization or categorization with reference resources (e.g. taxonomy of organisms, ontology of habitats, ontology of phenotypes, ontology of physico-chemical properties, etc.). Information retrieval is supported by a powerful semantic search engine that enables ontology-based query.

More on Omnicrobe

Public data

The information managed by the Omnicrobe application is publicly accessible online. It offers numerous cross-functional avenues of use in different fields like food security, ecology, or human health. The main source of information in Omnicrobe are scientific references from PubMed. Omnicrobe includes an increasing volume of textual and non-textual information from relevant biological databases such as Biological Resource Centers (e.g. INRAE CIRM, DSMZ) and major genetic databases (GenBank). The available data is the result of an automatic predictive text-mining process. Users must be aware that this information is not curated.

Text mining process

The text-mining process behind Omnicrobe has been set up by INRAE using the AlvisNLP environment. It consists in extracting the relevant information, mostly textual, from scientific literature and databases. Words or word groups are identified and assigned a type ("habitat", "phenotype", "use" or "taxon"). They are then normalized, meaning they are assigned either a finer category (e.g. cheese as habitat) or an ID that is shared with other public databases (e.g. 1639 is Listeria monocytogenes ID in the NCBI taxonomy). Reference semantic resources such as nomenclatures, ontologies define these IDs or categories. For example, "Irish dairy farms", "dairy cattle farms" or "dairy farms environment" are designated by the same habitat reference class "dairy farm" according to the OntoBiotope ontology.

Who benefits?

Researchers - Rapid overview of microorganisms and their functionality in their ecosystems, leading to a better ability to understand, control and use them in food processing.
Agro-industrial Technology Institutes - Provide more efficient access to research results and therefore boost innovation.
Agrofood Companies and artisans - Gather information which helps identify food microbes more quickly, thereby increasing food safety and speeding up the development of new products.
Food Safety Agencies - Better ability to determine which microbe might interfere with food products and the origin of harmful microbes.

How to cite Omnicrobe?

The Omnicrobe database and the associated data are free of use, available under the CC-BY license. If you share or adapt it, you must give appropriate credit i.e. provide a link to the license, indicate if changes were made and cite the paper: Dérozier S, Bossy R, Deléger L, Ba M, Chaix E, et al. (2023) Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach. PLOS ONE 18(1): e0272473. https://doi.org/10.1371/journal.pone.0272473.

About us

Core group

Text-mining team: Bibliome at MaIAGE (Applied Mathematics and Computer Science, from Genomes to the Environment), INRAE, Université Paris-Saclay

Bioinformatics and Statistics for omics data team: StatInfOmics at MaIAGE, INRAE, Université Paris-Saclay

Bioinformatics platform: Migale at MaIAGE, INRAE, Université Paris-Saclay

Microbiology team: Microbiology of Milk and Egg Sectors at STLO, INRAE, AgroCampus Ouest

Participant teams

UR BIA, INRAE, Nantes

UMR SECALIM, INRAE, Nantes

URTAL, INRAE, Poligny

UMR GMPA, INRAE, Grignon

UMR SPO, INRAE, Montpellier

LRF, INRAE, Aurillac

CIRM BIA, INRAE, Rennes

DipSO, INRAE, Université Paris-Saclay, Versailles

CIRM Levures, INRAE, Grignon

MICALIS, INRAE, Université Paris-Saclay, Jouy-en Josas

Bibliography

Omnicrobe database

Posters

Dérozier, S., Bossy, R., Deléger, L., Ba, M., Chaix, E., Loux, V., Falentin, H., Nédellec, C. Omnicrobe, an open-access database of microbial habitats, phenotypes and uses extracted from text. Presented at JOBIM 2022, Rennes (2022-07-05 - 2022-07-08).
Falentin, H., Harlé, O., Dérozier, S., Deléger, L., Chaix, E., Ba, M., Bossy, R., Loux, V., Nédellec, C. Omnicrobe : une base de données d’habitats et de phénotypes microbiens. Presented at 23ème édition du colloque du Club des Bactéries Lactiques, Rennes (2022-06-08 - 2022-06-10).
Dérozier, S., Deléger, L., Chaix, E., Mekdad R., Ba, M., Bossy, R., Sicard, D., Loux, V., Falentin, H. & Nédellec, C. Florilege: a database gathering microbial habitats, phenotypes and uses. Presented at JOBIM 2020, virtual edition (2020-06-30 - 2020-07-03).
Chaix, E., Dérozier, S., Deléger, L., Falentin, H., Bohuon, J.B., Ba, M., Bossy, R., Sicard, D., Loux, V. & Nédellec, C. Florilege: an integrative database using text mining and ontologies. In: Abstract JOBIM 2018 (p. 563). Presented at JOBIM 2018, Marseille, FRA (2018-07-03 - 2018-07-06).
Falentin, H., Chaix, E., Dérozier, S., Weber, M., Buchin, S., Dridi, B., Deutsch, S.-M., Valence-Bertel, F., Casaregola, S., Renault, P., Champomier-Vergès, M.-C., Thierry, A., Zagorec, M., Irlinger, F., Delbès, C., Aubin, S., Bessières, P., Loux, V., Bossy, R., Dibie, J., Sicard, D., Nédellec, C. (2017, October). Florilege: a database gathering microbial phenotypes of food interest. In 4th International Conference on Microbial Diversity 2017, Bari, ITA (2017-10-24 - 2017-10-26).

Data production by text-mining and ontologies

Publications

Robert Bossy, Louise Deléger, Estelle Chaix, Mouhamadou Ba, Claire Nédellec. Bacteria Biotope at BioNLP Open Shared Tasks 2019, Proceedings of the 5th Workshop on BioNLP Open Shared Tasks joint to EMNLP-IJCNLP 2019, Hong-Kong, nov 2019. DOI: 10.18653/v1/D19-5719.
Chaix, E., Deléger, L., Bossy, R., Nédellec, C. (2018). Text-mining tools for extracting information about microbial biodiversity in food. Food Microbiology, 1-13. , DOI: 10.1016/j.fm.2018.04.011.
Nédellec, C., Chaix, E., Bossy, R., Deléger, L., Dérozier, S., Bohuon, J.-B., Loux, V. (2018). L'ontologie OntoBiotope pour l'étude de la biodiversité microbienne. Actes des Journées Francophones d'Extraction et de Gestion des Connaissances (EGC'2018), Paris, janvier 2018. pp.353-358.
Nédellec, C. , Bossy, R., Chaix, E., Deléger, L. (2017). Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity. In: Proceedings of the 4th International Microbial Diversity Conference (p. 221-227). Bari, ed. Marco Gobetti. Baris, Pub. Simtra. ISBN 978-88-943010-0-7, arXiv:1805.04107.

Report

Robert Bossy, Claire Nédellec, Julien Jourde, Mouhamadou Ba, Estelle Chaix, Louise Deleger. Bacteria biotope annotation guidelines. 2019, 30 p. hal-02787110.

Analysis of needs

Publications

Chaix, E., Aubin, S., Deléger, L., Nédellec, C. (2017). Text-mining needs of the food microbiology research community. Presented at 2017 EFITA WCCA Congress, Montpellier, FRA (2017-07-02 - 2017-07-06).
Przybyła, P., Shardlow, M., Aubin, S., Bossy, R., de Castilho, R. E., Piperidis, S., McNaught, J., Ananiadou, S. (2016). Text mining resources for the life sciences. Database, november (25), 1-30. DOI: 10.1093/database/baw145.

Using Omnicrobe for knowledge discovery

Oral presentation at meetings

Hélène Falentin. Florilège : une base de données de phénotypes microbiens d’intérêt agro-alimentaire. Journées qualiment, 4 february 2020, Paris, France.
Hélène Falentin, Stéphanie-Marie Deutsch, Valérie Gagnaire, Anne Thierry, Sandra Dérozier, Claire Nédellec. Bioinformatics tools as a way to select microbial strains for fermented food products, 15th Symposium on Bacterial Genetics and Ecology "Ecosystem drivers in a changing planet", (BAGECO), 27 mai 2019, Lisbonne, Portugal.
Hélène Falentin, Claire Nédellec, Estelle Chaix, Bedis Dridi, Philippe Bessières, et al.. Florilege: a database gathering microbial phenotypes of food interest. 2017 Scientific MEM days: Journées scientifiques MEM (Métaomiques et écosystèmes microbiens), Jan 2017, Paris, France.

Terms of use and Copyright

The access to Omnicrobe is free for academic/non-commercial users. The Omnicrobe webpage can be browsed and all text downloads can be freely copied. The re publication of information is permitted provided that the source is indicated (see the Source column in Omnicrobe interface). The redistribution of data or commercial use from BacDive requires written permission by the Leibniz-Institut DSMZ.

Sources

PubMed: information is available under CC-BY License. PubMed must be cited as the source.

GenBank: NCBI places no restrictions on the use or distribution of the GenBank data, for more details see.

BacDive: the use of information is permitted provided BacDive is indicated as the source or with prior approval by the Leibniz-Institut DSMZ. Any redistribution of data or commercial use requires written permission by the Leibniz-Institut DSMZ.

CIRM-BIA: information is available under CC-BY License. CIRM-BIA must be cited as the source.

CIRM-CFBP: information is available under CC-BY License. CIRM-CFBP must be cited as the source.

CIRM-Levures: information is available under CC-BY License. CIRM-Levures must be cited as the source.

Disclaimer

The authors of Omnicrobe information have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated in the creation and publication of this information hold any sort of responsibility that might occur as a result of using its content.

Credits

Logo vector created by kreativkolors - www.freepik.com