Development of a Unified Output Format for Text Parsers in the Ontology Construction System From Text Documents

Andrii Chornyi; D. G. Dosyn

The challenge of effectively constructing ontologies from text documents remains unresolved, posing a critical gap in modern knowledge extraction methodologies. One of the primary obstacles is the lack of a standardized output format across different NLP tools, particularly text parsers, which serve as the foundational step in multi-stage knowledge extraction processes. While several widely used text parsers exist, each excels in specific functions, making it beneficial to leverage multiple parsers for more comprehensive ontology construction. However, this approach introduces the issue of reconciling their disparate output formats.
To address this challenge, we propose using a graph database to store parser outputs in a subject- predicate-object triple format, enabling seamless integration and further processing through rule-based transformations using SPARQL queries. A key advantage of this approach is the ability to execute new transformation rules dynamically, allowing for greater flexibility and efficiency in ontology generation.
As part of our research, we developed an intelligent agent in Java capable of constructing semantic graphs from natural language text using a rule-based approach. The agent was employed to evaluate the relationship between the execution time of syntax-semantic transformation rules and variables such as text corpus size and dataset sample dimensions. This evaluation was made possible through the implementation of first-level reflection for the studied transformation rule.
The results demonstrate that our approach – standardizing parser outputs via a graph database – roves effective in terms of both computational complexity and processing speed. By streamlining the ontology construction process, our method paves the way for advanced automated learning of intelligent agents based on textual information, unlocking new possibilities for modern science in the realm of knowledge extraction and representation.

natural language processing

ontology

automatic ontology construction

automated learning

syntax-semantic patterns

Apache Open NLP Website. (n.d.). (Apache) Retrieved from https://opennlp.apache.org/
Asim, M. N., Wasim, M., Khan, M. U., Mahmood, W., & Abbasi, H. M. (2018). A survey of ontology learning techniques and applications. Database: The Journal of Biological Databases and Curation, 2018(bay101). DOI: 10.5120/2610-3642
Basaraba, I., Bets, I., & Bets, Y. (2024). Current trends in the recognition and decoding of phraseological units. Current Issues of the Humanities, 74(1), 211–216. DOI: 10.24919/2308-4863/74-1-29
Chornyi, A. (2024). Development of an adequate intellectual agent for a wide subject area as a model for further scientific research. Abstract. Retrieved from https://www.academia.edu/127201897
CoreNLP vs Apache OpenNLP (n.d.). (Awsome Java) Retrieved from https://java.libhunt.com/compare-corenlp- vs-apache-opennlp
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), 4171–4186. DOI: 10.48550/arXiv.1810.04805
Doroshenko, A. (2018). Development of information technology for intellectual analysis of factographic information. Bionics of Intelligence, 1 (90), 116–121. DOI: 10.11591/eei.v11i5.3075
Dosyn, D., & Lytvyn, V. (2021). Models and methods for determining the usefulness of ontological knowledge: monograph. Lviv: Novyy svit – 2000.
Dosyn, D., Daradkeh, Y., Kovalevych, V., Luchkevych, M., & Kis, Y. (2022). Domain Ontology Learning using Link Grammar Parser and WordNet. MoMLeT+DS 2022: 4-th International Workshop on Modern Machine Learning Technologies and Data Science. Leiden-Lviv, The Netherlands-Ukraine. Retrieved from https://ceur-ws.org/Vol-3312/paper2.pdf
GATE website. (n.d.). Retrieved from https://gate.ac.uk/
Haiko, C. (2023). Ontology-driven means for processing and presentation of large arrays of unstructured texts. Innovative Technologies and Scientific Solutions for Industries, 2(24), 27–38. DOI: 10.30837/ ITSSI.2023.24.027
Hlybovets, M., & Bobko, O. (2012). The methods of automatic ontology generation. NaUKMA Research Papers. Computer Science, 138, 61–67. Retrieved from https://ekmair.ukma.edu.ua/handle/123456789/1917
Kumari, P. (2024, October 26). 7 Top NLP Libraries For NLP Development. Retrieved from https://www.labellerr.com/blog/top-7-nlp-libraries-for-nlp-development
Linked Open Data Cloud. (n.d.). Retrieved from https://www.lod-cloud.net/
Lytvyn, V., & Cherna, T. (2014). The problem of automated development of a basic ontology. Information Systems and Networks: Journal of Lviv Polytechnic National University, 805, 306–315.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60. Baltimore, Maryland, USA. DOI: 10.3115/v1/P14-5010
Mousavi, H., Kerr, D., Iseli, M., & Zaniolo, C. (2014). Harvesting Domain Specific Ontologies from Text. International Conference on Semantic Computing. Newport Beach, CA, USA. DOI: 10.1109/ICSC.2014.12
Nanavati, J., & Ghodasara, Y. (2015, November). A Comparative Study of Stanford NLP and Apache. International Journal of Soft Computing and Engineering (IJSCE). ISSN: 2231-2307, 5(5), 57–60. Retrieved from https://www.ijsce.org/wp-content/uploads/papers/v5i5/E2744115515.pdf
NTLK website. (n.d.). (NLTK Project) Retrieved from https://www.nltk.org/
Schmitt, X., Kubler, S., Robert, J., Papadakis, M., & LeTraon, Y. (2019). A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate. Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). Granada, Spain. DOI: 10.1109/SNAMS.2019.8931850
Shaptala, R. (2023). Dictionary embeddings for document classification in low-resource natural language processing. – Qualification scientific work as manuscript. Kyiv. Отримано з https://ela.kpi.ua/ items/14de271d-5971-4cdc-92e6-8e645336332d
Shvorob, I. (2015). Comparative analysis of methods for syntactic parsing of texts. Information Systems and Networks: Journal of Lviv Polytechnic National University, 814, 197–202. Retrieved from http://nbuv.gov.ua/UJRN/VNULPICM_2015_814_22spaCy website. (n.d.). Retrieved from https://spacy.io/
Stanford CoreNLP website (n. d.). Retrieved from https://stanfordnlp.github.io/CoreNLP/
Vovnianka, R., Dosyn, D., & Kovalevych, V. (2014). The method of knowledge extraction from text documents. Information Systems and Networks: Journal of Lviv Polytechnic National University, 783, 302–312.
Yunchyk, V., Kunanets, N., Pasichnyk, V., & Fedoniuk, A. (2021, 10). Analysis of artificial intellectual agents for e-learning systems. Information Systems and Networks: Journal of Lviv Polytechnic National University, 10, 41–57. DOI: 10.23939/sisn2021.10.041
Zezula, T. (2020, August 29). 15 Natural Language Processing Libraries Worth a T ry. Retrieved from https://www.tomaszezula.com/natural-language-processing-libraries
Zlatareva, N., & Amin, D. (2021). Processing Natural Language Queries in Semantic Web Applications. The 7th World Congress on Electrical Engineering and Computer Systems and Science (EECSS’21). DOI: 10.11159/cist21.108