Development of a Unified Output Format for Text Parsers in the Ontology Construction System From Text Documents

2025;
: pp. 170 - 188
1
Lviv Polytechnic National University, Information Systems and Networks Department, Lviv, Ukrainе
2
Lviv Polytechnic National University Information Systems and Networks Department, Lviv, Ukraine

The challenge of effectively constructing ontologies from text documents remains unresolved, posing a critical gap in modern knowledge extraction methodologies. One of the primary obstacles is the lack of a standardized output format across different NLP tools, particularly text parsers, which serve as the foundational step in multi-stage knowledge extraction processes. While several widely used text parsers exist, each excels in specific functions, making it beneficial to leverage multiple parsers for more comprehensive ontology construction. However, this approach introduces the issue of reconciling their disparate output formats.
To address this challenge, we propose using a graph database to store parser outputs in a subject- predicate-object triple format, enabling seamless integration and further processing through rule-based transformations using SPARQL queries. A key advantage of this approach is the ability to execute new transformation rules dynamically, allowing for greater flexibility and efficiency in ontology generation.
As part of our research, we developed an intelligent agent in Java capable of constructing semantic graphs from natural language text using a rule-based approach. The agent was employed to evaluate the relationship between the execution time of syntax-semantic transformation rules and variables such as text corpus size and dataset sample dimensions. This evaluation was made possible through the implementation of first-level reflection for the studied transformation rule.
The results demonstrate that our approach – standardizing parser outputs via a graph database – roves effective in terms of both computational complexity and processing speed. By streamlining the ontology construction process, our method paves the way for advanced automated learning of intelligent agents based on textual information, unlocking new possibilities for modern science in the realm of knowledge extraction and representation.

  1. Apache Open NLP Website. (n.d.). (Apache) Retrieved from https://opennlp.apache.org/
  2. Asim, M. N., Wasim, M., Khan, M. U., Mahmood, W., & Abbasi, H. M. (2018). A survey of ontology learning techniques and applications. Database: The Journal of Biological Databases and Curation, 2018(bay101). doi:10.5120/2610-3642
  3. Basaraba, I., Bets, I., & Bets, Y. (2024). Current trends in the recognition and decoding of phraseological units. Current Issues of the Humanities, 74(1), 211-216. doi:10.24919/2308-4863/74-1-29
  4. Chornyi, A. (2024). Development of an adequate intellectual agent for a wide subject area as a model for further scientific research. Abstract. Retrieved from https://www.academia.edu/127201897
  5. CoreNLP vs Apache OpenNLP. (n.d.). (Awsome Java) Retrieved from https://java.libhunt.com/compare-corenlp- vs-apache-opennlp
  6. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), pp. 4171-4186. doi:10.48550/arXiv.1810.04805
  7. Doroshenko, A. (2018). Development of information technology for intellectual analysis of factographic information. Bionics of Intelligence, 1 (90), 116-121. doi:10.11591/eei.v11i5.3075
  8. Dosyn, D., & Lytvyn, V. (2021). Models and methods for determining the usefulness of ontological knowledge: Monograph. Lviv: "Novyy svit – 2000".
  9. Dosyn, D., Daradkeh, Y., Kovalevych, V., Luchkevych, M., & Kis, Y. (2022). Domain Ontology Learning using Link Grammar Parser and WordNet. MoMLeT+DS 2022: 4-th International Workshop on Modern Machine Learning Technologies and Data Science. Leiden-Lviv, The Netherlands-Ukraine. Retrieved from https://ceur- ws.org/Vol-3312/paper2.pdf
  10. GATE website. (n.d.). Retrieved from https://gate.ac.uk/
  11. Haiko, C. (2023). Ontology-driven means for processing and presentation of large arrays of unstructured texts. Innovative Technologies and Scientific Solutions for Industries, 2(24), 27-38. doi:10.30837/ITSSI.2023.24.027
  12. Hlybovets, M., & Bobko, O. (2012). The methods of automatic ontology generation. NaUKMA Research Papers. Computer Science, 138, 61-67. Retrieved from https://ekmair.ukma.edu.ua/handle/123456789/1917
  13. Kumari, P. (2024, October 26). 7 Top NLP Libraries For NLP Development. Retrieved from https://www.labellerr.com/blog/top-7-nlp-libraries-for-nlp-development
  14. Linked Open Data Cloud. (n.d.). Retrieved from https://www.lod-cloud.net/
  15. Lytvyn, V., & Cherna, T. (2014). The problem of automated development of a basic ontology. Journal of Lviv Polytechnic National University "Information Systems and Networks", 805, 306–315.
  16. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, (pp. 55-60). Baltimore, Maryland, USA. doi:10.3115/v1/P14- 5010
  17. Mousavi, H., Kerr, D., Iseli, M., & Zaniolo, C. (2014). Harvesting Domain Specific Ontologies from Text. International Conference on Semantic Computing. Newport Beach, CA, USA. doi:10.1109/ICSC.2014.12
  18. Nanavati, J., & Ghodasara, Y. (2015, November). A Comparative Study of Stanford NLP and Apache. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, 5(5), 57-60. Retrieved from https://www.ijsce.org/wp-content/uploads/papers/v5i5/E2744115515.pdf
  19. NTLK website. (n.d.). (NLTK Project) Retrieved from https://www.nltk.org/
  20. Schmitt, X., Kubler, S., Robert, J., Papadakis, M., & LeTraon, Y. (2019). A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate. Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). Granada, Spain. doi:10.1109/SNAMS.2019.8931850
  21. Shaptala, R. (2023). Dictionary embeddings for document classification in low-resource natural language processing. – Qualification scientific work as manuscript. Kyiv. Отримано з https://ela.kpi.ua/items/14de271d- 5971-4cdc-92e6-8e645336332d
  22. Shvorob, I. (2015). Comparative analysis of methods for syntactic parsing of texts. Journal of Lviv Polytechnic National University "Information Systems and Networks", 814, 197-202. Retrieved from http://nbuv.gov.ua/UJRN/VNULPICM_2015_814_22
  23. spaCy website. (n.d.). Retrieved from https://spacy.io/
  24. Stanford CoreNLP website. (n.d.). Retrieved from https://stanfordnlp.github.io/CoreNLP/
  25. Vovnianka, R., Dosyn, D., & Kovalevych, V. (2014). The method of knowledge extraction from text documents. Journal of Lviv Polytechnic National University "Information Systems and Networks", 783, 302–312.
  26. Yunchyk, V., Kunanets, N., Pasichnyk, V., & Fedoniuk, A. (2021, 10). Analysis of artificial intellectual agents for e-learning systems. Journal of Lviv Polytechnic National University "Information Systems and Networks", 10, 41- 57. doi:10.23939/sisn2021.10.041
  27. Zezula, T. (2020, August 29). 15 Natural Language Processing Libraries Worth a Try. Retrieved from https://www.tomaszezula.com/natural-language-processing-libraries
  28. Zlatareva, N., & Amin, D. (2021). Processing Natural Language Queries in Semantic Web Applications. The 7th World Congress on Electrical Engineering and Computer Systems and Science (EECSS’21). doi:10.11159/cist21.108