The article outlines a range of tasks, approaches and stages of developing parsing technology for text of a multilingual explanatory terminology dictionary. Research was conducted for the “Dictionary of Ukrainian Biological Terminology”. Among all the vocabulary diversity, this dictionary was chosen because terminology dictionaries provide a lexical-semantic basis for further creation of systems for the intelligent processing of professional texts, which provide information on specific subject areas.
The scientific and practical problem of automatic detection of meaningful keywords and Ukrainian content categorization in Internet systems on the basis of linguistic analysis of text information is unleashed. The article presents a theoretical and experimental substantiation of linguistic analysis methods for Ukrainian content using Porter stemming.
Post quality assessing algorithm based on the set of chosen parameters is considered in the article. To solve the problem the following next instruments will be used: Java library called Jsoup for HTML-code parsing, and Matlab tools for building the decision tree for post quality assessing.
Some parsing algorithms have been shown and described in the article. The performance comparison of the selected algorithms is made.