An overview of the basic principles and approaches for extracting information and processing information from web pages has been conducted. A methodology for developing a client-server system based on a tool for automation of work in Selenium web browsers based on the analyzed information about data parsing has been created. A third-party API as a user interface to simplify and speed up system development has been used. User access without downloading additional software has been enabled. Data from web pages have been received and processed. Development has been based on this methodology of its own client-server system, which is used to parse and collect the information presented on web pages. Analysis of cloud technology services for further deployment of data collection system from web pages has been carried out. Assessment and analysis of the viability of the system in an autonomous state have been deployed in the cloud service during long-term operation.
- Cukier, K. (2017). Big data : a revolution that will transform how we live, work and think. London: John Murray., 280 p.
- O’neil, C. and Schutt, R. (2013). Doing data science. Beijing. Cambridge: O’reilly., 510 p.
- Sweigart, A. (2020). Automate the boring stuff with Python : practical programming for total beginners. San Francisco, Calif. No Starch Press., 357 p.
- Selenium. (n.d.). The Selenium Browser Automation Project. [online] Available at: https://selenium.dev/documentation/.
- Espinosa-Leal, L. (2018). Special issue of Big Data Research Journal on “Big Data and Neural Networks.”. Big Data Research, 11, pp. 120–130.
- Williamson, E.P. (2017). Fetching and Parsing Data from the Web with OpenRefine. The Programming Historian, pp. 6–15.
- Holden, G. (2016). Big Data and R&D Management. Research- Technology Management, 59(5), pp. 22–26. DOI:10.1080/08956308.2016.1208044
- Kumar, S. and Singh, M. (2019). Big data analytics for healthcare industry: impact, applications, and tools. Big Data Mining and Analytics, 2(1), pp. 48–57.
- Gardner, F.M. (1998). HTML Sourcebook: A Complete Guide To HTML 3.2 And HTML Extensions [Book Reviews]. IEEE Communications Magazine, 36(6), pp. 26–28. DOI: 10.1109/MCOM.1998.685344
- Varshith, K. (2020). Software Virtualization using Containers in Google Cloud Platform. International Journal of Innovative Technology and Exploring Engineering, 9(4), pp. 802–804.
- Itglobal.com. (n.d.). Amazon Web Services (AWS): platform responsibilities. [online] Available at: https://itglobal.com/ru- ru/company/glossary/amazon-web-services [Accessed 7 Nov. 2021].
- Hamed, P.K. and Preece, A.S. (2020). Google Cloud Platform Adoption for Teaching in HEIs: A Qualitative Approach. OALib, 07(11), pp. 1–23. DOI: 10.4236/oalib.1106819