COMBINED DATA PARTITIONING METHOD FOR BIG DATA IN INFORMATION SYSTEMS

Volodymyr Solohub; Mykola Beshley

In today’s environment of rapidly growing data volumes, information systems must not only provide storage and access to massive datasets but also maintain stable performance when handling diverse query types. A critical challenge lies in balancing the efficiency of analytical (OLAP) operations with the responsiveness of transactional (OLTP) processes. Traditional approaches to data organization in relational DBMS often lose effectiveness in large-scale environments, resulting in longer processing times, reduced flexibility, and greater complexity in database management. This underscores the importance of developing new partitioning optimization methods capable of ensuring both high performance and scalability in hybrid information systems. This article investigates existing data partitioning methods in information systems designed to handle large volumes of structured information while simultaneously serving OLAP and OLTP workloads. The mechanisms of table partitioning in modern DBMSs are analyzed, and the strengths and limitations of each approach are identified with respect to performance, scalability, and ease of data management. A combined partitioning method (range + list) is proposed, tailored for hybrid information systems that concurrently process analytical and transactional workloads. Unlike traditional approaches, the proposed method not only applies combined partitioning for analytical tasks but also provides a comprehensive evaluation of its impact on query performance and transaction processing speed. The results demonstrate that the developed method achieves a balance between OLAP and OLTP performance, enhances scalability and flexibility of information systems, and can be considered a universal approach to managing large-scale data. To conduct the study, a unified simulation model of data processing was built using a star schema with a fact sales table, supporting both business analytics queries and transactional CRUD operations. Experimental findings confirm that the combined partitioning approach reduces analytical query execution time by 30–40% without significant degradation of CRUD performance, making it an effective tool for improving the performance of large-scale information systems.

[1] S. Ponnusamy and P. Gupta, "Scalable Data Partitioning Techniques for Distributed Data Processing in Cloud Environments: A Review," in IEEE Access, vol. 12, pp. 26735-26746, 2024, doi: 10.1109/ACCESS.2024.3365810.

[2] H. Song, W. Zhou, H. Cui, X. Peng, and F. Li, “A survey on hybrid transactional and analytical processing,” VLDB J., vol. 33, no. 5, pp. 1485–1515, 2024, doi:10.1007/s00778-024-00858-9

[3] D. Corral-Plaza, I. Medina-Bulo, G. Ortiz, and J. Boubeta-Puig, “A stream processing architecture for heterogeneous data sources in the Internet of Things,” Comput. Stand. Interfaces, vol. 70, no. 103426, p. 103426, 2020, doi:10.1016/j.csi.2020.103426

[4] P.-J. Liu, C.-P. Li, and H. Chen, “Enhancing storage efficiency and performance: A survey of data partitioning techniques,” J. Comput. Sci. Technol., vol. 39, no. 2, pp. 346–368, 2024, doi:10.1007/s11390-024-3538-1