Method of data dedublication and distribution in cloud warehouses during data backup

2019;
: pp. 1 - 12
1
Karpenko Physico-Mechanical Institute of the National Academy of Sciences of Ukraine, Lviv, Ukraine
2
Karpenko Physico-Mechanical Institute of the NAS Ukraine
3
Lviv Polytechnic National University, Information Systems and Networks Department; Osnabrück University, Institute of Computer Science
4
Information Systems and Networks Department, Lviv Polytechnic National University

An intellectual system of data deduplication and distribution in cloud storage facilities was developed. The resulting software has a user-friendly interface that allows you to backup and restore data. An analytical review of the methodological principles of the research is carried out, existing approaches to data backup with the use of data deduplication and distribution in cloud storage were analyzed, their advantages and disadvantages are highlighted. The advantages and disadvantages of modern data deduplication technologies are considered in detail. This analysis has proved the efficiency of the design and implementation of the intellectual systemof data deduplication and distribution in cloud storage. A systematic analysis of the subject domain is performed. The purpose of functioning and development of the system, purpose and place of functioning of the system is formulated, the expected effects from the introduction of the software product are determined. A conceptual model of the system has been developed and described in detail. The detailed diagrams of precedents, states of transitions, sequences, components and classes are given, which together allowed to determine the system’s behavior, to define and formulate the necessary business processes. It was analyzed (the disadvantages and advantages of using different approaches are given) and the effective methods of solving problems are selected: hybrid deduplication at the block level, data splitting on the basis of Rabin’s digital imprint, data distribution based on the hash values of the duplication units, and the use of the distributed index. During the analysis of task solutions, the Rust programming language for writing a client part, Scala programming language for the server part, Akka for distributed computing management and Amazon S3 as cloud storage are selected. The intellectual system of deduplication and distribution of data in cloud storage is developed, the software description is described, the steps for the user’s operation are considered. The testing of the work of the designed system is carried out and several control samples were given, the results are analyzed.

1. Understanding Data Deduplication. (2018). Retrieved 28, 2019, from https://www.druva.com/understanding-data-deduplication

2. Explaining deduplication rates and single-instance storage to clients. (2008). Retrieved 28, 2019, from https://searchitchannel.techtarget.com/tip/Explaining-deduplication-rate

3. Inline vs. post-processing deduplication appliances. (2008). Retrieved 28, 2019, from https://searchdatabackup.techtarget.com/tip/Inline-vs-post-processing-de

4. Introduction to Data Deduplication. (2008). Retrieved 28, 2019, from https://www.petri.com/datadeduplication-introduction

5. Rabin, M. O. (1981). Fingerprinting by random polynomials: Center for Research in Computing Technology Harvard University Report - Harvard.

6. Tanenbaum, A. S., & van Steen, M. (2017). Distributed Systems. Upper Saddle River: Pearson Prentice Hall.

7. Amdahl, G. (1967). The validity of the single processor approach to achieving large-scale computing capabilities. Atlantic City : Proceedings of AFIPS. https://doi.org/10.1145/1465482.1465560

8. Using StorReduce for cloud-based data deduplication. (2008). Retrieved 28, 2019, from https://cloud.google.com/solutions/partners/storreduce-cloud-deduplication

9. OpenDedup Overview. (2008). Retrieved 2019, from https://opendedup.org/odd/overview/

10. Rumbaugh, J., Jacobson, I., & Booch, G. (1999). The unified modeling language reference manual. Addison Wesley Longman Inc.

11. Rolling hash, Rabin Karp, palindromes, rsync and others. (2008). Retrieved 28, 2019, from https://www.infoarena.ro/blog/rolling-hash

12. Vysotska, V., Chyrun, L., & Lytvyn, V. (2016). Methods based on ontologies for information resources processing. LAP Lambert Academic Publishing.

13. Vysotska, V., & Shakhovska, N. (2018). Information technologies of gamification for training and recruitment. Saarbrucken, Germany: LAP LAMBERT Academic Publishing.

14. Vysotska, V. (2008). Osoblyvosti proektuvannya ta vprovadzhennya system elektronnoyi komertsiyi.

15. Vysotska, V., & Lytvyn, V. (2018). Web resources processing based on ontologies. Saarbrucken, Germany: LAP LAMBERT Academic Publishing.

16. Vysotska, V. (2018). Tekhnolohiyi elektronnoyi komertsiyi ta Internet-marketynhu. Saarbrucken, Germany: LAP LAMBERT Academic Publishing.

17. Vysotska, V. (2018). Internet systems design and development based on Web Mining and NLP. Saarbrucken, Germany: LAP LAMBERT Academic Publishing.

18. Vysotska, V. (2018). Computer linguistics for online marketing in information technology: Monograph. Saarbrucken, Germany: LAP LAMBERT Academic Publishing.

19. Lytvyn, V., Vysotska, V., Wojcik, W., & Dosyn, D. (2017). A method of construction of automated basic ontology. In Computational linguistics andintelligent systems (COLINS 2017). National Technical University "KhPI".

20. Lytvyn, V., Vysotska, V., Chyrun, L., Smolarz, A., & Naum, O. (2017). Intelligent system structure for Web resources processing and analysis. In Computational linguistics andintelligent systems (COLINS 2017). National Technical University "KhPI".

21. Berko, A., Vysotska, V., & Chyrun, L. (2014). Features of information resources processing in electronic content commerce. Applied Computer Science, 10.

22. Berko, A., Vysotska, V., & Rishnyak, I. (2008). Metody ta zasoby otsinyuvannya ryzykiv bezpeky informatsiyi v systemakh elektronnoyi komertsiyi.

23. Vysotska, V., & Chyrun, L. (2013). Web Content Processing Method for Electronic Business Systems. International Journal of Computers & Technology, 12(2), 3211-3220. https://doi.org/10.24297/ijct.v12i2.3288

24. Vysotska, V., Chyrun, L., & Chyrun, L. (2011). Modelyuvannya etapiv zhyttyevoho tsyklu komertsiynoho web-kontentu.

25. Berko, A., Vysotska, V., & Chyrun, L. (2004). Alhorytmy opratsyuvannya informatsiynykh resursiv v systemakh elektronnoyi komertsiyi.

26. Vysotska, V., & Chyrun, L. (2011). Commercial Web Content Lifecycle Model.

27. Berko, A., & Vysotska, V. (2009). Proektuvannya navihatsiynoho hrafu web-storinok bazy danykh system elektronnoyi kontent-komertsiyi.

28. Berko, A., & Vysotska, V. (2009). Semantychna intehratsiya nepovnykh ta netochnykh danykh. Systemy obrobky informatsiyi, (7), 93-98.

29. Berko, A., & Vysotska, V. (2007). Modeli ta metody proektuvannya informatsiynykh system elektronnoyi komertsiyi. Avtomatyzyrovannye systemy upravlenyya y prybory avtomatyky, (138).

30. Alekseeva, K., Berko, A., & Vysotska, V. (2015). UpravlinnyaWeb-resursamy za umov nevyznachenosti. Tekhnolohycheskyy audyt y rezervy proyzvodstva, (2 (2)), 4-7.

31. Vysotska, V., & Chyrun, L. (2014). Designing features of architecture for e-commerce systems [Electronic resource]. MEST Journal, 2(1), 57-70. https://doi.org/10.12709/mest.02.02.01.06

32. Vysotska, V., & Chyrun, L. (2014). Set-theoretic models and unified methods of information resources processing in e-business systems. Applied Computer Science, 10.