IMPLEMENTATION OF AN APACHE SPARK COMPUTING CLUSTER BASED ON RASPBERRY PI MICROCOMPUTERS

1
Lviv Polytechnic National University, Ukraine
2
Lviv Polytechnic National University

The paper presents the implementation of an Apache Spark distributed computing cluster based on Raspberry Pi microcomputers. The solution consists of three Raspberry Pi 4 devices (one master node and two worker nodes), each equipped with 8 GB of RAM and a high-speed network connection. The cluster configuration was optimized by adjusting the SPARK_WORKER_MEMORY and SPARK_WORKER_CORES parameters to maximize the use of available hardware resources. Secure communication between nodes was established through authentication using 4096-bit SSH keys. The functionality of the cluster was tested using a test application that demonstrated efficient distribution of computational load across nodes. The developed solution costs $400, which is four times less than the cost of using equivalent cloud resources for one year. The results show that the Raspberry Pi cluster provides all the necessary capabilities for practical learning of distributed computing technologies, offering physical access to all system components at a low cost.

  1. F. Dai, M. A. Hossain, and Y. Wang, “State of the Art in Parallel and Distributed Systems: Emerging Trends and Challenges,” Electronics, vol. 14, no. 4, p. 677, Feb. 2025, doi: 10.3390/electronics14040677.
  2. V. Thesma, G. C. Rains, and J. Mohammadpour Velni, “Development of a Low-Cost Distributed Computing Pipe- line for High-Throughput Cotton Phenotyping,” Sensors, vol. 24, no. 3, p. 970, Feb. 2024, doi: 10.3390/s24030970.
  3. A. Alakuu and D. K. Dake, “Cloud Computing in Education: A review of Architecture, Applications, and Integration Challenges,” IJCA, vol. 186, no. 66, pp. 49–65, Feb. 2025, doi: 10.5120/ijca2025924472.
  4. S. Younus, K. Kumar, I. A. Kandhro, A. A. Laghari, and A. Ali, “Systematic Analysis of On Premise and Cloud Services,” IJCC, vol. 13, no. 3, p. 10063641, 2024, doi: 10.1504/IJCC.2024.10063641.
  5. A. A. Abdulle, A. Farah Ali, and R. H. Abdullah, “Cost- Benefit Analysis of Public Cloud Versus In-House Computing,” IJETT, vol. 70, no. 6, pp. 300–307, Jun. 2022, doi: 10.14445/22315381/IJETT-V70I6P231.
  6. A. Katal, S. Dahiya, and T. Choudhury, “Energy efficiency in cloud computing data centers: a survey on software technologies,” Cluster Comput, vol. 26, no. 3, pp. 1845–1875, Jun. 2023, doi: 10.1007/s10586-022-03713-0.
  7. G. Agapito and M. Cannataro, “An Overview on the Challenges and Limitations Using Cloud Computing in Healthcare Corporations,” BDCC, vol. 7, no. 2, p. 68, Apr. 2023, doi: 10.3390/bdcc7020068.
  8. P. K. Donta, I. Murturi, V. Casamayor Pujol, B. Sedlak, and S. Dustdar, “Exploring the Potential of Distributed Computing Continuum Systems,” Computers, vol. 12, no. 10, p. 198, Oct. 2023, doi: 10.3390/computers12100198.
  9. “Spark Overview.” Apache Software Foundation. [Online].Available: https://spark.apache.org/docs/latest/
  10. P. Sewal and Hari Singh, “Performance Comparison of Apache Spark and Hadoop for Machine Learning based iterative GBTR on HIGGS and Covid-19 Datasets,” SCPE, vol. 25, no. 3, pp. 1373–1386, Apr. 2024, doi: 10.12694/scpe.v25i3.2687.
  11. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica., “Spark: Cluster Computing with Working Sets,” 2010. [Online]. Available: https://www.usenix.org/ legacy/event/hotcloud10/tech/full_papers/Zaharia.pdf
  12. N. Ahmed, A. L. C. Barczak, M. A. Rashid, and T. Susnjak, “A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters,” J Big Data, vol. 8, no. 1, p. 107, Dec. 2021, doi: 10.1186/s40537-021-00499-7.
  13. Z.-D. Zhang et al., “TopADDPi: An Affordable and Sustainable Raspberry Pi Cluster for Parallel-Computing Topology Optimization,” Processes, vol. 13, no. 3, p. 633, Feb. 2025, doi: 10.3390/pr13030633.
  14. M. Cloutier, C. Paradis, and V. Weaver, “A Raspberry Pi Cluster Instrumented for Fine-Grained Power Measurement,” Electronics, vol. 5, no. 4, p. 61, Sep. 2016, doi: 10.3390/electronics5040061.
  15. E. Shoop, S. J. Matthews, R. Brown, and J. C. Adams, “Hands-on parallel & distributed computing with Raspberry Pi devices and clusters,” Journal of Parallel and Distributed Computing, vol. 196, p. 104996, Feb. 2025, doi: 10.1016/j.jpdc.2024.104996.
  16. “Spark Configuration.” Apache Software Foundation. [Online]. Available: https://spark.apache.org/docs/latest/ configuration.html
  17. “Amazon EC2 On-Demand Pricing.” AWS. [Online]. Available: https://aws.amazon.com/ec2/pricing/on-demand/