This study examines the capabilities and limitations of small, 4-bit quantized language models that run locally on Apple Silicon. Four models have been benchmarked on a dataset of natural language prompts, with runtime metrics including inference time, memory usage, and token throughput, as well as output behavior. The study provides an empirical assessment of the feasibility of deploying language models on resource-constrained devices. The results highlight trade-offs of small language models and underscore the importance of model size, quantization, and prompt tuning in balancing performance, efficiency, and usability. Building on these insights, future work will extend evaluations to multi-turn agentic dialogues, analyze the semantic quality of output, and pursue further optimizations to enhance local inference performance.
- Patil, R. & V. Gudivada. (2024). A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs). Applied Sciences, 14(5), 2074. DOI: 10.3390/app14052074.
- Yao, Y., J. Duan, K. Xu, Y. Cai, Z. Sun, & Y. Zhang. (2024). A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 4(2), 100211. DOI: 10.1016/j.hcc.2024.100211.
- Gallagher, S., B. Gelman, S. Taoufiq, T. Vörös, Y. Lee, A. Kyadige, & S. Bergeron. (2024). Phishing and Social Engineering in the Age of LLMs. Large Language Models in Cybersecurity: Threats, Exposure and Mitigation, 81– 86. DOI: 10.1007/978-3-031-54827-7_8.
- Piccialli, F., D. Chiaro, P. Qi, V. Bellandi, & E. Damiani. (2025). Federated and edge learning for large language models. Information Fusion, 117, 102840. DOI: 10.1016/j.inffus.2024.102840.
- Lamaakal, I., Y. Maleh, K. El Makkaoui, I. Ouahbi, P. Pławiak, O. Alfarraj, M. Almousa, & A. A. Abd El-Latif. (2025). Tiny Language Models for Automation and Control: Overview, Potential Applications, and Future Research Directions. Sensors, 25(5), 1318. DOI: 10.3390/s25051318.
- Jay, R. (2024). Building Different Types of Agents. Generative AI Apps with LangChain and Python: A Project-Based Approach to Building Real-World LLM Apps, 345–414. DOI: 10.1007/979-8-8688-0882-1_9.
- Chitty-Venkata, K. T., S. Mittal, M. Emani, V. Vishwanath,& A. K. Somani. (2023). A survey of techniques for optimi- zing transformer inference. Journal of Systems Architecture, 144, 102990. DOI: 10.1016/j.sysarc.2023.102990.
- Kassinos, S. & A. Alexiadis. (2025). Beyond language: Applying MLX transformers to engineering physics. Results in Engineering, 26, 104871. DOI: 10.1016/j.rineng.2025.104871.
- Kasperek, D., P. Antonowicz, M. Baranowski, M. Sokolowska, & M. Podpora. (2023). Comparison of the Usability of Apple M2 and M1 Processors for Various Machine Learning Tasks. Sensors, 23(12), 5424. DOI: 10.3390/s23125424.
- Berardi, D., S. Giallorenzo, A. Melis, M. Prandini, J. Mauro, & F. Montesi. (2022). Microservice security: a systematic literature review. PeerJ Computer Science, 7. DOI: 10.7717/PEERJ-CS.779.
- Chaplia, O. & H. Klym. (2023). Node.js project architecture with shared dependencies for microservices. Measuring Equipment and Metrology, 84(3), 53–58. DOI: 10.23939/istcmtm2023.03.053.
- Bližnák, K., M. Munk, & A. Pilková. (2024). A Systematic Review of Recent Literature on Data Governance (2017– 2023). IEEE Access, 12, 149875–149888. DOI:10.1109/ACCESS.2024.3476373.
- Josa, A. D. & M. Bleda-Bejar. (2024). Local LLMs: Safeguarding Data Privacy in the Age of Generative AI. A Case Study at the University of Andorra. ICERI2024 Proceedings, 7879–7888. DOI: 10.21125/iceri.2024.1931.