Small Language Models

A Comparative Study of Inference Frameworks for Node.js Microservices on Edge Devices

Deploying small language models (e.g., SLMs) on edge devices has become increasingly viable due to advancements in model compression and efficient inference frameworks. Running small models offers significant benefits, including privacy through on-device processing, reduced latency, and increased autonomy. This paper conducts a comparative review and analysis of Node.js inference frameworks that operate on-device. It evaluates frameworks in terms of performance, memory consumption, isolation, and deployability.

Evaluating small quantized language models on apple silicon

This study examines the capabilities and limitations of small, 4-bit quantized language models that run locally on Apple Silicon. Four models have been benchmarked on a dataset of natural language prompts, with runtime metrics including inference time, memory usage, and token throughput, as well as output behavior. The study provides an empirical assessment of the feasibility of deploying language models on resource-constrained devices.