Home All Projects
Intermediate

Running local LLMs and VLMs on the Arduino UNO Q with yzma

Discover how to run local LLMs and VLMs directly on the Arduino UNO Q using Ron Evans' yzma project, using llama.cpp with Go making edge AI LLM possible in the Arduino UNO Q.

Running local LLMs and VLMs on the Arduino UNO Q with yzma
PS

Project Supporter Team

Posted by

Internet of Things, BT & Wireless
1,971 views
1 likes

Components, Tools and Machines

1x Arduino UNO Q
🛒 Buy Now

Apps and platforms

1x Edge Impulse Studio
Official Site

Project description

The Arduino UNO Q is a game-changer: an hybrid board running full Debian Linux alongside a real-time STM32 microcontroller with the lovely classic Arduino UNO form factor. When listening to experts telling me that it was not possible to run LLMs or VLMs because the Arduino UNO Q was not enough powerful I started researching.

My good friend Ron Evans (the legendary maker behind Gobot, TinyGo, and much more) loves Go so much that he created yzma — a clean Go wrapper for llama.cpp. Ron's philosophy is simple: libraries shouldn't lock you into one language. With yzma, Go developers can integrate high-performance, hardware-accelerated inference without CGo headaches. Now supports a huge range of GGUF models from Hugging Face.

The two models in this tutorial are part of Hugging Face's Smol family, designed for lightweight, on-device AI. They're optimized for edge hardware like the Arduino UNO Q's Arm-based Linux environment. Quantized GGUF versions (via llama.cpp compatibility) make them runnable locally with low memory and compute. The LLM model is running ~135 million parameters (tiny for an instruct model) and the VLM about ~500 million parameters (smallest practical video-capable VLM at release). It is able to process images and short videos with text prompts to generate descriptions or answers. These models exemplify the push towards Edge AI and on-device multimodal AI.

Find below how to get the LLMs and VLMs running locally on my Arduino UNO Q, first in the 4GB and afterwards in the 2GB.

Step-by-step tutorial

Prerequisites

  1. Arduino UNO Q 2GB or 4GB versions
  2. SSH access

Step 1: Install Go and yzma

First SSH into your Arduino UNO Q and clone the yzma repository.

https://github.com/hybridgroup/yzma/tree/main

Alternatively, clone it in your machine and via VS Code (`scp` or similar) copy the files to the Arduino UNO Q.

We need to start installing Go:

sudo apt update

sudo apt install golang


go install github.com/hybridgroup/yzma/cmd/yzma@v1.9.0


echo 'export PATH=$PATH:$(go env GOPATH)/bin' >> ~/.bashrc && source ~/.bashrc

Check the version running yzma version.

When running go version you should get: go version go1.24.4 linux/arm64

Step 2: Install llama.cpp libraries via yzma

Now create a lib folder into the `yzma` folder where you cloned the `yzma` repository. And let’s install all the needed libraries for running llama.cpp via yzma

export YZMA_LIB=/home/arduino/yzma/lib


yzma install -u --processor cpu --os trixie

Now you should be able to start running the examples available in the yzma project.

Run a tiny LLMs in the Arduino UNO Q

To start testing, you should be able to run the first examples. We are going to start with the chat example to chat with the LLM locally installed.

First download a small instruct model like SmolLM2-135M:

yzma model get -u https://huggingface.co/bartowski/SmolLM2-135M-Instruct-GGUF/resolve/main/SmolLM2-135M-Instruct-Q4_K_M.gguf

Launch the chat example:

go run ./examples/chat/ -model ~/models/SmolLM2-135M-Instruct-Q4_K_M.gguf -lib ./lib/ -v

Type prompts when the USER sign appears and enjoy local text generation!

Run a Small VLM (Vision + Language) on the Arduino UNO Q

Now we are going to run the vlm example. Grab SmolVLM2-500M model.

yzma model get -u https://huggingface.co/ggml-org/SmolVLM2-500M-Video-Instruct-GGUF/resolve/main/SmolVLM2-500M-Video-Instruct-Q8_0.gguf


yzma model get -u https://huggingface.co/ggml-org/SmolVLM2-500M-Video-Instruct-GGUF/resolve/main/mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf

Launch the VLM example:

go run ./examples/vlm/ -model ~/models/SmolVLM2-500M-Video-Instruct-Q8_0.gguf -mmproj ~/models/mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf -image ./images/domestic_llama.jpg -p "What is in this picture?" -v

This is testing a llama image. Try to upload a picture to the Arduino UNO Q and prompt your image:

arduino@Arduino4:~/yzma$ go run ./examples/vlm/ -model ~/models/SmolVLM2-500M-Video-Instruct-Q8_0.gguf -mmproj ~/models/mmproj-SmolVLM2-500M-Video-Instruct-Q8_0.gguf -image ./images/pens.jpg -p "What is in this picture?"

Image used in the VLM example

The image shows a collection of various markers lying on a surface, possibly a desk or a table. The markers, each with a unique color and design, are scattered across the surface in a seemingly random arrangement. The markers, which range from light to dark shades, include a mix of different shapes and sizes, with some being cylindrical while others are rectangular.
Some of the markers, like the ones in the top left corner of the image, are predominantly pink and have a white marking on the tip, while others, such as the ones in the top right corner, are predominantly blue and have a white marking on the tip. The markers in the bottom left corner, which are light in color, are mostly cylindrical with a white marking on the tip, and the ones in the bottom right corner, which are predominantly pink, have a white marking on the tip.
The markers are arranged in a somewhat haphazard manner, with some overlapping each other and others positioned closer to the edges of the image. The image does not provide any specific context or labels for the markers, so it's not possible to determine the exact brand or type of the markers. However, based on their color and design, it's likely that these markers are used for personal or artistic purposes, possibly for writing, drawing, or coloring.

Why is this important

In this tutorial using yzma and Arduino UNO Q we are running real multimodal AI inference locally on a $50 board locally. This is a perfect template for edge robotics or connected home experiments.

Try it yourself—clone https://github.com/hybridgroup/yzma, grab a tiny model, and start prompting your Arduino UNO Q today.

Attributions

Huge thanks to Ron Evans for building tools that let us dream bigger on tiny hardware!

Have you run local models on UNO Q yet? Drop your experiences in the comments—I'd love to hear!

Code

🔒 Unlock Code

Support to get the Source Code for this project

Project Reference Code: running-local-llms-and-vlms-on-the-arduino-uno-q-with-yzma-74e288-en
490 THB
PromptPay QR Code

ประเมินราคาอัตโนมัติ + Reference Code

อยากได้งานคล้ายโปรเจคนี้ ให้ AI ประเมินราคาก่อน

กรอกข้อมูลให้ครบ ระบบจะสร้างรหัสอ้างอิงและประเมินราคา/ระยะเวลาคร่าว ๆ จากรายละเอียดงาน แล้วให้กด Add LINE พร้อมพิมพ์รหัสนี้เพื่อคุยต่อ

คำถามให้ AI ประเมินแม่นขึ้น

หลังส่งฟอร์ม ระบบจะโชว์ Reference Code ให้ copy แล้วกด Add LINE เพื่อคุยต่อ ข้อมูลส่วนตัวจะไม่ถูกส่งเข้า GA4

รีวิวจากคนใช้งานจริง

รีวิวจากลูกค้าและคนที่เคยใช้งาน

ถ้าเคยสั่งงาน เคยอ่านหน้านี้แล้วได้ประโยชน์ หรือมีข้อเสนอแนะ ฝากรีวิวไว้ได้เลย

กำลังโหลดรีวิว...