I'm always on the lookout for innovative ways to enhance my coding experience, and this week's Reading Notes are filled with exciting discoveries! From cutting-edge UI libraries to secure sandbox environments for AI agents, I've curated a selection of articles that showcase the latest programming trends and technologies.
Whether you're interested in harnessing the power of Docker sandboxes or exploring the potential of smart glasses integration, there's something on this list for everyone.
Running an AI model as a one-shot script is useful, but it forces you to restart the model every time you need a result. Setting it up as a service lets any application send requests to it continuously, without reloading the model. This guide shows how to serve Reka Edge using vLLM and an open-source plugin, then connect a web app to it for image description and object detection.
You need a machine with a GPU and either Linux, macOS, or Windows (with WSL). I use UV, a fast Python package and project manager, or pip + venv if you prefer.
Clone the vLLM Reka Plugin
Reka models require a dedicated plugin to run under vLLM, not all models need this extra step, but Reka's architecture requires it. Clone the plugin repository and enter the directory:
git clone https://github.com/reka-ai/vllm-reka
cd vllm-reka
The repository contains the plugin code and a serve.sh script you will use to start the service.
Download the Reka Edge Model
Before starting the service, you need the model weights locally. Install the Hugging Face Hub CLI and use it to pull the reka-edge-2603 model into your project directory:
This is a large model, so make sure you have enough disk space and a stable connection.
Start the Service
Once the model is downloaded, start the vLLM service using the serve.sh script included in the plugin:
uv run bash serve.sh ./models/reka-edge-2603
The script accepts environment variables to configure which model to load and how much GPU memory to allocate. If your GPU cannot fit the model at default settings, open serve.sh and adjust the variables at the top. The repository README lists the available options. The service takes a few seconds to load the model weights, then starts listening for HTTP requests.
As an example with an NVIDIA GeForce RTX 5070, here are the settings I used to run the model:
With the backend running, time to start the Media Library app. Clone the repository, jump into the directory, and run it with Docker:
git clone https://github.com/fboucher/media-library
cd media-library
docker compose up --build -d
Open http://localhost:8080 in your browser, then add a new connection with these settings:
Name: local (or any label you want)
IP address: your machine's local network IP (e.g. 192.168.x.x)
API key: leave blank or enter anything — no key is required for a local connection
Model: reka-edge-2603
Click Test to confirm the connection, then save it.
Try It: Image Description and Object Detection
Select an image in the app and choose your local connection, then click Fill with AI. The app sends the image to your vLLM service, and the model returns a natural language description. You can watch the request hit your backend in the terminal where the service is running.
Reka Edge also supports object detection. Type a prompt asking the model to locate a specific feature (ex: "face") and the model returns bounding-box coordinates. The app renders these as red boxes overlaid on the image. This works for any region you can describe in a prompt.
Switch to the Reka Cloud API
If your local GPU is too slow for production use, you can point the app at the Reka APIs instead. Add a new connection in the app and set the base URL to the Reka API endpoint. Get your API key from platform.reka.ai. OpenRouter is another option if you prefer a unified API across providers.
The model name stays the same (reka-edge-2603), so switching between local and cloud is just a matter of selecting a different connection in the app. The cloud API is noticeably faster because Reka's servers are more powerful than a local GPU (at least mine :) ). During development, use the local service to avoid burning credits; switch to the API for speed when you need it.
What You Can Build
The service you just set up accepts any image, or video via HTTP — point a script at a folder and you have a batch pipeline for descriptions, tags, or bounding boxes. Swap the prompt and you change what it extracts. The workflow is the same whether you are running locally or through the API.
The tech landscape is constantly evolving, and keeping up with the latest developments can be overwhelming. From AI-powered tools like Ollama and OpenClaw, to new ways of programming with Aspire Docs and Azure CLI, it seems like there's always something new to explore. In this edition of Reading Notes, I'll share some of the interesting things that caught my eye recently, from AI advancements to developer tools and beyond.
NDI Tools: The Unsung Hero of Video Production (Golnaz) - I started using NDI tools for my stream back in 2020, when I was bringing one of my friends into my stream. Amazing, and you must read this post
Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, podcasts and books that catch my interest during the week.