I'm always on the lookout for innovative ways to enhance my coding experience, and this week's Reading Notes are filled with exciting discoveries! From cutting-edge UI libraries to secure sandbox environments for AI agents, I've curated a selection of articles that showcase the latest programming trends and technologies.
Whether you're interested in harnessing the power of Docker sandboxes or exploring the potential of smart glasses integration, there's something on this list for everyone.
Running an AI model as a one-shot script is useful, but it forces you to restart the model every time you need a result. Setting it up as a service lets any application send requests to it continuously, without reloading the model. This guide shows how to serve Reka Edge using vLLM and an open-source plugin, then connect a web app to it for image description and object detection.
You need a machine with a GPU and either Linux, macOS, or Windows (with WSL). I use UV, a fast Python package and project manager, or pip + venv if you prefer.
Clone the vLLM Reka Plugin
Reka models require a dedicated plugin to run under vLLM, not all models need this extra step, but Reka's architecture requires it. Clone the plugin repository and enter the directory:
git clone https://github.com/reka-ai/vllm-reka
cd vllm-reka
The repository contains the plugin code and a serve.sh script you will use to start the service.
Download the Reka Edge Model
Before starting the service, you need the model weights locally. Install the Hugging Face Hub CLI and use it to pull the reka-edge-2603 model into your project directory:
This is a large model, so make sure you have enough disk space and a stable connection.
Start the Service
Once the model is downloaded, start the vLLM service using the serve.sh script included in the plugin:
uv run bash serve.sh ./models/reka-edge-2603
The script accepts environment variables to configure which model to load and how much GPU memory to allocate. If your GPU cannot fit the model at default settings, open serve.sh and adjust the variables at the top. The repository README lists the available options. The service takes a few seconds to load the model weights, then starts listening for HTTP requests.
As an example with an NVIDIA GeForce RTX 5070, here are the settings I used to run the model:
With the backend running, time to start the Media Library app. Clone the repository, jump into the directory, and run it with Docker:
git clone https://github.com/fboucher/media-library
cd media-library
docker compose up --build -d
Open http://localhost:8080 in your browser, then add a new connection with these settings:
Name: local (or any label you want)
IP address: your machine's local network IP (e.g. 192.168.x.x)
API key: leave blank or enter anything — no key is required for a local connection
Model: reka-edge-2603
Click Test to confirm the connection, then save it.
Try It: Image Description and Object Detection
Select an image in the app and choose your local connection, then click Fill with AI. The app sends the image to your vLLM service, and the model returns a natural language description. You can watch the request hit your backend in the terminal where the service is running.
Reka Edge also supports object detection. Type a prompt asking the model to locate a specific feature (ex: "face") and the model returns bounding-box coordinates. The app renders these as red boxes overlaid on the image. This works for any region you can describe in a prompt.
Switch to the Reka Cloud API
If your local GPU is too slow for production use, you can point the app at the Reka APIs instead. Add a new connection in the app and set the base URL to the Reka API endpoint. Get your API key from platform.reka.ai. OpenRouter is another option if you prefer a unified API across providers.
The model name stays the same (reka-edge-2603), so switching between local and cloud is just a matter of selecting a different connection in the app. The cloud API is noticeably faster because Reka's servers are more powerful than a local GPU (at least mine :) ). During development, use the local service to avoid burning credits; switch to the API for speed when you need it.
What You Can Build
The service you just set up accepts any image, or video via HTTP — point a script at a folder and you have a batch pipeline for descriptions, tags, or bounding boxes. Swap the prompt and you change what it extracts. The workflow is the same whether you are running locally or through the API.
Reka just released Reka Edge, a compact but powerful vision-language model that runs entirely on your own machine. No API keys, no cloud, no data leaving your computer. I work at Reka and putting together this tutorial was genuinely fun; I hope you enjoy running it as much as I did.
In three steps, you'll go from zero to asking an AI what's in any image or video.
What You'll Need
A machine with enough RAM to run a 7B parameter model (~16 GB recommended)
Git
uv, a fast Python package manager. Install it with:
curl -LsSf https://astral.sh/uv/install.sh | sh
This works on macOS, Linux, and Windows (WSL). If you're on Windows without WSL, grab the Windows installer instead.
Step 1: Get the Model and Inference Code
Clone the Reka Edge repository from Hugging Face. This includes both the model weights and the inference code:
git clone https://huggingface.co/RekaAI/reka-edge-2603
cd reka-edge-2603
Step 2: Fetch the Large Files
Hugging Face stores large files (model weights and images) using Git LFS. After cloning, these files exist on disk but contain only small pointer files, not the actual content.
First, make sure Git LFS is installed. The command varies by platform:
Then pull all large files, including model weights and media samples:
git lfs pull
Grab a coffee while it downloads, the model weights are several GB.
Step 3: Ask the Model About an Image or Video
To analyze an image, use the sample included in the media/ folder:
uv run example.py \
--image ./media/hamburger.jpg \
--prompt "What is in this image?"
Or pass a video with --video:
uv run example.py \
--video ./media/many_penguins.mp4 \
--prompt "What is in this?"
The model will load, process your input, and print a description, all locally, all private.
Try different prompts to unlock more:
"Describe this scene in detail."
"What text is visible in this image?"
"Is there anything unusual or unexpected here?"
What's Actually Happening?
You don't need this to use the model, but if you're anything like me and can't help wondering what's going on under the hood, here's the magic behind example.py:
1. It picks the best hardware available.
The script checks whether your machine has a GPU (CUDA for Nvidia, Metal for Apple Silicon) and uses it automatically. If neither is available, it falls back to the CPU. This affects speed, not quality.
2. It loads the model into memory.
The 7 billion parameter model is read from the folder you cloned. This is the "weights": billions of numbers that encode everything the model has learned. Loading takes ~30 seconds depending on your hardware.
processor = AutoProcessor.from_pretrained(args.model, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(args.model, ...).eval()
3. It packages your input into a structured message.
Your image (or video) and your text prompt are wrapped together into a conversation-style format, the same way a chat message works, except one part is visual instead of text.
4. It converts everything into numbers.
The processor translates your image into a grid of numerical patches and your prompt into tokens (small chunks of text, each mapped to a number). The model only understands numbers, so this step bridges the gap.
5. The model generates a response, token by token.
Starting from your input, the model predicts the most likely next word, then the next, up to 256 tokens. It stops when it hits a natural end-of-response marker.
6. It converts the numbers back into text and prints it.
The token IDs are decoded back into human-readable words and printed to your terminal. No internet involved at any point.
If you prefer watching and reading, here is the video version:
That's Pretty Cool, Right?
A single script. No API key. No cloud. You just ran a 7 billion parameter vision-language model entirely on your own machine, and it works whether you're on a Mac, Linux, or Windows with WSL, which is what I was using when I wrote this.
This works great as a one-off script: drop in a file, ask a question, get an answer. But what if you wanted to build something on top of it? A web app, a tool that watches a folder, or anything that needs to talk to the model repeatedly?
That's exactly what the next post is about. I'll show you how to wrap Edge as a local API, so instead of running a script, you have a service running on your machine that any app can plug into. Same model, same privacy, but now it's a proper building block.
I've spent most of my career building software in C# and .NET, and only used Python in IoT projects. When I wanted to build a fun project—an app that uses AI to roast videos, I knew it was the perfect opportunity to finally dig into Python web development.
The question was: where do I start? I hopped into a brainstorming session with Reka's AI chat and asked about options for building web apps in Python. It mentioned Flask, and I remembered friends talking about it being lightweight and perfect for getting started. That sounded right.
In this post, I share how I built "Roast My Life," a Flask app using the Reka Vision API.
The Vision (Pun Intended)
The app needed three core things:
List videos: Show me what videos are in my collection
Upload videos: Let me add new ones via URL
Roast a video: Send a selected video to an AI and get back some hilarious commentary
See it in action
Part 1: Getting Started Environment Setup
The first hurdle was always going to be environment setup. I'm serious about keeping my Python projects isolated, so I did the standard dance:
Before even touching dependencies, I scaffolded a super bare-bones Flask app. Then one thing I enjoy from C# is that all dependencies are brought in one shot, so I like doing the same with my python projects using requirements.txt instead of installing things ad‑hoc (pip install flask then later freezing).
Dropping that file in first means the setup snippet below is deterministic. When you run pip install -r requirements.txt, Flask spins up using the exact versions I tested with, and you won't accidentally grab a breaking major update.
Here's the shell dance that activates the virtual environment and installs everything:
To get that API key, I visited the Reka Platform and grabbed a free one. Seriously, a free key for playing with AI vision APIs? I was in.
With python app.py, I fired up the Flask development server and opened http://127.0.0.1:5000 in my browser. The UI was there, but... it was dead. Nothing worked.
Perfect. Time to build.
The Backend: Flask Routing and API Integration
Coming from ASP.NET Core's controller-based routing and Blazor, Flask's decorator-based approach felt just like home. All the code code goes in the app.py file, and each route is defined with a simple decorator. But first things first: loading configuration from the .env file using python-dotenv:
from flask import Flask, request, jsonify
import requests
import os
from dotenv import load_dotenv
app = Flask(__name__)
# Load environment variables (like appsettings.json)
load_dotenv()
api_key = os.environ.get('API_KEY')
base_url = os.environ.get('BASE_URL')
All the imports packages are the same ones that needs to be in the requirements.txt. And we retreive the API key and base URL from environment variables, just like in .NET Core.
Now, to be able to get roasted we need first to upload a video to the Reka Vision API. Here's the code—I'll go over some details after.
@app.route('/api/upload_video', methods=['POST'])
def upload_video():
"""Upload a video to Reka Vision API"""
data = request.get_json() or {}
video_name = data.get('video_name', '').strip()
video_url = data.get('video_url', '').strip()
if not video_name or not video_url:
return jsonify({"error": "Both video_name and video_url are required"}), 400
if not api_key:
return jsonify({"error": "API key not configured"}), 500
try:
response = requests.post(
f"{base_url.rstrip('/')}/videos/upload",
headers={"X-Api-Key": api_key},
data={
'video_name': video_name,
'index': 'true', # Required: tells Reka to process the video
'video_url': video_url
},
timeout=30
)
response_data = response.json() if response.ok else {}
if response.ok:
video_id = response_data.get('video_id', 'unknown')
return jsonify({
"success": True,
"video_id": video_id,
"message": "Video uploaded successfully"
})
else:
error_msg = response_data.get('error', f"HTTP {response.status_code}")
return jsonify({"success": False, "error": error_msg}), response.status_code
except requests.Timeout:
return jsonify({"success": False, "error": "Request timed out"}), 504
except Exception as e:
return jsonify({"success": False, "error": f"Upload failed: {str(e)}"}), 500
Once the information from the frontend is validated we make a POST request to the Reka Vision API's /videos/upload endpoint. The parameters are sent as form data, and we include the API key in the headers for authentication. Here I was using URLs to upload videos, but you can also upload local files by adjusting the request accordingly. As you can see, it's pretty straightforward, and the documentation from Reka made it easy to understand what was needed.
The Magic: Sending Roast Requests to Reka Vision API
Here's where things get interesting. Once a video is uploaded, we can ask the AI to analyze it and generate content. The Reka Vision API supports conversational queries about video content:
def call_reka_vision_qa(video_id: str) -> Dict[str, Any]:
"""Call the Reka Video QA API to generate a roast"""
headers = {'X-Api-Key': api_key} if api_key else {}
payload = {
"video_id": video_id,
"messages": [
{
"role": "user",
"content": "Write a funny and gentle roast about the person, or the voice in this video. Reply in markdown format."
}
]
}
try:
resp = requests.post(
f"{base_url}/qa/chat",
headers=headers,
json=payload,
timeout=30
)
data = resp.json() if resp.ok else {"error": f"HTTP {resp.status_code}"}
if not resp.ok and 'error' not in data:
data['error'] = f"HTTP {resp.status_code} calling chat endpoint"
return data
except requests.Timeout:
return {"error": "Request to chat API timed out"}
except Exception as e:
return {"error": f"Chat API call failed: {e}"}
Here we pass the video ID and a prompt asking for a "funny and gentle roast." The API responds with AI-generated content, which we can then send back to the frontend for display. I try to give more "freedom" to the AI by asking it to reply in markdown format, which makes the output more engaging.
What really stood out to me was how approachable the Reka Vision API is. You don't need any special SDK—just the requests library making standard HTTP calls. And honestly, it doesn't matter what language you're used to; an HTTP call is pretty much always simple to do. Whether you're coming from .NET, Python, JavaScript, or anything else, you're just sending JSON and getting JSON back.
Authentication is refreshingly straightforward: just pop your API key in the header and you're good to go. No complex SDKs, no multi-step authentication flows, no wrestling with binary data streams. The conversational interface lets you ask questions in natural language, and you get back structured JSON responses with clear fields.
One thing worth noting: in this example, the videos are pre-uploaded and indexed, which means the responses come back fast. But here's the impressive part—the AI actually looks at the video content. It's not just reading a transcript or metadata; it's genuinely analyzing the visual elements. That's what makes the roasts so spot-on and contextual.
Final Thoughts
The Reka Vision API itself deserves credit for making video AI accessible. No complicated SDKs, no multi-GB model downloads, no GPU requirements. Just simple HTTP requests and powerful AI capabilities. I'm not saying I'm switching to Python full-time, but expect to see me sharing more Python projects in the future!
Every Monday, I share my "reading notes". Those are the articles, blog posts, podcast episodes, and books that catch my interest during the week and that I found interesting. It's a mix of the actuality and what I consumed.
What are Azure CLI Extensions? (Michael Crump) - An interesting first article of a series. This one introduces us to the extension... Hmmm. I think I have an idea.
Top 10 C# Developer Books for Summer 2019 (Claudio Bernasconi) - Great list of books to get started with C# or as a developer. I'll definitely refer people to it when asked where to start.
A really amazing book packed of very interesting advice. Things that you kind of already knew, or at least had a feeling you maybe knew are clearly explained to you.
After reading (or listening) this book, you will know why, and you can decide to fight it or change the when... improve your performance and use your time and energy on something else.