Franky's Notes

Reading Notes #700

Labels: ai, antigravity, aspire, best practices, cli, docker, donet, foundry, google, gorden, grill-me, icons, ide, nvidia, readingnotes, speach, Tools, vibe-conding

Seven hundred weeks.

When I started taking notes about the articles I was reading, I never imagined I would still be doing it 700 weeks later.

Back then, my notes lived on a USB key. I carried a small personal wiki with me and used it to save interesting articles, ideas, and discoveries. It was a simple way to build my own searchable knowledge base so I could find things again when I needed them.

In 2011, I started sharing those notes publicly on my blog, Franky's Notes. A few months later, I made another important change: I switched from writing in French to writing in English. At the time, I wasn't fluent, but I wanted to improve. "Notes de lecture" became "Reading Notes", and every week became an opportunity to learn something new while practicing a language that would eventually become a big part of my career.

Over the years, the format evolved. Articles were joined by podcasts, books, videos, and whatever else helped me learn and stay curious. Technology changes constantly, and one of the things I enjoy most about working in this industry is that there is always something new to discover.

What never changed was the habit itself.

Most mornings start the same way: a coffee, my e-reader, and a few articles. Throughout the week, I collect the things that made me think, taught me something, or simply felt worth sharing. Then, every Monday, I publish a new edition.

Seven hundred weeks later, these reading notes have become much more than a list of links. They are a record of what caught my attention, what I was learning, and how both technology and I have changed over the years.

If you've been reading along for a while, thank you. If you're new here, I hope you discover something interesting in the links below.

Suggestion of the week

Harness, Scaffold, and the AI Agent Terms Worth Getting Right (Sergio Paniego, Aritra Roy Gosthipaty) - This post shares a list of definitions. If you feel like there are some terms used in the "AI world" that you are not sure what they mean, this post is for you.

AI

Building an On-Device Voice Assistant with Microsoft Foundry Local | Microsoft Community Hub (Lee Stott) - Cool project! Really need to look into the local Foundry.
Building Fluent Icon Finder with the WinUI Copilot Skill (TheJoeFin) - I like the idea of these vibe-coded tools, as I often spend a lot of time searching for the right icon.
The Untrusted Autonomous Workload and AI Sandboxes (Vladimir Mikhalev) - Very nice post that shares how Docker sandboxes could help us do our work safely.
9 Things People Get Wrong With /grill-me and /grill-with-docs (Matt Pocock) - These are excellent tips to use with this skill, but they are also true in many other circumstances.
Meet Gordon: AI Agent for Container Workflows | Docker (Nuno Coracao, Deanna Sparks) - Gordon looks like the specialized agent we are all looking for. Looking to see if it delivers as advertised.

Programming

aspire init is no longer one-size-fits-all. The aspireify skill and your coding agent tailor the AppHost to your repo. (Erik Lieben) - I would say an interesting twist in this new version. A smart upgrade (pun intended)

Miscellaneous

An important update: Transitioning Gemini CLI to Antigravity CLI - I migrated this week to Antigravity IDE 2.0, and it works great. I had an intense coding session with the CLI (thanks to the crappy weather), and it just worked! No issues, no adaptation, no conversion. 5/5

~frank

Reading Notes #699

Labels: ai, conference, contribution, dotnet, foundry, github, html, local, markdown, Nuget, oss, performance, readingnotes, video, wasm

This week's reading notes bring you the latest insights into AI, .NET, open-source development, and even a few social hacks! From exploring background tasks in Blazor to the fascinating debate on Markdown vs. HTML for AI output, this roundup has something for everyone.

Jean-Olivier P. presenting at MsDevMtl user group

Let me know if you find anything particularly interesting; I'd love to hear your thoughts!

Programming

Running background tasks in Blazor with Web Workers: Exploring the .NET 11 preview - Part 1 (Andrew Lock) - This web worker role looks promising. This post explains the what, why, and how.
NuGet Package Pruning: Cleaner Dependencies and Actionable Vulnerability Reports - .NET Blog (Nikolche Kolev) - This post explains one feature that is now turned on by default in .NET that will improve our dependency graph.

AI

Anthropic Engineer Debates Use of Markdown vs. HTML in AI Agent Output (Paul Thurrott) - Interesting debate. Initially, I was hell no! But the more I think about it, the less certain I am. But we can always ask to generate HTML, so I guess it's okay to keep it lighter and more minimalist by default.
Using Azure Local Foundry CLI with PowerShell (Olivier Miossec) - A nice short tutorial to get local AI.

Open Source

GitHub for Beginners: Getting started with OSS contributions (Kedasha Kerr) - Short tutorial that provides tips to find a project and start your first contribution

Podcasts

513: Agents Over Chat: The Future of Developer Workflows (Merge Conflict) - James and Frank discuss token optimization, model choices, and the workflow; do you make one master big plan or do you do multiple passes...
How to Stop Being Socially Awkward (According to Science) with Behavioral Scientist Vanessa Van Edwards (A Bit of Optimism) - I really appreciated this episode, and I wish many others would find it. It's packed with little gems that will make you feel better.

Miscellaneous

If You Didn't Capture It, Did It Really Happen? (Golnaz Alibeigi) - Will people attend an in-person event if it's available online? That's what this post answers.

Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, and books that catch my interest during the week.

~frank

Reading Notes #698

Labels: ai, devops, docker, documentation, metric, optimization, readingnotes, sandbox, Security, skill

The world of AI is exploding, and with that explosion comes a crucial question: how do we keep these powerful agents in check? Traditional security methods might not cut it anymore, so developers are turning to innovative sandboxing techniques. Let's explore some of the most promising approaches and see which ones emerge as the frontrunners in this AI safety race.

AI

Comparing Sandboxing Approaches for AI Agents (Siri Varma Vegiraju) - Good news, we have options! But right now, what looks the easiest and safest is the Docker sandbox
I stopped using /grill-me for coding. Now, I use /grill-with-docs (Matt Pocock) - Looking forward to try this one as the original grill-me was a revelation for me.

Programming

What's New in Aspire 13.3 (Maddy Montaquila) - Oh, I love that new WithBrowserLog() need to upgrade all my solutions asap!

DevOps

Top 15 CI/CD Metrics: What to Track & Why They Matter (James Walker) - Are our workflows and pipeline healthy? Could we improve them? This post shares metrics to help us answer these questions.

Podcasts

he Real Reason You Feel Empty (Even When Life Looks Good) with Musician Mike Posner (A Bit of Optimism) - Fun discussion about life and how the goal in life shouldn't be getting to the end but enjoying the journey.
Use What Works with Dylan Beattie (.NET Rocks!) - This episode will provide great information for anyone who is interested in using an open source project to build their solution. You should not see OSS as only being free, and you should consider how you can help the maintainers.
AI SREs, Chat With Your Infrastructure with Anyshift (Agentic DevOps : AI Engineering for Infrastructure) - An interesting tool that Anyshift.io that digs into our infra to build a diagram and helps us to see. See what works and understand what doesn't. I might go try the free trial.
The Science of Achieving Goals: How to Change Your Life in 5 Simple Steps (The Mel Robbins Podcast) - We all have goals, some are attainable faster than others. For those goals that require more planning, this episode shares simple steps (very simple indeed) to achieve them. It's explained as only Mel can do it.

I've made it a habit to share the fascinating articles, blog posts, and books that cross my path each week. Think of this as an open invitation, if you stumble upon something intriguing, don't hesitate to share it!
Let's build a community of curious minds.

~frank

Reading Notes #697

Labels: ai, cli, data, database, docker, ghostty, github, local, openwebui, outage, readingnotes, Security, service, skill, sql, Tools, triage

This week’s reading notes cover a wide range of topics, from local AI workflows and Docker agent fleets to data privacy, SQL tips, and developer tooling updates. There’s also an interesting look at how AI may be reshaping platforms like GitHub, alongside practical articles and podcasts packed with ideas for developers and tech enthusiasts alike.

Programming

The new Run dialog: faster, cleaner, and more capable (Clint Rutkas) - Nice boost to performance for Run dialog (currently only in the insider version)
Ghostty Is Leaving GitHub (Mitchell Hashimoto) - I didn't realize it was that bad! It's true that I spend less time there, but did AI cause all those outages (by generating peaks of traffic)
Run Gemma 4 with Ollama locally, and keep the Aspire LLM Insights (sparkles and all) (Erik Lieben) - Oh yes! Interesting tutorial that explains how to hook local AI to Aspire (a dev tool)

Data

Things I Think I Think... Data Privacy - Data is the new gold, the new oil, now more than ever. This post shares an opinion of our future.

AI

AI Agent Fleet for CI/CD: How Docker Ships Faster (Manuel de la Peña) - This is a nice post that describes a new tool that's great and also provides good tips on how to write your skills.
Generate Images Locally with Docker Model Runner (Ignasi Lopez Luna) - I have a very similar setup, and I love having a local AI for my little tasks
Burn Through Your Backlog With My /triage Skill (Matt Pocock) - This post explains a skill triage to help you in your repos. There is also a video version.

Databases

Why COALESCE might be the most useful SQL function you’re not using right ( Lukas Vileikis) - If you don't know, coalesce this post for you. If you don't know any other alternative, this post is for you. If you're not sure when to use what this post is also for you

Podcasts

Docker AI, what’s new with MCP, Agents, Sandboxes, and more (DevOps and Docker Talk: Cloud Native Interviews and Tooling) - Michael Irwin from Docker is on this episode and they go through alllll the recent releases and some major upcoming stuff, rellay interesting episode
Making opinionated AI tooling decisions with Nimbalyst's Greg Hinkle (Hanselminutes with Scott Hanselman) - Interesting discussion around Nimbalyst, an AI tool that has a few pretty cool tricks up its sleeves

Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, and books that catch my interest during the week.

If you have interesting content, share it!

~frank

Apps That See: Bringing Vision AI to Your Projects

Labels: ai, conference, demo, github, oss, post, vision

I was wearing a t-shirt with a partial Reka logo at the edge of the frame. I never said the word "Reka" in that segment. The model caught the logo, connected it to the topic I was discussing, and mentioned it unprompted in the output it generated.

That is not a transcript trick. The model was watching.

At the AI Agents Conference 2026, I gave a talk called "Apps That See" — six live demos showing how to build applications that understand images and video. Every project is open source and ready to clone. This post walks through each one so you have enough context to pick it up, run it, and adapt it to something useful in your own work.

Vision AI Is Accessible Now

Not long ago, working with visual AI meant GPU clusters, specialized teams, and weeks of training. Today a compressed 4B model like Qwen or Gemini 3 runs on a regular laptop and handles image description well enough to prototype. Step up to a 7B model like Reka Edge and the quality improves meaningfully. It also runs locally: a gaming PC with a decent GPU is enough. No server required.

For tasks that need more power, cloud APIs give you faster results without local hardware requirements. The tradeoff is that your images and video go to a third-party provider. For corridor cameras or stock photos that is usually acceptable. For private or sensitive content, local is the better default.

The practical pattern: start local to build and test, then decide whether the task actually requires cloud.

What You Can Build With This

Accessibility: Describe a scene in real time for visually impaired users, or identify objects on demand.
Content creation: Extract structure from a video and turn it into a blog post, caption set, or highlight reel.
Productivity: Search through thousands of videos for a specific object or topic, even when the title gives no indication of the content.
Automation: Trigger actions only when specific visual conditions are met, such as an unrecognized person entering a room.
Fun: Most developers' first contact with AI is building something for themselves, and that is a perfectly valid starting point.

Demo 1: Caption This — Generate a Prompt from Any Image

Source: fboucher/caption-this

If you work with image generation models, you end up with a lot of images to test and compare. Writing the text prompt that would reproduce a specific image is tedious. This tool does it for you: give it an image, get back a prompt you can use to regenerate something similar.

The demo uses an HTTP client extension in VS Code to call the API directly, no SDK. Pass an image, ask for a plain-text prompt that would recreate it. One prompt detail that improved results noticeably: add no markdown to the instruction.

POST https://api.reka.ai/v1/chat
Content-Type: application/json

{
  "model": "reka-flash",
  "messages": [{
    "role": "user",
    "content": [
      { "type": "image_url", "image_url": { "url": "https://..." } },
      { "type": "text", "text": "Write a prompt in plain text, no markdown, that would generate the exact same image." }
    ]
  }]
}

One thing to know when testing this across different models: some accept an image URL directly, others require the image as a base64-encoded string. Same task, same prompt, different input contract. If you plan to swap models in your app, account for this difference from the start.

Demo 2: Media Library — Compare Vision Models Side by Side

Source: fboucher/media-library

This is a web app that connects to multiple vision backends and lets you switch between them at runtime. The motivation: benchmark Reka Edge running locally — via OpenRouter or directly through the Reka API — against other models on real tasks.

Object detection surfaces the biggest portability problem. Some models return bounding boxes in an HTML-style bracket format with pixel coordinates. Others use a 2D box structure with a different coordinate scheme. If you code against one format and then swap models, your rendering breaks. There is no standard here — handle the differences at the application layer, not the model layer.

The app uses the OpenAI API format as the common interface across all backends. Any model with a compatible endpoint can be swapped in with minimal changes. It does not eliminate the per-model quirks, but it reduces the friction of switching to a configuration change rather than a rewrite.

Video input is supported too, though far fewer models handle it than images. Of the models tested, Reka Edge is the standout for video — the others either reject it or behave inconsistently.

Demo 3: Video2Blog — Turn a Video into a Structured Post

Source: fboucher/video2blog

I built this for myself. I do a lot of tutorial videos and I wanted a tool that would turn a recording into a structured blog post without me having to write one from scratch.

The tool sends the video to a vision model with a detailed prompt: target structure, tone, format, and an instruction to flag moments where a screenshot would add value. The model returns timestamps — it cannot extract frames itself, but it tells you exactly where to look, and you pull them locally with ffmpeg.

That creates one architectural quirk worth knowing: the video lives in two places. ffmpeg needs it locally to extract frames. The hosted model needs it uploaded to analyze content. For a one-evening project it works well enough, and I use it often enough that it has paid for itself many times over.

After the first draft, you stay in a conversation loop: change the tone, translate to French, swap a timestamp, restructure a section. The model holds context and iterates with you until the result is what you want.

Demo 4: Video Analyzer — Search and Query Your Video Library

Source: reka-ai/api-examples-dotnet

Most video search runs on titles, descriptions, and transcribed audio. This demo searches by what is actually visible on screen.

The app pre-indexes a video library by sending each video through a vision model ahead of time. When a query arrives, the heavy work is already done. A search for "robot arm" returns the right video — a clip of a robotic arm animation. It also returns a false positive: fast-moving hands apparently looked close enough to fool the model. Useful, not perfect, and worth designing around in your UX.

The Q&A feature goes further. You pick a video and ask a specific question. "What database was used?" returned MySQL — and noted it was running in a Docker container. The model identified that from watching the screen, not from audio. No transcript needed.

From there, you can generate study materials from any recorded session. The demo produces a multiple-choice quiz with answer options, correct answers, and explanations. The model is doing comprehension, not transcription.

Demo 5: Roast My Life — What the Model Actually Sees

Source: reka-ai/api-examples-python

I never mentioned the pictures on my wall. The model did.

In a video about Python and AI, the model's generated blog post made a remark about the artwork hanging behind me. I had said nothing about it. The model noticed, mentioned it, and moved on as if it were obvious.

Then there was the t-shirt moment described at the top of this post. A partial logo, half out of frame, no mention of it anywhere in the audio — and the model connected it to the topic anyway.

This demo is named Roast My Life because the model ends up commenting on things you never intended to share. But the real point is what it reveals: a vision model is not a smarter transcript. It is watching. The larger models do this particularly well, and once you see it, it changes how you think about what these tools can do — and what they will pick up without you asking.

Demo 6: N8N Automation — No-Code Video Clipping Pipeline

Sources: N8N Reka Vision integration

Vision AI does not always need custom code. This demo wires everything together in N8N, a visual workflow tool, with no programming required.

The trigger is a new video published to YouTube. The workflow finds an engaging clip, reformats it from horizontal to vertical, adds captions in a specific style (all lowercase, specific colors — chosen to be obviously distinct from any default), and sends an email with the finished clip attached. The whole thing runs automatically.

For developers, this pattern is worth knowing even if you code everything else. Many real business workflows have a vision AI step that fits cleanly into a larger automation, and a no-code tool is often the fastest way to ship it.

Watch the Full Talk

The demos above are the written version. The live version, with the actual code running, models responding in real time, and a few things going sideways in interesting ways, is on YouTube.

All the Code

The demos span Python, C#, raw HTTP, Go, and N8N. Vision AI is not tied to a specific stack — if your environment can make an HTTP request, it can call a vision model.

All projects:

Reading Notes #696

Labels: agent, ai, api, dotnet, editing, foundry, git, openapi, readingnotes, sdk, session, swagger, video, vscode

This week's collection highlights the rapid evolution of AI agents, exploring their asynchronous capabilities, deployment journeys, and their impact on DevOps and video editing. On the programming front, we explore new Git features and API versioning with OpenAPI in .NET 10. We also dive into some fascinating podcast discussions ranging from the GUI vs. CLI debate to generational perspectives in the workplace.

Enjoy the reading!

AI

All your agents are going async — /dev/knill (Zak Knill) - An interesting solution is explained in this post to ease our communication with AI agents.
AI Didn't Kill Video Editing (It Made Some Nice Motion Graphics) (Golnaz) - It's so easy and convenient to think that AI is doing the work and could replace humans if you don't know what really is the tasks.
From Local to Production: The Complete Developer Journey for Building, Composing, and Deploying AI Agents | Microsoft Foundry Blog (Takuto , jeffhollan) - Finally reaching v1! Hurray! And a new name for a VS Code extension.

Programming

New features in Git 2.54: easier rebasing, hooks, and statistics (Andrew Lock) - A nice post that explains how to rebase correctly with the new features included in the last release
Combining API versioning with OpenAPI in .NET 10 applications - .NET Blog (Sander ten Brinke) - There is a lot in this post. Close to the end, she explains how to configure SwaggerUI or Scalar to visualize your API. Nice, since now API doc is generated by OpenAPI.

Podcast

510: AI Agents: Claws, Copilot, GUI vs CLI Debate (Merge Conflict) - In this episode, James is put in the spotlight and needs to talk about this phone situation. Honestly, interesting discussion and they finally end up talking about the AI agent, you know, CLI versus UI.
The Real Reason Young People Don't Have 'The Hunger' for Work (And What Leaders Need to Hear) with Generations Expert Dr. Eliza Filby (A Bit of Optimism) - I'm sure, like me, you heard about "the Gen Z are doing this" or "the millennials are like that". It's interesting to go deeper than this and understand the reason why the Boomer, the Gen X, all those generation they do what they do, and also understand the phase, the years, like why all those generations seems to get closer and closer.
Can AI Agents Safely Become DevOps Engineers? (Agentic DevOps : AI + Infra Ops) - I read and consume a lot of AI as a developer, it's interesting to see the DevOps side waking up and are building AI agents that focus on DevOps.
Chet Husk: .NET Tooling - Episode 399 (AI DevOps Podcast) - I never thought about that competition, the human versus the machine, related to the consumption of the output of a CLI. This was a very interesting episode about the performance and the priorities.

~frank

Reading Notes #695

Labels: agent, ai, claude, copilot, design, docker, foundry, gpt, model, obs, quality, readingnotes, Security, streaming, Visual Studio, youtube

A mix of thoughtful perspectives and practical updates this week. From evolving AI tools and model selection guidance to changes in developer workflows and tooling, there’s plenty to reflect on. Add in insights on streaming and a strong push toward more secure environments, and you get a well-rounded set of reads worth your time.

Suggestion of the week

Why MicroVMs: The Architecture Behind Docker Sandboxes (Srini Sekaran, Craig Gumbley) - Yes! Officially launched, so we can finally include some security in our processes!

AI

Introducing Claude Design by Anthropic Labs - Looking forward to try the new interface that that seams to allow a more artistic interaction.
The importance of people who care – Rachel Andrew - Interesting post about people who care in the world, where doing a lot of stuff fast is seen as a goal
Changes to GitHub Copilot Individual plans (Joe Binder) - Big changes for Copilot that will probably affect your workflow. This post shares the details and reasons of this disturbance; it's all for a good reason
GPT-5 vs GPT-4.1 - choosing the right model for your use case - Microsoft Foundry - Yes, this post is about two specific models in Foundry, but it's still true that it can definitely help to understand when to pick one model over the other in general

Programming

It's Time for a Visual Studio Upgrade - This post does a comparison between the old and the new versions of the Visual Studio IDE and shares details about the most impactful changes.

Miscellaneous

Livestreaming Before It Was Cool (Golnaz) - Curious to learn more about the streaming options from the different platforms to the tools, and the pro and cons of each? This post is for you, and on top of that, you get the Microsoft story.

~frank

Reading Notes #694

Labels: ai, arm, blazor, cli, devop, docker, dotnet, foudry, local, maui, readingnotes, sandboxes, Security

A fast-moving mix this week: AI tooling, ARM readiness, Docker sandboxes, and real-world lessons from agents. Practical insights across .NET, DevOps, and local-first workflows.

Suggestion of the week

Token Exhaustion: What Three AI Coding Agents Taught Me in a Single Week (Alexandre Brisebois) - Great post! Everyone needs to learn this lesson. Now you have an alternative instead of the hard way.

AI

How Intelligent is AI? (Shawn Wildermuth) - Interesting thoughts about the current situation between code, developers, and AI
How to Analyze Hugging Face for Arm64 Readiness (Ajeet Singh Raina) - This is a great tutorial that explains in detail how, in 7 steps, you can produce a stable arm container image.
Foundry Local is now Generally Available (sam kemp) - Interesting solution when you need to stay local.
Copilot CLI vs VS Code - When I Use Each (Golnaz) - Learn when some prefer using a CLI or an editor. There are so many different angles and things to do.

Programming

Running AI agents with customized templates using docker sandbox (Andrew Lock) - The post have been waiting for! This post goes further than the happy path. Shows how to add the tools you want or the os you need.
Why .NET MAUI Popups Lag and How to Fix Performance Issues (Kompelli Sravan Kumar) - Good to know performance as such a huge impact[on the user experience.

DevOps

Reclaim Developer Hours through Smarter Vulnerability Prioritization with Docker and Mend.io (Adam Dawson, Dor Hayun) - Understand better the vulnerabilities with the new integration of mend.io to Docker

Podcasts

Our Favorite Agent Setups (Agentic DevOps) - Nice discussion that goes through many AI harnesses, agents, models, and what they are playing with right now. OpenClaw, OpenCode, Claude Code, Copilot, and all of it.
Michael Perry: AI-assisted Development - Episode 397 (AI DevOps Podcast) - Interesting discussion about AI-assisted Development (or can we say programming?) with a focus on skills and how they could be defined.

Reading Notes #693

Labels: agent, ai, blazor, company, docker, dotnet, fluentui, Migration, readingnotes, reka, sanbox, sandbox, Security, smart glasses, Tools, vision

I'm always on the lookout for innovative ways to enhance my coding experience, and this week's Reading Notes are filled with exciting discoveries! From cutting-edge UI libraries to secure sandbox environments for AI agents, I've curated a selection of articles that showcase the latest programming trends and technologies.

Whether you're interested in harnessing the power of Docker sandboxes or exploring the potential of smart glasses integration, there's something on this list for everyone.

Programming

What's new for the Microsoft Fluent UI Blazor library 5.0 RC2 (Vincent Baaij) - V5 already! Wow, time flies! Tons of features in this release, looking forward to try the pinned column from the grid.

AI

Docker Sandboxes: Run Agents in YOLO Mode, Safely (Eric Jia, Srini Sekaran,Timir Karia) - From what I saw, it is very secure, and it is that simple to set up. Looking forward to do more with it.
Running AI agents safely in a microVM using docker sandbox (Andrew Lock) - This post answers so many questions! I heard a lot about those sandboxes and tried a few things, but had so many questions. Great post Andrew.
Experimenting with AI subagents (Nicolas Fränkel) - An interesting story with what seems to be a good vision of the future.
Old Protocols, New Pipes (Mark Downie) - Well said! And I hope you had a good time doing it.
I Recorded 13 Hours of My Day With Smart Glasses for AI. Here's What I Built and What I Learned (tash-2s) - A really cool project that uses AI to make a pile of videos into some kind of dataset. A new to do daily journaling.

Miscellaneous

Claude Code Windows Migration Guide: Move Your Setup (Dirk Strauss) - There is always a part that is easy in starting a new computer and a part that requires more effort.

~frank

How to Serve a Vision AI Model Locally with vLLM and Reka Edge

Labels: ai, local, oss, post, reka, vision, vllm

Running an AI model as a one-shot script is useful, but it forces you to restart the model every time you need a result. Setting it up as a service lets any application send requests to it continuously, without reloading the model. This guide shows how to serve Reka Edge using vLLM and an open-source plugin, then connect a web app to it for image description and object detection.

The vLLM plugin is available at github.com/reka-ai/vllm-reka.
The demo Media Library app is at github.com/fboucher/media-library.

Prerequisites

You need a machine with a GPU and either Linux, macOS, or Windows (with WSL). I use UV, a fast Python package and project manager, or pip + venv if you prefer.

Clone the vLLM Reka Plugin

Reka models require a dedicated plugin to run under vLLM, not all models need this extra step, but Reka's architecture requires it. Clone the plugin repository and enter the directory:

git clone https://github.com/reka-ai/vllm-reka
cd vllm-reka

The repository contains the plugin code and a serve.sh script you will use to start the service.

Download the Reka Edge Model

Before starting the service, you need the model weights locally. Install the Hugging Face Hub CLI and use it to pull the reka-edge-2603 model into your project directory:

uv sync
uv pip install huggingface_hub
uvx hf download RekaAI/reka-edge-2603 --local-dir ./models/reka-edge-2603

This is a large model, so make sure you have enough disk space and a stable connection.

Start the Service

Once the model is downloaded, start the vLLM service using the serve.sh script included in the plugin:

uv run bash serve.sh ./models/reka-edge-2603

The script accepts environment variables to configure which model to load and how much GPU memory to allocate. If your GPU cannot fit the model at default settings, open serve.sh and adjust the variables at the top. The repository README lists the available options. The service takes a few seconds to load the model weights, then starts listening for HTTP requests.

As an example with an NVIDIA GeForce RTX 5070, here are the settings I used to run the model:

export GPU_MEM=0.80
export MAX_LEN=4096
export MAX_BATCH_TOKENS=4096
export MAX_IMAGES=2
export MAX_VIDEOS=1
export VIDEO_NUM_FRAMES=4
uv run bash serve.sh ./models/reka-edge-2603

Connect the Media Library App

With the backend running, time to start the Media Library app. Clone the repository, jump into the directory, and run it with Docker:

git clone https://github.com/fboucher/media-library
cd media-library
docker compose up --build -d

Open http://localhost:8080 in your browser, then add a new connection with these settings:

Name: local (or any label you want)
IP address: your machine's local network IP (e.g. 192.168.x.x)
API key: leave blank or enter anything — no key is required for a local connection
Model: reka-edge-2603

Click Test to confirm the connection, then save it.

Try It: Image Description and Object Detection

Select an image in the app and choose your local connection, then click Fill with AI. The app sends the image to your vLLM service, and the model returns a natural language description. You can watch the request hit your backend in the terminal where the service is running.

Reka Edge also supports object detection. Type a prompt asking the model to locate a specific feature (ex: "face") and the model returns bounding-box coordinates. The app renders these as red boxes overlaid on the image. This works for any region you can describe in a prompt.

Switch to the Reka Cloud API

If your local GPU is too slow for production use, you can point the app at the Reka APIs instead. Add a new connection in the app and set the base URL to the Reka API endpoint. Get your API key from platform.reka.ai. OpenRouter is another option if you prefer a unified API across providers.

The model name stays the same (reka-edge-2603), so switching between local and cloud is just a matter of selecting a different connection in the app. The cloud API is noticeably faster because Reka's servers are more powerful than a local GPU (at least mine :) ). During development, use the local service to avoid burning credits; switch to the API for speed when you need it.

What You Can Build

The service you just set up accepts any image, or video via HTTP — point a script at a folder and you have a batch pipeline for descriptions, tags, or bounding boxes. Swap the prompt and you change what it extracts. The workflow is the same whether you are running locally or through the API.

References

Reka Edge model: huggingface.co/RekaAI/reka-edge-2603
vLLM Reka plugin: github.com/reka-ai/vllm-reka
Media Library app: github.com/fboucher/media-library
Reka API platform: platform.reka.ai

Reading Notes #692

Labels: ai, apple, aspire, azure, Blog, cli, cloud, documentation, dotnet, git, ollama, openclaw, readingnotes, Tools

The tech landscape is constantly evolving, and keeping up with the latest developments can be overwhelming. From AI-powered tools like Ollama and OpenClaw, to new ways of programming with Aspire Docs and Azure CLI, it seems like there's always something new to explore. In this edition of Reading Notes, I'll share some of the interesting things that caught my eye recently, from AI advancements to developer tools and beyond.

Suggestion of the week

The Future of Tech Blogging in the Age of AI (Mark Heath) - Great post, and the timing is perfect. Personally, I agree to all of it. What are your thoughts?

AI

Ollama is now powered by MLX on Apple Silicon in preview - Great news for those who bought the new M5 Apple computer, Ollama is now much faster for you!
The simplest and fastest way to setup OpenClaw - Hard to be easier than this. I wish we could do the same to run it inside a Docker sandbox
Meet GitAgent: The Docker for AI Agents that is Finally Solving the Fragmentation between LangChain, AutoGen, and Claude Code - MarkTechPost (Michal Sutter) - Another day, another AI tool. It looks interesting, curious to dig in and see it in action.

Programming

Aspire Docs in Your Terminal (and Your AI's Brain) (David Pine) - Really cool idea, and for sure I will try it. Having options is great, and this is done perfectly here

Cloud

Azure Developer CLI (azd) - March 2026: Run and Debug AI Agents Locally, GitHub Copilot Integration, & Container App Jobs (PuiChee (PC) Chan) - Nice update from this great deployment tool.

Miscellaneous

NDI Tools: The Unsung Hero of Video Production (Golnaz) - I started using NDI tools for my stream back in 2020, when I was bringing one of my friends into my stream. Amazing, and you must read this post

Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, podcasts and books that catch my interest during the week.

If you have interesting content, share it!

~frank

Pages

Suggestion of the week

AI

Programming

Miscellaneous

Programming

AI

Open Source

Podcasts

Miscellaneous

AI

Programming

DevOps

Podcasts

Programming

Data

AI

Databases

Podcasts

Vision AI Is Accessible Now

What You Can Build With This

Demo 1: Caption This — Generate a Prompt from Any Image

Demo 2: Media Library — Compare Vision Models Side by Side

Demo 3: Video2Blog — Turn a Video into a Structured Post

Demo 4: Video Analyzer — Search and Query Your Video Library

Demo 5: Roast My Life — What the Model Actually Sees

Demo 6: N8N Automation — No-Code Video Clipping Pipeline

Watch the Full Talk

All the Code

AI

Programming

Podcast

Suggestion of the week

AI

Programming

Miscellaneous

Suggestion of the week

AI

Programming

DevOps

Podcasts

Programming

AI

Miscellaneous

Prerequisites

Clone the vLLM Reka Plugin

Download the Reka Edge Model

Start the Service

Connect the Media Library App

Try It: Image Description and Object Detection

Switch to the Reka Cloud API

What You Can Build

References

Suggestion of the week

AI

Programming

Cloud

Miscellaneous