Reading Notes #700

Seven hundred weeks.

When I started taking notes about the articles I was reading, I never imagined I would still be doing it 700 weeks later.


Back then, my notes lived on a USB key. I carried a small personal wiki with me and used it to save interesting articles, ideas, and discoveries. It was a simple way to build my own searchable knowledge base so I could find things again when I needed them.

In 2011, I started sharing those notes publicly on my blog, Franky's Notes. A few months later, I made another important change: I switched from writing in French to writing in English. At the time, I wasn't fluent, but I wanted to improve. "Notes de lecture" became "Reading Notes", and every week became an opportunity to learn something new while practicing a language that would eventually become a big part of my career.

Over the years, the format evolved. Articles were joined by podcasts, books, videos, and whatever else helped me learn and stay curious. Technology changes constantly, and one of the things I enjoy most about working in this industry is that there is always something new to discover.

What never changed was the habit itself.

Most mornings start the same way: a coffee, my e-reader, and a few articles. Throughout the week, I collect the things that made me think, taught me something, or simply felt worth sharing. Then, every Monday, I publish a new edition.

Seven hundred weeks later, these reading notes have become much more than a list of links. They are a record of what caught my attention, what I was learning, and how both technology and I have changed over the years.

If you've been reading along for a while, thank you. If you're new here, I hope you discover something interesting in the links below.

Suggestion of the week

AI

Programming

Miscellaneous

~frank

Reading Notes #699

This week's reading notes bring you the latest insights into AI, .NET, open-source development, and even a few social hacks! From exploring background tasks in Blazor to the fascinating debate on Markdown vs. HTML for AI output, this roundup has something for everyone.

Jean-Olivier P. presenting at MsDevMtl user group

Let me know if you find anything particularly interesting; I'd love to hear your thoughts!

Programming

AI

Open Source

Podcasts

Miscellaneous


Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, and books that catch my interest during the week. 

 ~frank

Reading Notes #698

The world of AI is exploding, and with that explosion comes a crucial question: how do we keep these powerful agents in check? Traditional security methods might not cut it anymore, so developers are turning to innovative sandboxing techniques. Let's explore some of the most promising approaches and see which ones emerge as the frontrunners in this AI safety race.




AI

Programming

DevOps

Podcasts


I've made it a habit to share the fascinating articles, blog posts, and books that cross my path each week. Think of this as an open invitation, if you stumble upon something intriguing, don't hesitate to share it!
Let's build a community of curious minds.

~frank

Reading Notes #697

This week’s reading notes cover a wide range of topics, from local AI workflows and Docker agent fleets to data privacy, SQL tips, and developer tooling updates. There’s also an interesting look at how AI may be reshaping platforms like GitHub, alongside practical articles and podcasts packed with ideas for developers and tech enthusiasts alike.


Programming

Data

AI

Databases

Podcasts


Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, and books that catch my interest during the week.

If you have interesting content, share it!

~frank

Apps That See: Bringing Vision AI to Your Projects

I was wearing a t-shirt with a partial Reka logo at the edge of the frame. I never said the word "Reka" in that segment. The model caught the logo, connected it to the topic I was discussing, and mentioned it unprompted in the output it generated.

That is not a transcript trick. The model was watching.

At the AI Agents Conference 2026, I gave a talk called "Apps That See" — six live demos showing how to build applications that understand images and video. Every project is open source and ready to clone. This post walks through each one so you have enough context to pick it up, run it, and adapt it to something useful in your own work.

Vision AI Is Accessible Now

Not long ago, working with visual AI meant GPU clusters, specialized teams, and weeks of training. Today a compressed 4B model like Qwen or Gemini 3 runs on a regular laptop and handles image description well enough to prototype. Step up to a 7B model like Reka Edge and the quality improves meaningfully. It also runs locally: a gaming PC with a decent GPU is enough. No server required.

For tasks that need more power, cloud APIs give you faster results without local hardware requirements. The tradeoff is that your images and video go to a third-party provider. For corridor cameras or stock photos that is usually acceptable. For private or sensitive content, local is the better default.

The practical pattern: start local to build and test, then decide whether the task actually requires cloud.

What You Can Build With This

  • Accessibility: Describe a scene in real time for visually impaired users, or identify objects on demand.
  • Content creation: Extract structure from a video and turn it into a blog post, caption set, or highlight reel.
  • Productivity: Search through thousands of videos for a specific object or topic, even when the title gives no indication of the content.
  • Automation: Trigger actions only when specific visual conditions are met, such as an unrecognized person entering a room.
  • Fun: Most developers' first contact with AI is building something for themselves, and that is a perfectly valid starting point.


Demo 1: Caption This — Generate a Prompt from Any Image


Source: fboucher/caption-this

If you work with image generation models, you end up with a lot of images to test and compare. Writing the text prompt that would reproduce a specific image is tedious. This tool does it for you: give it an image, get back a prompt you can use to regenerate something similar.

The demo uses an HTTP client extension in VS Code to call the API directly, no SDK. Pass an image, ask for a plain-text prompt that would recreate it. One prompt detail that improved results noticeably: add no markdown to the instruction.

POST https://api.reka.ai/v1/chat
Content-Type: application/json

{
  "model": "reka-flash",
  "messages": [{
    "role": "user",
    "content": [
      { "type": "image_url", "image_url": { "url": "https://..." } },
      { "type": "text", "text": "Write a prompt in plain text, no markdown, that would generate the exact same image." }
    ]
  }]
}

One thing to know when testing this across different models: some accept an image URL directly, others require the image as a base64-encoded string. Same task, same prompt, different input contract. If you plan to swap models in your app, account for this difference from the start.

Demo 2: Media Library — Compare Vision Models Side by Side


Source: fboucher/media-library

This is a web app that connects to multiple vision backends and lets you switch between them at runtime. The motivation: benchmark Reka Edge running locally — via OpenRouter or directly through the Reka API — against other models on real tasks.

Object detection surfaces the biggest portability problem. Some models return bounding boxes in an HTML-style bracket format with pixel coordinates. Others use a 2D box structure with a different coordinate scheme. If you code against one format and then swap models, your rendering breaks. There is no standard here — handle the differences at the application layer, not the model layer.

The app uses the OpenAI API format as the common interface across all backends. Any model with a compatible endpoint can be swapped in with minimal changes. It does not eliminate the per-model quirks, but it reduces the friction of switching to a configuration change rather than a rewrite.

Video input is supported too, though far fewer models handle it than images. Of the models tested, Reka Edge is the standout for video — the others either reject it or behave inconsistently.

Demo 3: Video2Blog — Turn a Video into a Structured Post


Source: fboucher/video2blog

I built this for myself. I do a lot of tutorial videos and I wanted a tool that would turn a recording into a structured blog post without me having to write one from scratch.

The tool sends the video to a vision model with a detailed prompt: target structure, tone, format, and an instruction to flag moments where a screenshot would add value. The model returns timestamps — it cannot extract frames itself, but it tells you exactly where to look, and you pull them locally with ffmpeg.

That creates one architectural quirk worth knowing: the video lives in two places. ffmpeg needs it locally to extract frames. The hosted model needs it uploaded to analyze content. For a one-evening project it works well enough, and I use it often enough that it has paid for itself many times over.

After the first draft, you stay in a conversation loop: change the tone, translate to French, swap a timestamp, restructure a section. The model holds context and iterates with you until the result is what you want.

Demo 4: Video Analyzer — Search and Query Your Video Library


Source: reka-ai/api-examples-dotnet

Most video search runs on titles, descriptions, and transcribed audio. This demo searches by what is actually visible on screen.

The app pre-indexes a video library by sending each video through a vision model ahead of time. When a query arrives, the heavy work is already done. A search for "robot arm" returns the right video — a clip of a robotic arm animation. It also returns a false positive: fast-moving hands apparently looked close enough to fool the model. Useful, not perfect, and worth designing around in your UX.

The Q&A feature goes further. You pick a video and ask a specific question. "What database was used?" returned MySQL — and noted it was running in a Docker container. The model identified that from watching the screen, not from audio. No transcript needed.

From there, you can generate study materials from any recorded session. The demo produces a multiple-choice quiz with answer options, correct answers, and explanations. The model is doing comprehension, not transcription.

Demo 5: Roast My Life — What the Model Actually Sees


Source: reka-ai/api-examples-python

I never mentioned the pictures on my wall. The model did.

In a video about Python and AI, the model's generated blog post made a remark about the artwork hanging behind me. I had said nothing about it. The model noticed, mentioned it, and moved on as if it were obvious.

Then there was the t-shirt moment described at the top of this post. A partial logo, half out of frame, no mention of it anywhere in the audio — and the model connected it to the topic anyway.

This demo is named Roast My Life because the model ends up commenting on things you never intended to share. But the real point is what it reveals: a vision model is not a smarter transcript. It is watching. The larger models do this particularly well, and once you see it, it changes how you think about what these tools can do — and what they will pick up without you asking.

Demo 6: N8N Automation — No-Code Video Clipping Pipeline


Sources: N8N Reka Vision integration

Vision AI does not always need custom code. This demo wires everything together in N8N, a visual workflow tool, with no programming required.

The trigger is a new video published to YouTube. The workflow finds an engaging clip, reformats it from horizontal to vertical, adds captions in a specific style (all lowercase, specific colors — chosen to be obviously distinct from any default), and sends an email with the finished clip attached. The whole thing runs automatically.

For developers, this pattern is worth knowing even if you code everything else. Many real business workflows have a vision AI step that fits cleanly into a larger automation, and a no-code tool is often the fastest way to ship it.


Watch the Full Talk

The demos above are the written version. The live version, with the actual code running, models responding in real time, and a few things going sideways in interesting ways, is on YouTube.


All the Code

The demos span Python, C#, raw HTTP, Go, and N8N. Vision AI is not tied to a specific stack — if your environment can make an HTTP request, it can call a vision model.

All projects:


Reading Notes #696

This week's collection highlights the rapid evolution of AI agents, exploring their asynchronous capabilities, deployment journeys, and their impact on DevOps and video editing. On the programming front, we explore new Git features and API versioning with OpenAPI in .NET 10. We also dive into some fascinating podcast discussions ranging from the GUI vs. CLI debate to generational perspectives in the workplace. 
Enjoy the reading!

AI

Programming

Podcast


~frank

Reading Notes #695

A mix of thoughtful perspectives and practical updates this week. From evolving AI tools and model selection guidance to changes in developer workflows and tooling, there’s plenty to reflect on. Add in insights on streaming and a strong push toward more secure environments, and you get a well-rounded set of reads worth your time.


Suggestion of the week

AI

Programming

Miscellaneous

  • Livestreaming Before It Was Cool (Golnaz) - Curious to learn more about the streaming options from the different platforms to the tools, and the pro and cons of each? This post is for you, and on top of that, you get the Microsoft story.
~frank


Reading Notes #694

A fast-moving mix this week: AI tooling, ARM readiness, Docker sandboxes, and real-world lessons from agents. Practical insights across .NET, DevOps, and local-first workflows.


Suggestion of the week

AI

Programming

DevOps

Podcasts

  • Our Favorite Agent Setups (Agentic DevOps) - Nice discussion that goes through many AI harnesses, agents, models, and what they are playing with right now. OpenClaw, OpenCode, Claude Code, Copilot, and all of it.

  • Michael Perry: AI-assisted Development - Episode 397 (AI DevOps Podcast) - Interesting discussion about AI-assisted Development (or can we say programming?) with a focus on skills and how they could be defined.


Reading Notes #693

I'm always on the lookout for innovative ways to enhance my coding experience, and this week's Reading Notes are filled with exciting discoveries! From cutting-edge UI libraries to secure sandbox environments for AI agents, I've curated a selection of articles that showcase the latest programming trends and technologies. 

Whether you're interested in harnessing the power of Docker sandboxes or exploring the potential of smart glasses integration, there's something on this list for everyone.


Programming

AI

Miscellaneous

~frank

How to Serve a Vision AI Model Locally with vLLM and Reka Edge

Running an AI model as a one-shot script is useful, but it forces you to restart the model every time you need a result. Setting it up as a service lets any application send requests to it continuously, without reloading the model. This guide shows how to serve Reka Edge using vLLM and an open-source plugin, then connect a web app to it for image description and object detection.

Prerequisites

You need a machine with a GPU and either Linux, macOS, or Windows (with WSL). I use UV, a fast Python package and project manager, or pip + venv if you prefer.

Clone the vLLM Reka Plugin

Reka models require a dedicated plugin to run under vLLM, not all models need this extra step, but Reka's architecture requires it. Clone the plugin repository and enter the directory:

git clone https://github.com/reka-ai/vllm-reka
cd vllm-reka

The repository contains the plugin code and a serve.sh script you will use to start the service.

Download the Reka Edge Model

Before starting the service, you need the model weights locally. Install the Hugging Face Hub CLI and use it to pull the reka-edge-2603 model into your project directory:

uv sync
uv pip install huggingface_hub
uvx hf download RekaAI/reka-edge-2603 --local-dir ./models/reka-edge-2603

This is a large model, so make sure you have enough disk space and a stable connection.

Start the Service

Once the model is downloaded, start the vLLM service using the serve.sh script included in the plugin:

uv run bash serve.sh ./models/reka-edge-2603

The script accepts environment variables to configure which model to load and how much GPU memory to allocate. If your GPU cannot fit the model at default settings, open serve.sh and adjust the variables at the top. The repository README lists the available options. The service takes a few seconds to load the model weights, then starts listening for HTTP requests.

As an example with an NVIDIA GeForce RTX 5070, here are the settings I used to run the model:

export GPU_MEM=0.80
export MAX_LEN=4096
export MAX_BATCH_TOKENS=4096
export MAX_IMAGES=2
export MAX_VIDEOS=1
export VIDEO_NUM_FRAMES=4
uv run bash serve.sh ./models/reka-edge-2603

Connect the Media Library App

With the backend running, time to start the Media Library app. Clone the repository, jump into the directory, and run it with Docker:

git clone https://github.com/fboucher/media-library
cd media-library
docker compose up --build -d

Open http://localhost:8080 in your browser, then add a new connection with these settings:

  • Name: local (or any label you want)
  • IP address: your machine's local network IP (e.g. 192.168.x.x)
  • API key: leave blank or enter anything — no key is required for a local connection
  • Model: reka-edge-2603

Click Test to confirm the connection, then save it.


Try It: Image Description and Object Detection

Select an image in the app and choose your local connection, then click Fill with AI. The app sends the image to your vLLM service, and the model returns a natural language description. You can watch the request hit your backend in the terminal where the service is running.

Reka Edge also supports object detection. Type a prompt asking the model to locate a specific feature (ex: "face") and the model returns bounding-box coordinates. The app renders these as red boxes overlaid on the image. This works for any region you can describe in a prompt.



Switch to the Reka Cloud API

If your local GPU is too slow for production use, you can point the app at the Reka APIs instead. Add a new connection in the app and set the base URL to the Reka API endpoint. Get your API key from platform.reka.ai. OpenRouter is another option if you prefer a unified API across providers.

The model name stays the same (reka-edge-2603), so switching between local and cloud is just a matter of selecting a different connection in the app. The cloud API is noticeably faster because Reka's servers are more powerful than a local GPU (at least mine :) ). During development, use the local service to avoid burning credits; switch to the API for speed when you need it.

What You Can Build

The service you just set up accepts any image, or video via HTTP — point a script at a folder and you have a batch pipeline for descriptions, tags, or bounding boxes. Swap the prompt and you change what it extracts. The workflow is the same whether you are running locally or through the API.

References

Reading Notes #692

The tech landscape is constantly evolving, and keeping up with the latest developments can be overwhelming. From AI-powered tools like Ollama and OpenClaw, to new ways of programming with Aspire Docs and Azure CLI, it seems like there's always something new to explore. In this edition of Reading Notes, I'll share some of the interesting things that caught my eye recently, from AI advancements to developer tools and beyond.


Suggestion of the week

AI

Programming

Cloud

Miscellaneous


Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, podcasts and books that catch my interest during the week.

If you have interesting content, share it!

~frank