Showing posts with label demo. Show all posts
Showing posts with label demo. Show all posts

Apps That See: Bringing Vision AI to Your Projects

I was wearing a t-shirt with a partial Reka logo at the edge of the frame. I never said the word "Reka" in that segment. The model caught the logo, connected it to the topic I was discussing, and mentioned it unprompted in the output it generated.

That is not a transcript trick. The model was watching.

At the AI Agents Conference 2026, I gave a talk called "Apps That See" — six live demos showing how to build applications that understand images and video. Every project is open source and ready to clone. This post walks through each one so you have enough context to pick it up, run it, and adapt it to something useful in your own work.

Vision AI Is Accessible Now

Not long ago, working with visual AI meant GPU clusters, specialized teams, and weeks of training. Today a compressed 4B model like Qwen or Gemini 3 runs on a regular laptop and handles image description well enough to prototype. Step up to a 7B model like Reka Edge and the quality improves meaningfully. It also runs locally: a gaming PC with a decent GPU is enough. No server required.

For tasks that need more power, cloud APIs give you faster results without local hardware requirements. The tradeoff is that your images and video go to a third-party provider. For corridor cameras or stock photos that is usually acceptable. For private or sensitive content, local is the better default.

The practical pattern: start local to build and test, then decide whether the task actually requires cloud.

What You Can Build With This

  • Accessibility: Describe a scene in real time for visually impaired users, or identify objects on demand.
  • Content creation: Extract structure from a video and turn it into a blog post, caption set, or highlight reel.
  • Productivity: Search through thousands of videos for a specific object or topic, even when the title gives no indication of the content.
  • Automation: Trigger actions only when specific visual conditions are met, such as an unrecognized person entering a room.
  • Fun: Most developers' first contact with AI is building something for themselves, and that is a perfectly valid starting point.


Demo 1: Caption This — Generate a Prompt from Any Image


Source: fboucher/caption-this

If you work with image generation models, you end up with a lot of images to test and compare. Writing the text prompt that would reproduce a specific image is tedious. This tool does it for you: give it an image, get back a prompt you can use to regenerate something similar.

The demo uses an HTTP client extension in VS Code to call the API directly, no SDK. Pass an image, ask for a plain-text prompt that would recreate it. One prompt detail that improved results noticeably: add no markdown to the instruction.

POST https://api.reka.ai/v1/chat
Content-Type: application/json

{
  "model": "reka-flash",
  "messages": [{
    "role": "user",
    "content": [
      { "type": "image_url", "image_url": { "url": "https://..." } },
      { "type": "text", "text": "Write a prompt in plain text, no markdown, that would generate the exact same image." }
    ]
  }]
}

One thing to know when testing this across different models: some accept an image URL directly, others require the image as a base64-encoded string. Same task, same prompt, different input contract. If you plan to swap models in your app, account for this difference from the start.

Demo 2: Media Library — Compare Vision Models Side by Side


Source: fboucher/media-library

This is a web app that connects to multiple vision backends and lets you switch between them at runtime. The motivation: benchmark Reka Edge running locally — via OpenRouter or directly through the Reka API — against other models on real tasks.

Object detection surfaces the biggest portability problem. Some models return bounding boxes in an HTML-style bracket format with pixel coordinates. Others use a 2D box structure with a different coordinate scheme. If you code against one format and then swap models, your rendering breaks. There is no standard here — handle the differences at the application layer, not the model layer.

The app uses the OpenAI API format as the common interface across all backends. Any model with a compatible endpoint can be swapped in with minimal changes. It does not eliminate the per-model quirks, but it reduces the friction of switching to a configuration change rather than a rewrite.

Video input is supported too, though far fewer models handle it than images. Of the models tested, Reka Edge is the standout for video — the others either reject it or behave inconsistently.

Demo 3: Video2Blog — Turn a Video into a Structured Post


Source: fboucher/video2blog

I built this for myself. I do a lot of tutorial videos and I wanted a tool that would turn a recording into a structured blog post without me having to write one from scratch.

The tool sends the video to a vision model with a detailed prompt: target structure, tone, format, and an instruction to flag moments where a screenshot would add value. The model returns timestamps — it cannot extract frames itself, but it tells you exactly where to look, and you pull them locally with ffmpeg.

That creates one architectural quirk worth knowing: the video lives in two places. ffmpeg needs it locally to extract frames. The hosted model needs it uploaded to analyze content. For a one-evening project it works well enough, and I use it often enough that it has paid for itself many times over.

After the first draft, you stay in a conversation loop: change the tone, translate to French, swap a timestamp, restructure a section. The model holds context and iterates with you until the result is what you want.

Demo 4: Video Analyzer — Search and Query Your Video Library


Source: reka-ai/api-examples-dotnet

Most video search runs on titles, descriptions, and transcribed audio. This demo searches by what is actually visible on screen.

The app pre-indexes a video library by sending each video through a vision model ahead of time. When a query arrives, the heavy work is already done. A search for "robot arm" returns the right video — a clip of a robotic arm animation. It also returns a false positive: fast-moving hands apparently looked close enough to fool the model. Useful, not perfect, and worth designing around in your UX.

The Q&A feature goes further. You pick a video and ask a specific question. "What database was used?" returned MySQL — and noted it was running in a Docker container. The model identified that from watching the screen, not from audio. No transcript needed.

From there, you can generate study materials from any recorded session. The demo produces a multiple-choice quiz with answer options, correct answers, and explanations. The model is doing comprehension, not transcription.

Demo 5: Roast My Life — What the Model Actually Sees


Source: reka-ai/api-examples-python

I never mentioned the pictures on my wall. The model did.

In a video about Python and AI, the model's generated blog post made a remark about the artwork hanging behind me. I had said nothing about it. The model noticed, mentioned it, and moved on as if it were obvious.

Then there was the t-shirt moment described at the top of this post. A partial logo, half out of frame, no mention of it anywhere in the audio — and the model connected it to the topic anyway.

This demo is named Roast My Life because the model ends up commenting on things you never intended to share. But the real point is what it reveals: a vision model is not a smarter transcript. It is watching. The larger models do this particularly well, and once you see it, it changes how you think about what these tools can do — and what they will pick up without you asking.

Demo 6: N8N Automation — No-Code Video Clipping Pipeline


Sources: N8N Reka Vision integration

Vision AI does not always need custom code. This demo wires everything together in N8N, a visual workflow tool, with no programming required.

The trigger is a new video published to YouTube. The workflow finds an engaging clip, reformats it from horizontal to vertical, adds captions in a specific style (all lowercase, specific colors — chosen to be obviously distinct from any default), and sends an email with the finished clip attached. The whole thing runs automatically.

For developers, this pattern is worth knowing even if you code everything else. Many real business workflows have a vision AI step that fits cleanly into a larger automation, and a no-code tool is often the fastest way to ship it.


Watch the Full Talk

The demos above are the written version. The live version, with the actual code running, models responding in real time, and a few things going sideways in interesting ways, is on YouTube.


All the Code

The demos span Python, C#, raw HTTP, Go, and N8N. Vision AI is not tied to a specific stack — if your environment can make an HTTP request, it can call a vision model.

All projects:


Writing My First Custom n8n Node: A Step-by-Step Guide

Recently, I decided to create a custom node for n8n, the workflow automation tool I've been using. I'm not an expert in Node.js development, but I wanted to understand how n8n nodes work under the hood. This blog post shares my journey and the steps that actually worked for me.

French version here

Why I Did This

Before starting this project, I was curious about how n8n nodes are built. The best way to learn something is by doing it, so I decided to create a simple custom node following n8n's official tutorial. Now that I understand the basics, I'm planning to build a more complex node featuring AI Vision capabilities, but that's for another blog post!

The Challenge

I started with the official n8n tutorial: Build a declarative-style node. While the tutorial is well-written, I ran into some issues along the way. The steps didn't work exactly as described, so I had to figure out what was missing. This post documents what actually worked for me, in case you're facing similar challenges. I already have an n8n instance running in a container. In Step 8, I'll explain how I run a second instance for development purposes.

Prerequisites

Before you start, you'll need:

  • Node.js and npm - I used Node.js version 24.12.0
  • Basic understanding of JavaScript/TypeScript - you don't need to be an expert

Step 1: Fixing the Missing Prerequisites

I didn't have Node.js installed on my machine, so my first step was getting that sorted out. Instead of installing Node.js directly, I used nvm (Node Version Manager), which makes it easy to manage different Node.js versions. Installation details are available on the nvm GitHub repository. Once nvm was set up, I installed Node.js version 24.12.0.

Most of the time, I use VS Code as my code editor. I created a new profile and used the template for Node.js development to get the right extensions and settings.

Step 2: Cloning the Starter Repository

n8n provides a n8n-nodes-starter on GitHub that includes all the basic files and dependencies you need. You can clone it or use it as a template for your own project. Since this was just a "learning exercise" for me, I cloned the repository directly:

git clone https://github.com/n8n-io/n8n-nodes-starter
cd n8n-nodes-starter

Step 3: Getting Started with the Tutorial

I won't repeat the entire tutorial here; it's clear enough, but I'll highlight some details along the way that I found useful.

The tutorial makes you create a "NasaPics" node and provides a logo for it. It's great, but I suggest you use your own logo images and have light and dark versions. Add both images in a new folder icons (same level as nodes and the credentials folder). Having two versions of the logo will make your node look better, whatever theme the user is using in n8n (light or dark). The tutorial only adds the logo in NasaPics.node.ts, but I found that adding it also in the credentials file NasaPicsApi.credentials.ts makes the node look more consistent.

Replace or add the logo line with this, and add Icon to the import statement at the top of the file:

icon: Icon = { light: 'file:MyLogo-dark.svg', dark: 'file:MyLogo-light.svg' };

Note: the darker logo should be used in light mode, and vice versa.

Step 4: Following the Tutorial (With Adjustments)

Here's where things got interesting. I followed the official tutorial to create the node files, but I had to make some adjustments that weren't mentioned in the documentation.

Adjustment 1: Making the Node Usable as a Tool

In the NasaPics.node.ts file, I added this line just before the properties array:

requestDefaults: {
      baseURL: 'https://api.nasa.gov',
      headers: {
         Accept: 'application/json',
         'Content-Type': 'application/json',
      },
   },
   usableAsTool: true, // <-- Added this line
   properties: [
      // Resources and operations will go here

This setting allows the node to be used as a tool within n8n workflows and also fixes warnings from the lint tool.

Adjustment 2: Securing the API Key Field

In the NasaPicsApi.credentials.ts file, I added a typeOptions to make the API key field a password field. This ensures the API key is hidden when users enter it, which is a security best practice.

properties: INodeProperties[] = [
   {
      displayName: 'API Key',
      name: 'apiKey',
      type: 'string',
      typeOptions: { password: true }, // <-- Added this line
      default: '',
   },
];

A Note on Errors

I noticed there were some other errors showing up in the credentials file. If you read the error message, you'll see that it's complaining about missing test properties. To fix this, I added a test property at the end of the class that implements ICredentialTestRequest. I also added the interface import at the top of the file.

authenticate: IAuthenticateGeneric = {
   type: 'generic',
   properties: {
      qs: {
         api_key: '={{$credentials.apiKey}}',
      },
   },
};

// Add this at the end of the class
test: ICredentialTestRequest = {
   request: {
      baseURL: 'https://api.nasa.gov/',
      url: '/user',
      method: 'GET',
   },
};

Step 5: Building and Linking the Package

Once I had all my files ready, it was time to build the node. From the root of my node project folder, I ran:

npm i
npm run build
npm link

During the build process, pay attention to the package name that gets generated. In my case, it was n8n-nodes-nasapics. You'll need this name in the next steps.

> n8n-nodes-nasapics@0.1.0 build
> n8n-node build

┌   n8n-node build 
│
◓  Building TypeScript files│
◇  TypeScript build successful
│
◇  Copied static files
│
└  ✓ Build successful

Step 6: Setting Up the n8n Custom Folder

n8n looks for custom nodes in a specific location: ~/.n8n/custom/. If this folder doesn't exist, you need to create it:

mkdir -p ~/.n8n/custom
cd ~/.n8n/custom

Then initialize a new npm package in this folder: run npm init and press Enter to accept all the defaults.

Step 7: Linking Your Node to n8n

Now comes the magic part - linking your custom node so n8n can find it. Replace n8n-nodes-nasapics with whatever your package name is. From the ~/.n8n/custom folder, run:

npm link n8n-nodes-nasapics

Step 8: Running n8n

This is where my setup differs from the standard tutorial. As mentioned at the beginning, I already have an instance of n8n running in a container and didn't want to install it. So I decided to run a second container using a different port. Here's the command I used:

docker run -d --name n8n-DEV -p 5680:5678 \
  -e N8N_COMMUNITY_PACKAGES_ENABLED=true \
  -v ~/.n8n/custom/node_modules/n8n-nodes-nasapics:/home/node/.n8n/custom/node_modules/n8n-nodes-nasapics \
  n8nio/n8n

Let me break down what this command does:

  • -d: Runs the container in detached mode (in the background)
  • --name n8n-DEV: Names the container for easy reference
  • -p 5680:5678: Maps port 5678 from the container to port 5680 on my machine so it doesn't conflict with my existing n8n instance
  • -e N8N_COMMUNITY_PACKAGES_ENABLED=true: Enables community packages — you need this to use custom nodes
  • -v: Mounts my custom node folder into the container, which lets me try my custom node without having to publish it.
  • n8nio/n8n: The official n8n container image

If you're running n8n directly on your machine (not in a container), you can simply start it.

Step 9: Testing Your Node

Once n8n-DEV is running, open your browser and navigate to it. Create a new workflow and search for your node. In my case, I searched for "NasaPics" and my custom node appeared!

To test it:

  1. Add your node to the workflow
  2. Configure the credentials with a NASA API key (you can get one for free at api.nasa.gov)
  3. Execute the node
  4. Check if the data is retrieved correctly

Updating Your Node

During development, you'll likely need to make changes to your code (aka node). Once done, you have to rebuild npm run build and restart the n8n container docker restart n8n-DEV to see the changes.

What's Next?

Now that I understand the basics of building custom n8n nodes, I'm ready to tackle something more ambitious. My next project will be creating a node that uses AI Vision capabilities. Spoiler alert: It's done and I'll be sharing the details in an upcoming blog post!

If you're interested in creating your own custom nodes, I encourage you to give it a try. Start with something simple, like I did, and build from there. Don't be afraid to experiment and make mistakes - that's how we learn!

Resources

From Hours to Minutes: AI That Finds Tech Events for You

TL;DR

I built an AI research agent that actually browses the live web and finds tech events, no search loops, no retry logic, no hallucinations. Just ask a question and get structured JSON back with the reasoning steps included. The secret? An API that handles multi-step research automatically. Built with .NET/Blazor in a weekend. Watch the video | Get the code | Free API key
(version française)

Happy New Year! I wanted to share something I recently presented at the AI Agents Conference 2025: how to build intelligent research assistants that can search the live web and return structured, reliable results.

Coming back from the holidays, I'm reminded of a universal problem: information overload. Whether it's finding relevant tech conferences, catching up on industry news, or wading through piles of documentation that accumulated during time off, we all need tools that can quickly search and synthesize information for us. That's what Reka Research does, it's an agentic AI that browses the web (or your private documents), answers complex questions, and turns hours of research into minutes. I built a practical demo to show this in action: an Event Finder that searches the live internet for upcoming tech conferences.

The full presentation is available on YouTube if you want to follow along: How to Build Agentic Web Research Assistants

The Problem: Finding Events Isn't Just a Simple Search

Let me paint a picture. You want to find upcoming tech conferences about AI in your area. You need specific information: the event name, start and end dates, location, and most importantly, the registration URL.

A simple web search or basic LLM query falls short because:

  • You might get outdated information
  • The first search result rarely contains all required details
  • You need to cross-reference multiple sources
  • Without structure, the data is hard to use in an application

This is where Reka's Research API shines. It doesn't just search, it reasons through multiple steps, aggregates information, and returns structured, grounded results.

Event finder interface

The Solution: Multi-Step Research That Actually Works

The core innovation here is multi-step grounding. Instead of making a single query and hoping for the best, the Research API acts like a diligent human researcher:

  1. It makes an initial search based on your query
  2. Checks what information is missing
  3. Performs additional targeted searches
  4. Aggregates and validates the data
  5. Returns a complete, structured response

As a developer, you simply send your question, and the API handles the complex iteration. No need to build your own search loops or retry logic.

How It Works: The Developer Experience

Here's what surprised me most: the simplicity. You define your data structure, ask a question, and the API handles all the complex research orchestration. No retry logic, no search loop management.

The key is structured output. Instead of parsing messy text, you tell the API exactly what JSON schema you want:

public class TechEvent
{
    public string? Name { get; set; }
    public DateTime? StartDate { get; set; }
    public DateTime? EndDate { get; set; }
    public string? City { get; set; }
    public string? Country { get; set; }
    public string? Url { get; set; }
}

Then you send your query with the schema, and it returns perfectly structured data every time. The API uses OpenAI-compatible format, so if you've worked with ChatGPT's API, this feels instantly familiar.

The real magic? You also get back the reasoning steps, the actual web searches it performed and how it arrived at the answer. Perfect for debugging and understanding the agent's thought process.

I walk through the complete implementation, including domain filtering, location-aware search, and handling the async research calls in the video. The full source code is on GitHub if you want to dive deeper.


Try It Yourself

The complete source code is on GitHub. Clone it, grab a free API key, and you'll have it running in under 5 minutes.

I'm curious what you'll build with this. Research agents that monitor news? Product comparison tools? Documentation synthesizers? The API works for any web research task. If you build something, tag me.  I'd love to see it.

Happy New Year! 🎉

AI Vision: Turning Your Videos into Comedy Gold (or Cringe)

I've spent most of my career building software in C# and .NET, and only used Python in IoT projects. When I wanted to build a fun project—an app that uses AI to roast videos, I knew it was the perfect opportunity to finally dig into Python web development.

The question was: where do I start? I hopped into a brainstorming session with Reka's AI chat and asked about options for building web apps in Python. It mentioned Flask, and I remembered friends talking about it being lightweight and perfect for getting started. That sounded right.

In this post, I share how I built "Roast My Life," a Flask app using the Reka Vision API.

The Vision (Pun Intended)

The app needed three core things:

  1. List videos: Show me what videos are in my collection
  2. Upload videos: Let me add new ones via URL
  3. Roast a video: Send a selected video to an AI and get back some hilarious commentary

See it in action

Part 1: Getting Started Environment Setup

The first hurdle was always going to be environment setup. I'm serious about keeping my Python projects isolated, so I did the standard dance:

Before even touching dependencies, I scaffolded a super bare-bones Flask app. Then one thing I enjoy from C# is that all dependencies are brought in one shot, so I like doing the same with my python projects using requirements.txt instead of installing things ad‑hoc (pip install flask then later freezing).

Dropping that file in first means the setup snippet below is deterministic. When you run pip install -r requirements.txt, Flask spins up using the exact versions I tested with, and you won't accidentally grab a breaking major update.

Here's the shell dance that activates the virtual environment and installs everything:

cd roast_my_life/workshop
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Then came the configuration. We will need an API key and I don't want to have it hardoced so I created a .env file to store my API credentials:

API_KEY=YOUR_REKA_API_KEY
BASE_URL=https://vision-agent.api.reka.ai

To get that API key, I visited the Reka Platform and grabbed a free one. Seriously, a free key for playing with AI vision APIs? I was in.

With python app.py, I fired up the Flask development server and opened http://127.0.0.1:5000 in my browser. The UI was there, but... it was dead. Nothing worked.

Perfect. Time to build.

The Backend: Flask Routing and API Integration

Coming from ASP.NET Core's controller-based routing and Blazor, Flask's decorator-based approach felt just like home. All the code code goes in the app.py file, and each route is defined with a simple decorator. But first things first: loading configuration from the .env file using python-dotenv:

from flask import Flask, request, jsonify
import requests
import os
from dotenv import load_dotenv

app = Flask(__name__)

# Load environment variables (like appsettings.json)
load_dotenv()
api_key = os.environ.get('API_KEY')
base_url = os.environ.get('BASE_URL')

All the imports packages are the same ones that needs to be in the requirements.txt. And we retreive the API key and base URL from environment variables, just like in .NET Core.

Now, to be able to get roasted we need first to upload a video to the Reka Vision API. Here's the code—I'll go over some details after.


@app.route('/api/upload_video', methods=['POST'])
def upload_video():
    """Upload a video to Reka Vision API"""
    data = request.get_json() or {}
    video_name = data.get('video_name', '').strip()
    video_url = data.get('video_url', '').strip()
    
    if not video_name or not video_url:
        return jsonify({"error": "Both video_name and video_url are required"}), 400
    
    if not api_key:
        return jsonify({"error": "API key not configured"}), 500
    
    try:
        response = requests.post(
            f"{base_url.rstrip('/')}/videos/upload",
            headers={"X-Api-Key": api_key},
            data={
                'video_name': video_name,
                'index': 'true',  # Required: tells Reka to process the video
                'video_url': video_url
            },
            timeout=30
        )
        
        response_data = response.json() if response.ok else {}
        
        if response.ok:
            video_id = response_data.get('video_id', 'unknown')
            return jsonify({
                "success": True,
                "video_id": video_id,
                "message": "Video uploaded successfully"
            })
        else:
            error_msg = response_data.get('error', f"HTTP {response.status_code}")
            return jsonify({"success": False, "error": error_msg}), response.status_code
            
    except requests.Timeout:
        return jsonify({"success": False, "error": "Request timed out"}), 504
    except Exception as e:
        return jsonify({"success": False, "error": f"Upload failed: {str(e)}"}), 500

Once the information from the frontend is validated we make a POST request to the Reka Vision API's /videos/upload endpoint. The parameters are sent as form data, and we include the API key in the headers for authentication. Here I was using URLs to upload videos, but you can also upload local files by adjusting the request accordingly. As you can see, it's pretty straightforward, and the documentation from Reka made it easy to understand what was needed.

The Magic: Sending Roast Requests to Reka Vision API

Here's where things get interesting. Once a video is uploaded, we can ask the AI to analyze it and generate content. The Reka Vision API supports conversational queries about video content:

def call_reka_vision_qa(video_id: str) -> Dict[str, Any]:
    """Call the Reka Video QA API to generate a roast"""
    
    headers = {'X-Api-Key': api_key} if api_key else {}
    
    payload = {
        "video_id": video_id,
        "messages": [
            {
                "role": "user",
                "content": "Write a funny and gentle roast about the person, or the voice in this video. Reply in markdown format."
            }
        ]
    }
    
    try:
        resp = requests.post(
            f"{base_url}/qa/chat",
            headers=headers,
            json=payload,
            timeout=30
        )
        
        data = resp.json() if resp.ok else {"error": f"HTTP {resp.status_code}"}
        
        if not resp.ok and 'error' not in data:
            data['error'] = f"HTTP {resp.status_code} calling chat endpoint"
        
        return data
        
    except requests.Timeout:
        return {"error": "Request to chat API timed out"}
    except Exception as e:
        return {"error": f"Chat API call failed: {e}"}

Here we pass the video ID and a prompt asking for a "funny and gentle roast." The API responds with AI-generated content, which we can then send back to the frontend for display. I try to give more "freedom" to the AI by asking it to reply in markdown format, which makes the output more engaging.

Try It Yourself!

The complete project is available on GitHub: reka-ai/api-examples-python

What Makes Reka Vision API So nice to use

What really stood out to me was how approachable the Reka Vision API is. You don't need any special SDK—just the requests library making standard HTTP calls. And honestly, it doesn't matter what language you're used to; an HTTP call is pretty much always simple to do. Whether you're coming from .NET, Python, JavaScript, or anything else, you're just sending JSON and getting JSON back.

Authentication is refreshingly straightforward: just pop your API key in the header and you're good to go. No complex SDKs, no multi-step authentication flows, no wrestling with binary data streams. The conversational interface lets you ask questions in natural language, and you get back structured JSON responses with clear fields.

One thing worth noting: in this example, the videos are pre-uploaded and indexed, which means the responses come back fast. But here's the impressive part—the AI actually looks at the video content. It's not just reading a transcript or metadata; it's genuinely analyzing the visual elements. That's what makes the roasts so spot-on and contextual.

Final Thoughts

The Reka Vision API itself deserves credit for making video AI accessible. No complicated SDKs, no multi-GB model downloads, no GPU requirements. Just simple HTTP requests and powerful AI capabilities. I'm not saying I'm switching to Python full-time, but expect to see me sharing more Python projects in the future!

References and Resources


Reading Notes #248

imageProgramming


Databases


Miscellaneous