Showing posts with label oss. Show all posts
Showing posts with label oss. Show all posts

How to Serve a Vision AI Model Locally with vLLM and Reka Edge

Running an AI model as a one-shot script is useful, but it forces you to restart the model every time you need a result. Setting it up as a service lets any application send requests to it continuously, without reloading the model. This guide shows how to serve Reka Edge using vLLM and an open-source plugin, then connect a web app to it for image description and object detection.

Prerequisites

You need a machine with a GPU and either Linux, macOS, or Windows (with WSL). I use UV, a fast Python package and project manager, or pip + venv if you prefer.

Clone the vLLM Reka Plugin

Reka models require a dedicated plugin to run under vLLM, not all models need this extra step, but Reka's architecture requires it. Clone the plugin repository and enter the directory:

git clone https://github.com/reka-ai/vllm-reka
cd vllm-reka

The repository contains the plugin code and a serve.sh script you will use to start the service.

Download the Reka Edge Model

Before starting the service, you need the model weights locally. Install the Hugging Face Hub CLI and use it to pull the reka-edge-2603 model into your project directory:

uv sync
uv pip install huggingface_hub
uvx hf download RekaAI/reka-edge-2603 --local-dir ./models/reka-edge-2603

This is a large model, so make sure you have enough disk space and a stable connection.

Start the Service

Once the model is downloaded, start the vLLM service using the serve.sh script included in the plugin:

uv run bash serve.sh ./models/reka-edge-2603

The script accepts environment variables to configure which model to load and how much GPU memory to allocate. If your GPU cannot fit the model at default settings, open serve.sh and adjust the variables at the top. The repository README lists the available options. The service takes a few seconds to load the model weights, then starts listening for HTTP requests.

As an example with an NVIDIA GeForce RTX 5070, here are the settings I used to run the model:

export GPU_MEM=0.80
export MAX_LEN=4096
export MAX_BATCH_TOKENS=4096
export MAX_IMAGES=2
export MAX_VIDEOS=1
export VIDEO_NUM_FRAMES=4
uv run bash serve.sh ./models/reka-edge-2603

Connect the Media Library App

With the backend running, time to start the Media Library app. Clone the repository, jump into the directory, and run it with Docker:

git clone https://github.com/fboucher/media-library
cd media-library
docker compose up --build -d

Open http://localhost:8080 in your browser, then add a new connection with these settings:

  • Name: local (or any label you want)
  • IP address: your machine's local network IP (e.g. 192.168.x.x)
  • API key: leave blank or enter anything — no key is required for a local connection
  • Model: reka-edge-2603

Click Test to confirm the connection, then save it.


Try It: Image Description and Object Detection

Select an image in the app and choose your local connection, then click Fill with AI. The app sends the image to your vLLM service, and the model returns a natural language description. You can watch the request hit your backend in the terminal where the service is running.

Reka Edge also supports object detection. Type a prompt asking the model to locate a specific feature (ex: "face") and the model returns bounding-box coordinates. The app renders these as red boxes overlaid on the image. This works for any region you can describe in a prompt.



Switch to the Reka Cloud API

If your local GPU is too slow for production use, you can point the app at the Reka APIs instead. Add a new connection in the app and set the base URL to the Reka API endpoint. Get your API key from platform.reka.ai. OpenRouter is another option if you prefer a unified API across providers.

The model name stays the same (reka-edge-2603), so switching between local and cloud is just a matter of selecting a different connection in the app. The cloud API is noticeably faster because Reka's servers are more powerful than a local GPU (at least mine :) ). During development, use the local service to avoid burning credits; switch to the API for speed when you need it.

What You Can Build

The service you just set up accepts any image, or video via HTTP — point a script at a folder and you have a batch pipeline for descriptions, tags, or bounding boxes. Swap the prompt and you change what it extracts. The workflow is the same whether you are running locally or through the API.

References

Adding Keycloak Authentication to an Existing .NET Aspire Application

By the end of this post, you'll have a working login/logout flow backed by Keycloak, running locally via Aspire and deployable via Docker Compose.

If your Aspire app doesn't have authentication yet, this is your fastest path to a real identity provider. This tutorial walks through wiring Keycloak OIDC into an existing .NET Aspire + Blazor Server app: from AppHost registration to login/logout UI, using production code from NoteBookmark, an open-source bookmark manager built with .NET Aspire and Blazor.\

[version en français disponible]

Step 1: Add Aspire.Hosting.Keycloak to AppHost

Aspire provides first-class Keycloak support through the Aspire.Hosting.Keycloak package. Add it to your AppHost project:

For AppHost project

dotnet add package Aspire.Hosting.Keycloak

Run dotnet restore to pull the package.


Step 2: Register Keycloak in AppHost.cs

With the package installed, register Keycloak as a resource in your AppHost. Aspire will spin up a Keycloak container, wire its connection details into dependent projects, and ensure proper startup ordering.


// ...

// Add Keycloak authentication server
var keycloak = builder.AddKeycloak("keycloak", port: 8080)
    .WithDataVolume(); // Persist Keycloak data across container restarts

if (builder.Environment.IsDevelopment())
{
    // ...

    builder.AddProject<NoteBookmark_BlazorApp>("blazor-app")
        // ...
        .WithReference(keycloak)  // <-- reference Keycloak
        .WaitFor(keycloak)  // <-- wait for Keycloak to be ready
        .WithExternalHttpEndpoints()
        .PublishAsDockerComposeService((resource, service) =>
        {
            service.ContainerName = "notebookmark-blazor";
        });
}

Key Changes:

  • AddKeycloak("keycloak", port: 8080): Registers a Keycloak resource listening on port 8080.
  • WithDataVolume(): Persists Keycloak's configuration and realm data across container restarts. Without this, you'd lose your realm setup every time the container stops.
  • .WithReference(keycloak): Injects Keycloak connection settings (base URL, etc.) into the BlazorApp as environment variables.
  • .WaitFor(keycloak): Ensures Keycloak is fully started before launching the Blazor app. This is critical: if your app starts before Keycloak is ready, OIDC discovery will fail.

 

Step 3: Set Up Keycloak for Non-Aspire (aka prod) Deployments

This post focuses on the Aspire dev setup, but for production (Docker Compose, Kubernetes), you need a standalone Keycloak. Here's what that looks like and why.

Aspire can actually help you bridge the gap. AddDockerComposeEnvironment() in AppHost generates a draft Docker Compose file from your Aspire model, a great starting point before customizing for production. Worth checking out if you want a head start.

The final compose files for both Keycloak and the NoteBookmark app are available in the NoteBookmark repo:

A few things worth noting about the setup:

  • Postgres as the backing store: Keycloak uses a dedicated Postgres instance (not the app's database) to persist realm configuration, users, and sessions.
  • KC_HTTP_ENABLED: "true": Allows HTTP traffic internally. In production, Keycloak runs behind a reverse proxy (nginx, Traefik) that handles TLS termination: HTTPS externally, HTTP internally.
  • KC_FEATURES: "token-exchange": Enables the token exchange feature, needed if you want service-to-service auth flows.

Step 4: Configure Keycloak Realm and OIDC Client

This configuration is required in both production and development environments, but only needs to be done once per environment. In dev, thanks to .WithDataVolume(), all Keycloak settings are persisted between run and debug sessions, so you only configure it once and it survives restarts.

Once Keycloak is running, configure it:

  1. Navigate to http://localhost:8080 and log in with your admin credentials.
  2. Create a new realm:
    • Click Create Realm
    • Name: notebookmark (match the realm in your Authority URL below)
  3. Create an OIDC client:
    • Clients → Create Client
    • Client ID: notebookmark
    • Client Protocol: openid-connect
    • Access Type: confidential (generates a client secret)
    • Valid Redirect URIs: http://localhost:5173/* (adjust for your Blazor app's URL)
    • Web Origins: http://localhost:5173
  4. Go to the Credentials tab and copy the Client Secret: you'll need this in your app config.
Keycloak client configuration screen


Step 5: Add OpenID Connect to the Blazor App

Now wire up the authentication pipeline in your Blazor Server app.

Add the NuGet Package


For BlazorApp project

dotnet add package Microsoft.AspNetCore.Authentication.OpenIdConnect

Update Program.cs

BlazorApp/Program.cs:

using Microsoft.AspNetCore.Authentication;
using Microsoft.AspNetCore.Authentication.Cookies;
using Microsoft.AspNetCore.Authentication.OpenIdConnect;

//...

// Add authentication
builder.Services.AddAuthentication(options =>
{
    options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme;
    options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme;
})
.AddCookie(CookieAuthenticationDefaults.AuthenticationScheme)
.AddOpenIdConnect(OpenIdConnectDefaults.AuthenticationScheme, options =>
{
    var authority = builder.Configuration["Keycloak:Authority"];
    options.Authority = authority;
    options.ClientId = builder.Configuration["Keycloak:ClientId"];
    options.ClientSecret = builder.Configuration["Keycloak:ClientSecret"];
    options.ResponseType = "code";
    options.SaveTokens = true;
    options.GetClaimsFromUserInfoEndpoint = true;

    // Allow overriding RequireHttpsMetadata via configuration.
    // Relax the requirement when running in a container against HTTP Keycloak.
    var requireHttpsConfigured = builder.Configuration.GetValue<bool?>("Keycloak:RequireHttpsMetadata");
    var isRunningInContainer = string.Equals(
        System.Environment.GetEnvironmentVariable("DOTNET_RUNNING_IN_CONTAINER"),
        "true",
        StringComparison.OrdinalIgnoreCase);

    if (requireHttpsConfigured.HasValue)
    {
        options.RequireHttpsMetadata = requireHttpsConfigured.Value;
    }
    else
    {
        var defaultRequireHttps = !builder.Environment.IsDevelopment();
        if (isRunningInContainer &&
            !string.IsNullOrEmpty(authority) &&
            authority.StartsWith("http://", StringComparison.OrdinalIgnoreCase))
        {
            defaultRequireHttps = false;
        }
        options.RequireHttpsMetadata = defaultRequireHttps;
    }

    options.Scope.Clear();
    options.Scope.Add("openid");
    options.Scope.Add("profile");
    options.Scope.Add("email");

    options.TokenValidationParameters = new()
    {
        NameClaimType = "preferred_username",
        RoleClaimType = "roles"
    };

    // Configure logout to pass id_token_hint to Keycloak
    options.Events = new OpenIdConnectEvents
    {
        OnRedirectToIdentityProviderForSignOut = async context =>
        {
            var idToken = await context.HttpContext.GetTokenAsync("id_token");
            if (!string.IsNullOrEmpty(idToken))
            {
                context.ProtocolMessage.IdTokenHint = idToken;
            }
        }
    };
});

builder.Services.AddAuthorization();
builder.Services.AddCascadingAuthenticationState();
builder.Services.AddHttpContextAccessor();

// ... existing Razor Components, FluentUI, etc. ...

var app = builder.Build();
app.MapDefaultEndpoints();

// ... existing middleware ...

// CRITICAL: UseAuthentication BEFORE UseAuthorization
app.UseAuthentication();
app.UseAuthorization();

app.MapRazorComponents<App>()
    .AddInteractiveServerRenderMode();

// Authentication endpoints
app.MapGet("/authentication/login", async (HttpContext context, string? returnUrl) =>
{
    var authProperties = new AuthenticationProperties { RedirectUri = returnUrl ?? "/" };
    await context.ChallengeAsync(OpenIdConnectDefaults.AuthenticationScheme, authProperties);
});

app.MapGet("/authentication/logout", async (HttpContext context) =>
{
    var authProperties = new AuthenticationProperties { RedirectUri = "/" };
    await context.SignOutAsync(CookieAuthenticationDefaults.AuthenticationScheme);
    await context.SignOutAsync(OpenIdConnectDefaults.AuthenticationScheme, authProperties);
});

app.Run();


Configuration


Create or update appsettings.json in the BlazorApp project:

{
  "Keycloak": {
    "Authority": "http://localhost:8080/realms/notebookmark",
    "ClientId": "notebookmark",
    "ClientSecret": "your-client-secret-from-keycloak",
    "RequireHttpsMetadata": false
  }
}

For your Prod, Docker Compose deployments, use environment variables in your docker-compose.yaml:

environment:
  Keycloak__Authority: ${KEYCLOAK_AUTHORITY}
  Keycloak__ClientId: ${KEYCLOAK_CLIENT_ID}
  Keycloak__ClientSecret: ${KEYCLOAK_CLIENT_SECRET}
In the development enviroment, Aspire's .WithReference(keycloak) automatically injects environment variables like services__keycloak__http__0 for the Keycloak base URL. You can read this in your config or manually set the Authority URL as shown above.

 

Handling HTTP vs HTTPS: The RequireHttpsMetadata Gotcha

By default, the OpenID Connect middleware requires HTTPS for metadata discovery (RequireHttpsMetadata = true). This is a security best practice for production, but it causes problems in local/container dev environments where Keycloak runs on HTTP.

The code above implements a smart fallback:

  1. Check explicit configuration first: If Keycloak:RequireHttpsMetadata is set in config, use that value.
  2. Detect container environment: If running in a container (DOTNET_RUNNING_IN_CONTAINER=true) and the Authority URL is HTTP, disable the HTTPS requirement.
  3. Default to HTTPS in production: Outside of Development mode, default to requiring HTTPS.

This ensures:

  • Local dev works seamlessly with HTTP Keycloak
  • Container-to-container communication works (HTTP internally)
  • Production enforces HTTPS (assuming you've configured it properly)

Note: In production, run Keycloak behind a reverse proxy (nginx, Traefik, etc.) that handles TLS termination. Your app sees https://yourdomain.com, Keycloak internally runs on HTTP.

That's the server-side setup done. Now let's build the Blazor UI pieces that make auth visible to users.


Step 6: Blazor UI: Login, Logout, and Route Protection

With the backend authentication pipeline configured, it's time to build the UI components that let users actually log in, log out, and interact with protected content. We'll create three key pieces: the login/logout pages, a login display component, and routing configuration that enforces authorization.

Keycloak login screen

The Login and Logout Razor Pages

First, we need pages to trigger authentication flows. These aren't typical Blazor pages with markup—they're redirect triggers that hand off control to Keycloak.

Login.razor

Create Components/Pages/Login.razor:

@page "/login"
@attribute [AllowAnonymous]
@using Microsoft.AspNetCore.Authorization
@using Microsoft.AspNetCore.Authentication
@using Microsoft.AspNetCore.Authentication.OpenIdConnect
@inject NavigationManager Navigation
@inject IHttpContextAccessor HttpContextAccessor

@code {
    protected override async Task OnInitializedAsync()
    {
        var uri = new Uri(Navigation.Uri);
        var query = System.Web.HttpUtility.ParseQueryString(uri.Query);
        var returnUrl = query["returnUrl"] ?? "/";

        var httpContext = HttpContextAccessor.HttpContext;
        if (httpContext != null)
        {
            var authProperties = new AuthenticationProperties
            {
                RedirectUri = returnUrl
            };
            await httpContext.ChallengeAsync(OpenIdConnectDefaults.AuthenticationScheme, authProperties);
        }
    }
}

What's happening here?

  • No markup: This page doesn't render anything. Its job is to initiate the OpenID Connect authentication challenge, which redirects the browser to Keycloak.
  • ChallengeAsync: This triggers the OIDC middleware to redirect the user to Keycloak's login page.
  • Return URL: We capture the returnUrl query parameter so users land back where they started after logging in.
  • [AllowAnonymous]: Critical! Without this, the page would require authentication to access, creating a redirect loop.

Logout.razor

Create Components/Pages/Logout.razor:

@page "/logout"
@attribute [AllowAnonymous]
@using Microsoft.AspNetCore.Authorization
@using Microsoft.AspNetCore.Authentication
@using Microsoft.AspNetCore.Authentication.Cookies
@using Microsoft.AspNetCore.Authentication.OpenIdConnect
@inject IHttpContextAccessor HttpContextAccessor

@code {
    protected override async Task OnInitializedAsync()
    {
        var httpContext = HttpContextAccessor.HttpContext;
        if (httpContext != null)
        {
            var properties = new AuthenticationProperties
            {
                RedirectUri = "/"
            };
            await httpContext.SignOutAsync(OpenIdConnectDefaults.AuthenticationScheme, properties);
            await httpContext.SignOutAsync(CookieAuthenticationDefaults.AuthenticationScheme);
        }
    }
}


Why sign out of TWO schemes?


This is where many implementations fail. OpenID Connect uses a dual authentication scheme:

  1. OpenIdConnect scheme: Handles the protocol dance with Keycloak (redirects, token exchange, logout).
  2. Cookie scheme: Manages the local session in your Blazor app.

When logging out, you must sign out of both, in this order:

  1. OIDC first: This redirects to Keycloak's logout endpoint, ending the SSO session.
  2. Cookie second: This clears the local authentication cookie.

Signing out of only the cookie leaves the Keycloak session active—users can click "Login" and get back in without re-entering credentials. Signing out only from OIDC leaves the local cookie intact, so the app still thinks they're logged in.

The RedirectUri in the authentication properties controls where users land after the Keycloak logout completes. We send them to the home page.

 

Step 7: The LoginDisplay Component

Now we need a UI element to show login state and provide login/logout actions. This typically lives in your app's header or navigation bar.

Note: NoteBookmark uses FluentUI Blazor (the <Fluent...> components), it's not a requirement, but it definitely looks great ;)

Create Components/Layout/LoginDisplay.razor:

@rendermode InteractiveServer
@using Microsoft.AspNetCore.Components.Authorization
@inject NavigationManager Navigation

<AuthorizeView>
    <Authorized>
        <FluentStack Orientation="Orientation.Horizontal" HorizontalGap="8" 
                     HorizontalAlignment="HorizontalAlignment.Right" 
                     VerticalAlignment="VerticalAlignment.Center">
            <span>Hello, @context.User.Identity?.Name</span>
            <FluentButton Appearance="Appearance.Lightweight" OnClick="Logout" 
                          IconStart="@(new Icons.Regular.Size16.ArrowExit())">
                Logout
            </FluentButton>
        </FluentStack>
    </Authorized>
    <NotAuthorized>
        <FluentButton Appearance="Appearance.Accent" OnClick="Login" 
                      IconStart="@(new Icons.Regular.Size16.Person())">
            Login
        </FluentButton>
    </NotAuthorized>
</AuthorizeView>

@code {
    private void Login()
    {
        var returnUrl = Navigation.ToBaseRelativePath(Navigation.Uri);
        if (string.IsNullOrEmpty(returnUrl)) returnUrl = "/";
        Navigation.NavigateTo($"/login?returnUrl={Uri.EscapeDataString(returnUrl)}", forceLoad: false);
    }

    private void Logout()
    {
        Navigation.NavigateTo("/logout", forceLoad: false);
    }
}

Key implementation details:

  • @rendermode InteractiveServer: This is essential. <AuthorizeView> needs to access AuthenticationStateProvider, which requires an interactive render mode. Without this, the component renders as static HTML and won't respond to auth state changes.
  • <AuthorizeView>: This component automatically shows/hides content based on authentication state. The context parameter provides access to the User claims principal.
  • Return URL on login: We pass the current page URL so users return to where they were after authenticating.
  • forceLoad: false: We use in-app navigation. The Login.razor and Logout.razor pages will handle the actual HTTP redirects.

Add this component to your MainLayout.razor or header component:<LoginDisplay />

visual of the LoginDisplay

Step 8: Protecting Routes and Pages

With login/logout working, you need to enforce authorization rules. Blazor provides two mechanisms: page-level protection with [Authorize] and inline content protection with <AuthorizeView>.

Updating Routes.razor

First, modify Components/Routes.razor to support authorization-aware routing:

@using Microsoft.AspNetCore.Components.Authorization
@using Microsoft.AspNetCore.Authorization

<FluentDesignTheme StorageName="theme" @rendermode="@InteractiveServer" />

<CascadingAuthenticationState>
    <Router AppAssembly="typeof(Program).Assembly">
        <Found Context="routeData">
            <AuthorizeRouteView RouteData="routeData" DefaultLayout="typeof(Layout.MainLayout)">
                <NotAuthorized>
                    @if (context.User.Identity?.IsAuthenticated != true)
                    {
                        <FluentStack Orientation="Orientation.Vertical" VerticalGap="20" 
                                     HorizontalAlignment="HorizontalAlignment.Center" 
                                     Style="margin-top: 100px;">
                            <FluentIcon Value="@(new Icons.Regular.Size48.LockClosed())" Color="Color.Accent" />
                            <h2>Authentication Required</h2>
                            <p>You need to be logged in to access this page.</p>
                            <FluentButton Appearance="Appearance.Accent" 
                                OnClick="@(() => NavigationManager.NavigateTo(
                                    "/login?returnUrl=" + Uri.EscapeDataString(
                                        NavigationManager.ToBaseRelativePath(NavigationManager.Uri)), 
                                    forceLoad: false))">
                                Login
                            </FluentButton>
                        </FluentStack>
                    }
                    else
                    {
                        <FluentStack Orientation="Orientation.Vertical" VerticalGap="20" 
                                     HorizontalAlignment="HorizontalAlignment.Center" 
                                     Style="margin-top: 100px;">
                            <FluentIcon Value="@(new Icons.Regular.Size48.ShieldError())" Color="Color.Error" />
                            <h2>Access Denied</h2>
                            <p>You don't have permission to access this page.</p>
                            <FluentButton Appearance="Appearance.Accent" 
                                OnClick="@(() => NavigationManager.NavigateTo("/", forceLoad: false))">
                                Go to Home
                            </FluentButton>
                        </FluentStack>
                    }
                </NotAuthorized>
            </AuthorizeRouteView>
            <FocusOnNavigate RouteData="routeData" Selector="h1" />
        </Found>
    </Router>
</CascadingAuthenticationState>

@code {
    [Inject] private NavigationManager NavigationManager { get; set; } = default!;
}


What changed?

  1. <CascadingAuthenticationState>: This wraps the entire router and makes authentication state available to all child components. Without it, <AuthorizeView> and [Authorize] attributes won't work.

  2. <AuthorizeRouteView>: Replaces the standard RouteView. This component checks the [Authorize] attribute on routed pages and enforces authorization rules.

  3. <NotAuthorized> with two states: This is subtle but important. The <NotAuthorized> content renders when authorization fails, but there are two scenarios:

    • Not authenticated (context.User.Identity?.IsAuthenticated != true): The user isn't logged in. Show a "Login" button.
    • Authenticated but not authorized (else): The user is logged in but lacks permission (e.g., wrong role). Show an "Access Denied" message.

Protecting Pages with [Authorize]

To require authentication for an entire page, add the [Authorize] attribute:

@page "/posts"
@attribute [Authorize]
@using Microsoft.AspNetCore.Authorization

<PageTitle>My Posts</PageTitle>

<h1>My Posts</h1>

<!-- Your protected content here -->

Now, unauthenticated users who navigate to /posts will see the "Authentication Required" message from Routes.razor, not the page content.

Note: [Authorize] also supports roles and policies (e.g. [Authorize(Roles = "Admin")]) for more granular access control, that's a topic for a future post.


Testing it out:

  1. Run your Aspire app host: dotnet run --project NoteBookmark.AppHost
  2. Navigate to your Blazor app in the browser.
  3. Click "Login"—you should redirect to Keycloak, authenticate, and return.
  4. You'll see "Hello, [your name]" in the header.
  5. Navigate to a page marked [Authorize] without logging in—you'll see the auth required message.
  6. Click "Logout"—you'll sign out of both the app and Keycloak.

Your Blazor app now has full OpenID Connect authentication with Keycloak, with a clean separation between the auth mechanics (Login/Logout pages), UI (LoginDisplay), and enforcement (Routes.razor + [Authorize]).


Conclusion

You've now integrated Keycloak authentication into your .NET Aspire application. The key pieces:

  1. Aspire orchestration: AddKeycloak(), .WithReference(), and .WaitFor() handle container lifecycle and configuration injection.
  2. OIDC pipeline: The standard ASP.NET Core authentication middleware, configured for Keycloak's OIDC endpoints.
  3. HTTP flexibility: Logic to handle HTTP Keycloak in dev while enforcing HTTPS in production.
  4. Persistent data: WithDataVolume() ensures your Keycloak realm config survives restarts.

This pattern scales beyond Keycloak, Aspire's resource model works the same way for databases, message queues, and other services. Once you've mastered .WithReference() and .WaitFor(), you can compose complex distributed systems with confidence.

The full working implementation is available in the NoteBookmark repository, including the AppHost configuration, Blazor components, and Docker Compose files referenced throughout this post.


Useful links


Private Vision AI: Run Reka Edge Entirely on Your Machine

Reka just released Reka Edge, a compact but powerful vision-language model that runs entirely on your own machine. No API keys, no cloud, no data leaving your computer. I work at Reka and putting together this tutorial was genuinely fun; I hope you enjoy running it as much as I did.

[Originally published at dev.to/reka]

In three steps, you'll go from zero to asking an AI what's in any image or video.

What You'll Need

  • A machine with enough RAM to run a 7B parameter model (~16 GB recommended)
  • Git
  • uv, a fast Python package manager. Install it with:
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    This works on macOS, Linux, and Windows (WSL). If you're on Windows without WSL, grab the Windows installer instead.

Step 1: Get the Model and Inference Code

Clone the Reka Edge repository from Hugging Face. This includes both the model weights and the inference code:

git clone https://huggingface.co/RekaAI/reka-edge-2603
cd reka-edge-2603

Step 2: Fetch the Large Files

Hugging Face stores large files (model weights and images) using Git LFS. After cloning, these files exist on disk but contain only small pointer files, not the actual content.

First, make sure Git LFS is installed. The command varies by platform:

# macOS
brew install git-lfs

# Linux / WSL (Ubuntu/Debian)
sudo apt install git-lfs

Then initialize it:

git lfs install

Then pull all large files, including model weights and media samples:

git lfs pull

Grab a coffee while it downloads, the model weights are several GB.


Step 3: Ask the Model About an Image or Video

To analyze an image, use the sample included in the media/ folder:

uv run example.py \
  --image ./media/hamburger.jpg \
  --prompt "What is in this image?"
the prompt and the burger image

Or pass a video with --video:

uv run example.py \
  --video ./media/many_penguins.mp4 \
  --prompt "What is in this?"

The model will load, process your input, and print a description, all locally, all private.

Try different prompts to unlock more:

  • "Describe this scene in detail."
  • "What text is visible in this image?"
  • "Is there anything unusual or unexpected here?"

What's Actually Happening? 

You don't need this to use the model, but if you're anything like me and can't help wondering what's going on under the hood, here's the magic behind example.py:

1. It picks the best hardware available. The script checks whether your machine has a GPU (CUDA for Nvidia, Metal for Apple Silicon) and uses it automatically. If neither is available, it falls back to the CPU. This affects speed, not quality.

if torch.cuda.is_available():
    device = torch.device("cuda")
elif mps_ok:
    device = torch.device("mps")
else:
    device = torch.device("cpu")

2. It loads the model into memory. The 7 billion parameter model is read from the folder you cloned. This is the "weights": billions of numbers that encode everything the model has learned. Loading takes ~30 seconds depending on your hardware.

processor = AutoProcessor.from_pretrained(args.model, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(args.model, ...).eval()

3. It packages your input into a structured message. Your image (or video) and your text prompt are wrapped together into a conversation-style format, the same way a chat message works, except one part is visual instead of text.

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": args.image},
        {"type": "text", "text": args.prompt},
    ],
}]

4. It converts everything into numbers. The processor translates your image into a grid of numerical patches and your prompt into tokens (small chunks of text, each mapped to a number). The model only understands numbers, so this step bridges the gap.

inputs = processor.apply_chat_template(
    messages, tokenize=True, return_tensors="pt", return_dict=True
)

5. The model generates a response, token by token. Starting from your input, the model predicts the most likely next word, then the next, up to 256 tokens. It stops when it hits a natural end-of-response marker.

output_ids = model.generate(**inputs, max_new_tokens=256, do_sample=False)

6. It converts the numbers back into text and prints it. The token IDs are decoded back into human-readable words and printed to your terminal. No internet involved at any point.

output_text = processor.tokenizer.decode(new_tokens, skip_special_tokens=True)
print(output_text)


Here the video

If you prefer watching and reading, here is the video version:

 

That's Pretty Cool, Right?

A single script. No API key. No cloud. You just ran a 7 billion parameter vision-language model entirely on your own machine, and it works whether you're on a Mac, Linux, or Windows with WSL, which is what I was using when I wrote this.

This works great as a one-off script: drop in a file, ask a question, get an answer. But what if you wanted to build something on top of it? A web app, a tool that watches a folder, or anything that needs to talk to the model repeatedly?

That's exactly what the next post is about. I'll show you how to wrap Edge as a local API, so instead of running a script, you have a service running on your machine that any app can plug into. Same model, same privacy, but now it's a proper building block.


~frank 

Reading Notes #689

Another week, another batch of interesting reads. This edition covers AI video experiments, extending coding agents with .NET skills, open source contributions, and a few podcast episodes worth adding to your queue.


AI

Programming

Open Source

Podcasts

Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, podcasts and books that catch my interest during the week.

If you have interesting content, share it!

~frank


Reading Notes #686

This week's Reading Notes is packed with AI insights, open-source discoveries, programming tips, and podcast episodes that will leave you eager to dive in. From Ralph Wiggum's coding secrets to the dangers of one-shot glamour, we've got it all covered. So grab your favourite beverage, settle in, and get ready to level up your tech game!

AI


Open Source


Programming


Podcast

~frank


Reading Notes #683

A lot of good stuff crossed my radar this week. From Aspire’s continued evolution and local AI workflows with Ollama, to smarter, more contextual help in GitHub Copilot, the theme is clear: better tools, used more intentionally. I also bookmarked a few thoughtful pieces on leadership and communication that are worth slowing down for. Plenty here to explore, whether you’re deep in code or thinking about how teams actually work.

Meetup MsDevMtl

Programming

AI

Open Source

  • The end of the curl bug-bounty (Daniel Stenberg) - I didn't know about this effort, and it's sad to learn about it too now, of course, but I'm glad those programs exist.

Miscellaneous

  • Why I Still Write Code as an Engineering Manager (James Sturtevant) - There is still hope, everyone! But more seriously, an inspiring post that managers should read.

  • The Art of the Oner (Golnaz) - Another great post from Golnaz talks about how to help the message to land. How and why one takes are helping when presenting and the effort it represents.

Sharing my Reading Notes is a habit I started a long time ago, where I share a list of all the articles, blog posts, and books that catch my interest during the week.

If you have interesting content, share it!

~frank



Automatically Create AI Clips with This n8n Template

I'm excited to share that my new n8n template has been approved and is now available for everyone to use! This template automates the process of creating AI-generated video clips from YouTube videos and sending notifications directly to your inbox.

French version of this post here

Try the template here: https://link.reka.ai/n8n-template-api


What Does This Template Do?

If you've ever wanted to automatically create short clips from long YouTube videos, this template is for you. It watches a YouTube channel of your choice, and whenever a new video is published, it uses AI to generate engaging short clips perfect for social media. You get notified by email when your clip is ready to download.

How It Works

The workflow is straightforward and runs completely on autopilot:

  1. Monitor YouTube channels - The template watches the RSS feed of any YouTube channel you specify. When a new video appears, the automation kicks off.

  2. Request AI clip generation - Using Reka's Vision API, the workflow sends the video for AI processing. You have full control over the output:

    • Write a custom prompt to guide the AI on what kind of clip to create
    • Choose whether to include captions
    • Set minimum and maximum clip duration
  3. Smart status checking - When the clips are ready, you receive a success email with your download link. As a safety feature, if the job takes too long, you'll get an error notification instead.

Getting Started is Easy

The best part? You can install this template with just one click from the n8n Templates page. No complex setup required!

After installation, you'll just need two quick things:

  • A free Reka AI API key (get yours from Reka)
  • A Gmail account (or use any email provider you like)

That's it! The template comes ready to use. Simply add your YouTube channel RSS feed, connect your API key, and you're ready to start generating clips automatically. The whole setup takes just a few minutes.

If you run into any questions or want to share what you've built, join the Reka Discord community. I'd love to hear how you're using this template!

Show Me

In this short  video I show you how to get that template into your n8n and how to configure it.

Happy clipping!

Writing My First Custom n8n Node: A Step-by-Step Guide

Recently, I decided to create a custom node for n8n, the workflow automation tool I've been using. I'm not an expert in Node.js development, but I wanted to understand how n8n nodes work under the hood. This blog post shares my journey and the steps that actually worked for me.

French version here

Why I Did This

Before starting this project, I was curious about how n8n nodes are built. The best way to learn something is by doing it, so I decided to create a simple custom node following n8n's official tutorial. Now that I understand the basics, I'm planning to build a more complex node featuring AI Vision capabilities, but that's for another blog post!

The Challenge

I started with the official n8n tutorial: Build a declarative-style node. While the tutorial is well-written, I ran into some issues along the way. The steps didn't work exactly as described, so I had to figure out what was missing. This post documents what actually worked for me, in case you're facing similar challenges. I already have an n8n instance running in a container. In Step 8, I'll explain how I run a second instance for development purposes.

Prerequisites

Before you start, you'll need:

  • Node.js and npm - I used Node.js version 24.12.0
  • Basic understanding of JavaScript/TypeScript - you don't need to be an expert

Step 1: Fixing the Missing Prerequisites

I didn't have Node.js installed on my machine, so my first step was getting that sorted out. Instead of installing Node.js directly, I used nvm (Node Version Manager), which makes it easy to manage different Node.js versions. Installation details are available on the nvm GitHub repository. Once nvm was set up, I installed Node.js version 24.12.0.

Most of the time, I use VS Code as my code editor. I created a new profile and used the template for Node.js development to get the right extensions and settings.

Step 2: Cloning the Starter Repository

n8n provides a n8n-nodes-starter on GitHub that includes all the basic files and dependencies you need. You can clone it or use it as a template for your own project. Since this was just a "learning exercise" for me, I cloned the repository directly:

git clone https://github.com/n8n-io/n8n-nodes-starter
cd n8n-nodes-starter

Step 3: Getting Started with the Tutorial

I won't repeat the entire tutorial here; it's clear enough, but I'll highlight some details along the way that I found useful.

The tutorial makes you create a "NasaPics" node and provides a logo for it. It's great, but I suggest you use your own logo images and have light and dark versions. Add both images in a new folder icons (same level as nodes and the credentials folder). Having two versions of the logo will make your node look better, whatever theme the user is using in n8n (light or dark). The tutorial only adds the logo in NasaPics.node.ts, but I found that adding it also in the credentials file NasaPicsApi.credentials.ts makes the node look more consistent.

Replace or add the logo line with this, and add Icon to the import statement at the top of the file:

icon: Icon = { light: 'file:MyLogo-dark.svg', dark: 'file:MyLogo-light.svg' };

Note: the darker logo should be used in light mode, and vice versa.

Step 4: Following the Tutorial (With Adjustments)

Here's where things got interesting. I followed the official tutorial to create the node files, but I had to make some adjustments that weren't mentioned in the documentation.

Adjustment 1: Making the Node Usable as a Tool

In the NasaPics.node.ts file, I added this line just before the properties array:

requestDefaults: {
      baseURL: 'https://api.nasa.gov',
      headers: {
         Accept: 'application/json',
         'Content-Type': 'application/json',
      },
   },
   usableAsTool: true, // <-- Added this line
   properties: [
      // Resources and operations will go here

This setting allows the node to be used as a tool within n8n workflows and also fixes warnings from the lint tool.

Adjustment 2: Securing the API Key Field

In the NasaPicsApi.credentials.ts file, I added a typeOptions to make the API key field a password field. This ensures the API key is hidden when users enter it, which is a security best practice.

properties: INodeProperties[] = [
   {
      displayName: 'API Key',
      name: 'apiKey',
      type: 'string',
      typeOptions: { password: true }, // <-- Added this line
      default: '',
   },
];

A Note on Errors

I noticed there were some other errors showing up in the credentials file. If you read the error message, you'll see that it's complaining about missing test properties. To fix this, I added a test property at the end of the class that implements ICredentialTestRequest. I also added the interface import at the top of the file.

authenticate: IAuthenticateGeneric = {
   type: 'generic',
   properties: {
      qs: {
         api_key: '={{$credentials.apiKey}}',
      },
   },
};

// Add this at the end of the class
test: ICredentialTestRequest = {
   request: {
      baseURL: 'https://api.nasa.gov/',
      url: '/user',
      method: 'GET',
   },
};

Step 5: Building and Linking the Package

Once I had all my files ready, it was time to build the node. From the root of my node project folder, I ran:

npm i
npm run build
npm link

During the build process, pay attention to the package name that gets generated. In my case, it was n8n-nodes-nasapics. You'll need this name in the next steps.

> n8n-nodes-nasapics@0.1.0 build
> n8n-node build

┌   n8n-node build 
│
◓  Building TypeScript files│
◇  TypeScript build successful
│
◇  Copied static files
│
└  ✓ Build successful

Step 6: Setting Up the n8n Custom Folder

n8n looks for custom nodes in a specific location: ~/.n8n/custom/. If this folder doesn't exist, you need to create it:

mkdir -p ~/.n8n/custom
cd ~/.n8n/custom

Then initialize a new npm package in this folder: run npm init and press Enter to accept all the defaults.

Step 7: Linking Your Node to n8n

Now comes the magic part - linking your custom node so n8n can find it. Replace n8n-nodes-nasapics with whatever your package name is. From the ~/.n8n/custom folder, run:

npm link n8n-nodes-nasapics

Step 8: Running n8n

This is where my setup differs from the standard tutorial. As mentioned at the beginning, I already have an instance of n8n running in a container and didn't want to install it. So I decided to run a second container using a different port. Here's the command I used:

docker run -d --name n8n-DEV -p 5680:5678 \
  -e N8N_COMMUNITY_PACKAGES_ENABLED=true \
  -v ~/.n8n/custom/node_modules/n8n-nodes-nasapics:/home/node/.n8n/custom/node_modules/n8n-nodes-nasapics \
  n8nio/n8n

Let me break down what this command does:

  • -d: Runs the container in detached mode (in the background)
  • --name n8n-DEV: Names the container for easy reference
  • -p 5680:5678: Maps port 5678 from the container to port 5680 on my machine so it doesn't conflict with my existing n8n instance
  • -e N8N_COMMUNITY_PACKAGES_ENABLED=true: Enables community packages — you need this to use custom nodes
  • -v: Mounts my custom node folder into the container, which lets me try my custom node without having to publish it.
  • n8nio/n8n: The official n8n container image

If you're running n8n directly on your machine (not in a container), you can simply start it.

Step 9: Testing Your Node

Once n8n-DEV is running, open your browser and navigate to it. Create a new workflow and search for your node. In my case, I searched for "NasaPics" and my custom node appeared!

To test it:

  1. Add your node to the workflow
  2. Configure the credentials with a NASA API key (you can get one for free at api.nasa.gov)
  3. Execute the node
  4. Check if the data is retrieved correctly

Updating Your Node

During development, you'll likely need to make changes to your code (aka node). Once done, you have to rebuild npm run build and restart the n8n container docker restart n8n-DEV to see the changes.

What's Next?

Now that I understand the basics of building custom n8n nodes, I'm ready to tackle something more ambitious. My next project will be creating a node that uses AI Vision capabilities. Spoiler alert: It's done and I'll be sharing the details in an upcoming blog post!

If you're interested in creating your own custom nodes, I encourage you to give it a try. Start with something simple, like I did, and build from there. Don't be afraid to experiment and make mistakes - that's how we learn!

Resources

Building an AI-Powered YouTube Clipper using n8n

My colleague Annie loves clipping videos from her favorite creators. You know that feeling when you catch a great moment and turn it into a perfect short? That's her jam. But she kept running into this frustrating problem: by the time she saw a new video and got around to clipping it, everyone else had already done it. She was always late to the party.

When she told me about this, I thought, "What if we could automatically clip videos the moment they're published?" That way, she'd have her clips ready to post while the content is still fresh.

So I put my experience with integration tools to work and built something for her—and for anyone else who has this same problem. And you know what? I'm pretty excited to share it with you.

French version here: Automatiser le clipping vidéo YouTube avec l'IA et n8n

What I Created

I put together an open-source n8n templates that automatically clips YouTube videos using AI. Here's how it works:

  1. It watches for new videos from your favorite YouTube channel
  2. Sends the video to Reka's AI to create clips automatically
  3. Checks when the clips are ready and sends you an email with the download link

The whole thing runs on n8n (it's a free automation platform), and it uses Reka's Clips API to do the AI magic. Best part? It's completely free to use and set up.

How It Actually Works

I built this using two n8n workflows that work together:

Workflow 1: Submit Reel Creation


This one's the watcher. It monitors a YouTube channel's RSS feed, and the moment a new video drops, it springs into action:

  • Grabs the video URL
  • Sends it to Reka's API with instructions like "Create an engaging short video highlighting the best moments"
  • Gets back a job ID so we can track the progress
  • Saves everything to a n8n data table

The cool thing is you can customize how the clips are made. Want vertical videos for TikTok? Done. Need subtitles? Got it. You can set the clip length anywhere from 0 to 30 seconds. It's all in the JSON configuration.

{
  "video_urls": ["{{ $json.link }}"],
  "prompt": "Create an engaging short video highlighting the best moments",
  "generation_config": {
    "template": "moments",
    "num_generations": 1,
    "min_duration_seconds": 0,
    "max_duration_seconds": 30
  },
  "rendering_config": {
    "subtitles": true,
    "aspect_ratio": "9:16"
  }
}

Workflow 2: Check Reel Status


This one's the patient checker. Since AI takes time to analyze a video and create clips (could be several minutes depending on the video length), we need to check in periodically:

  • Looks at all the pending jobs in our data table
  • Asks Reka's API "Hey, is this one done yet?"
  • When a clip is ready, sends you an email with the download link
  • Marks the job as complete so we don't check it again

I set mine to check every 15-30 minutes. No need to spam the API—good things take time! 😉

Setting It Up (It's Easier Than You Think)

When I was helping Annie set this up (you can watch the full walkthrough below), we got it working in just a few minutes. Here's what you need to do:

Step 1: Create Your Data Table

In n8n, create a new data table. Here's a pro tip I learned the hard way: don't name it "videos"—use something like "clip_jobs" or "reel_records" instead. Trust me on this one; it'll save you some headaches.

Your table needs four columns (all strings):

  • video_title - The name of the video
  • video_url - The YouTube URL
  • job_id - The ID Reka gives us to track the clip
  • job_status - Where we are in the process (queued, processing, completed, etc.)

Step 2: Import the Workflows

Download the two JSON files from the GitHub repo and import them into n8n. They'll show up with some errors at first—that's totally normal! We need to configure them.

Step 3: Configure "Submit Reel Creation"

  1. RSS Feed Trigger: Replace my YouTube channel ID with the one you want to monitor. You can find any channel's ID in their channel URL.

  2. API Key: Head to platform.reka.ai and grab your free API key. Pop it into the Bearer Auth field. Give it a memorable name like "Reka API key" so you know what it is later.

  3. Clip Settings: This is where you tell the AI what kind of clips you want. The default settings create one vertical video (9:16 aspect ratio) up to 30 seconds long with subtitles. But you can change anything:

    • The prompt ("Create an engaging short video highlighting the best moments")
    • Duration limits
    • Aspect ratio (square, vertical, horizontal—your choice)
    • Whether to include subtitles
  4. Data Table: Connect it to that table you created in Step 1.

Step 4: Configure "Check Reel Status"

  1. Trigger: Start with the manual trigger while you're testing. Once everything works, switch it to a schedule trigger (I recommend every 15-30 minutes).

  2. API Key: Same deal as before—add your Reka API key.

  3. Email: Update the email node with your email address. You can customize the subject and body if you want, but the default works great.

  4. Data Table: Make sure all the data table nodes point to your table from Step 1.

Watching It Work

When Annie and I tested it live, that moment when the first clip job came through with a "queued" status? That was exciting. Then checking back and seeing "completed"? Even better. And when that email arrived with the download link? Perfect.

The clips Reka AI creates are actually really good. It analyzes the entire video, finds the best key moments (or what ever your prompt asks), adds subtitles, and packages it all up in a format ready for social media.

Wrap Up

This tool works great whether you're a clipper enthusiast or a content creator looking to generate clips for your own channel. Once you set it up, it just runs. New video drops at 3 AM? Your clip is already processing. You wake up to a download link in your inbox.

It's open source and free to use. Take it, customize it, make it your own. And if you come up with improvements or have ideas, I'd love to hear about them. Share your updates on GitHub or join the conversation in the Reka Community Discord.

Watch the Full Setup

I recorded the entire setup process with Annie (she was testing it for the first time). You can see every step, every click, and yes, even the little mistakes we made along the way. That's real learning right there.


Get Started

Ready to try it? Here's everything you need:

🔗 n8n template/ Github: https://link.reka.ai/n8n-clip
🔗 Reka API key: https://link.reka.ai/free (renewable & free)


~frank



Ask AI from Anywhere: No GUI, No Heavy Clients, No Friction

Ever wished you could ask AI from anywhere without needing an interface? Imagine just typing ? and your question in any terminal the moment it pops into your head, and getting the answer right away! In this post, I explain how I wrote a tiny shell script that turns this idea into reality, transforming the terminal into a universal AI client. You can query Reka, OpenAI, or a local Ollama model from any editor, tab, or pipeline—no GUI, no heavy clients, no friction.

Small, lightweight, and surprisingly powerful: once you make it part of your workflow, it becomes indispensable.

💡 All the code scripts are available at: https://github.com/reka-ai/terminal-tools


The Core Idea

There is almost always a terminal within reach—embedded in your editor, sitting in a spare tab, or already where you live while building, debugging, and piping data around. So why break your flow to open a separate chat UI? I wanted to just type a single character (?) plus my question and get an answer right there. No window hopping. No heavy client.

How It Works

The trick is delightfully small: send a single JSON POST request to whichever AI provider you feel like (Reka, OpenAI, Ollama locally, etc.):

# Example: Reka
curl https://api.reka.ai/v1/chat
     -H "X-Api-Key: <API_KEY>" 
     -d {
          "messages": [
            {
              "role": "user",
              "content": "What is the origin of thanksgiving?"
            }
          ],
          "model": "reka-core",
          "stream": false
        }
# Example: Ollama local
curl http://127.0.0.1:11434/api/chat 
-d {  
      "model": "llama3",   
      "messages": [
        {
          "role": "user", 
          "content": "What is the origin of thanksgiving?"
        }], 
      "stream": false
    }

Once we get the response, we extract the answer field from it. A thin shell wrapper turns that into a universal “ask” verb for your terminal. Add a short alias (?) and you have the most minimalist AI client imaginable.

Let's go into the details

Let me walk you through the core script step-by-step using reka-chat.sh, so you can customize it the way you like. Maybe this is a good moment to mention that Reka has a free tier that's more than enough for this. Go grab your key—after all, it's free!

The script (reka-chat.sh) does four things:

  1. Captures your question
  2. Loads an API key from ~/.config/reka/api_key
  3. Sends a JSON payload to the chat endpoint with curl.
  4. Extracts the answer using jq for clean plain text.

1. Capture Your Question

This part of the script is a pure laziness hack. I wanted to save keystrokes by not requiring quotes when passing a question as an argument. So ? What is 32C in F works just as well as ? "What is 32C in F".

if [ $# -eq 0 ]; then
    if [ ! -t 0 ]; then
        QUERY="$(cat)"
    else
        exit 1
    fi
else
    QUERY="$*"
fi

2. Load Your API Key

If you're running Ollama locally you don't need any key, but for all other AI providers you do. I store mine in a locked-down file at ~/.config/reka/api_key, then read and trim trailing whitespace like this:

API_KEY_FILE="$HOME/.config/reka/api_key"
API_KEY=$(cat "$API_KEY_FILE" | tr -d '[:space:]')

3. Send The JSON Payload

Building the JSON payload is the heart of the script, including the API_ENDPOINT, API_KEY, and obviously our QUERY. Here’s how I do it for Reka:

RESPONSE=$(curl -s -X POST "$API_ENDPOINT" \
     -H "X-Api-Key: $API_KEY" \
     -H "Content-Type: application/json" \
     -d "{
  \"messages\": [
    {
      \"role\": \"user\",
      \"content\": $(echo "$QUERY" | jq -R -s .)
    }
  ],
  \"model\": \"reka-core\",
  \"stream\": false
}")

4. Extract The Answer

Finally, we parse the JSON response with jq to pull out just the answer text. If jq isn't installed we display the raw response, but a formatted answer is much nicer. If you are customizing for another provider, you may need to adjust the JSON path here. You can add echo "$RESPONSE" >> data_sample.json to the script to log raw responses for tinkering.

With Reka, the response look like this:

{
    "id": "cb7c371b-3a7b-48d2-829d-70ffacf565c6",
    "model": "reka-core",
    "usage": {
        "input_tokens": 16,
        "output_tokens": 460,
        "reasoning_tokens": 0
    },
    "responses": [
        {
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": " The origin of Thanksgiving ..."
            }
        }
    ]
}
The value we are looking for and want to display is the `content` field inside `responses[0].message`. Using `jq`, we do:
echo "$RESPONSE" | jq -r '.responses[0].message.content // .error // "Error: Unexpected response format"'

Putting It All Together

Now that we have the script, make it executable with chmod +x reka-chat.sh, and let's add an alias to your shell config to make it super easy to use. Add one line to your .zshrc or .bashrc that looks like this:

alias \\?=\"$REKA_CHAT_SCRIPT\"

Because ? is a special character in the shell, we escape it with a backslash. After adding this line, reload your shell configuration with source ~/.zshrc or source ~/.bashrc, and you are all set!

The Result

Now you can ask questions directly from your terminal. Wanna know what is origin of Thanksgiving, ask it like this:

? What is the origin of Thanksgiving

And if you want to keep the quotes, please you do you!

Extra: Web research

I couldn't stop there! Reka also supports web research, which means it can fetch and read web pages to provide more informed answers. Following the same pattern described previously, I wrote a similar script called reka-research.sh that sends a request to Reka's research endpoint. This obviously takes a bit more time to answer, as it's making different web queries and processing them, but the results are often worth the wait—and they are up to date! I used the alias ?? for this one.

On the GitHub repository, you can find both scripts (reka-chat.sh and reka-research.sh) along with a script to create the aliases automatically. Feel free to customize them to fit your workflow and preferred AI provider. Enjoy the newfound superpower of instant AI access right from your terminal!

What's Next?

With this setup, the possibilities are endless. Reka supports questions related to audio and video, which could be interesting to explore next. The project is open source, so feel free to contribute or suggest improvements. You can also join the Reka community on Discord to share your experiences and learn from others.


Resources




Check-In Doc MCP Server: A Handy Way to Search Only the Docs You Trust

Ever wished you could ask a question and have the answer come only from a handful of trusted documentation sites—no random blogs, no stale forum posts? That’s exactly what the Check-In Doc MCP Server does. It’s a lightweight Model Context Protocol (MCP) server you can run locally (or host) to funnel questions to selected documentation domains and get a clean AI-generated answer back.

What It Is

The project (GitHub: https://github.com/fboucher/check-in-doc-mcp) is a Dockerized MCP server that:

  • Accepts a user question.
  • Calls the Reka AI Research API with constraints (only allowed domains).
  • Returns a synthesized answer based on live documentation retrieval.

You control which sites are searchable by passing a comma‑separated list of domains (e.g. docs.reka.ai,docs.github.com). That keeps > results focused, reliable, and relevant.

What Is the Reka AI Research API?

Reka AI’s Research API lets you blend language model reasoning with targeted, on‑the‑fly web/document retrieval. Instead of a model hallucinating an answer from static training data, it can:

  • Perform limited domain‑scoped web searches.
  • Pull fresh snippets.
  • Integrate them into a structured response.

In this project, we use the research feature with a web_search block specifying:

  • allowed_domains: Only the documentation sites you trust.
  • max_uses: Caps how many retrieval calls it makes per query (controls cost & latency).

Details used here:

  • Model: reka-flash-research
  • Endpoint: http://api.reka.ai/v1/chat/completions
  • Auth: Bearer API key (generated from the Reka dashboard: https://link.reka.ai/free)

How It Works Internally

The core logic lives in ResearchService (src/Domain/ResearchService.cs). Simplified flow:

  1. Initialization
    Stores the API key + array of allowed domains, sets model & endpoint, logs a safe startup message.

  2. Build Request Payload
    The CheckInDoc(string question) method creates a JSON payload:

    var requestPayload = new {
      model,
      messages = new[] { new { role = "user", content = question } },
      research = new {
        web_search = new {
          allowed_domains = allowedDomains,
          max_uses = 4
        }
      }
    };
    
  3. Send Request
    Creates a HttpRequestMessage (POST), adds Authorization: Bearer <APIKEY>, sends JSON to Reka.

  4. Parse Response
    Deserializes into a RekaResponse domain object, returns the first answer string.

Adding It to VS Code (MCP Extension)

You can run it as a Docker-based MCP server. Two simple approaches:

Option 1: Via “Add MCP Server” UI

  1. In VS Code (with MCP extension), click Add MCP Server.
  2. Choose type: Docker image.
  3. Image name: fboucher/check-in-doc-mcp.
  4. Enter allowed domains and your Reka API key when prompted.

Option 2: Via mcp.json (Recommended)

Alternatively, you can manually configure it in your mcp.json file. This will make sure your API key isn't displayed in plain text. Add or merge this configuration:

{
  "servers": {
    "check-in-docs": {
      "type": "stdio",
      "command": "docker",
      "args": [
        "run",
        "-i",
        "--rm",
        "-e",
        "ALLOWED_DOMAINS=${input:allowed_domains}
        ",
        "-e",
        "APIKEY=${input:apikey}",
        "fboucher/check-in-doc-mcp"
      ]
    }
  },
  "inputs": [
    {
      "id": "allowed_domains",
      "type": "promptString",
      "description": "Enter the comma-separated list of documentation domains to allow (e.g. docs.reka.ai,docs.github.com):"
    },
    {
      "id": "apikey",
      "type": "promptString",
      "password": true,
      "description": "Enter your Reka Platform API key:"
    }
  ]
}

How to Use It

To use it ask to Check In Doc something or You can now use the SearchInDoc tool in your MCP-enabled environment. Just ask a question, and it will search only the specified documentation domains.

Final Thoughts

It’s intentionally simple—no giant orchestration layer. Just a clean bridge between a question, curated domains, and a research-enabled model. Sometimes that’s all you need to get focused, trustworthy answers.

If this sparks an idea, clone it and adapt away. If you improve it (citations, richer error handling, multi-turn context)—send a PR!

Watch a quick demo


Links & References