Franky's Notes: vision

Showing posts with label vision. Show all posts

Apps That See: Bringing Vision AI to Your Projects

Labels: ai, conference, demo, github, oss, vision

I was wearing a t-shirt with a partial Reka logo at the edge of the frame. I never said the word "Reka" in that segment. The model caught the logo, connected it to the topic I was discussing, and mentioned it unprompted in the output it generated.

That is not a transcript trick. The model was watching.

At the AI Agents Conference 2026, I gave a talk called "Apps That See" — six live demos showing how to build applications that understand images and video. Every project is open source and ready to clone. This post walks through each one so you have enough context to pick it up, run it, and adapt it to something useful in your own work.

Vision AI Is Accessible Now

Not long ago, working with visual AI meant GPU clusters, specialized teams, and weeks of training. Today a compressed 4B model like Qwen or Gemini 3 runs on a regular laptop and handles image description well enough to prototype. Step up to a 7B model like Reka Edge and the quality improves meaningfully. It also runs locally: a gaming PC with a decent GPU is enough. No server required.

For tasks that need more power, cloud APIs give you faster results without local hardware requirements. The tradeoff is that your images and video go to a third-party provider. For corridor cameras or stock photos that is usually acceptable. For private or sensitive content, local is the better default.

The practical pattern: start local to build and test, then decide whether the task actually requires cloud.

What You Can Build With This

Accessibility: Describe a scene in real time for visually impaired users, or identify objects on demand.
Content creation: Extract structure from a video and turn it into a blog post, caption set, or highlight reel.
Productivity: Search through thousands of videos for a specific object or topic, even when the title gives no indication of the content.
Automation: Trigger actions only when specific visual conditions are met, such as an unrecognized person entering a room.
Fun: Most developers' first contact with AI is building something for themselves, and that is a perfectly valid starting point.

Demo 1: Caption This — Generate a Prompt from Any Image

Source: fboucher/caption-this

If you work with image generation models, you end up with a lot of images to test and compare. Writing the text prompt that would reproduce a specific image is tedious. This tool does it for you: give it an image, get back a prompt you can use to regenerate something similar.

The demo uses an HTTP client extension in VS Code to call the API directly, no SDK. Pass an image, ask for a plain-text prompt that would recreate it. One prompt detail that improved results noticeably: add no markdown to the instruction.

POST https://api.reka.ai/v1/chat
Content-Type: application/json

{
  "model": "reka-flash",
  "messages": [{
    "role": "user",
    "content": [
      { "type": "image_url", "image_url": { "url": "https://..." } },
      { "type": "text", "text": "Write a prompt in plain text, no markdown, that would generate the exact same image." }
    ]
  }]
}

One thing to know when testing this across different models: some accept an image URL directly, others require the image as a base64-encoded string. Same task, same prompt, different input contract. If you plan to swap models in your app, account for this difference from the start.

Demo 2: Media Library — Compare Vision Models Side by Side

Source: fboucher/media-library

This is a web app that connects to multiple vision backends and lets you switch between them at runtime. The motivation: benchmark Reka Edge running locally — via OpenRouter or directly through the Reka API — against other models on real tasks.

Object detection surfaces the biggest portability problem. Some models return bounding boxes in an HTML-style bracket format with pixel coordinates. Others use a 2D box structure with a different coordinate scheme. If you code against one format and then swap models, your rendering breaks. There is no standard here — handle the differences at the application layer, not the model layer.

The app uses the OpenAI API format as the common interface across all backends. Any model with a compatible endpoint can be swapped in with minimal changes. It does not eliminate the per-model quirks, but it reduces the friction of switching to a configuration change rather than a rewrite.

Video input is supported too, though far fewer models handle it than images. Of the models tested, Reka Edge is the standout for video — the others either reject it or behave inconsistently.

Demo 3: Video2Blog — Turn a Video into a Structured Post

Source: fboucher/video2blog

I built this for myself. I do a lot of tutorial videos and I wanted a tool that would turn a recording into a structured blog post without me having to write one from scratch.

The tool sends the video to a vision model with a detailed prompt: target structure, tone, format, and an instruction to flag moments where a screenshot would add value. The model returns timestamps — it cannot extract frames itself, but it tells you exactly where to look, and you pull them locally with ffmpeg.

That creates one architectural quirk worth knowing: the video lives in two places. ffmpeg needs it locally to extract frames. The hosted model needs it uploaded to analyze content. For a one-evening project it works well enough, and I use it often enough that it has paid for itself many times over.

After the first draft, you stay in a conversation loop: change the tone, translate to French, swap a timestamp, restructure a section. The model holds context and iterates with you until the result is what you want.

Demo 4: Video Analyzer — Search and Query Your Video Library

Source: reka-ai/api-examples-dotnet

Most video search runs on titles, descriptions, and transcribed audio. This demo searches by what is actually visible on screen.

The app pre-indexes a video library by sending each video through a vision model ahead of time. When a query arrives, the heavy work is already done. A search for "robot arm" returns the right video — a clip of a robotic arm animation. It also returns a false positive: fast-moving hands apparently looked close enough to fool the model. Useful, not perfect, and worth designing around in your UX.

The Q&A feature goes further. You pick a video and ask a specific question. "What database was used?" returned MySQL — and noted it was running in a Docker container. The model identified that from watching the screen, not from audio. No transcript needed.

From there, you can generate study materials from any recorded session. The demo produces a multiple-choice quiz with answer options, correct answers, and explanations. The model is doing comprehension, not transcription.

Demo 5: Roast My Life — What the Model Actually Sees

Source: reka-ai/api-examples-python

I never mentioned the pictures on my wall. The model did.

In a video about Python and AI, the model's generated blog post made a remark about the artwork hanging behind me. I had said nothing about it. The model noticed, mentioned it, and moved on as if it were obvious.

Then there was the t-shirt moment described at the top of this post. A partial logo, half out of frame, no mention of it anywhere in the audio — and the model connected it to the topic anyway.

This demo is named Roast My Life because the model ends up commenting on things you never intended to share. But the real point is what it reveals: a vision model is not a smarter transcript. It is watching. The larger models do this particularly well, and once you see it, it changes how you think about what these tools can do — and what they will pick up without you asking.

Demo 6: N8N Automation — No-Code Video Clipping Pipeline

Sources: N8N Reka Vision integration

Vision AI does not always need custom code. This demo wires everything together in N8N, a visual workflow tool, with no programming required.

The trigger is a new video published to YouTube. The workflow finds an engaging clip, reformats it from horizontal to vertical, adds captions in a specific style (all lowercase, specific colors — chosen to be obviously distinct from any default), and sends an email with the finished clip attached. The whole thing runs automatically.

For developers, this pattern is worth knowing even if you code everything else. Many real business workflows have a vision AI step that fits cleanly into a larger automation, and a no-code tool is often the fastest way to ship it.

Watch the Full Talk

The demos above are the written version. The live version, with the actual code running, models responding in real time, and a few things going sideways in interesting ways, is on YouTube.

All the Code

The demos span Python, C#, raw HTTP, Go, and N8N. Vision AI is not tied to a specific stack — if your environment can make an HTTP request, it can call a vision model.

All projects:

Reading Notes #693

Labels: agent, ai, blazor, company, docker, dotnet, fluentui, Migration, readingnotes, reka, sanbox, sandbox, Security, smart glasses, Tools, vision

I'm always on the lookout for innovative ways to enhance my coding experience, and this week's Reading Notes are filled with exciting discoveries! From cutting-edge UI libraries to secure sandbox environments for AI agents, I've curated a selection of articles that showcase the latest programming trends and technologies.

Whether you're interested in harnessing the power of Docker sandboxes or exploring the potential of smart glasses integration, there's something on this list for everyone.

Programming

What's new for the Microsoft Fluent UI Blazor library 5.0 RC2 (Vincent Baaij) - V5 already! Wow, time flies! Tons of features in this release, looking forward to try the pinned column from the grid.

AI

Docker Sandboxes: Run Agents in YOLO Mode, Safely (Eric Jia, Srini Sekaran,Timir Karia) - From what I saw, it is very secure, and it is that simple to set up. Looking forward to do more with it.
Running AI agents safely in a microVM using docker sandbox (Andrew Lock) - This post answers so many questions! I heard a lot about those sandboxes and tried a few things, but had so many questions. Great post Andrew.
Experimenting with AI subagents (Nicolas Fränkel) - An interesting story with what seems to be a good vision of the future.
Old Protocols, New Pipes (Mark Downie) - Well said! And I hope you had a good time doing it.
I Recorded 13 Hours of My Day With Smart Glasses for AI. Here's What I Built and What I Learned (tash-2s) - A really cool project that uses AI to make a pile of videos into some kind of dataset. A new to do daily journaling.

Miscellaneous

Claude Code Windows Migration Guide: Move Your Setup (Dirk Strauss) - There is always a part that is easy in starting a new computer and a part that requires more effort.

~frank

How to Serve a Vision AI Model Locally with vLLM and Reka Edge

Labels: ai, local, oss, reka, vision, vllm

Running an AI model as a one-shot script is useful, but it forces you to restart the model every time you need a result. Setting it up as a service lets any application send requests to it continuously, without reloading the model. This guide shows how to serve Reka Edge using vLLM and an open-source plugin, then connect a web app to it for image description and object detection.

The vLLM plugin is available at github.com/reka-ai/vllm-reka.
The demo Media Library app is at github.com/fboucher/media-library.

Prerequisites

You need a machine with a GPU and either Linux, macOS, or Windows (with WSL). I use UV, a fast Python package and project manager, or pip + venv if you prefer.

Clone the vLLM Reka Plugin

Reka models require a dedicated plugin to run under vLLM, not all models need this extra step, but Reka's architecture requires it. Clone the plugin repository and enter the directory:

git clone https://github.com/reka-ai/vllm-reka
cd vllm-reka

The repository contains the plugin code and a serve.sh script you will use to start the service.

Download the Reka Edge Model

Before starting the service, you need the model weights locally. Install the Hugging Face Hub CLI and use it to pull the reka-edge-2603 model into your project directory:

uv sync
uv pip install huggingface_hub
uvx hf download RekaAI/reka-edge-2603 --local-dir ./models/reka-edge-2603

This is a large model, so make sure you have enough disk space and a stable connection.

Start the Service

Once the model is downloaded, start the vLLM service using the serve.sh script included in the plugin:

uv run bash serve.sh ./models/reka-edge-2603

The script accepts environment variables to configure which model to load and how much GPU memory to allocate. If your GPU cannot fit the model at default settings, open serve.sh and adjust the variables at the top. The repository README lists the available options. The service takes a few seconds to load the model weights, then starts listening for HTTP requests.

As an example with an NVIDIA GeForce RTX 5070, here are the settings I used to run the model:

export GPU_MEM=0.80
export MAX_LEN=4096
export MAX_BATCH_TOKENS=4096
export MAX_IMAGES=2
export MAX_VIDEOS=1
export VIDEO_NUM_FRAMES=4
uv run bash serve.sh ./models/reka-edge-2603

Connect the Media Library App

With the backend running, time to start the Media Library app. Clone the repository, jump into the directory, and run it with Docker:

git clone https://github.com/fboucher/media-library
cd media-library
docker compose up --build -d

Open http://localhost:8080 in your browser, then add a new connection with these settings:

Name: local (or any label you want)
IP address: your machine's local network IP (e.g. 192.168.x.x)
API key: leave blank or enter anything — no key is required for a local connection
Model: reka-edge-2603

Click Test to confirm the connection, then save it.

Try It: Image Description and Object Detection

Select an image in the app and choose your local connection, then click Fill with AI. The app sends the image to your vLLM service, and the model returns a natural language description. You can watch the request hit your backend in the terminal where the service is running.

Reka Edge also supports object detection. Type a prompt asking the model to locate a specific feature (ex: "face") and the model returns bounding-box coordinates. The app renders these as red boxes overlaid on the image. This works for any region you can describe in a prompt.

Switch to the Reka Cloud API

If your local GPU is too slow for production use, you can point the app at the Reka APIs instead. Add a new connection in the app and set the base URL to the Reka API endpoint. Get your API key from platform.reka.ai. OpenRouter is another option if you prefer a unified API across providers.

The model name stays the same (reka-edge-2603), so switching between local and cloud is just a matter of selecting a different connection in the app. The cloud API is noticeably faster because Reka's servers are more powerful than a local GPU (at least mine :) ). During development, use the local service to avoid burning credits; switch to the API for speed when you need it.

What You Can Build

The service you just set up accepts any image, or video via HTTP — point a script at a folder and you have a batch pipeline for descriptions, tags, or bounding boxes. Swap the prompt and you change what it extracts. The workflow is the same whether you are running locally or through the API.

References

Reka Edge model: huggingface.co/RekaAI/reka-edge-2603
vLLM Reka plugin: github.com/reka-ai/vllm-reka
Media Library app: github.com/fboucher/media-library
Reka API platform: platform.reka.ai

Private Vision AI: Run Reka Edge Entirely on Your Machine

Labels: ai, edge, huggingface, local, oss, python, vision

Reka just released Reka Edge, a compact but powerful vision-language model that runs entirely on your own machine. No API keys, no cloud, no data leaving your computer. I work at Reka and putting together this tutorial was genuinely fun; I hope you enjoy running it as much as I did.

[Originally published at dev.to/reka]

In three steps, you'll go from zero to asking an AI what's in any image or video.

What You'll Need

A machine with enough RAM to run a 7B parameter model (~16 GB recommended)
Git
uv, a fast Python package manager. Install it with:
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```
This works on macOS, Linux, and Windows (WSL). If you're on Windows without WSL, grab the Windows installer instead.

Step 1: Get the Model and Inference Code

Clone the Reka Edge repository from Hugging Face. This includes both the model weights and the inference code:

git clone https://huggingface.co/RekaAI/reka-edge-2603
cd reka-edge-2603

Step 2: Fetch the Large Files

Hugging Face stores large files (model weights and images) using Git LFS. After cloning, these files exist on disk but contain only small pointer files, not the actual content.

First, make sure Git LFS is installed. The command varies by platform:

# macOS
brew install git-lfs

# Linux / WSL (Ubuntu/Debian)
sudo apt install git-lfs

Then initialize it:

git lfs install

Then pull all large files, including model weights and media samples:

git lfs pull

Grab a coffee while it downloads, the model weights are several GB.

Step 3: Ask the Model About an Image or Video

To analyze an image, use the sample included in the media/ folder:

uv run example.py \
  --image ./media/hamburger.jpg \
  --prompt "What is in this image?"

Or pass a video with --video:

uv run example.py \
  --video ./media/many_penguins.mp4 \
  --prompt "What is in this?"

The model will load, process your input, and print a description, all locally, all private.

Try different prompts to unlock more:

"Describe this scene in detail."
"What text is visible in this image?"
"Is there anything unusual or unexpected here?"

What's Actually Happening?

You don't need this to use the model, but if you're anything like me and can't help wondering what's going on under the hood, here's the magic behind example.py:

1. It picks the best hardware available. The script checks whether your machine has a GPU (CUDA for Nvidia, Metal for Apple Silicon) and uses it automatically. If neither is available, it falls back to the CPU. This affects speed, not quality.

if torch.cuda.is_available():
    device = torch.device("cuda")
elif mps_ok:
    device = torch.device("mps")
else:
    device = torch.device("cpu")

2. It loads the model into memory. The 7 billion parameter model is read from the folder you cloned. This is the "weights": billions of numbers that encode everything the model has learned. Loading takes ~30 seconds depending on your hardware.

processor = AutoProcessor.from_pretrained(args.model, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(args.model, ...).eval()

3. It packages your input into a structured message. Your image (or video) and your text prompt are wrapped together into a conversation-style format, the same way a chat message works, except one part is visual instead of text.

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": args.image},
        {"type": "text", "text": args.prompt},
    ],
}]

4. It converts everything into numbers. The processor translates your image into a grid of numerical patches and your prompt into tokens (small chunks of text, each mapped to a number). The model only understands numbers, so this step bridges the gap.

inputs = processor.apply_chat_template(
    messages, tokenize=True, return_tensors="pt", return_dict=True
)

5. The model generates a response, token by token. Starting from your input, the model predicts the most likely next word, then the next, up to 256 tokens. It stops when it hits a natural end-of-response marker.

output_ids = model.generate(**inputs, max_new_tokens=256, do_sample=False)

6. It converts the numbers back into text and prints it. The token IDs are decoded back into human-readable words and printed to your terminal. No internet involved at any point.

output_text = processor.tokenizer.decode(new_tokens, skip_special_tokens=True)
print(output_text)

Here the video

If you prefer watching and reading, here is the video version:

That's Pretty Cool, Right?

A single script. No API key. No cloud. You just ran a 7 billion parameter vision-language model entirely on your own machine, and it works whether you're on a Mac, Linux, or Windows with WSL, which is what I was using when I wrote this.

This works great as a one-off script: drop in a file, ask a question, get an answer. But what if you wanted to build something on top of it? A web app, a tool that watches a folder, or anything that needs to talk to the model repeatedly?

That's exactly what the next post is about. I'll show you how to wrap Edge as a local API, so instead of running a script, you have a service running on your machine that any app can plug into. Same model, same privacy, but now it's a proper building block.

~frank

AI Vision: Turning Your Videos into Comedy Gold (or Cringe)

Labels: ai, api, demo, github, post, reka, video, vision

I've spent most of my career building software in C# and .NET, and only used Python in IoT projects. When I wanted to build a fun project—an app that uses AI to roast videos, I knew it was the perfect opportunity to finally dig into Python web development.

The question was: where do I start? I hopped into a brainstorming session with Reka's AI chat and asked about options for building web apps in Python. It mentioned Flask, and I remembered friends talking about it being lightweight and perfect for getting started. That sounded right.

In this post, I share how I built "Roast My Life," a Flask app using the Reka Vision API.

The Vision (Pun Intended)

The app needed three core things:

List videos: Show me what videos are in my collection
Upload videos: Let me add new ones via URL
Roast a video: Send a selected video to an AI and get back some hilarious commentary

See it in action

Part 1: Getting Started Environment Setup

The first hurdle was always going to be environment setup. I'm serious about keeping my Python projects isolated, so I did the standard dance:

Before even touching dependencies, I scaffolded a super bare-bones Flask app. Then one thing I enjoy from C# is that all dependencies are brought in one shot, so I like doing the same with my python projects using requirements.txt instead of installing things ad‑hoc (pip install flask then later freezing).

Dropping that file in first means the setup snippet below is deterministic. When you run pip install -r requirements.txt, Flask spins up using the exact versions I tested with, and you won't accidentally grab a breaking major update.

Here's the shell dance that activates the virtual environment and installs everything:

cd roast_my_life/workshop
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Then came the configuration. We will need an API key and I don't want to have it hardoced so I created a .env file to store my API credentials:

API_KEY=YOUR_REKA_API_KEY
BASE_URL=https://vision-agent.api.reka.ai

To get that API key, I visited the Reka Platform and grabbed a free one. Seriously, a free key for playing with AI vision APIs? I was in.

With python app.py, I fired up the Flask development server and opened http://127.0.0.1:5000 in my browser. The UI was there, but... it was dead. Nothing worked.

Perfect. Time to build.

The Backend: Flask Routing and API Integration

Coming from ASP.NET Core's controller-based routing and Blazor, Flask's decorator-based approach felt just like home. All the code code goes in the app.py file, and each route is defined with a simple decorator. But first things first: loading configuration from the .env file using python-dotenv:

from flask import Flask, request, jsonify
import requests
import os
from dotenv import load_dotenv

app = Flask(__name__)

# Load environment variables (like appsettings.json)
load_dotenv()
api_key = os.environ.get('API_KEY')
base_url = os.environ.get('BASE_URL')

All the imports packages are the same ones that needs to be in the requirements.txt. And we retreive the API key and base URL from environment variables, just like in .NET Core.

Now, to be able to get roasted we need first to upload a video to the Reka Vision API. Here's the code—I'll go over some details after.


@app.route('/api/upload_video', methods=['POST'])
def upload_video():
    """Upload a video to Reka Vision API"""
    data = request.get_json() or {}
    video_name = data.get('video_name', '').strip()
    video_url = data.get('video_url', '').strip()
    
    if not video_name or not video_url:
        return jsonify({"error": "Both video_name and video_url are required"}), 400
    
    if not api_key:
        return jsonify({"error": "API key not configured"}), 500
    
    try:
        response = requests.post(
            f"{base_url.rstrip('/')}/videos/upload",
            headers={"X-Api-Key": api_key},
            data={
                'video_name': video_name,
                'index': 'true',  # Required: tells Reka to process the video
                'video_url': video_url
            },
            timeout=30
        )
        
        response_data = response.json() if response.ok else {}
        
        if response.ok:
            video_id = response_data.get('video_id', 'unknown')
            return jsonify({
                "success": True,
                "video_id": video_id,
                "message": "Video uploaded successfully"
            })
        else:
            error_msg = response_data.get('error', f"HTTP {response.status_code}")
            return jsonify({"success": False, "error": error_msg}), response.status_code
            
    except requests.Timeout:
        return jsonify({"success": False, "error": "Request timed out"}), 504
    except Exception as e:
        return jsonify({"success": False, "error": f"Upload failed: {str(e)}"}), 500

Once the information from the frontend is validated we make a POST request to the Reka Vision API's /videos/upload endpoint. The parameters are sent as form data, and we include the API key in the headers for authentication. Here I was using URLs to upload videos, but you can also upload local files by adjusting the request accordingly. As you can see, it's pretty straightforward, and the documentation from Reka made it easy to understand what was needed.

The Magic: Sending Roast Requests to Reka Vision API

Here's where things get interesting. Once a video is uploaded, we can ask the AI to analyze it and generate content. The Reka Vision API supports conversational queries about video content:

def call_reka_vision_qa(video_id: str) -> Dict[str, Any]:
    """Call the Reka Video QA API to generate a roast"""
    
    headers = {'X-Api-Key': api_key} if api_key else {}
    
    payload = {
        "video_id": video_id,
        "messages": [
            {
                "role": "user",
                "content": "Write a funny and gentle roast about the person, or the voice in this video. Reply in markdown format."
            }
        ]
    }
    
    try:
        resp = requests.post(
            f"{base_url}/qa/chat",
            headers=headers,
            json=payload,
            timeout=30
        )
        
        data = resp.json() if resp.ok else {"error": f"HTTP {resp.status_code}"}
        
        if not resp.ok and 'error' not in data:
            data['error'] = f"HTTP {resp.status_code} calling chat endpoint"
        
        return data
        
    except requests.Timeout:
        return {"error": "Request to chat API timed out"}
    except Exception as e:
        return {"error": f"Chat API call failed: {e}"}

Here we pass the video ID and a prompt asking for a "funny and gentle roast." The API responds with AI-generated content, which we can then send back to the frontend for display. I try to give more "freedom" to the AI by asking it to reply in markdown format, which makes the output more engaging.

Try It Yourself!

The complete project is available on GitHub: reka-ai/api-examples-python

What Makes Reka Vision API So nice to use

What really stood out to me was how approachable the Reka Vision API is. You don't need any special SDK—just the requests library making standard HTTP calls. And honestly, it doesn't matter what language you're used to; an HTTP call is pretty much always simple to do. Whether you're coming from .NET, Python, JavaScript, or anything else, you're just sending JSON and getting JSON back.

Authentication is refreshingly straightforward: just pop your API key in the header and you're good to go. No complex SDKs, no multi-step authentication flows, no wrestling with binary data streams. The conversational interface lets you ask questions in natural language, and you get back structured JSON responses with clear fields.

One thing worth noting: in this example, the videos are pre-uploaded and indexed, which means the responses come back fast. But here's the impressive part—the AI actually looks at the video content. It's not just reading a transcript or metadata; it's genuinely analyzing the visual elements. That's what makes the roasts so spot-on and contextual.

Final Thoughts

The Reka Vision API itself deserves credit for making video AI accessible. No complicated SDKs, no multi-GB model downloads, no GPU requirements. Just simple HTTP requests and powerful AI capabilities. I'm not saying I'm switching to Python full-time, but expect to see me sharing more Python projects in the future!

References and Resources

GitHub Repository: reka-ai/api-examples-python
Join Reka community and share what you build! You can find me there as fboucheros.

Reading Notes #415

Labels: ai, api, azure, cloud, cognitiveservices, docker, dotnet, function, google, iot, mongo, python, readingnotes, terminal, vision, visualstudio, vso, windows, wsl

Every Monday, I share my "reading notes". Those are the articles, blog posts, podcast episodes, and books that catch my interest during the week and that I found interesting. It's a mix of the actuality and what I consumed.

Cloud

Creating a RESTful Serverless API with Azure Functions and MongoDB (Will Velida) - This is a great tutorial that uses Azure function mongo and c# to build a project from scratch. It is simple enough to make sense and be easy to understand.

Running Containers in Azure with Rancher (Tom Callway) - This post introduces Rancher, a tool thatt looks really interesting.

Step-by-Step: Finding and Analyzing Microsoft Azure Cloud Usage Costs (Sonia Cuff ) - Great post that provides a few tips to better see or highlight some cost sending in Azure.

Programming

Build your own Azure CLI Extensions (Michael Crump) - A very cool tutorial that explains how to do our own extension.

How to set up Docker within Windows System for Linux (WSL2) on Windows 10 (Scott Hanselman) - Nice update; tons of info about wsl, docker, ssh and more.

Pulling your DEV.TO Stats into a Google Sheet (Jeremy Morgan) - That's a cool little project, and I'm pretty sure it can be used by a lot of authors...on devto.

Using Cognitive Services: Custom Vision Service with Azure IoT Edge - Cloud | Mobile | Edge (Jared Rhodes) - A nice tutorial that shows how to use Python and vscode to create an awesome app.

What's New in Visual Studio Online | Visual Studio Blog (Allison Buchholtz-Au) - Wow, a turbo mode! But this time(remember the turbo on our old 486 beige pc), it makes totally sens...

Miscellaneous

What are Azure CLI Extensions? (Michael Crump) - An interesting first article of a series. This one introduces us to the extension... Hmmm. I think I have an idea.

Reading Notes #382

Labels: azure, blazor, books, cloud, cognitiveservices, container, csharp, dependency, di, EF, function, injection, kubernetes, readingnotes, storage azcopy, sync, vision

Cloud

Getting Started with the Cognitive Services Computer Vision API - The Wit and Ramblings of David Giard (David Giard ) - Nice little tutorial to get started quickly with the Vision API.
Kubernetes: What it is and what it isn’t (Jason Haley) - Cool source of resources related to how to learn Kubernetes
Sync Folder with Azure Blob Storage (Thomas Maurer) - Wow, this is so sick! Very happy about this feature.
Using Entity Framework with Azure Functions (Jeff Hollan) - Pretty cool demo about the new dependency injection feature of Azure Function.

Programming

Clever little C# and ASP.NET Core features that make me happy (Scott Hanselman) - Nice post about some C# gems...If I can say that.
10 Top Blazor Tools Just a NuGet Away -- Visual Studio Magazine (David Ramel) - Wow, so many great projects...
Top 10 C# Developer Books for Summer 2019 (Claudio Bernasconi) - Great list of books to get started with C# or as a developer. I'll definitely refer people to it when asked where to start.

Reading Notes #331

Labels: api, architrcture, azure, cloud, container, github, instance, linux, logicapps, microsoft, readingnotes, vision, vscode, windows

Cloud

Advanced Architecture for ASP.NET Core Web API (Chris Woodruff) - Nice and detailed post that describes the architecture pattern.
Analyzing images using Azure Cognitive Services (Gunnar Pei[man) - Cool post that introduces the API and shows some examples.
Cloning Azure Logic App to create a new one (Abhijit Jana) - This is soooooo sick! (In a good way)
Getting Started with Azure Container Instances (Mark Heath) - Nice new course available on Pluralsight.
Quickly rollback Azure Logic Apps to the Previous Version (Abhijit Jana) - Really cool!

Programming

Carriage Returns and Line Feeds will ultimately bite you - Some Git Tips (Scott Hanselman) - I didn't know that what I'm doing (coding CLI) was weird... But yeah I can feel the pain.
How I Increased My Productivity With Visual Studio Code (Michael Hoffmann) - Nice little post, learn how to get better with VsCode.
Microsoft + GitHub = Empowering Developers (Satya Nadella) - This was a big news this week. I profoundly think its for the best.

Books

Wait, What?: And Life's Other Essential Questions (James E. Ryan) - Simple, short, but very interesting book.

Miscellaneous

When Do People Read Blogs? The Best Times to Post (null) - Looks like timing may not be everything... But it's a big part of the equation.

Reading Notes #321

Labels: ai, angular, arm, aspnet, azure, Backup, bestpractices, cloud, cognitiveservices, database, dotnetcore, google, k8s, markdown, microservice, microsoft, nodejs, readingnotes, vision

Suggestion of the week

Creating an ASP.NET Core Markdown TagHelper and Parser (Rick Strahl) - Amazing post that shares a cool taghelper for our projects, and explains all steps to build it.

Cloud

Beyond The Obvious With API Vision On Cognitive Services (Víctor Moreno ) - This post is a nice and details overview of the cognitive services.
Deployment Pipelines For Versioned Azure Resource Manager Template Deployments ( Peter Groenewegen) - Great post that explains all the best practices to manage healthy ARM template.
Differences between Azure Functions v1 and v2 in C# (Damien Guard) - Nice overview showing the differences in the code.
Secure your backups, not just your data! (Aruna Somendra) - I do most of that already...but yeah not all of it. What's YOUR backup plan?

Programming

Azure vs GCP part 15: Microservices (Kenichiro Nakamura) - Interesting post that compares two microservice platforms... A must.
Converting a Ghost blog to a Progressive Web App (Dean Hume) - Very interesting post that explains every step in its conversion journey.
How to deploy Angular application to GitHub (Dhananjay Kumar) - Nice post to get started with Angular and Angular CLI.
Demystifying Serverless with Cecil Phillip (Scott Hanselman) - Very interesting discussion about serverless or less server programming...

Databases

The world of Automatic Tuning – SQL Server 2017 | All About SQL on WordPress.com (blobeater) - I was also surprised, but yes it's true the auto tunning really works.

Miscellaneous

From Windows to the Cloud (Tim Sneath) - Wow! Well... Speechless, just read it.

Books

When: The Scientific Secrets of Perfect Timing (Daniel H. Pink)

A really amazing book packed of very interesting advice. Things that you kind of already knew, or at least had a feeling you maybe knew are clearly explained to you.

After reading (or listening) this book, you will know why, and you can decide to fight it or change the when... improve your performance and use your time and energy on something else.

ISBN: 0525589333

Reading Notes #319

Labels: analitics, api, ApplicationInsights, arm, azure, cloud, database, dotnetcore, manageinstance, metric, ml, mysql, postgrest, readingnotes, swagger, Tools, vision, xamarin

Cloud

Azure Log Analytics ML: Using the evaluate operator with the app() or workspace() scope function (Iris Classon) - A nice blog post that shares some beat stuff to make our queries to work.
Custom Vision service deployment on Azure with ARM template (Vivien Chevallier) - The perfect post to get started following the best practices.

Programming

Setting up Application Insights took 10 minutes. It created two days of work for me. (Scott Hanselman) - Application Insights is really an incredible tool. A must-have in any online project... Not only the cloud one, just need to have internet access.
Using Swagger to automatically generate the client code (Evgeny Zborovsky) - A nice tutorial that explains how to do cool stuff with swagger.

Databases

Making Azure the best place for all your applications and data (Rohan Kumar) - A few great products that turn GA are announced in this post.

Miscellaneous

Finding What's Changed in Your Code (Peter Vogel) - This post is like a big slice of wisdom cake...
Follow Your Passion Is WRONG? (What About MONEY?) (John Sonmez) - Follow your passion or what I personally refer to "fun".It's very important.

Reading Notes #317

Labels: azure, cli, cloud, cognitive services, cognitiveservices, databases, github, Migration, opensource, readingnotes, sql, storage, training, vision

Suggestion of the week

Migrating to Azure SQL Database with zero downtime for read-only workloads (Joseph Sack) - Learn the best practices and more about the tools available to you in this great post.

Cloud

Analyzing a karting image with Azure Cognitive Services (Vivien Chevallier) - It's time to play with these Cognitive services. This post is a nice quick start.
Microsoft updates Cognitive Services terms (Microsoft Azure) - Good to know.
Oops! I Deleted My Blobs! What Can I Do? (Gaurav Mantri) - A nice new feature! Is it time to upgrade all our cloud solutions to use it? It could be a good idea.

Programming

az webapp new - Azure CLI extension to create and deploy a .NET Core or nodejs site in one command (Scott Hanselman) - Azure CLI is a super handy tool. New commands are being created, it's time for us developers to get involve read this post and clone that GitHub repo!

Miscellaneous

10 Characteristics of a Successful Internal IT Champion (Richard Seroter) - Awesome post! Next time changes are in the pipeline, you should think about these 10 little things.
Azure Cosmos DB training webinars | Azure Roadmap | Microsoft Azure (Microsoft Team) - Fantastic news, I know a lot of people were waiting for this kind of training.

Pages

Vision AI Is Accessible Now

What You Can Build With This

Demo 1: Caption This — Generate a Prompt from Any Image

Demo 2: Media Library — Compare Vision Models Side by Side

Demo 3: Video2Blog — Turn a Video into a Structured Post

Demo 4: Video Analyzer — Search and Query Your Video Library

Demo 5: Roast My Life — What the Model Actually Sees

Demo 6: N8N Automation — No-Code Video Clipping Pipeline

Watch the Full Talk

All the Code

Programming

AI

Miscellaneous

Prerequisites

Clone the vLLM Reka Plugin

Download the Reka Edge Model

Start the Service

Connect the Media Library App

Try It: Image Description and Object Detection

Switch to the Reka Cloud API

What You Can Build

References

What You'll Need

Step 1: Get the Model and Inference Code

Step 2: Fetch the Large Files

Step 3: Ask the Model About an Image or Video

What's Actually Happening?

Here the video

That's Pretty Cool, Right?

The Vision (Pun Intended)

See it in action

Part 1: Getting Started Environment Setup

The Backend: Flask Routing and API Integration

The Magic: Sending Roast Requests to Reka Vision API

Try It Yourself!

What Makes Reka Vision API So nice to use

Final Thoughts

References and Resources

Cloud

Programming

Miscellaneous

Cloud

Programming

Cloud

Programming

Books

Miscellaneous

Suggestion of the week

Cloud

Programming

Databases

Miscellaneous

Books

Cloud

Programming

Databases

Miscellaneous

Suggestion of the week

Cloud

Programming

Miscellaneous