AI Vision: Turning Your Videos into Comedy Gold (or Cringe)

I've spent most of my career building software in C# and .NET, and only used Python in IoT projects. When I wanted to build a fun project—an app that uses AI to roast videos, I knew it was the perfect opportunity to finally dig into Python web development.

The question was: where do I start? I hopped into a brainstorming session with Reka's AI chat and asked about options for building web apps in Python. It mentioned Flask, and I remembered friends talking about it being lightweight and perfect for getting started. That sounded right.

In this post, I share how I built "Roast My Life," a Flask app using the Reka Vision API.

The Vision (Pun Intended)

The app needed three core things:

  1. List videos: Show me what videos are in my collection
  2. Upload videos: Let me add new ones via URL
  3. Roast a video: Send a selected video to an AI and get back some hilarious commentary

See it in action

Part 1: Getting Started Environment Setup

The first hurdle was always going to be environment setup. I'm serious about keeping my Python projects isolated, so I did the standard dance:

Before even touching dependencies, I scaffolded a super bare-bones Flask app. Then one thing I enjoy from C# is that all dependencies are brought in one shot, so I like doing the same with my python projects using requirements.txt instead of installing things ad‑hoc (pip install flask then later freezing).

Dropping that file in first means the setup snippet below is deterministic. When you run pip install -r requirements.txt, Flask spins up using the exact versions I tested with, and you won't accidentally grab a breaking major update.

Here's the shell dance that activates the virtual environment and installs everything:

cd roast_my_life/workshop
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Then came the configuration. We will need an API key and I don't want to have it hardoced so I created a .env file to store my API credentials:

API_KEY=YOUR_REKA_API_KEY
BASE_URL=https://vision-agent.api.reka.ai

To get that API key, I visited the Reka Platform and grabbed a free one. Seriously, a free key for playing with AI vision APIs? I was in.

With python app.py, I fired up the Flask development server and opened http://127.0.0.1:5000 in my browser. The UI was there, but... it was dead. Nothing worked.

Perfect. Time to build.

The Backend: Flask Routing and API Integration

Coming from ASP.NET Core's controller-based routing and Blazor, Flask's decorator-based approach felt just like home. All the code code goes in the app.py file, and each route is defined with a simple decorator. But first things first: loading configuration from the .env file using python-dotenv:

from flask import Flask, request, jsonify
import requests
import os
from dotenv import load_dotenv

app = Flask(__name__)

# Load environment variables (like appsettings.json)
load_dotenv()
api_key = os.environ.get('API_KEY')
base_url = os.environ.get('BASE_URL')

All the imports packages are the same ones that needs to be in the requirements.txt. And we retreive the API key and base URL from environment variables, just like in .NET Core.

Now, to be able to get roasted we need first to upload a video to the Reka Vision API. Here's the code—I'll go over some details after.


@app.route('/api/upload_video', methods=['POST'])
def upload_video():
    """Upload a video to Reka Vision API"""
    data = request.get_json() or {}
    video_name = data.get('video_name', '').strip()
    video_url = data.get('video_url', '').strip()
    
    if not video_name or not video_url:
        return jsonify({"error": "Both video_name and video_url are required"}), 400
    
    if not api_key:
        return jsonify({"error": "API key not configured"}), 500
    
    try:
        response = requests.post(
            f"{base_url.rstrip('/')}/videos/upload",
            headers={"X-Api-Key": api_key},
            data={
                'video_name': video_name,
                'index': 'true',  # Required: tells Reka to process the video
                'video_url': video_url
            },
            timeout=30
        )
        
        response_data = response.json() if response.ok else {}
        
        if response.ok:
            video_id = response_data.get('video_id', 'unknown')
            return jsonify({
                "success": True,
                "video_id": video_id,
                "message": "Video uploaded successfully"
            })
        else:
            error_msg = response_data.get('error', f"HTTP {response.status_code}")
            return jsonify({"success": False, "error": error_msg}), response.status_code
            
    except requests.Timeout:
        return jsonify({"success": False, "error": "Request timed out"}), 504
    except Exception as e:
        return jsonify({"success": False, "error": f"Upload failed: {str(e)}"}), 500

Once the information from the frontend is validated we make a POST request to the Reka Vision API's /videos/upload endpoint. The parameters are sent as form data, and we include the API key in the headers for authentication. Here I was using URLs to upload videos, but you can also upload local files by adjusting the request accordingly. As you can see, it's pretty straightforward, and the documentation from Reka made it easy to understand what was needed.

The Magic: Sending Roast Requests to Reka Vision API

Here's where things get interesting. Once a video is uploaded, we can ask the AI to analyze it and generate content. The Reka Vision API supports conversational queries about video content:

def call_reka_vision_qa(video_id: str) -> Dict[str, Any]:
    """Call the Reka Video QA API to generate a roast"""
    
    headers = {'X-Api-Key': api_key} if api_key else {}
    
    payload = {
        "video_id": video_id,
        "messages": [
            {
                "role": "user",
                "content": "Write a funny and gentle roast about the person, or the voice in this video. Reply in markdown format."
            }
        ]
    }
    
    try:
        resp = requests.post(
            f"{base_url}/qa/chat",
            headers=headers,
            json=payload,
            timeout=30
        )
        
        data = resp.json() if resp.ok else {"error": f"HTTP {resp.status_code}"}
        
        if not resp.ok and 'error' not in data:
            data['error'] = f"HTTP {resp.status_code} calling chat endpoint"
        
        return data
        
    except requests.Timeout:
        return {"error": "Request to chat API timed out"}
    except Exception as e:
        return {"error": f"Chat API call failed: {e}"}

Here we pass the video ID and a prompt asking for a "funny and gentle roast." The API responds with AI-generated content, which we can then send back to the frontend for display. I try to give more "freedom" to the AI by asking it to reply in markdown format, which makes the output more engaging.

Try It Yourself!

The complete project is available on GitHub: reka-ai/api-examples-python

What Makes Reka Vision API So nice to use

What really stood out to me was how approachable the Reka Vision API is. You don't need any special SDK—just the requests library making standard HTTP calls. And honestly, it doesn't matter what language you're used to; an HTTP call is pretty much always simple to do. Whether you're coming from .NET, Python, JavaScript, or anything else, you're just sending JSON and getting JSON back.

Authentication is refreshingly straightforward: just pop your API key in the header and you're good to go. No complex SDKs, no multi-step authentication flows, no wrestling with binary data streams. The conversational interface lets you ask questions in natural language, and you get back structured JSON responses with clear fields.

One thing worth noting: in this example, the videos are pre-uploaded and indexed, which means the responses come back fast. But here's the impressive part—the AI actually looks at the video content. It's not just reading a transcript or metadata; it's genuinely analyzing the visual elements. That's what makes the roasts so spot-on and contextual.

Final Thoughts

The Reka Vision API itself deserves credit for making video AI accessible. No complicated SDKs, no multi-GB model downloads, no GPU requirements. Just simple HTTP requests and powerful AI capabilities. I'm not saying I'm switching to Python full-time, but expect to see me sharing more Python projects in the future!

References and Resources