Back to projects

Automated video cropping platform

AIClipper.video

An automated serverless SaaS platform converting long-form YouTube videos into viral vertical Shorts, optimized for mixed-language regional audiences.

Outcome

Delivered a fully automated media rendering pipeline that transcribes, isolates viral hooks, and crops landscape videos into portrait formats.

Problem

Content creators spend hours manually transcription-cropping videos for vertical formats, struggle with local slang transcriptions, and face high compute bills.

Approach

Decoupled video processing into a serverless orchestrator using AWS Step Functions, dividing audio extraction, transcription, AI curation, and rendering into individual steps.

Architecture

A pnpm monorepo containing a Next.js 15 client authenticated by AWS Cognito, and a backend built on Serverless Framework with AWS Lambda, Step Functions, DynamoDB, Python, Groq Whisper API, DeepSeek curation model, and FFmpeg.

Result

Reduced video editing workflow time from 30 minutes to under 2 minutes per clip.

Lessons learned

Decoupling compute-heavy FFmpeg tasks into independent AWS Step Functions prevents monolithic Lambda timeout errors and allows targeted resource scaling.

Constraints

  • Serverless execution limits on Lambda (max 15 mins for heavy processing).
  • High accuracy transcription of mixed Indonesian-English-Slang expressions.
  • Interactive subtitle synchronization.

Technical decisions

  • • Used Groq Whisper API for lightning-fast word-level timestamped transcriptions.
  • • Selected DeepSeek via BytePlus/ModelArk to analyze transcript timelines and identify viral segments.
  • • Used Python (yt-dlp) and FFmpeg to segment and crop the original video stream efficiently.

Key features

  • • AI-driven viral hook segment detection.
  • • Word-level timestamped dynamic subtitles.
  • • Fully serverless orchestration pipeline.