SEMIAUTOMATIC
by DARYL ANSELMO
— JUN 2025
ABSTRACT
As genAI tools proliferate, challenges emerge when integrating new models into established workflows. A steady flow of new tools is released constantly, each with different requirements and approaches. Every genAI project must now continuously confront how these new pieces might fit together within their own framework, which requires having a systematic process and a clear point of view.
Semiautomatic represents Anselmo's 14th consecutive deep dive: 91 days of automation and API integration to reveal how building pipelines forces explicit creative decisions. What started as small experiments evolved into a complete genAI lab capable of transforming minimal user input into finished video files through end-to-end automation. It raised fundamental questions: Is automation merely about efficiency gains, or does systematic integration challenge the creative process itself? Beyond simple optimization, can automation lead to new innovations and creative expansion?
The project ultimately produced a suite of core automation tools, each codifying existing knowledge as a discrete step that can be chained together into a usable production pipeline. Building this system demanded critical evaluations: Which steps truly matter versus those driven by habit? As new models emerge constantly, how can systems be future-proofed for integration? The result provides a clear baseline for evaluating whether a new tool should be evaluated at all, shifting from reactive adoption to strategic integration.
From image captioning through custom model training to media generation and post-production processing, Semiautomatic demonstrates how systematic automation becomes a mechanism more about clarification and creative expansion, rather than simple workflow optimization.
PROCESS
getcaptions
Automatically generates captions from image datasets for use in model training. Eliminates the manual process of writing descriptive text for training data, enabling rapid dataset preparation for custom model development.
Fig 2: getcaptions generates machine-captioned training data for model development from a folder of images
ostris_trainer
Sends captioned datasets to ostris-ai-toolkit (replicate API), returns custom LoRA models in .safetensors format. Together with getcaptions, transforms a single folder of images into personalized models for immediate deployment, eliminating manual training configuration and file management.
Fig 3: ostris_trainer automates the complete training pipeline from captioned data to production-ready LoRA models
augmentprompt
Automatically transforms simple user inputs into detailed, optimized prompts for different models and use cases. Supports image, video and audio modalities, eliminating manual prompt engineering across platforms.
Fig 4: augmentprompt expands minimal input into optimized prompts tailored for specific AI models and modalities
viewcomfy_api
Sends prompts to a custom ComfyUI deployment running personalized workflows and LoRA models. This creates a dedicated image generation service tailored to specific creative requirements, accessible through CLI. Rather than being limited to standard AI platforms, the system provides programmatic access to custom-trained models and workflows for consistent, personalized image generation without manual ComfyUI interaction.
Fig 5: viewcomfy_api provides CLI access to a custom image generation service running personalized ComfyUI workflows and LoRA models
magnific_upscale
Sends images to Magnific via Freepik API, returns enhanced images. Integrates seamlessly into automated workflows with concurrency support, configurable parameters. Eliminates manual image enhancement steps.
Fig 6: magnific_upscale enhances images through Magnific, with concurrent batch operations
getvideo
Sends image and prompt to video models via fal API, returns video files. Supports Kling1.5-2.1, Hailuo 2.0, and Seedance with configurable parameters including duration, end-frames, and loops. Features concurrent worker support for batch processing.
Fig 7: getvideo generates video content from images and prompts across multiple AI video platforms
getaudio
Sends video files and prompts to MMAudio v2 model via fal API, returns videos with synchronized generated audio. This handles the complexity of video file transmission and processing, creating contextually appropriate sound effects that match video content. Integrates directly with the video pipeline to eliminate manual audio editing and synchronization. Supports concurrent requests.
autopost
Automatically applies post-production to local video files including color-correction, sharpening, and vignette via ffmpeg; upscaling/interpolation via Topaz (CLI or API); high-quality analog film grain (configurable); and optimal file compression for delivery.
Fig 9: autopost applies multi-stage enhancement including color correction, upscaling, interpolation, film grain, and file compression.
DISCOVERIES
Building individual automation tools revealed how they could work together systematically. Two main workflows emerged that demonstrate how chaining these tools creates capabilities greater than the sum of their parts.
full_clip_train
The training pipeline combines getcaptions and ostris_trainer to transform raw image folders into custom LoRA models without manual intervention. This reduces traditional bottlenecks of dataset preparation and model training, enabling rapid experimentation with personalized models.
Images go in, trained models come out. The system handles captioning, dataset organization, and training pipeline management automatically, producing .safetensors files ready for immediate testing and iteration in generation workflows.
Fig 10: full_clip_train produces a custom LoRA model from a folder of raw images
full_clip_generate
The generation pipeline chains multiple tools to create complete video content from minimal input: getprompts transforms simple text into optimized prompts, viewcomfy_api generates images using custom models, magnific_upscale enhances quality, getvideo creates animation, getaudio adds synchronized sound, and autopost applies post-production and packages media for final delivery.
Text goes in, finished videos come out. This comprehensive pipeline transforms simple user inputs into polished video pieces, incorporating custom-trained models, professional enhancement, synchronized audio, and polished post-production effects.
Fig 11: full_clip_generate completes the automation chain from simple concept to finished video
Integration Architecture
The system shows how individual tools work together as a complete genAI lab. Custom models from the training workflow integrate directly into the generation workflow, creating personalized creative capabilities.
Each module includes configurable parameters for different creative needs (quality versus speed, style preferences, output formats), making the system flexible rather than rigid. This flexibility extends throughout the entire pipeline, allowing creative control while maintaining the benefits of automation.
Connecting multiple platforms (fal, replicate, Freepik, Topaz, viewComfy) through APIs demonstrates that diverse AI services can be orchestrated into unified workflows. This multi-platform approach emerged from a key discovery: the best models aren't always available through APIs, and different services have different strengths and limitations. Rather than being stuck with one platform, the system uses each service for its strengths.
Performance and Scalability
Concurrency in several tools (magnific_upscale, getvideo, etc.) enables batch operations and faster output. The command-line backend supports both interactive and programmatic use.
Configurable parameters throughout the pipeline allow for quality versus speed trade-offs, making the system adaptable to different project requirements and available compute.
Creative Impact
The complete genAI lab approach shows how systematic automation eliminates traditional production bottlenecks. From concept to finished video, the pipeline reduces tedious manual work while maintaining creative control through smart, adjustable settings.
A key insight emerged around prioritizing rapid development over perfected individual tools. This approach enabled faster end-to-end creative experimentation, revealing that speed expands the creative process. The ability to quickly test ideas, iterate on approaches, and explore creative directions proved more valuable than pursuing production-perfect individual components. Rather than eliminating creative work, removing production friction unlocks latent creativity, making previously impossible projects feasible, and enabling exploration of new ideas that would have been too resource-intensive to pursue manually.
This approach also enables personalized outputs for different projects, clients, or audiences. By training custom models and managing custom prompts for specific requirements, the same automation pipeline can produce distinctly different output depending on the context and intended audience, opening up new creative opportunities.
Semiautomatic suggests a shift from tool-by-tool AI adoption toward more comprehensive "pipeline thinking", suggesting a new paradigm for media production that unlocks creative possibilities and offers a path to more personalized, custom media.
BIO
Daryl Anselmo is a Canadian-American artist, director, advisor, and founder. He is the co-creator of the original NBA Street and Def Jam franchises for Electronic Arts, was the Art/Creative Director for FarmVille 2 at Zynga, and served for many years as a Director of Art for The Walt Disney Company.
Now an artist and proponent for the creative use of AI-based workflows, Daryl has lectured at numerous institutions including Stanford University, Siggraph, UC Berkeley, and Google. His work was showcased on the Main Stage at Ted 2023.
Currently splitting his time between San Francisco and Vancouver, Daryl is obsessed with technology and writes his own code. He is currently deepening his art practice and providing consulting and creative services for select clients.
INFO
- 91 days of research and development, documented daily between March 30 and June 28, 2025
- 8 core automation tools built in python: getcaptions, ostris_trainer, getprompts, viewcomfy_api, magnific_upscale, getvideo, getaudio, autopost
- Platforms integrated: fal, Replicate, Freepik (Magnific), Topaz, viewComfy
- Complete genAI lab: full_clip pipeline with both training and generation workflows
- Note: This is not open source (yet) and comes without support, however, if you are an interested builder, feel free to contact the author to discuss potential collaborations and/or access to these tools.