AI Slop — Daryl Anselmo

PROCESS

AI Slop began as an experimental process, with many techniques tested along the way. By the end of the series, a structured workflow had emerged: generate images, convert them into video, generate the music, edit, then polish and post.

Step 1: Image Generation

The foundation of the process is generating images, to set the visual tone for each reel. The goal of this step is to create a set of images that can be animated, edited and refined into a cohesive video.

At the start of AI Slop, SDXL was used for direct injection of images into AnimateDiff workflows in ComfyUI. As the project evolved, Midjourney became the primary tool for its speed and ability to refine personal aesthetics. Flux was released, and briefly explored mid-project but wasn't fully integrated.

Midjourney is one of the most accessible image generators, but it also offers a surprisingly deep suite of tools for power users. One of the most powerful features is aesthetic personalization, which allows users to refine output to align with their artistic taste. This feature activates after providing enough image rankings, gradually tilting results toward personal preferences. After ranking hundreds of images, outputs during AI Slop evolved into a distinct retro-dystopian aesthetic.

Fig 2.1: Default Midjourney outputs

Fig 2.2: The same outputs after aesthetic personalization

An aspect ratio of 9:16 is optimal for Instagram reels. For any given reel, about 40-50 images were generated. Zoomed out variants on several images were also generated, to allow for zoom-cuts in the edit.

Fig 3.1: Base image zoomed out by a factor of 1.5x twice

Fig 3.2: Zoomed-out variants offer flexibility in post-production

Step 2: Video Generation

With a full set of images prepared, the next step was to transform them into video clips.

AI Slop initially used AnimateDiff running locally in ComfyUI before migrating to commercially available img2vid tools including Luma Dream Machine, Runway Gen-3, and KlingAI.

KlingAI emerged as the main tool, delivering consistent quality with the ability to generate multiple concurrent clips. Luma Dream Machine provided quick generation but (at the time) sometimes produced static camera movements rather than depth-aware scenes with animated characters. Despite its capabilities, Runway Gen-3 wasn't incorporated into any final reels due to its lack of support for portrait orientation.

For each reel, between 30-50 video clips (5 seconds each) were generated, providing sufficient material for editing.

Fig 4: For 'coney island', 42 video clips were generated using Kling v1.5

Step 3: Music Generation

With all visual elements prepared, music became the next step in the process. Early reels incorporated tracks licensed from Instagram's catalog, but throughout the project, AI audio generation tools became increasingly core to the workflow. Two primary tools emerged as essential: Suno and Udio.

For each reel, the process typically involved generating around 10, but sometimes up to 30 different musical tracks, eventually shortlisting 2-5 of the best options for the final edit. This approach provided enough variety to find the right sonic match for each visual sequence while maintaining efficiency.

Bleeding Tooth Fungus, with Baroque (August 22)
In-N-Out vs Shake Shack (September 18)
Coney Island (September 12)

Fig 5.1: Signature Tracks from Suno (Click to Play)

Voidpunk Luminar (September 13)
A Guacalypse (September 14)
Macaroni Business (September 21)

Fig 5.2: Signature Tracks from Udio (Click to Play)

Importantly, all music generated for AI Slop was entirely instrumental. This deliberate choice eliminated the complexity that vocals would have introduced to a workflow optimized for speed and consistency.

Step 4: The Edit

With all audiovisual elements prepared, editing brings everything together into the final product. The editing process was done "the old fashioned way", the only part of the workflow not using automation or GenAI.

The edit begins with importing all generated video clips and shortlisted music tracks into Adobe Premiere Pro using a standard 1080x1920 template optimized for Instagram Reels at 60fps.

Fig 6: For 'coney island', 3:35 of footage was cut down to 24 seconds

First comes a series of "listening sessions" where the video clips are played alongside each shortlisted music track to identify which audio and visuals complement each other. After listening to the various options, one music track emerges as the foundation for the final edit.

Once the track is selected, the sound edit takes priority. This involves identifying and extracting the most compelling section of music, typically two four-bar loops or a 20-30 second segment that pairs with the visuals. The music creates the backbone for the entire edit, with cuts placed on musical beats, typically at the start of bars or half-bars to maintain rhythm.

Every video clip is watched, then sorted based on visual interest. High-interest clips with the strongest visual impact bookend the loop at the start or end of the sequence, medium-interest clips make up the body of the edit, and the rest are used in transitions or are cut entirely.

The AI video generators tended to generate unnaturally slow or wandering camera movements, so each clip was individually reviewed and typically sped up to 1.5-2x its original speed to create the illusion of more natural movement.

The edit was then continually iterated, testing each cut for flow, pacing and implied narrative. Subtle motion effects were added through the addition of Ken Burns style zooms, and cuts to match with audio half-measures. Color grading and image enhancement finished the edit. Sharpen, Unsharp Mask and Noise effects were added to every clip, and Lumetri in Premiere was used to color grade and match the shots prior to posting.

Step 5: Final Output

With the edit complete, the final step was to prep output for social media. Reels were exported at 1080x1920 resolution, 60fps using H264 format with AAC audio and 2-Pass Variable Bit Rate encoding at maximum quality.

Final enhancement was done in Topaz Video AI, applying both frame interpolation and image enhancement models. This process added another layer of polish to improve fidelity and fluidity.

Fig 7: original 30fps on the left, enhanced 60fps with frame interpolation and enhancement on the right. from 'artisanal intelligence' - September 17, 2024

This process took about 10 minutes to compute locally on a 4090-equipped workstation, or about 3 hours on a MacBook Pro while on the road. Once completed, the reel was ready to post.

DISCOVERIES

This project aimed to build familiarity with cutting-edge audio and video AI tools, strengthen editing skills in non-narrative media, and explore darker, surreal themes beyond the artist's typical aesthetic.

AnimateDiff

Early reels used AnimateDiff in ComfyUI. At the time, AnimateDiff stood out as one of the most powerful options for local video generation, particularly suited for its mesmerizing, short-form loops. While moderate technical setup was needed, and long clips posed challenges, AnimateDiff provided complete local control over the generation process. `Ipiv's Morph` img2vid workflow is recommended as a starting point for newcomers.

Fig 8.1: natural pools - July 3, 2024

Fig 8.2: orange gardens - July 4, 2024

Fig 8.3: ghostly baroque - July 11, 2024

Fig 8.4: tonal microshifts - July 23, 2024

Video Enhancement

Earlier pieces relied on Krea's video enhancement function, which featured strength and resemblance sliders, upscaling and interpolation settings, prompt input (auto-generated by default) and presets for "cinematic," "animation," and "render." However, subtle visual shifts occurred every ~1.3 seconds during static shots, a processing artifact evident in 'tonal microshifts' above.

Midway through the project, Topaz Video AI became central to the workflow. Extensive testing revealed optimal settings: Rhea for upscaling and detail enhancement combined with Chronos for frame interpolation at 60fps. Final enhancements included subtle film grain and sharpening.

While similar results could be achieved through other methods (such as FlowFrames, or integrating RIFE, GIMM-VFI, or upscaling nodes into ComfyUI), Topaz offered a streamlined and efficient process suitable for production.

Flux

Flux by Black Forest Labs was released mid-project in August 2024. Initial tests produced promising results in 'pueblo palate cleanser' (which also combined Flux with KlingAI for the first time). Additional experiments in 'roman thermae' and 'flux landscape tests, with water features' further demonstrated its potential. While Flux wasn't fully integrated into AI Slop due to its mid-production release, early explorations yielded promising results that warrant further investigation in future projects.

Fig 9.1: pueblo palate cleanser - August 10, 2024

Fig 9.2: roman thermae - August 23, 2024

Fig 9.3: flux landscape tests, with water features - August 27, 2024

Fig 9.4: romanesque ghosts - September 1, 2024

Arcana

Exclusive access to Arcana AI in late August provided an opportunity to experiment with its suite of image models and artist tools. Three key pieces: 'plague lab', 'l'oeuf baroque', and 'santa cruz forest' showcased Arcana's capabilities, producing results closely aligned with the project's overall creative direction.

Fig 10.1: plague lab - August 30, 2024

Fig 10.2: l'oeuf baroque - September 6, 2024

Fig 10.3: santa cruz forest - September 7, 2024

The Sound of Slop

Music became an increasingly central element of AI Slop, featuring an exploration of over 80 AI-generated tracks produced with Suno and Udio. The project's audio approach evolved in two distinct phases. July to mid-August primarily used Suno for textural, atmospheric compositions, before shifting in mid-August through September towards Udio for more refined pieces resembling traditionally composed music. This progression reflected a growing preference for Udio's framework, though both platforms were important to the project.

Suno (v3 at time of writing) excelled with instrument-based prompts. Terms like "xylophone," "claves," "didgeridoo," "sitar," and "synth" produced results true to their real-world counterparts. Suno's near-instantaneous generation allowed for rapid iteration and experimentation.

In contrast, Udio demonstrated strength in genre-based prompting. It excelled with prompts like "orchestral trailer," "dark synthwave," "acid," and "horror," while responding exceptionally well to emotional descriptors such as "eery," "melancholic," and "epic." Though slower than Suno, requiring full track generation before playback, Udio's exceptional quality justified additional processing time.

While focused primarily on music generation, limited experimentation with sound design occurred through ElevenLabs, as heard in 'Drip Gurgles.' Though minimal in this project, these sound effects and vocal capabilities represent a promising area for future exploration.

A Pivot to the Dark Side

AI Slop marked a deliberate shift from the landscapes, architectural compositions, and interior designs of previous projects toward darker aesthetics. This transformation emerged organically from creative interest while recognizing that "the medium is the message" - social media algorithms combined with AI video is particularly effective for the unsettling and surreal.

Four key pieces document this evolution: 'pastel synesthesia' introduced unconvential audio elements; 'unsettling azulejo' incorporated darker notes while bridging technical workflows; 'dark greek revival party' committed more fully to darker visual language; and finally, 'gothic campaign' completed the transformation with an unapologetic embrace of unsettling aesthetics, prioritizing authentic creative expression above audience retention.

Fig 11.1: pastel synesthesia - July 15, 2024

Fig 11.2: unsettling azulejo - July 16, 2024

Fig 11.3: dark greek revival party - July 21, 2024

Fig 11.4: gothic campaign - August 9, 2024

Food Trauma

Food-related themes developed into the deliberate thematic series 'food trauma', an continuing exploration of surreal and unsettling food visuals. AI Slop features the first 14 works in the 'food trauma' series, including notable entries like 'california donut people', 'a portal to the pizza dimension', 'after hours waffle house', and 'coney island.'

'food trauma' uses food imagery as a vehicle to explore personal struggles, transforming comfort into discomfort.

Fig 12: Examples from the 'food trauma' series

Intelligence

'feel the agi' marked the first major viral success of the project and introduced the 'intelligence' series featuring bots with exposed brains. This visual motif became a recurring signature, appearing in key works such as 'ai powered dating concierges', 'bot congress', and 'emotional intelligence'.

'intelligence' uses the brain metaphor to critique aspects of ai culture in a dystopian context.

Fig 12: Examples from the 'intelligence' series

Wildlife

Another significant theme emerged in the 'wildlife' series, featuring surreal and unsettling human-animal subjects. Pieces such as 'lovecraftian lighthouse', 'frog house tour', 'kingdom of jellyfish', and 'scarab encampment' tap into our primal fascination with animal transformation and instincts confronting these aspects of human nature.

Fig 14: Examples from the 'wildlife' series

Capstone

The project culminated with the capstone reel 'ai slop,' bringing together the visual aesthetic, thematic elements and technical process developed throughout the series. The final piece serves as both summation and statement, using the refined workflow and darker visual language established by the preceding 90 reels.

Fig 15: Stills from the capstone piece: ai slop - September 28, 2024