Today I posted another AI-generated film. (Two full minutes this time!) It’s paired with with a soundtrack that I made in my little home studio.
As mentioned in my last post — to me, the music is the most satisfying part of the work. My little home studio is the real playground. I’m not really a guitarist, and on the bass and keys I’m straight faking it, but it’s great fun chasing grooves and melodies.
But the AI stuff is what’s new, and to anyone curious about what the process looks like, I offer this brief account.
I’m still just exploring the potential of the medium. What I’ve made so far evokes the fever dream of a stoned teenager. But that’s not surprising: it’s a rush, cranking out the wild hallucinations that proceed from the prompts and the models.
For anyone thinking of trying it out, most of what I learned in the beginning was with the generous help of the website Stable Diffusion Art.
You can do all of this locally on certain PC’s with powerful graphics cards, but for the rest of us you only need to connect to Google Colab servers for your processing power, and then launch the Automatic 1111 Web-UI:
(Again, this is all explained in full at Stable Diffusion Art)
Once you’re connected to the A1111 interface, you load a model, aka checkpoint. Above, I’ve chosen one named “Deliberate.” A checkpoint is an image generator that was built from (or trained on) thousands or millions of other images.
The place to find checkpoints (and much else) is Civitai:
Back to A1111. You can generate single images using text prompts (txt2img), resize images (including video frames) using img2img, and a lot more. You can also make videos using Deforum, which is the active tab in the A1111 screen shot above.
Looking closer… you can also see the “Prompts” secondary tab within Deforum.
(In others, such as Run, Keyframes, Output — you specify the frame size, frame rate, total number of frames, the motion of the camera, the “cadence” of brand new frames vs filler frames, etc.)
The first set of prompts above — skeleton trees, etc — is keyed to certain frames. Deforum uses those as signposts and generates smooth transitions between them. Then there is a place to add a second set of prompts — both positive and negative — that apply to every frame equally:
This is where you can get into the weird lexicon of prompt engineering. Helpful samples, with prompts included, are found at sites such as Playground AI — itself a place where you can experiment with your own text-to-image generation.
So… that is the briefest description of the tools and the process. It takes a bit of figurin’ but not too much, if you make the effort. What’s cool is the way the tools train you, through trial and error, to achieve results that you are happy with.
Notice I didn’t say “the results that you want” — because to a considerable extent, what you get when you click the big orange “Generate” button is unpredictable. But there is definitely an art to getting past the threshold of plain ugliness and into some wild, uncanny, and beautiful pictures.
(Some previous efforts here)