OpenAI’s Sora video generation tool produces outputs so good you might question reality—but most of what it generates is so unnatural you can spot it easily. So, if you want better Sora AI videos, try these Sora prompting tips and techniques for an instant improvement.
3
Focus on Style and Aesthetic
Sora is one of many AI text-to-video tools, but it’s coming from the biggest player in the AI market, OpenAI.
Sora seems to understand the “vibe” of your prompts better than highly specific details. I wrote a fairly specific prompt to get a dramatic shot of a cowboy, “Show a dramatic cowboy giving a flirtatious smirk while lifting up his boots,” and the smirk and boot lifting never materialized, but the shot was dramatic and clean.
Likewise, I gave another prompt with a specific stylistic direction: “In the style of a found footage horror film, show a cute kitten approaching the viewer in a dark alley.” This prompt captured the aesthetic of a found footage horror film, but the kitten in question was walking away from the camera, and its head kept turning around to look at the camera. The cat’s movements were also quite unnatural.
Last, I attempted a more silly prompt of something that cannot happen in real life: “T-rex walking through the Shibuya Scramble crosswalk in Tokyo.” I’ve spent a considerable amount of time in this specific spot, and this prompt’s output has a few flaws. For example, it captures the “vibe” of Shibuya with incorrect aesthetics, and the T-rex is cartoonish and stationary.
Even when requesting that the T-rex look more photorealistic and actually walk, it remained stationary, and Shibuya remained in the uncanny valley.
2
Limit Complexity for Cleaner Results
Evidently, Sora and other AI video tools (some of which are free) don’t handle complex motion or action well. You may have seen curated, seemingly-high-quality, AI-generated videos making waves across social media, such as the “Egypt 3099” video made using Kling AI. In almost all of these cases, while the aesthetics are generally clean and impressive, motion and complexity are usually extremely limited.
The “Egypt 3099” example is particularly impressive, but only because, whether intentionally or not, any prompting from the creator limited complex motion and interactions. Consequently, any complexity in your prompt should be devoted to describing the aesthetic style of your Sora video, and motion, actions, and object interaction should be kept simple.
1
Sora Doesn’t Handle Object Interactions Well
One area of complexity that Sora handles particularly poorly is object interaction. The physics of just about any moving object in a Sora video looks highly unnatural, albeit comedic. I prompted the classic “Will Smith Eating Spaghetti” prompt, and in most cases, object interactions are highly unnatural. In the Will Smith example, the person doesn’t resemble Will Smith at all, and the spaghetti apparently gets absorbed into the fork.
I made several attempts to engineer something moving while looking passably good. Working within the limitations of focusing on style and limiting action complexity, I finally received a decent output. The prompt was “Create a dramatic, wide, panning shot from a distance of a knight riding a horse through a medieval countryside at sunset.” Other than the horse’s galloping looking a bit unnatural, the results are impressive.
Of what I prompted with Sora, my best output was the cowboy example. Currently, AI video tools like Sora produce low-quality, unnatural output without very specific prompting. And even with skilled prompt engineering, you must work within relatively strict limitations to make anything look more natural. Nevertheless, AI text-to-video is improving exponentially, and these videos will probably be indistinguishable from real video in a few years—a reality that comes with both excitement and fear for many.