Sora 2 (and Sora 2 pro) are the latest video-generating AI models from OpenAI. It is capable of generating videos of 4, 8, and 12 seconds. It has 4 resolutions:

720x1280
1280x720
1024x1792 (Pro only)
1792x1024 (Pro only)

This article records my first experience with these models.

Test #1: Simple landscape video

Let's start with a simple 1-sentence prompt - zebra dancing in a pool.

The video is ready for download after about 3 minutes. It has what the prompt requested, and the background music is in sync with the dance move. Ok, that's not too bad.

Test #2: The moderation

How about a video featuring OpenAI's CEO saying something he never said?

Place Sam Altman in a grey t-shirt as the sole onscreen actor, standing at a classroom whiteboard and writing "A for AI" while striking through "Apple" with a thick marker, body angled three-quarters to the camera. 
Have him speak one line in English: "Apple is no longer relevant." (speech present: yes; language: English) and keep the room silent except for his voice

The Answer is NO! This request is not allowed.

{"code" => "moderation_blocked", "message" => "Your request was blocked by our moderation system."}

Test #3: Longer Prompt

Next, I tried creating an 8-second video with a much longer prompt. The prompt describes the actor, the scene and what I want to see in 2 cuts.

Place an athletic orange cat with a small red cape and expressive eyes, streamlined body, flying between glass-and-steel skyscrapers toward a sea port. 
Have the cat bank slightly left, paws forward and cape fluttering, centered in frame as it speeds past windows and ledges in a low-angle tracking shot (0–5 s). 
Then cut to a wide aerial pull-back that reveals the harbor, container cranes, and water as the cat continues toward the port (5–8 s). 
Keep no spoken lines; make ambient city noise and a distant wailing police siren audible, use no background music; light the scene with harsh noon sunlight, strong key/rim highlights and soft fill from reflected glass for an urgent mood.

Pretty impressive! The requirements are all met.

Test #4: Longer Prompt & Longer Video

Next, let's make a 12-second video with 4 cuts. With voice-over and background music.

Show the oil-tank truck (heavy-duty, chrome trim) racing right-to-left along a coastal highway, low-angle tracking with a gentle dolly-in and lens flare (0–3 s). 
Show the dump truck (robust, mud-splattered) driving left-to-right on a dusty construction road, medium-wide pan with slight handheld grit for texture (3–6 s). 
Show the freight truck (long-haul, aerodynamic trailer) overtaking center-frame on an interstate, high-side smooth tracking with subtle rack-focus on the cab (6–9 s). 
Show the crane truck (urban, boom stowed) turning a corner into frame, tight low-angle tracking to emphasize silhouette and hydraulics (9–12 s). 
Include a confident English voiceover: "Built to work, engineered to last." (VO across shots) and energetic corporate-rock music with driving drums, electric guitar and synth pad; use late-afternoon golden-hour locations (highway, construction road, interstate, urban avenue) with warm key sunlight, soft fill and rim highlights on metal to create a powerful, reliable mood.

Test #5: Non-English Prompt

Can we supply our prompt in a language other than English?

展示主角：戴猫咪面具、黑短发、穿整齐办公套装的年轻女性，右手握铁锅，左手提橘色急救箱，穿过一扇厚重门向外走；
镜头从背后跟拍并轻微跟随移动（0–4 s），环境为城市废墟，黄昏，弱主光配冷色补光，烟雾与尘埃营造荒凉氛围。
切到俯瞰镜头由上往下缓慢下降并拉远，看到她从门口走出并停下环顾四周，镜头慢速倾斜并轻微放大（4–8 s）；
保留破碎城市的逆光与rim light，長陰影与塵霧層次。转为主角视角向上看向撕裂的天空并带微幅手持抖動與上移（8–12 s），
讓她低聲說 "Good Bye PJ"（英语，台詞存在，短促、有回声）。
加入背景音樂（存在）：稀薄環境氛圍樂，哀傷合成墊與遙遠鋼琴、低頻打擊，節奏慢，音量在台詞前後微微起伏以烘托感傷。

It works! There are some minor issues. For example, the missing first aid box in the first cut and the incorrect subtitle in the third cut. These mistakes can be improved with better prompts.

Test #6: Edit the video via remix

If the video matches expectation with some minor flaws, you can supply an additional prompt to fix it via the remix endpoint.

https://api.openai.com/v1/videos/{video_id}/remix

Let's first generate a video of a dancing zebra:

The video is showing a zebra turning its back toward the camera while dancing. We want it to turn to face the camera. Let's try with an additional prompt:

face the camera while continuing dancing

And there we go, the main actor and the background remain the same, but the action has changed.

In the remix endpoint, you cannot change the model, size or duration. The only input is the prompt and a new video id will be returned.

# response
{
  "id": "video_2",
  "object": "video",
  "created_at": 1760249237,
  "status": "queued",
  "completed_at": null,
  "error": null,
  "expires_at": null,
  "model": "sora-2",
  "progress": 0,
  "remixed_from_video_id": "video_1",
  "seconds": "4",
  "size": "1280x720"
}

Observation #1: Watermark

Unlike those videos generated via mobile app, there is no visible watermark in the API-generated videos.

Observation #2: Forever in-progress bug?

In some of the tests, the video is never generated and its status is forever showing "in progress" even after hours. This appears to have been reported by several other developers on their community forums, but there is no official statement so far. In addition, the request is billed and my credit is deducted.

{"video_status" =>
  {"id" => "video_68e5...",
   "object" => "video",
   "created_at" => 1759853145,
   "status" => "in_progress",
   "completed_at" => nil,
   "error" => nil,
   "expires_at" => nil,
   "model" => "sora-2",
   "progress" => 10,
   "remixed_from_video_id" => nil,
   "seconds" => "8",
   "size" => "1280x720"}
}

Hope they fix this issue and refund my credit. :)

OpenAI Sora 2 API first experience