Video Generation

The video generation tool lets the agent take a prompt and optional reference media (image / video / audio), render a video in the background, and drop it straight onto a path in the agent or space Drive.

How it works at a glance

Asynchronous. The call returns right away; you can keep chatting while the video renders.
Written to Drive. On success the video lands on the path you chose, and the message bubble shows a preview card.
Cancellable. Queued or running jobs can be stopped from the chat UI.

Enabling the tool

Open Agent Settings → Tools → Video and toggle the group on. If the group isn't visible in your account, video generation isn't available in your current environment — reach out to your admin.

Input parameters

Parameter	Required	Notes
`prompt`	yes	Natural-language description of the video
`file_path`	yes	Output path. Relative paths land on the agent Drive; `/space/...` writes to the space Drive
`attachments`	no	Array of reference media — each entry is a URL, asset ID, Drive path string, or explicit object `{ path? / url?, type?, role?, name?, mimeType? }`
`model`	no	Override the default model
`durationSeconds`	no	4–15 seconds
`aspectRatio`	no	`adaptive / 21:9 / 16:9 / 4:3 / 1:1 / 3:4 / 9:16`
`resolution`	no	`480p / 720p / 1080p` (not every model supports 1080p)
`generateAudio`	no	Defaults to on
`watermark`	no	Defaults to off
`returnLastFrame`	no	Ask for the final frame so you can chain another clip
`webSearch`	no	Allow reference web search for pure text-to-video prompts

`role` values inside `attachments`

role	Media type	Typical use
`reference_image`	image	General reference (style, subject)
`first_frame`	image	First frame of the video
`last_frame`	image	Last frame — pair with `first_frame` for first-to-last-frame mode
`reference_video`	video	Reference clip for edit / extend flows
`reference_audio`	audio	Reference audio for voiceover or background

If you omit role, a sensible default is picked based on the media type.

Output path

Relative path (e.g. videos/demo.mp4) — writes to the agent Drive
Absolute path starting with /space/ — writes to the space Drive
Missing extension is normalized to .mp4
The chat bubble's video card renders a preview directly from this path

Capability matrix

Capability varies by model tier — pick one based on what you need:

Tier	Text→Video	First frame	First+last	Multimodal ref (image/video/audio)	Edit	Extend	Max resolution	Duration
Flagship	✓	✓	✓	Full	✓	✓	1080p	4–15s
Flagship fast	✓	✓	✓	Full	✓	✓	720p	4–15s
Pro	✓	✓	✓	Image only	✗	✗	1080p	4–12s
Lite i2v	✗	✓	✗	Image only	✗	✗	720p	2–12s
Lite t2v	✓	✗	✗	✗	✗	✗	720p	2–12s

Flagship tiers can output audio when generateAudio: true.

Media specs

Images — jpeg / png / webp / bmp / tiff / gif / heic / heif. ≤ 30 MB each, aspect ratio 0.4–2.5, side length 300–6000 px. Counts: 1 for first-frame, 2 for first+last, 1–9 for multimodal reference, 1–4 for lite reference.

Videos — mp4 / mov, H.264 or H.265 + AAC / MP3. 2–15 s each, up to 3 clips, ≤ 15 s total. 480p / 720p / 1080p, 24–60 fps.

Audio — wav / mp3, 2–15 s each, up to 3 clips, ≤ 15 s total, ≤ 15 MB each.

Prompt tips

Formula: subject + action, scene + action, camera + action.

Be concrete. Avoid stacking abstract adjectives.
Front-load the important parts (subject, action, camera).
Iterate on prompt first, then reference media; swap vague terms for specific descriptions.
Text-to-video is high-variance — use it to prospect ideas; use image-to-video when you need a stable look.
When using image-to-video, upload a high-quality first frame; frame quality strongly influences the result.

Aspect ratio and cropping

If aspectRatio differs from the input image, the backend center-crops along the shorter dimension so the crop region is fully inside the original. Keep aspectRatio close to the input image's ratio, or use adaptive to let the backend match it.

Cancelling a job

Queued or running jobs can be stopped directly from the message bubble. The status flips to "cancelled" right away and no more work is done in the background.

Limits

Intermediate state and the temporary video URL are kept for 24 hours and then cleaned up; videos already written to Drive are unaffected.
Reference media containing real human faces is rejected.
Per-account RPM and concurrency limits apply; if you hit a rate limit, the error surfaces in the message bubble.
Generation time depends on length, resolution, and model — usually 30 s to a few minutes. Closing the session pauses status updates; reopening the session resumes them.