Buda LogoBuda

Video Generation

Turn text or reference media into videos with the generate_video tool.

The video generation tool lets the agent take a prompt and optional reference media (image / video / audio), render a video in the background, and drop it straight onto a path in the agent or space Drive.

How it works at a glance

  • Asynchronous. The call returns right away; you can keep chatting while the video renders.
  • Written to Drive. On success the video lands on the path you chose, and the message bubble shows a preview card.
  • Cancellable. Queued or running jobs can be stopped from the chat UI.

Enabling the tool

Open Agent Settings → Tools → Video and toggle the group on. If the group isn't visible in your account, video generation isn't available in your current environment — reach out to your admin.

Input parameters

ParameterRequiredNotes
promptyesNatural-language description of the video
file_pathyesOutput path. Relative paths land on the agent Drive; /space/... writes to the space Drive
attachmentsnoArray of reference media — each entry is a URL, asset ID, Drive path string, or explicit object { path? / url?, type?, role?, name?, mimeType? }
modelnoOverride the default model
durationSecondsno4–15 seconds
aspectRationoadaptive / 21:9 / 16:9 / 4:3 / 1:1 / 3:4 / 9:16
resolutionno480p / 720p / 1080p (not every model supports 1080p)
generateAudionoDefaults to on
watermarknoDefaults to off
returnLastFramenoAsk for the final frame so you can chain another clip
webSearchnoAllow reference web search for pure text-to-video prompts

role values inside attachments

roleMedia typeTypical use
reference_imageimageGeneral reference (style, subject)
first_frameimageFirst frame of the video
last_frameimageLast frame — pair with first_frame for first-to-last-frame mode
reference_videovideoReference clip for edit / extend flows
reference_audioaudioReference audio for voiceover or background

If you omit role, a sensible default is picked based on the media type.

Output path

  • Relative path (e.g. videos/demo.mp4) — writes to the agent Drive
  • Absolute path starting with /space/ — writes to the space Drive
  • Missing extension is normalized to .mp4
  • The chat bubble's video card renders a preview directly from this path

Capability matrix

Capability varies by model tier — pick one based on what you need:

TierText→VideoFirst frameFirst+lastMultimodal ref (image/video/audio)EditExtendMax resolutionDuration
FlagshipFull1080p4–15s
Flagship fastFull720p4–15s
ProImage only1080p4–12s
Lite i2vImage only720p2–12s
Lite t2v720p2–12s

Flagship tiers can output audio when generateAudio: true.

Media specs

Images — jpeg / png / webp / bmp / tiff / gif / heic / heif. ≤ 30 MB each, aspect ratio 0.4–2.5, side length 300–6000 px. Counts: 1 for first-frame, 2 for first+last, 1–9 for multimodal reference, 1–4 for lite reference.

Videos — mp4 / mov, H.264 or H.265 + AAC / MP3. 2–15 s each, up to 3 clips, ≤ 15 s total. 480p / 720p / 1080p, 24–60 fps.

Audio — wav / mp3, 2–15 s each, up to 3 clips, ≤ 15 s total, ≤ 15 MB each.

Prompt tips

Formula: subject + action, scene + action, camera + action.

  • Be concrete. Avoid stacking abstract adjectives.
  • Front-load the important parts (subject, action, camera).
  • Iterate on prompt first, then reference media; swap vague terms for specific descriptions.
  • Text-to-video is high-variance — use it to prospect ideas; use image-to-video when you need a stable look.
  • When using image-to-video, upload a high-quality first frame; frame quality strongly influences the result.

Aspect ratio and cropping

If aspectRatio differs from the input image, the backend center-crops along the shorter dimension so the crop region is fully inside the original. Keep aspectRatio close to the input image's ratio, or use adaptive to let the backend match it.

Cancelling a job

Queued or running jobs can be stopped directly from the message bubble. The status flips to "cancelled" right away and no more work is done in the background.

Limits

  • Intermediate state and the temporary video URL are kept for 24 hours and then cleaned up; videos already written to Drive are unaffected.
  • Reference media containing real human faces is rejected.
  • Per-account RPM and concurrency limits apply; if you hit a rate limit, the error surfaces in the message bubble.
  • Generation time depends on length, resolution, and model — usually 30 s to a few minutes. Closing the session pauses status updates; reopening the session resumes them.

See also

On this page