Advanced configuration for the dual-stream engine.
Thinking Tokens
MotionLab operates differently from standard diffusion models. Before generation begins, our engine utilizes a proprietary "Thinking Token" architecture.
The Implication: You do not need to repeat yourself or "spam" the prompt with tags. State your intent clearly in natural language. The model "ponders" the relationship between elements to ensure the audio matches the visual physics.
Cinematic Structure
The engine is trained to understand cinematic structure. For optimal results, write your prompt as a screenplay scene using the Present Tense.
[Context/Setting] + [Subject Action] + [Audio Atmosphere] + [Dialogue]
"Cool robot, fighting, explosions, loud noise, HD."
"INT. HANGAR. A rusted battle-droid repairs its own arm. Sparks fly with a metallic hissing sound. The droid looks up and whispers: 'System critical.'"
Audio-Visual Sync
We utilize an Asymmetric Dual-Stream Architecture (14B Video / 5B Audio). This means audio is not an afterthought; it is generated simultaneously and shares timing with the video.
To trigger Lip-Sync, place text inside "quotation marks". Describe the emotion of the voice before the quote (e.g., 'Screaming in terror: "Watch out!"').
Camera
Do not describe camera movements in the text prompt.
VJ & Animation
MotionLab is specialized for continuous motion.
Use keywords like "Seamless loop," "Cyclic motion," and "Infinite flow."
Keep the background description minimal or solid if you intend to use the Extraction Module later for transparent overlays.
Negative Prompt
To ensure structural integrity, instruct the engine on what to reject.