Media Generation Workflows

Generated file — do not edit here. Source: JIGGAI/ClawKitchen / docs/MEDIA_GENERATION.md Commit: 91c806e4ff2380a3a4d2e465499c5ce91b15365f Edit: https://github.com/JIGGAI/ClawKitchen/blob/91c806e4ff2380a3a4d2e465499c5ce91b15365f/docs/MEDIA_GENERATION.md

Overview

ClawKitchen provides integrated media generation capabilities through workflow nodes that can create images, videos, and audio content. This system uses skill auto-discovery to work with various media generation providers while maintaining a consistent workflow interface.

Media Node Types

media-image

Generates static images from text prompts. Common Use Cases:

Marketing visuals and hero images
Social media graphics
Product illustrations
Blog post featured images
Presentation slides

Configuration:

Agent Assignment: Team member who executes the generation
Image Prompt: Text description of desired image
Media Type: Set to “image” (auto-configured)
Provider: Auto-discovered from available skills

media-video

Creates video content from text descriptions. Common Use Cases:

Product demo videos
Marketing commercials
Educational content
Social media video posts
Explainer animations

Configuration:

Agent Assignment: Team member who executes the generation
Video Prompt: Description of desired video content
Media Type: Set to “video” (auto-configured)
Provider: Auto-discovered from available skills

media-audio

Generates audio content including voiceovers and music. Common Use Cases:

Podcast content
Voiceovers for videos
Background music
Audio advertisements
Training narrations

Configuration:

Agent Assignment: Team member who executes the generation
Audio Prompt: Description of desired audio content
Media Type: Set to “audio” (auto-configured)
Provider: Auto-discovered from available skills

Skill Auto-Discovery System

ClawKitchen automatically detects available media generation capabilities by scanning:

OpenClaw Skills Directory: ~/.openclaw/skills/
Workspace Skills: Team-specific skills in workspace
Supported Providers: CellCog, OpenAI, and custom skills

How Auto-Discovery Works

Skill Scanning: System scans for generation scripts in skill directories
Capability Detection: Identifies image, video, and audio generation scripts
Provider Selection: Chooses best available provider for each media type
Environment Setup: Configures API keys and settings from OpenClaw config

Supported Providers

CellCog Integration

Multi-modal AI: Support for images, videos, and audio
High Quality: Production-ready media generation
Coordination: Multi-agent orchestration for complex content

OpenAI Integration

DALL-E: Image generation via OpenAI API
Whisper: Audio processing and generation
API Integration: Direct OpenAI service integration

Custom Skills

Extensible: Add custom generation providers via skill system
Flexible: Support for specialized or proprietary generation tools
Configurable: Custom parameters and settings per provider

Workflow Integration Patterns

Sequential Generation

Create multiple related media assets in sequence:

Draft Content (LLM) → Generate Image → Generate Video → Approval → Publish

Example Use Case: Blog post with featured image and promotional video

Parallel Generation

Generate multiple media types simultaneously:

                  → Generate Hero Image
Draft Content →   → Generate Social Video  → Combine Assets → Approval  
                  → Generate Audio Clip

Example Use Case: Multi-platform marketing campaign

Conditional Generation

Generate media based on content analysis:

Analyze Content → Route by Type → Generate Appropriate Media → Review → Publish

Example Use Case: Automatic media selection based on content category Generate, review, and refine media through multiple cycles:

Initial Generation → Human Review → Refine Prompt → Re-generate → Final Approval

Example Use Case: High-stakes marketing visuals requiring multiple revisions

Prompt Engineering for Media

Image Prompts

Effective image prompts include: Visual Style

Art style: “photorealistic”, “minimalist”, “hand-drawn”
Color scheme: “vibrant colors”, “monochromatic”, “sunset palette”
Composition: “centered subject”, “rule of thirds”, “close-up portrait”

Subject Matter

Main subject: “professional businesswoman”, “modern laptop”, “city skyline”
Context: “in office setting”, “on wooden desk”, “during golden hour”
Mood: “confident and approachable”, “sleek and modern”, “warm and inviting”

Technical Details

Resolution: “high resolution”, “4K quality”, “print ready”
Format: “landscape orientation”, “square format”, “vertical social media”
Lighting: “soft natural lighting”, “dramatic shadows”, “bright and airy”

Example Effective Prompt:

"A modern, clean illustration of a workflow automation dashboard displayed on a sleek laptop screen. The dashboard shows colorful workflow nodes connected by flowing lines. The setting is a bright, contemporary office with soft natural lighting. Style: minimalist vector art with a technology color palette of blues and greens. High resolution, landscape orientation."

Video Prompts

Effective video prompts specify: Content Structure

Duration: “30-second video”, “brief 10-second clip”, “2-minute explanation”
Pacing: “fast-paced montage”, “slow and contemplative”, “energetic presentation”
Narrative: “product demonstration”, “customer testimonial”, “feature walkthrough”

Visual Elements

Camera work: “smooth camera movements”, “close-up shots”, “wide establishing shots”
Transitions: “smooth fade transitions”, “quick cuts”, “seamless scene changes”
Text overlay: “minimal text labels”, “animated titles”, “call-to-action buttons”

Audio Elements

Background music: “upbeat electronic music”, “subtle ambient sounds”, “no background music”
Voiceover: “professional female narrator”, “energetic male voice”, “conversational tone”
Sound effects: “subtle interface sounds”, “ambient office noise”, “minimal sound design”

Example Effective Prompt:

"A 30-second product demo video showing a user creating a workflow in ClawKitchen. Start with a close-up of hands typing, then zoom out to show the visual editor interface. Highlight drag-and-drop functionality with smooth animations. End with the completed workflow running successfully. Style: clean and professional with subtle motion graphics. Include upbeat background music and minimal text overlays showing key features."

Audio Prompts

Effective audio prompts include: Voice Characteristics

Gender and age: “professional female voice, mid-30s”, “authoritative male narrator”
Tone: “warm and friendly”, “confident and professional”, “casual and conversational”
Accent: “neutral American accent”, “slight British accent”, “international English”

Content Style

Pace: “moderate speaking pace”, “slightly faster for energy”, “slow and deliberate”
Emphasis: “emphasize key benefits”, “casual conversational style”, “educational tone”
Structure: “introduction, main points, call-to-action”, “storytelling format”

Technical Specifications

Quality: “studio quality recording”, “podcast-ready audio”, “broadcast standard”
Format: “mono voice track”, “stereo with ambient sound”, “voice-only no effects”
Length: “2-minute narration”, “30-second voiceover”, “brief 15-second intro”

Example Effective Prompt:

"A 90-second professional voiceover explaining the benefits of workflow automation for small businesses. The narrator should be a confident, approachable female voice with a warm tone. The script should cover time savings, reduced errors, and improved team coordination. Pace should be moderate with natural pauses. Studio quality recording with no background music or effects."

Template Variables in Media Nodes

Use template variables to create dynamic media prompts:

From Upstream Nodes

{
  "id": "generate_post_image",
  "kind": "media-image",
  "input": { "from": ["draft_content"] },
  "action": {
    "image_prompt": "Create a featured image for this blog post: {{draft_content.title}}. Style should match this theme: {{draft_content.category}}",
    "mediaType": "image"
  }
}

Workflow Variables

{
  "id": "generate_branded_video", 
  "kind": "media-video",
  "action": {
    "video_prompt": "Create a {{workflow.brand_style}} video for {{workflow.campaign_name}} featuring {{workflow.target_audience}}",
    "mediaType": "video"
  }
}

Dynamic Content

{
  "id": "personalized_audio",
  "kind": "media-audio", 
  "action": {
    "audio_prompt": "Record a personalized greeting for {{customer.name}} about {{product.name}} with a {{customer.preferred_tone}} tone",
    "mediaType": "audio"
  }
}

Managing Generated Assets

File Organization

Generated media assets are organized in the workflow run directory:

shared-context/workflow-runs/{runId}/
├── run.json
├── node-outputs/
│   ├── 001-generate_image.json
│   └── 002-generate_video.json
└── artifacts/
    ├── hero_image_final.png
    ├── promo_video_v1.mp4
    └── voiceover_final.mp3

Asset Metadata

Each generated asset includes metadata:

Original Prompt: Text prompt used for generation
Provider Used: Which skill/provider generated the asset
Generation Time: When asset was created
File Information: Size, format, dimensions/duration
Quality Metrics: Provider-specific quality scores

Asset Delivery

Generated assets can be used in subsequent workflow nodes:

Reference by Path: Use file path in tool nodes
Template Variables: Insert asset URLs in content
Approval Workflows: Include assets in approval requests
Publishing: Automatically attach to social media posts

Troubleshooting Media Generation

Common Issues

No Media Providers Available

Check if media generation skills are installed
Verify OpenClaw skill directory contains generation scripts
Confirm API keys are configured in OpenClaw config
Review skill compatibility with current OpenClaw version

Generation Timeouts

Increase node timeout settings for complex media
Check provider service status and rate limits
Verify network connectivity to generation services
Consider breaking complex prompts into simpler requests

Poor Quality Output

Refine prompts with more specific descriptions
Add technical specifications (resolution, format)
Include style and mood descriptors
Test prompts with different providers

Large File Sizes

Specify output format and compression in prompts
Configure provider-specific quality settings
Consider file size limits for downstream usage
Implement post-processing compression if needed

Debugging Media Workflows

Check Provider Availability: Verify generation skills are detected
Review Prompt Quality: Test prompts manually with providers
Monitor Resource Usage: Watch for memory/disk constraints
Validate Configurations: Ensure API keys and settings are correct
Test Incrementally: Start with simple prompts and build complexity

Optimization Tips

Performance

Use appropriate resolution/quality for intended use
Cache frequently used prompts and results
Implement parallel generation for multiple assets
Consider provider-specific optimization settings

Cost Management

Monitor API usage and costs across providers
Implement prompt reuse for similar content
Use lower-cost providers for draft/preview content
Set up alerts for unusual usage patterns

Quality Assurance

Implement human review steps for critical assets
Create prompt templates for consistent brand style
Establish quality guidelines and approval criteria
Maintain asset libraries for reuse and reference

Integration with Publishing Workflows

Generate Content → Create Image → Create Video → Schedule Posts → Monitor Performance

Blog Publishing

Write Article → Generate Featured Image → Create Social Assets → Publish → Share

Marketing Campaigns

Campaign Brief → Generate Assets → A/B Test Variants → Select Winners → Launch Campaign

Product Documentation

Feature Description → Generate Screenshots → Create Demo Video → Update Docs → Release

Media generation workflows enable teams to create comprehensive, multi-format content automatically while maintaining quality and brand consistency.

Getting started

ClawRecipes

ClawKitchen

Media Generation Workflows

Overview

Media Node Types

media-image

media-video

media-audio

Skill Auto-Discovery System

How Auto-Discovery Works

Supported Providers

CellCog Integration

OpenAI Integration

Custom Skills

Workflow Integration Patterns

Sequential Generation

Parallel Generation

Conditional Generation

Iterative Refinement

Prompt Engineering for Media

Image Prompts

Video Prompts

Audio Prompts

Template Variables in Media Nodes

From Upstream Nodes

Workflow Variables

Dynamic Content

Managing Generated Assets

File Organization

Asset Metadata

Asset Delivery

Troubleshooting Media Generation

Common Issues

Debugging Media Workflows

Optimization Tips

Integration with Publishing Workflows

Blog Publishing

Marketing Campaigns

Product Documentation

Getting started

ClawRecipes

ClawKitchen

​Overview

​Media Node Types

​media-image

​media-video

​media-audio

​Skill Auto-Discovery System

​How Auto-Discovery Works

​Supported Providers

​CellCog Integration

​OpenAI Integration

​Custom Skills

​Workflow Integration Patterns

​Sequential Generation

​Parallel Generation

​Conditional Generation

​Iterative Refinement

​Prompt Engineering for Media

​Image Prompts

​Video Prompts

​Audio Prompts

​Template Variables in Media Nodes

​From Upstream Nodes

​Workflow Variables

​Dynamic Content

​Managing Generated Assets

​File Organization

​Asset Metadata

​Asset Delivery

​Troubleshooting Media Generation

​Common Issues

​Debugging Media Workflows

​Optimization Tips

​Integration with Publishing Workflows

​Social Media Publishing

​Blog Publishing

​Marketing Campaigns

​Product Documentation

Overview

Media Node Types

media-image

media-video

media-audio

Skill Auto-Discovery System

How Auto-Discovery Works

Supported Providers

CellCog Integration

OpenAI Integration

Custom Skills

Workflow Integration Patterns

Sequential Generation

Parallel Generation

Conditional Generation

Iterative Refinement

Prompt Engineering for Media

Image Prompts

Video Prompts

Audio Prompts

Template Variables in Media Nodes

From Upstream Nodes

Workflow Variables

Dynamic Content

Managing Generated Assets

File Organization

Asset Metadata

Asset Delivery

Troubleshooting Media Generation

Common Issues

Debugging Media Workflows

Optimization Tips

Integration with Publishing Workflows

Social Media Publishing

Blog Publishing

Marketing Campaigns

Product Documentation