Generated file — do not edit here. Source:JIGGAI/ClawKitchen/docs/MEDIA_GENERATION.mdCommit:91c806e4ff2380a3a4d2e465499c5ce91b15365fEdit: https://github.com/JIGGAI/ClawKitchen/blob/91c806e4ff2380a3a4d2e465499c5ce91b15365f/docs/MEDIA_GENERATION.md
Overview
ClawKitchen provides integrated media generation capabilities through workflow nodes that can create images, videos, and audio content. This system uses skill auto-discovery to work with various media generation providers while maintaining a consistent workflow interface.Media Node Types
media-image
Generates static images from text prompts. Common Use Cases:- Marketing visuals and hero images
- Social media graphics
- Product illustrations
- Blog post featured images
- Presentation slides
- Agent Assignment: Team member who executes the generation
- Image Prompt: Text description of desired image
- Media Type: Set to “image” (auto-configured)
- Provider: Auto-discovered from available skills
media-video
Creates video content from text descriptions. Common Use Cases:- Product demo videos
- Marketing commercials
- Educational content
- Social media video posts
- Explainer animations
- Agent Assignment: Team member who executes the generation
- Video Prompt: Description of desired video content
- Media Type: Set to “video” (auto-configured)
- Provider: Auto-discovered from available skills
media-audio
Generates audio content including voiceovers and music. Common Use Cases:- Podcast content
- Voiceovers for videos
- Background music
- Audio advertisements
- Training narrations
- Agent Assignment: Team member who executes the generation
- Audio Prompt: Description of desired audio content
- Media Type: Set to “audio” (auto-configured)
- Provider: Auto-discovered from available skills
Skill Auto-Discovery System
ClawKitchen automatically detects available media generation capabilities by scanning:- OpenClaw Skills Directory:
~/.openclaw/skills/ - Workspace Skills: Team-specific skills in workspace
- Supported Providers: CellCog, OpenAI, and custom skills
How Auto-Discovery Works
- Skill Scanning: System scans for generation scripts in skill directories
- Capability Detection: Identifies image, video, and audio generation scripts
- Provider Selection: Chooses best available provider for each media type
- Environment Setup: Configures API keys and settings from OpenClaw config
Supported Providers
CellCog Integration
- Multi-modal AI: Support for images, videos, and audio
- High Quality: Production-ready media generation
- Coordination: Multi-agent orchestration for complex content
OpenAI Integration
- DALL-E: Image generation via OpenAI API
- Whisper: Audio processing and generation
- API Integration: Direct OpenAI service integration
Custom Skills
- Extensible: Add custom generation providers via skill system
- Flexible: Support for specialized or proprietary generation tools
- Configurable: Custom parameters and settings per provider
Workflow Integration Patterns
Sequential Generation
Create multiple related media assets in sequence:Parallel Generation
Generate multiple media types simultaneously:Conditional Generation
Generate media based on content analysis:Iterative Refinement
Generate, review, and refine media through multiple cycles:Prompt Engineering for Media
Image Prompts
Effective image prompts include: Visual Style- Art style: “photorealistic”, “minimalist”, “hand-drawn”
- Color scheme: “vibrant colors”, “monochromatic”, “sunset palette”
- Composition: “centered subject”, “rule of thirds”, “close-up portrait”
- Main subject: “professional businesswoman”, “modern laptop”, “city skyline”
- Context: “in office setting”, “on wooden desk”, “during golden hour”
- Mood: “confident and approachable”, “sleek and modern”, “warm and inviting”
- Resolution: “high resolution”, “4K quality”, “print ready”
- Format: “landscape orientation”, “square format”, “vertical social media”
- Lighting: “soft natural lighting”, “dramatic shadows”, “bright and airy”
Video Prompts
Effective video prompts specify: Content Structure- Duration: “30-second video”, “brief 10-second clip”, “2-minute explanation”
- Pacing: “fast-paced montage”, “slow and contemplative”, “energetic presentation”
- Narrative: “product demonstration”, “customer testimonial”, “feature walkthrough”
- Camera work: “smooth camera movements”, “close-up shots”, “wide establishing shots”
- Transitions: “smooth fade transitions”, “quick cuts”, “seamless scene changes”
- Text overlay: “minimal text labels”, “animated titles”, “call-to-action buttons”
- Background music: “upbeat electronic music”, “subtle ambient sounds”, “no background music”
- Voiceover: “professional female narrator”, “energetic male voice”, “conversational tone”
- Sound effects: “subtle interface sounds”, “ambient office noise”, “minimal sound design”
Audio Prompts
Effective audio prompts include: Voice Characteristics- Gender and age: “professional female voice, mid-30s”, “authoritative male narrator”
- Tone: “warm and friendly”, “confident and professional”, “casual and conversational”
- Accent: “neutral American accent”, “slight British accent”, “international English”
- Pace: “moderate speaking pace”, “slightly faster for energy”, “slow and deliberate”
- Emphasis: “emphasize key benefits”, “casual conversational style”, “educational tone”
- Structure: “introduction, main points, call-to-action”, “storytelling format”
- Quality: “studio quality recording”, “podcast-ready audio”, “broadcast standard”
- Format: “mono voice track”, “stereo with ambient sound”, “voice-only no effects”
- Length: “2-minute narration”, “30-second voiceover”, “brief 15-second intro”
Template Variables in Media Nodes
Use template variables to create dynamic media prompts:From Upstream Nodes
Workflow Variables
Dynamic Content
Managing Generated Assets
File Organization
Generated media assets are organized in the workflow run directory:Asset Metadata
Each generated asset includes metadata:- Original Prompt: Text prompt used for generation
- Provider Used: Which skill/provider generated the asset
- Generation Time: When asset was created
- File Information: Size, format, dimensions/duration
- Quality Metrics: Provider-specific quality scores
Asset Delivery
Generated assets can be used in subsequent workflow nodes:- Reference by Path: Use file path in tool nodes
- Template Variables: Insert asset URLs in content
- Approval Workflows: Include assets in approval requests
- Publishing: Automatically attach to social media posts
Troubleshooting Media Generation
Common Issues
No Media Providers Available- Check if media generation skills are installed
- Verify OpenClaw skill directory contains generation scripts
- Confirm API keys are configured in OpenClaw config
- Review skill compatibility with current OpenClaw version
- Increase node timeout settings for complex media
- Check provider service status and rate limits
- Verify network connectivity to generation services
- Consider breaking complex prompts into simpler requests
- Refine prompts with more specific descriptions
- Add technical specifications (resolution, format)
- Include style and mood descriptors
- Test prompts with different providers
- Specify output format and compression in prompts
- Configure provider-specific quality settings
- Consider file size limits for downstream usage
- Implement post-processing compression if needed
Debugging Media Workflows
- Check Provider Availability: Verify generation skills are detected
- Review Prompt Quality: Test prompts manually with providers
- Monitor Resource Usage: Watch for memory/disk constraints
- Validate Configurations: Ensure API keys and settings are correct
- Test Incrementally: Start with simple prompts and build complexity
Optimization Tips
Performance- Use appropriate resolution/quality for intended use
- Cache frequently used prompts and results
- Implement parallel generation for multiple assets
- Consider provider-specific optimization settings
- Monitor API usage and costs across providers
- Implement prompt reuse for similar content
- Use lower-cost providers for draft/preview content
- Set up alerts for unusual usage patterns
- Implement human review steps for critical assets
- Create prompt templates for consistent brand style
- Establish quality guidelines and approval criteria
- Maintain asset libraries for reuse and reference
