Seamlessly Integrate Actors into AI-generated environments [ADVANCED]
Seamlessly Integrate Actors into AI-generated environments [ADVANCED]
[FULL GUIDE]
᛫
Aug 29, 2025
᛫
by Mickmumpitz

This powerful workflow enables you to seamlessly integrate real actors into AI-generated environments with professional results. Using Wan 2.1 models and camera tracking, you can create stunning composites where your actors appear naturally within computer-generated backgrounds. Whether you're a filmmaker, content creator, or visual effects enthusiast, this comprehensive guide will walk you through the entire process from preparing your footage to fine-tuning the final output.
ADVANCED WORKFLOW GUIDE
ATTENTION! This is the guide for the advanced workflow. While everyone can read this guide, only Patreon supporters can download the actual workflow file. If you want to support the creation of these workflows, guides, and YouTube videos, please consider becoming a patron!
The advanced workflow improves upon the Free SMPL Workflow by creating iterative batch-by-batch video generation groups, allowing us to overcome the 81-frame generation limit of Wan 2.1.
This powerful workflow enables you to seamlessly integrate real actors into AI-generated environments with professional results. Using Wan 2.1 models and camera tracking, you can create stunning composites where your actors appear naturally within computer-generated backgrounds. Whether you're a filmmaker, content creator, or visual effects enthusiast, this comprehensive guide will walk you through the entire process from preparing your footage to fine-tuning the final output.
ADVANCED WORKFLOW GUIDE
ATTENTION! This is the guide for the advanced workflow. While everyone can read this guide, only Patreon supporters can download the actual workflow file. If you want to support the creation of these workflows, guides, and YouTube videos, please consider becoming a patron!
The advanced workflow improves upon the Free SMPL Workflow by creating iterative batch-by-batch video generation groups, allowing us to overcome the 81-frame generation limit of Wan 2.1.
🎨 Workflow Sections
⬜ Input / Output / Model Loaders
🟩 Preparation
🟪 Video Generation
🟨 Important Notes

Installation
Download the .json file and drag and drop it into your ComfyUI window.
Install the missing custom nodes via the manager and restart ComfyUI.
Speed Optimizations
Note that you can optimize speed by installing Triton and SageAttention, however be aware that installation of these packages for the Windows portable version of ComfyUI is for advanced users and can destroy your ComfyUI setup.
You can find a guide for installation here: https://civitai.com/articles/12851/easy-installation-triton-and-sageattention
Download Models
Kijai's combined and quantized models:
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors
📁 ComfyUI/models/loras
Wan2_1-VACE_module_14B_fp8_e4m3fn.safetensors:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan2_1-VACE_module_14B_fp8_e4m3fn.safetensors
📁 ComfyUI/models/diffusion_models
Wan2_1-T2V-14B_fp8_e4m3fn.safetensors:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan2_1-T2V-14B_fp8_e4m3fn.safetensors
📁 ComfyUI/models/diffusion_models
umt5-xxl-enc-bf16.safetensors:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/umt5-xxl-enc-bf16.safetensors
📁 ComfyUI/models/text_encoders
GGUF Models:
Download the largest possible GGUF model that is compatible with your card here:
https://huggingface.co/QuantStack/Wan2.1_14B_VACE-GGUF/tree/main
We had good results with:
- Wan2.1_14B_VACE-Q5_1.gguf
- Wan2.1_14B_VACE-Q6_K.gguf
- Wan2.1_14B_VACE-Q8_0.gguf
📁 ComfyUI/models/unet
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors (same LoRA as above):
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors
📁 ComfyUI/models/loras
wan_2.1_vae.safetensors:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors
📁 ComfyUI/models/vae
🎨 Workflow Sections
⬜ Input / Output / Model Loaders
🟩 Preparation
🟪 Video Generation
🟨 Important Notes

Installation
Download the .json file and drag and drop it into your ComfyUI window.
Install the missing custom nodes via the manager and restart ComfyUI.
Speed Optimizations
Note that you can optimize speed by installing Triton and SageAttention, however be aware that installation of these packages for the Windows portable version of ComfyUI is for advanced users and can destroy your ComfyUI setup.
You can find a guide for installation here: https://civitai.com/articles/12851/easy-installation-triton-and-sageattention
Download Models
Kijai's combined and quantized models:
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors
📁 ComfyUI/models/loras
Wan2_1-VACE_module_14B_fp8_e4m3fn.safetensors:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan2_1-VACE_module_14B_fp8_e4m3fn.safetensors
📁 ComfyUI/models/diffusion_models
Wan2_1-T2V-14B_fp8_e4m3fn.safetensors:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan2_1-T2V-14B_fp8_e4m3fn.safetensors
📁 ComfyUI/models/diffusion_models
umt5-xxl-enc-bf16.safetensors:
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/umt5-xxl-enc-bf16.safetensors
📁 ComfyUI/models/text_encoders
GGUF Models:
Download the largest possible GGUF model that is compatible with your card here:
https://huggingface.co/QuantStack/Wan2.1_14B_VACE-GGUF/tree/main
We had good results with:
- Wan2.1_14B_VACE-Q5_1.gguf
- Wan2.1_14B_VACE-Q6_K.gguf
- Wan2.1_14B_VACE-Q8_0.gguf
📁 ComfyUI/models/unet
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors (same LoRA as above):
https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank32.safetensors
📁 ComfyUI/models/loras
wan_2.1_vae.safetensors:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors
📁 ComfyUI/models/vae
You can find the WORKFLOWS & EXAMPLE FILES here:
Before You Begin: Thank you for considering supporting us! Since these workflows can be complex, we recommend testing the free versions first to ensure compatibility with your system. We cannot guarantee full compatibility with every system that's why we always provide the main functionalities for free! Please take a moment to read through the entire guide. If you encounter any issues:
|
Prepare Input Footage and Track
To prepare the footage for the main workflow, use the MASK ACTOR + CAMERA TRACK workflow first.

We created this easy workflow to quickly process clips and generate mask and tracking outputs.
At the beginning, you can upload your clip.
You can also set a shot number or name that will be used for all outputs.
We added an option for activating interpolation. Long clips that don't fit into memory or require extensive processing time benefit greatly from reducing the number of frames that need to be processed by half. The other half will be done generated by interpolation.
For easy toggling of the optional workflow sections.

GENERATE MASK OUTPUTS
This section should work automatically in most cases. If you need a different resolution, you can adjust it using the Resize Image node.

GET FIRST FRAME
This section allows you to export the first frame, which is useful when you want to use image-to-image (I2I) workflows.

CAMERA TRACK
This section generates a camera track output for the next workflow. It initially masks out the area where the actor appears to prevent tracking points from being created there. You can adjust the mask size based on your specific needs.
Additionally, we've maintained the option to input vector data, as we had in the creature path workflow.

MODEL LOADERS
This workflow has two model loader groups:
FP8 Models → Using Kijai's fp8 Wan2.1 14B Models - extremely powerful but very VRAM intensive!
GGUF Models → Memory-efficient with options for various VRAM capacities.
ℹ️ Note: The active sampler in the SAMPLING section automatically determines which model is used. You don't need to manually disable or enable model groups to conserve memory..

Step 2: Video Input & Resolution
In this section you can set up your input video.
Resolution: 720p is recommended for the best quality.
Batch Size: Set the number of frames processed per batch.
Skip: Offset the start frame of your footage.
Frame Load Cap: 81 frames is recommended, as WAN 2.1 was trained on 81-frame data. However, we've had successful results with 40 frames more or less. If you encounter issues, they may be related to frame count.
Interpolation: This uses every second frame of the input video and interpolates the skipped frames afterward to reduce the total number of frames needed to be generated for the full range.
Upload Video Inputs
Masked Actor: Upload your actor against a 50% gray background, which yields the best results with VACE. Using other backgrounds can cause bugs like brightness changes.
Actor Alpha Mask: Needed for the VACE Encoder Nodes.
External Camera Tracking:
For Unique Perspective we used After Effects. We exported the tracks by creating red boxes in 3D space as reference points. VACE can interpret these boxes for camera tracking.
Alternatively, use our MASK + TRACK workflow — which includes a CAMERA TRACKING section.

Step 4: Background / Reference Image
You can upload an image to use as a reference background for the video. If you prefer to generate the video solely from text prompts, this step can be disabled by toggling the switch to 2.

Step 5: Start Frame Options
There are three options for the start frame:
Actor + Background Image: Places the masked actor on top of the background image for maximum adherence to the reference image.
Background Image Only: Uses only the background image, which is useful when the actor shouldn't appear in the initial frame.
NO Startframe: Gives WAN more flexibility to generate the scene environment.

Step 6: Control Images
This section creates the Control-Images for VACE, primarily to provide data about camera movement and person positioning.
You have 4 options:
Track Only: Passes the camera track loaded in the VIDEO INPUT section.
Track + Pose: Improves interaction with the environment and enhances overall tracking accuracy.
Canny + Pose: Ideal for pre-designed backgrounds. For example, scenes created in Blender and exported as a video.
Input CN Video: Save and re-upload output, recommended for memory-intensive workflows to save time and resources.

Step 7: Sampling and Fine-Tuning
Select your preferred sampling method. When the FP8 MODEL SAMPLER is active, the workflow will use Kijai's fp8 models. When the GGUF MODEL SAMPLER is active, it will use GGUF models. Ensure that only one sampling method—either FP8 or GGUF—is active at any given time.

Fast Groups Muter: Use this to toggle consecutive iterations on or off. This feature allows you to fine-tune each iteration's seed values and prompts before moving on to the next batch.
VACE Strength: Adjust this value to fine-tune your results. The second VACE Encode processes the actor footage, determining how closely the appearance matches the input versus how well it integrates into the background. If you notice jitter at the maximum value of 1, reducing it even slightly can help eliminate this issue.
Dilate/Erode Mask: Increase this value to reduce the actor's mask when artificial outlines appear.
Prompts: Describe clearly what is happening in the shot.
WanVideo Sampler Steps (FP8 only): With the lightx lora, 6 steps is optimal. We recommend sticking with 6 steps in most cases, as higher values may reduce the quality of the inserted person or create artifacts.
When working with fp8 models, you can create quick low-quality preview generations using just 2 steps—in our testing, these previews closely matched the final results, allowing you to efficiently test different seeds. Note that this tip doesn't apply to the GGUF sampler.
As always, experiment with different seeds to find the best possible animation.
This Video Combine node allows you to check the results of the current batch.
Here, all the generated videos up to this iteration are merged into a single video clip.

Additional Tips
Video sampling previews are now available, allowing you to see the result in real-time!

Set the Preview method in the ComfyUI Manager to Auto.

In the settings go to the 🎥VHS tab and enable Display animated preview when sampling. Be sure to have the latest version of the node pack.

Prepare Input Footage and Track
To prepare the footage for the main workflow, use the MASK ACTOR + CAMERA TRACK workflow first.

We created this easy workflow to quickly process clips and generate mask and tracking outputs.
At the beginning, you can upload your clip.
You can also set a shot number or name that will be used for all outputs.
We added an option for activating interpolation. Long clips that don't fit into memory or require extensive processing time benefit greatly from reducing the number of frames that need to be processed by half. The other half will be done generated by interpolation.
For easy toggling of the optional workflow sections.

GENERATE MASK OUTPUTS
This section should work automatically in most cases. If you need a different resolution, you can adjust it using the Resize Image node.

GET FIRST FRAME
This section allows you to export the first frame, which is useful when you want to use image-to-image (I2I) workflows.

CAMERA TRACK
This section generates a camera track output for the next workflow. It initially masks out the area where the actor appears to prevent tracking points from being created there. You can adjust the mask size based on your specific needs.
Additionally, we've maintained the option to input vector data, as we had in the creature path workflow.

MODEL LOADERS
This workflow has two model loader groups:
FP8 Models → Using Kijai's fp8 Wan2.1 14B Models - extremely powerful but very VRAM intensive!
GGUF Models → Memory-efficient with options for various VRAM capacities.
ℹ️ Note: The active sampler in the SAMPLING section automatically determines which model is used. You don't need to manually disable or enable model groups to conserve memory..

Step 2: Video Input & Resolution
In this section you can set up your input video.
Resolution: 720p is recommended for the best quality.
Batch Size: Set the number of frames processed per batch.
Skip: Offset the start frame of your footage.
Frame Load Cap: 81 frames is recommended, as WAN 2.1 was trained on 81-frame data. However, we've had successful results with 40 frames more or less. If you encounter issues, they may be related to frame count.
Interpolation: This uses every second frame of the input video and interpolates the skipped frames afterward to reduce the total number of frames needed to be generated for the full range.
Upload Video Inputs
Masked Actor: Upload your actor against a 50% gray background, which yields the best results with VACE. Using other backgrounds can cause bugs like brightness changes.
Actor Alpha Mask: Needed for the VACE Encoder Nodes.
External Camera Tracking:
For Unique Perspective we used After Effects. We exported the tracks by creating red boxes in 3D space as reference points. VACE can interpret these boxes for camera tracking.
Alternatively, use our MASK + TRACK workflow — which includes a CAMERA TRACKING section.

Step 4: Background / Reference Image
You can upload an image to use as a reference background for the video. If you prefer to generate the video solely from text prompts, this step can be disabled by toggling the switch to 2.

Step 5: Start Frame Options
There are three options for the start frame:
Actor + Background Image: Places the masked actor on top of the background image for maximum adherence to the reference image.
Background Image Only: Uses only the background image, which is useful when the actor shouldn't appear in the initial frame.
NO Startframe: Gives WAN more flexibility to generate the scene environment.

Step 6: Control Images
This section creates the Control-Images for VACE, primarily to provide data about camera movement and person positioning.
You have 4 options:
Track Only: Passes the camera track loaded in the VIDEO INPUT section.
Track + Pose: Improves interaction with the environment and enhances overall tracking accuracy.
Canny + Pose: Ideal for pre-designed backgrounds. For example, scenes created in Blender and exported as a video.
Input CN Video: Save and re-upload output, recommended for memory-intensive workflows to save time and resources.

Step 7: Sampling and Fine-Tuning
Select your preferred sampling method. When the FP8 MODEL SAMPLER is active, the workflow will use Kijai's fp8 models. When the GGUF MODEL SAMPLER is active, it will use GGUF models. Ensure that only one sampling method—either FP8 or GGUF—is active at any given time.

Fast Groups Muter: Use this to toggle consecutive iterations on or off. This feature allows you to fine-tune each iteration's seed values and prompts before moving on to the next batch.
VACE Strength: Adjust this value to fine-tune your results. The second VACE Encode processes the actor footage, determining how closely the appearance matches the input versus how well it integrates into the background. If you notice jitter at the maximum value of 1, reducing it even slightly can help eliminate this issue.
Dilate/Erode Mask: Increase this value to reduce the actor's mask when artificial outlines appear.
Prompts: Describe clearly what is happening in the shot.
WanVideo Sampler Steps (FP8 only): With the lightx lora, 6 steps is optimal. We recommend sticking with 6 steps in most cases, as higher values may reduce the quality of the inserted person or create artifacts.
When working with fp8 models, you can create quick low-quality preview generations using just 2 steps—in our testing, these previews closely matched the final results, allowing you to efficiently test different seeds. Note that this tip doesn't apply to the GGUF sampler.
As always, experiment with different seeds to find the best possible animation.
This Video Combine node allows you to check the results of the current batch.
Here, all the generated videos up to this iteration are merged into a single video clip.

Additional Tips
Video sampling previews are now available, allowing you to see the result in real-time!

Set the Preview method in the ComfyUI Manager to Auto.

In the settings go to the 🎥VHS tab and enable Display animated preview when sampling. Be sure to have the latest version of the node pack.

© 2025 Mickmumpitz
© 2025 Mickmumpitz
© 2025 Mickmumpitz