Next Stop Paris

AI

The New VFX Pipeline: AI-Powered Tools Reshaping Visual Effects#

A deep dive into the new tools and techniques transforming how we visual effects are created

Introduction#

I recently had the chance to spearhead innovation in AI filmmaking and develop novel techniques for integrating AI into traditional visual effects and filmmaking workflows at TCL. The use cases for AI are a lot different than simply prompting shots into existence. AI isn't a radical seed change to the industry but rather a number of new efficiencies and workflows.


In this article I'll go through some VFX and filmmaking steps/departments and how they were impacted by AI. The ethos at TCL was to use AI anywhere and everywhere possible as the film Next Stop Paris was funded as a research project.

Environments + Assets#

Gaussian Splats#

Gaussian Splats represent a revolutionary approach to 3D scene representation that moves beyond traditional polygon meshes and point clouds. Instead of storing geometric vertices and faces, Gaussian Splats encode 3D scenes as a collection of oriented 3D Gaussians—essentially ellipsoids with position, scale, rotation, and colour attributes. Each splat acts as a tiny, flexible 3D brush stroke that can be rendered from any viewpoint in real-time.

The real-world applications of Gaussian Splats in visual effects are particularly compelling for scene reconstruction and virtual production. Take, for example, our recent work on the flower market scene—a complex urban environment that would have been prohibitively expensive to model traditionally.

We captured the scene using 360-degree photography of individual planters and architectural elements, then processed these through Gaussian Splat reconstruction. The result was a fully navigable 3D environment that maintained photorealistic detail while being lightweight enough for real-time rendering in Unreal Engine.

The advantages over traditional methods became immediately apparent: we could extend the scene beyond the original capture area by intelligently duplicating and repositioning Gaussian elements. Camera tracking worked seamlessly, as the splat-based representation naturally handled parallax and perspective changes without the artifacts common in traditional mesh-based approaches.

CG Modelling#

Geometry as a control surface

DMP#

The most obvious use case for generative - when dealing with a static image with a ComfyUI and an in painting and a willingness to use Photoshop

Inpaint Reference A for I2VDMP
DMPInpaint Reference A for I2V

A lot of the pixels we see on screen for any given shot are often "beyond parallax" and a static 2D solution is right choice. The days of painstakingly stitching together photos and brush strokes are essentially over.

The same train appears twice in the show. ComfyUI ip-adaptor and in-painting was used to get alignment on the two shots. The environments were worked up separately from the train for crowd control, and later stitched together with a SAM2. A true medley of AI orchestration.

Right imageLeft image

One of the most powerful aspects of working with AI models is that the diffusion process goes through a U-NET, essentially the noise passes through different hidden layers as it is converted from noise to an image that aligns with input tokens. Interestingly some layers are deputized to serve a specific function, SDXL for example has layers for style and image structure. IP adaptor taps into these and allows artist to control those independently.

Crowds#

Onc of the great use cases for hybrid VFX is creating animated DMPs using an i2v workflow. For the Gare Du Nord scene one of our compositors was able to generated hundreds of possible DMP's in ComfyUI, from there we could land on one that would sufficiently meet the creative spec, in-paint or out-paint as a still initial crowd population then run i2v on Runway.ml making sure to include the keyword static camera Voila, a crowd sim essentially for free.

Layout#

Layout is the one step that should not be automated

An early research video where it became clear that AI video models have inherit euclidean understanding of the world baked somewhere in their latent space. They are trackable, and open up a lot of workflow where crossing between live action/cg/ai as needed is entirely possible.

Lighting and Rendering#

Traditional ray tracing is deterministic—given the same inputs (geometry, materials, lighting), it always produces identical results. AI rendering, however, is inherently statistical, generating images through learned patterns rather than physical simulation. This fundamental difference opens new possibilities for VFX workflows.

By using depth and normal control networks, we can guide AI generation with precise geometric information from our 3D scenes. This control allows us to skip the computationally expensive lighting and rendering steps entirely. Instead of calculating light bounces and material interactions, we can project our 3D geometry into 2D depth and normal maps, then use these as conditioning inputs for AI image generation.

Traditional Raytracing (Deterministic Rendering)#

The train car interiors were rendered traditionally using Arnold + NVDA's OpTix Denoiser. The image

AI + Controlnets (Statistical Rendering)#

Harnessing the inherit randomness and achieving control and consistency whilst still getting the speedups and creative flexibility is the

The sweet spot lies in using AI for early-stage work where speed matters more than physical accuracy, then transitioning to traditional rendering for final shots where precision is critical. This hybrid approach leverages the strengths of both technologies, using AI to explore creative possibilities quickly and traditional rendering to deliver final quality.


Quality vs Speed Trade-offs#

While AI rendering can't match the physical accuracy of ray tracing for final shots, it excels at speed and flexibility. For previsualization, concept art, and rapid prototyping, AI generation provides results in seconds rather than hours. The trade-off is control—we sacrifice deterministic precision for statistical approximation, but gain unprecedented speed and creative exploration capabilities.

Compositing#

Beyond Green Screen - Matte Extraction#

Even when shooting on a green screen you still need mattes, for garbage, core, or

Segment Anything Model (SAM2)

Traditional green screen workflows require controlled lighting, specific materials, and extensive post-processing to achieve clean mattes. VitMatte revolutionizes this process by using AI to extract alpha channels from natural footage without requiring any special setup. This capability dramatically accelerates production by eliminating the need for green screen setups, allowing filmmakers to shoot in any location with any lighting conditions.

The AI model understands complex visual relationships, automatically distinguishing between foreground subjects and background elements based on visual cues rather than color separation. This means actors can perform against natural backgrounds—city streets, forests, or even moving crowds—and VitMatte will generate clean mattes that would be impossible to achieve with traditional chroma key techniques.

VitMatte operates through a sophisticated vision transformer architecture that processes images at multiple scales simultaneously. The model analyzes spatial relationships, texture patterns, and semantic information to determine what belongs to the foreground versus background. Unlike traditional matting approaches that rely on color differences, VitMatte can handle complex scenarios like hair, transparent objects, and fine details that typically require manual rotoscoping.

The system works by first generating a coarse segmentation mask, then refining it through multiple attention layers that focus on boundary regions and fine details. This multi-scale approach allows it to capture both broad strokes and intricate details, producing mattes that rival hand-drawn rotoscoping in quality while requiring only seconds of processing time.

We discovered that V2V's most powerful application was in lighting transfer—a technique that became crucial for our workflow. When we needed to match lighting conditions between different shots or locations, we would re-render our 3D elements with the target lighting setup, then use V2V to transfer that lighting style to our live-action plates. This approach was particularly effective for scenes where we needed to integrate CG elements into footage shot under different conditions.

Matte Extraction#

Despite its impressive capabilities, SAM2 has significant limitations that require careful consideration in production workflows. The most critical constraint is that SAM2-generated mattes are typically too rough for direct use in final composites—they serve best as guides or starting points rather than finished products.

The quality of SAM2's output heavily depends on screen space detail. Objects that occupy substantial screen real estate with clear visual boundaries work well, while small or indistinct elements often produce unusable results. This limitation becomes particularly apparent when working with distant subjects or objects with complex, fine details.

To work around these limitations, we've developed a hybrid approach:

  • Use SAM2 for initial object identification and rough masking
  • Apply the results as guides for traditional roto tools
  • Combine with VitMatte for edge refinement and detail enhancement
  • Reserve manual roto for the most critical areas requiring pixel-perfect accuracy

This approach leverages SAM2's strengths while acknowledging its limitations, creating a more efficient pipeline that reduces manual work without sacrificing quality.

Derivatives By Default#

Rough key, despill, normals, depth

Mutlti Shot Workflow#

talk about how context switching is a big waste of artist time

Ai Light Transfer#

At the time of our production, Runway's Video-to-Video (V2V) tool was essentially the only game in town for AI-powered video transformation. While other AI tools existed for image generation and editing, Runway's V2V was unique in its ability to process entire video sequences while maintaining temporal consistency. This made it an essential tool in our pipeline, despite its limitations and the significant computational resources it required.

The technology works by taking a source video and a reference image or video, then applying the visual style and characteristics of the reference to the source while preserving the original motion and composition. This process involves complex neural networks that understand both spatial and temporal relationships, allowing for coherent transformations across multiple frames.

Normals Based Relighting#

https://beeble.ai/

Other Enhancements#

  • worked at HD, upscaled to 4k. perceptually comparable to working at 4k natively without the heavy data and processing lift
  • used OCIO and OIIO extensively to normalize around aces cg working space, esp important when working with web based ai inference providers

Creative#

Decision Front loading#

The challenge for many creators is getting alignment on creative decisions from stakeholders. Often times this means passing a shot through a various stages only to have to start from scratch. Generative tools open the possibility of

Creative Lookahead | Comp Lookahead#

One of the most powerful workflows for generative AI is image to image workflows where we can retain the input structure and have our diffuser resolve around something statistically photographic in a matter of seconds. I used this technique on Next Stop Paris taking a view from Google Earth and using it to roughly influence along with some text tokens a sunset view over Paris.

Adobe FireflyGoogle Earth
Google EarthAdobe Firefly

Conclusion#

For VFX creators looking to embrace these new tools, the path forward is surprisingly accessible. Start by experimenting with open-source AI models and integrating them into existing workflows. Focus on areas where speed and iteration matter more than final quality—concept development, pre-visualization, and rapid prototyping. Build small, focused tools that solve specific problems rather than trying to replace entire pipelines overnight.

The most important step is to approach AI as a creative collaborator rather than a replacement for human artistry. Learn to guide and direct AI generation, understand its strengths and limitations, and develop workflows that leverage both human creativity and machine efficiency. The future belongs to artists who can effectively collaborate with AI tools while maintaining their unique creative vision and technical expertise.

I want to extend my deepest gratitude to TCL for having the vision and courage to take the risk of developing AI technology when it was still an uncertain frontier. Their willingness to invest in emerging technology and push the boundaries of what's possible has opened new creative pathways for the entire VFX industry. Without their pioneering spirit and commitment to innovation, many of the workflows and techniques described in this document would not exist today.