Next Stop Paris

AI

The New VFX Pipeline: AI-Powered Tools Reshaping Visual Effects#

A deep dive into the new tools and techniques transforming how we visual effects are created

Introduction#

I recently had the chance to spearhead innovation in AI filmmaking and develop novel techniques for integrating AI into traditional visual effects and filmmaking workflows at TCL. The use cases for AI are a lot different than simply prompting shots into existence. AI isn't a radical seed change to the industry but rather a number of new efficiencies and workflows.


In this article I'll go through some VFX and filmmaking steps/departments and how they were impacted by AI. The ethos at TCL was to use AI anywhere and everywhere possible as the film Next Stop Paris was funded as a research project.

Environments + Assets#

Gaussian Splats#

Gaussian splats represent 3D scenes as a collection of oriented 3D Gaussians, each splat acts as a tiny, flexible brush stroke that can be rendered from any viewpoint in real-time.

In the flower market scene - we were able to capture a variety of real world assets, duplicate and repurpose to populate the virtual environment. Some of the fine and soft detail would have be prohibitively expensive to recreate manually.

For the gaussians, footage, and even cg the goal was to not bake in any lighting or creative early on and let AI do the heavy lifting of making a cohesive shot though a new technique called rediffusion. More on that later..

Retargeting Stock Footage#

For the exterior of the train the team used some footage I shot from my iPhone of Canadian countryside, using Runways v2v feature, we were able to keep the essence of the motion, and retarget it creatively towards the French countryside or the outskirts of Paris

DMP#

A lot of the pixels we see on screen for many types of shots are often "beyond parallax" and a projected 2D solution is right choice. For generative VFX its one of the most obvious use cases. The key is to use it as another tool and not simply resign oneself to the algorithm. When it comes to static images an artist has complete control over the outputs in ComfyUI/Adobe Firefly though in-painting and out-painting without having to get into the weeds of wrangling individual pixels.

Inpaint Reference A for I2VDMP
DMPInpaint Reference A for I2V

In Next Stop Paris the same train appears twice in the show. ComfyUI ip-adaptor and in-painting was used to get alignment on the two shots and have a consistent train asset in two separate shots. The environments were worked up separately from the train for crowd control, and later stitched together with a SAM2.

Right imageLeft image

Crowds#

Onc of the great use cases for hybrid VFX is creating animated DMPs using an i2v workflow. For the Gare Du Nord scene one of our compositors was able to batch generate hundreds of possible static environments in ComfyUI, from there I was able to pick from a shortlist, in-paint & out-paint as a still for the initial crowd population and finally run i2v on Runway.ml making sure to include the keyword static camera Voila, a crowd sim and environment essentially for free.

Layout#

An early research video where it became clear that AI video models have inherit euclidean understanding of the world. They are trackable, and open up a lot of workflow where crossing between live action/cg/ai as needed is entirely possible.

That said layout in VFX requires nuanced real-world understanding that only artists can provide, as they intuitively align camera angles, asset placement, and scene composition with real world photography. For Next Stop Paris there was a proper layout and match-move where applicable for every shot to ensure shot and scene consistency. Until AI prompts can incorporate a projection matrix, projecting AI-generated content onto geometry or within nodal pans, while accommodating dynamic camera movements, remains a key limitation to consider.

Lighting and Rendering#

Traditional ray tracing is deterministic; given the same inputs (geometry, materials, lighting), it always produces identical results with identical inputs. Using AI diffuser as a renderer is also deterministic given the same tokens and seed however there is no scene description and the image resolves around token semantics and a statistical approximation of what the difference between random noise and a cohesive image. For visual effects we can wrangle this randomness using control nets to combine the benefits of a traditional scene description (geo + camera) and the speed and creative flexibility of generative.

AI + ControlNets + IP Adapter (Statistical Rendering)#

One of the most powerful aspects of working with AI models is that the diffusion process goes through a U-NET, essentially the noise passes through different hidden layers as it is converted from noise to an image that aligns with input tokens. Interestingly some layers are deputized to serve a specific function, SDXL for example has layers for style and image structure. IP adaptor taps into these and allows artist to control those independently.

Traditional Raytracing (Deterministic Rendering)#

For ray-traced scenes, our 3D team used Arnold with NVIDIA's OptiX Denoiser, an AI trained on ray-traced images to accurately reduce noise with fewer samples, speeding up rendering while preserving quality.

Quality vs Speed Trade-offs#

While AI rendering can't match the physical accuracy of ray tracing for final shots, it excels at speed and flexibility. For pre-visualization, concept art, and rapid prototyping, AI generation provides results in seconds rather than hours. The trade-off is control, we sacrifice deterministic precision for statistical approximation, but gain unprecedented speed and creative exploration capabilities. For the Next Stop Paris sign shot, the approach was a best of both worlds, rough render, i2v, and project.

Compositing#

ViTMatte#

ViTMatte-for-Nuke analyzes spatial relationships, texture patterns, and semantic information within a trimap to determine what belongs to the foreground versus background. Unlike traditional matting approaches that rely on colour differences, VitMatte can handle complex scenarios like hair, transparent objects, and fine details that typically require manual rotoscoping.

Segment Anything#

Segment Anything Model (SAM2) has significant limitations that require careful consideration in production workflows. The most critical constraint is that SAM2-generated mattes are hard mattes and don't consider soft edges. SAM2 was developed at Meta as a data annotation tool for AI training and not with VFX in mind. Next Stop Paris was shot with a 240 degree shutter so that SAM2 mattes would be more useable.

To work around these limitations, I developed a hybrid approach:

  • Use SAM2 for initial object identification and garbage/core mattes
  • Combine with VitMatte for edge refinement and detail enhancement
  • Reserve manual roto only for areas that cant be masked with a key or VitMatte

Rediffusion / AI Light Transfer#

A new technique I developed was to work a shot up as essentially a slap comp and then let runway v2v reimagine the shot as a cohesive image, including subtle interactive lighting, grit, grime and dirty windows. Extract that information and selectively reapply it in a controlled fashion to various elements via crypto-matte, normals, etc.. Coined "rediffusion"

At the time of our production, Runway's Video-to-Video (V2V) tool was the only available AI v2v solution. While other AI tools existed for image generation and editing, Runway's V2V was unique in its ability to process video sequences while preserving structure and restyling according to a text prompt. Using various comp techniques its possible to extract and transfer the results of v2v back onto a source plate or cg render to get interactive lighting or integration for free.

image (2).webp

Normals Based Relighting#

Rediffusion works great when you don't necessarily need shot to shot consistency in your relighting. For consistent plate based relighting Beeble was used to generate normal maps from the keyed plate. This was an essential part of the workflow, in Nuke artists were able to shape the otherwise flat lighting to better fit the directionality of the hybrid ai/cg environment.

Other Enhancements#

AI Upscale#

To save on render time, storage, processing and artist time I set the working resolution to 1920x1080 HD. Using an Ai upscale in finish is effectively indistinguishable from working at native 4k.

Color Management#

As of writing none of the commercially available web based inference providers support exporting anything other than h.264 Rec.709 mp4 files or PNG/JPG sRGB stills. There is a long way to go to fill the gap on professional workflows. For Next Stop Paris, I implemented OCIO/ACES and OIIO at the core of any shared Nuke scripts, and pipeline publishing tools.

Pipeline#

Writing a separate blog post about this "Building an AI First VFX Studio from Scratch". Claude was essential for quickly writing and deploying all the necessary tools for artists. I used PyInstaller + UV extensively to containerize and deploy code.

Icons.png

Creative#

Decision Front loading#

The challenge for many artists is getting creative decisions from stakeholders. Often times this means passing a shot through a various stages only to have to start from scratch. Generative tools open the possibility of giving decision makers a huge parameterization of possible outcomes to comment on before committing resources towards implementation. For Next Stop Paris we used Miro as a creative cork board for the director to ideate and comment working directly with the generative and art team to solve creative problems freeing up the VFX team to cook on implementation and technical details.

Creative Lookahead | Comp Lookahead#

Another great technique I called comp lookahead. Using image to image artists can quickly take a peak into the future at a more refined version of their comp, needing only a slap comp to feed into i2i to preview and get ideas, inspiration, or feedback on possible outcomes.
In this example an i2i workflow using Adobe Firefly was used to take a look at what a shot might look like based on a screen grab from Google Earth.

Adobe FireflyGoogle Earth
Google EarthAdobe Firefly

Conclusion#

For VFX creators looking to embrace these new tools, the path forward is surprisingly accessible. Start by experimenting with open-source AI models within ComfyUI or on web based SaaS products like Krea or Runway. Focus on areas where speed and iteration matter more than final quality, concept development, pre-visualization, and rapid prototyping.

The most important step is to approach AI as a creative collaborator and accelerator rather than a replacement for human artistry. Learn to guide and direct AI generation, understand its strengths and limitations, and develop workflows that leverage both human creativity and machine efficiency. The future belongs to artists who can effectively collaborate with AI tools while maintaining their unique creative vision and technical expertise.

"Dare to prompt, lest you be prompted!"

I want to extend my deepest gratitude to TCL for having the vision and courage to take the risk of developing AI technology for film when it was still an uncertain frontier, and to the team of extremely talented artists from all around the world that brought their skills and spirit of innovation.

Questions or queries? Reach out at derekvfx.ca/contact