LatentSwap3D: Semantic Edits on 3D Image GANs

1ETH Zürich - Data Analytics Lab, 2Technical University of Munich, 3Google Switzerland
*Conducted this research as part of studies at Technical University of Munich.

ICCV AI3DCC 2023


MVCGAN Smiling Attribute Editing
EG3D Wearing Eyeglasses Attribute Editing
pi-GAN Smiling Attribute Editing

Abstract

3D GANs have the ability to generate latent codes for entire 3D volumes rather than only 2D images. These models offer desirable features like high-quality geometry and multi-view consistency, but, unlike their 2D counterparts, complex semantic image editing tasks for 3D GANs have only been partially explored. To address this problem, we propose LatentSwap3D, a semantic edit approach based on latent space discovery that can be used with any off-the-shelf 3D or 2D GAN model and on any dataset. LatentSwap3D relies on identifying the latent code dimensions corresponding to specific attributes by feature ranking using a random forest classifier. It then performs the edit by swapping the selected dimensions of the image being edited with the ones from an automatically selected reference image. Compared to other latent space control-based edit methods, which were mainly designed for 2D GANs, our method on 3D GANs provides remarkably consistent semantic edits in a disentangled manner and outperforms others both qualitatively and quantitatively. We show results on seven 3D GANs (pi-GAN, GIRAFFE, StyleSDF, MVCGAN, EG3D, StyleNeRF, and VolumeGAN) and on five datasets (FFHQ, AFHQ, Cats, MetFaces, and CompCars).

Italian Trulli
Figure 1: Given an image we invert it in the latent space of a pre-trained MVCGAN on FFHQ enabling novel view synthesis. Then we use our LatentSwap3D to perform attribute editing. Row two and three show a comparison on this task between LatentSwap3D and StyleFlow.

LatentSwap3D Framework

We aim to build a model agnostic method that can work on any 3D-aware image generator. Our method, LatentSwap3D, consists of two main components. The first one identifies important features in the latent space of a 3D GAN that controls the desired attribute through a random forest algorithm. Then, the target attribute is manipulated in an identity-preserving manner through a feature-swapping approach.

Framework
Figure 2: (a) We propose to train a random forest regressor taking latent codes \( s_i \) to predict the presence/absence of a desired attribute. We use the trained forest to rank the importance of dimensions of \( s_i \) with respect to the desired attribute. (b) Given the latent code \( s \) of an image, first, we find the closest latent code in the support set exhibiting the desired attribute (e.g., \( s^+ \) to increase blondness), then we swap the top \( K \) dimensions related to the attribute to generate an edited latent code \( \hat{s} \) that can be decoded in an edited image.

LatentSwap3D on Other 3D-aware Generators

GIRAFFE

GIRAFFE consists of NeRF and 2D GANs. The NeRF part outputs the features of the 3D shape and texture, while the 2D GAN part outputs the final image. Figure 3 shows smiling and wearing eyeglasses edits from LatentSwap3D on the GIRAFFE - FFHQ model following the same protocol detailed for the other FFHQ trained generators.

GIRAFFE Result
Figure 3: LatentSwap3D on GIRAFFE - FFHQ.

To test how well LatentSwap3D generalizes to different datasets we extended the experiment to include CompCars using the pre-trained GIRAFFE generator. Due to the lack of classifiers for car attributes, as a proof of concept, we trained a ResNet-50 to classify the color of a car from scratch on Myauto.ge Cars Dataset. As seen from Fig. 4, using these classifiers our approach can edit the color of the cars successfully.

GIRAFFE Result
Figure 4: LatentSwap3D on GIRAFFE - CompCars.

StyleNeRF

StyleNeRF is another high-resolution 3D-aware generative model that integrates a NeRF into a 2D style-based generator. StyleNeRF is able to generate high-resolution and 3D consistent images/shapes from unstructured 2D images. Figure 5 shows our attribute editing, e.g., smiling, removing bangs, and changing the hair color to blond, on StyleNeRF - FFHQ.

StyleNeRF Result
Figure 5: LatentSwap3D on StyleNeRF - FFHQ.

VolumeGAN

VolumeGAN is a high-quality 3D-aware generative model explicitly trained to learn a structural and a textural representation and it is based on NeRF. The results of our approach on VolumeGAN - FFHQ are provided in Fig. 6. Our approach applies the desired attributes, e.g., removing eyeglasses, changing the hair color, and reducing the facial hair, to the latent space of VolumeGAN, without changing the identity of the input face.

VolumeGAN Result
Figure 6: LatentSwap3D on VolumeGAN - FFHQ.

LatentSwap3D on StyleGAN2

LatentSwap3D is not limited to 3D-aware GANs but also works on image-based GANs like StyleGAN2, see Fig. 7. First, by applying the same procedure in Fig. 2(a), we identify the latent codes from the style space of StyleGAN2 that are most important for the desired attribute. Then, we swap these latent codes to generate the desired edits, as explained in Fig. 2(b).

StyleGAN2
Figure 7: LatentSwap3D on StyleGAN2 - FFHQ, MetFaces, AFHQ Cats and AFHQ Dogs.

BibTeX

@InProceedings{Simsar_2023_ICCV,
        author    = {Simsar, Enis and Tonioni, Alessio and Ornek, Evin Pinar and Tombari, Federico},
        title     = {LatentSwap3D: Semantic Edits on 3D Image GANs},
        booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
        month     = {October},
        year      = {2023},
        pages     = {2899-2909}
}

Acknowledgments

We are grateful to Google University Relationship GCP Credit Program for the support of this work by providing computational resources.