PhotoApp: Photorealistic Appearance Editing of Head Portraits

- By MALLIKARJUN B R and AYUSH TEWARI, MPI for Informatics, SIC, Germany and 9 others

Abstract

Photorealistic editing of head portraits is a challenging task as humans are very sensitive to inconsistencies in faces. Paper present an approach for high-quality intuitive editing of the camera viewpoint and scene illumination (parameterised with an environment map) in a portrait image. This requires our method to capture and control the full reflectance field of the person in the image. Most editing approaches rely on supervised learning using training data captured with setups such as light and camera stages. Such datasets are expensive to acquire, not readily available and do not capture all the rich variations of in-the-wild portrait images. In addition, most supervised approaches only focus on relighting and do not allow camera viewpoint editing.

Paper present a method that learns from limited supervised training data. The training images only include people in a fixed neutral expression with eyes closed, without much hair or background variations. Each person is captured under 150 one-light-at-a time conditions and under 8 camera poses. Instead of training directly in the image space, we design a supervised problem that learns transformations in the latent space of StyleGAN. This combines the best of supervised learning and generative adversarial modelling. We show that the StyleGAN prior allows for generalisation to different expressions, hairstyles and backgrounds. This produces high-quality photorealistic results for in-the-wild images and significantly outperforms existing methods. Our approach can edit the illumination and pose simultaneously and runs at interactive rates.

Paper Contributions

We combine the strength of supervised learning and generative adversarial modeling in a new way to develop a technique for high-quality editing of scene illumination and camera pose in portrait images. Both properties can be edited simultaneously.
Our novel formulation allows for generalisation to in-the-wild images with significantly higher quality results than related methods. It also allows for training with limited amount of supervision.

The method allows for editing the scene illumination 𝐸𝑡 and camera pose 𝜔𝑡 in an input source image 𝐼𝑠 . We learn to map the StyleGAN latent code 𝐿𝑠 of the source image, estimated using pSpNet to the latent code 𝐿𝑡 of the output image. StyleGAN is then used to synthesis the final output 𝐼𝑡. Our method is trained in a supervised manner using a light-stage dataset with multiple cameras and light sources. For training, used a latent loss and a perceptual loss defined using a pretrained network 𝜙. Supervised learning in the latent space of StyleGAN allows for high-quality editing which can generalise to in-the-wild images.

The method takes as input an in-the-wild portrait image, target illumination and the target camera pose. The output is a portrait image of the same identity, synthesised with the target camera and lit by the target illumination. Given a light-stage dataset of multiple independent illumination sources and viewpoints, the naive approach could be to learn the transformations directly in image space. Instead, we propose to learn the mapping in the latent space of StyleGAN. We show that learning using this latent representation helps in generalisation to in-the-wild images with high photorealism. StyleGAN2 is used in implementation, referred to as StyleGAN for better comprehension

Data Preparation We evaluate our approach on portrait images captured in the wild. All data in our work (including the training data) are cropped and preprocessed as described in Karras et al. The images are resized to a resolution of 1024x1024. Since we need the ground truth images for quantitative evaluations, we use the test portion of our lightstage dataset composed of images of 41 identities unseen during training. We create two test sets, Set1 has the input and ground truth pairs captured from the same viewpoint while Set2 includes pairs captured from different viewpoints. The HDR environment maps, randomly sampled from the Naval Outdoor and Naval Indoor datasets are used to synthesise the pairs with natural illumination conditions. Viewpoints are randomly sampled from the 8 cameras of the light-stage setup. The input and ground truth images are computed using the same environment map in Set2 for evaluating the viewpoint editing.

High-Fidelty Appearance Editing Figs. 5 show simultaneous viewpoint and illumination editing results of our method for various subjects. We also show the StyleGAN projection of the input images estimated by Richardson. Our approach produces high-quality photorealistic results and synthesises the full portrait, including hair, eyes, mouth, torso and the background, while preserving the identity, expression and other properties (such as facial hair). Additionally, the results show that our method can preserve a variety of reflectance properties, resulting in effects such as specularities and subsurface scattering. Please note the view-dependent effects such as specularities in the results(nose, forehead...). Our method can synthesise results even under high-frequency light conditions resulting in shadows, even though the StyleGAN network is trained on a dataset of natural images. In Figs. 5 we show more detailed editing results. As it can be noted, the relighting preserve the input pose and identity. Also, our method can change the viewpoint under a fixed environment map (third row for each subject).

Comparisons to Related Methods

We compare our method with several state of the art portrait editing approaches. We evaluate qualitatively on in the wild data, as well as quantitatively on the test set of the light-stage data. We compare with the following approaches:

• The relighting approach of Sun et al. which is a datadriven technique trained on a light-stage dataset. It can only edit the scene illumination.

• The relighting approach of Zhou which is trained on synthetic data. It can also only edit the scene illumination.

• PIE is a method which computes a StyleGAN embedding used to edit the image. It can edit the head pose and scene illumination sequentially (unlike ours, which can perform the edits simultaneously). It is trained without supervised image pairs.

• StyleFlow, like PIE can edit images by projecting them onto the StyleGAN latent space. It is also trained without supervised image pairs. Please note that this paper is concurrent to us (not counted as prior art). However, we provide comparisons for completeness.

CONCLUSION

We presented PhotoApp, a method for editing the scene illumination and camera pose in head portraits. Our method exploits the advantages of both supervised learning and generative adversarial modeling. By designing a supervised learning problem in the latent space of StyleGAN, we achieve high-quality editing results which generalise to in the wild images with significantly more diversity than the training data. Through extensive evaluations, we demonstrated that our method outperforms all related techniques, both in terms of realism and editing accuracy. We further demonstrated that our method can learn from very limited supervised data, achieving high-quality results when trained with as little as 3 identities captured in a single expression. While several limitations still exist, we hope that our contributions inspire future work on using generative representations for synthesis applications.

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

- By Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang Microsoft Research, Beijing 100080, China. Beihang University, Beijing 100191, China Paper Link Abstract Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for...

SRI Blog

Search This Blog

PhotoApp: Photorealistic Appearance Editing of Head Portraits

Labels

Comments

Popular posts from this blog

ABOD and its PyOD python module

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

Why Should I Trust You?. . LIME