-By Sheng-Yu Wang Oliver Wang Andrew Owens Richard Zhang Alexei A. Efros
UC Berkeley1 Adobe Research
Abstract
Most malicious photo manipulations are created using standard image editing tools, such as Adobe. We present a method for detecting one very popular Photoshop manipulation image warping applied to human faces – using a model trained entirely using fake images that were automatically generated by scripting Photoshop itself. We show that our model outperforms humans at the task of recognizing manipulated images, can predict the specific location of edits, and in some cases can be used to “undo” a manipulation to reconstruct the original, unedited image. We demonstrate that the system can be successfully applied to real, artist-created image manipulations
In an era when digitally edited visual content is ubiquitous, the public is justifiably eager to know whether the images they see on TV, in glossy magazines and on the Internet are, in fact, real. While the popular press has mostly focused on “DeepFakes” and other GAN-based methods that may one day be able to convincingly simulate a real person’s appearance, movements, and facial expressions, for now, such methods are prone to degeneracies and exhibit visible artefacts. Rather, it is the more subtle image manipulations, performed with classic image processing techniques, typically in Adobe Photoshop, that have been the largest contributors to the proliferation of manipulated visual content. While such editing operations have helped enable creative expression, if done without the viewer’s knowledge, they can have serious negative implications, ranging from body image issues set by unrealistic standards to the consequences of “fake news” in politics.
Face manipulation
Researchers have proposed forensics methods to detect a variety of face manipulations. Zhou and Roessler propose neural network models to detect face swapping and face reenactment manipulations where one face is wholly replaced with another (perhaps taken from the same subject) after splicing, color matching, and blending. Other work investigates
detecting morphed (interpolated) faces and inconsistencies in lighting from specular highlights on the eye
Learning photo forensics
The difficulty in obtaining
labeled training data has led researchers to propose a variety of “self-supervised” image forensics approaches that are trained on automatically-generated fake images. Chen use a convolutional network to detect median filtering. Zhou propose an object detection model, specifically using steganalysis features to reduce the influence of semantics. The model is pretrained on automatically created synthetic fakes using object segmentations, and subsequently fine-tuned on actual fake images. While we also generate fakes automatically, we use the tools that a typical editor would use, allowing us to detect these manipulations more accurately. A complementary approach is exploring unsupervised forensics models that learn only from real images, without explicitly modeling the fake image creation process. For example, several models have been proposed to detect spliced images by identifying patches which come from different camera models, by using EXIF metadata, or by identifying physical inconsistencies. These approaches, however, are designed to detect instances of the image splicing problem, while we address a more subtle manipulation — facial structure warping.
Hand-defined manipulation
cues Other image forensics work has proposed to detect fake images using handdefined cues. Early work detected resampling artifacts by finding periodic correlations between nearby pixels. There has also been work that detects inconsistent quantization, double-JPEG artifacts, and geometric inconsistencies. However, the operations performed by interactive image editing tools are often complex, and can be difficult to model. Our approach, by contrast, learns features appropriate for its task from a large dataset of manipulated images.
Real-or-fake classification
We first address the question “has this image been manipulated?” We train a binary classifier using a Dilated Residual Network variant. Investigated the effect of resolution by training low and high-resolution models. High-resolution models enable the preservation of low-level details, potentially useful for identifying fakes, such as resampling artefacts. On the other hand, a lower-resolution model potentially contains sufficient details to identify fakes and can be trained more efficiently. We try low and high-resolution models, where the shorter side of the image is resized to 400 and 700 pixels, respectively. During training, the images are randomly left-right flipped and cropped to 384 and 640 pixels, respectively. While we control the post-processing pipeline in our test setup, real-world use cases may contain unexpected postprocessing. Forensics algorithms are often sensitive to such operations. To increase robustness, we consider more aggressive data augmentation, including resizing methods (bicubic and bilinear), JPEG compression, brightness, contrast, and saturation. We experimentally find that this increases robustness to perturbations at testing, even if they are not in the augmentation setPredicting what moved where
Upon detecting whether a face has been modified, a natural question for a viewer is how the image was edited: which parts of an image were warped, and what did the image look like prior to manipulation? To do this, we predict an optical flow field Uˆ ∈ R H×W×2 from the original image Xorig ∈ R H×W×3 to the warped image X, which we then use to try to “reverse” the manipulation and recover the original image.Conclusion
We have presented the first method designed to detect facial warping manipulations and did so using by training a forensics model entirely with images automatically generated from an image editing tool. We showed that our model can outperform human judgments in determining whether images are manipulated, and in many cases is able to predict the local deformation field used to generate the warped images. We see facial warp detection as an important step toward making forensics methods for analyzing images of a human body, and extending these approaches to body manipulations and photometric edits such as skin smoothing are interesting avenues for future work. Moreover, we also see our work as being a step toward toward making forensics tools that learn without labeled data, and which incorporate interactive editing tools into the training process.
Comments