The Deutsche Welle Image Forensics Dataset
- Markos Zampoglou
- On November 11, 2015
Journalists make use of Social Media ever more to find breaking news or to find background information to news. And journalists like to make use of images, because we all know that images say more than a thousand words. But what if those images found in Social Networks are manipulated? Nowadays applications like Photoshop allow us to easily manipulate images, be it for propaganda reasons or just for fun.
One of the questions we ask ourselves in the REVEAL project is: can algorithms tell if an image has been manipulated? And if so, can algorithms also tell us to what extent the image has been tampered with? In order to foster the research in the REVEAL project, the Deutsche Welle Innovation team has created a dataset of manipulated images.
The Deutsche Welle Image Forensics Dataset is a small but highly structured dataset of forged images. It contains six base images, all created by Ruben Bouwmeester: two are captured with a semi-professional DSLR camera, two are captured using a smartphone and two are downloaded from the DW Innovation Flickr account.
For each image, seven operations were performed. These were: JPEG resave, copy-move forgery, a second copy move forgery on top of the first one, cropping, brightness/contrast adjustment, filtering and modifying the image metadata. With the exception of the second copy-move attack, all other operations were conducted on the original source images. The dataset also contains a version of the image that was uploaded to Deutsche Welle using Bambuser and consecutively re-downloaded. However, this file is entirely identical to the original, and thus no further analysis can be conducted on it.
How to handle high quality?
Besides the detailed documentation, specifying the exact parameters of each operation, the dataset has a second characteristic of interest, which is the high quality of the images. Especially the ones captured with the semi-professional camera have a resolution of 15MP and are compressed at JPEG quality 100. These parameters are quite different to what many forensic algorithms require in order to operate (for example, Double JPEG Quantization splicing localization may not work, since no quantization takes place at quality 100). Another interesting aspect of the dataset is the fact that it contains multiple different modifications of the same source images, allowing for direct comparison of the effects of each operation to the images.
Sharing is caring
The dataset and documentation are free to download for research purposes (169Mb). In the .zip file you will find a file structure by case (6 cases) and you will find ground-truth binary masks for the “manip” and “manip+” cases showing where the actual tampering took place. We have also added a folder for you, researchers, to put any binary maps produced by your algorithms, and we have provided two evaluation script files (for MATLAB) to compare these maps to the ground truth and get a metric for success rates. A README file gives an overview of the evaluation framework. Note: all images are copyrighted by Ruben Bouwmeester. Any use of the images should acknowledge the creator.
We are looking forward to hearing about your findings, please share them with us! For any research and evaluation related questions please send me an email on firstname.lastname@example.org. For any other business please contact us via Twitter: @RevealEU.