[Pkg-privacy-commits] [Git][pkg-privacy-team/mat2][master] 3 commits: d/mat2.docs: Install docs about implementation and threat model
Georg Faerber
gitlab at salsa.debian.org
Tue Oct 23 18:46:49 BST 2018
Georg Faerber pushed to branch master at Privacy Maintainers / mat2
Commits:
13aae8fb by Georg Faerber at 2018-10-23T16:02:15Z
d/mat2.docs: Install docs about implementation and threat model
- - - - -
86df3b37 by Georg Faerber at 2018-10-23T17:46:00Z
New upstream version 0.5.0
- - - - -
6b351399 by Georg Faerber at 2018-10-23T17:46:08Z
Update upstream source from tag 'upstream/0.5.0'
Update to upstream version '0.5.0'
with Debian dir 73ddf12bc33dc84fdaab912c4cfe1717582a6b20
- - - - -
29 changed files:
- .gitlab-ci.yml
- .pylintrc
- CHANGELOG.md
- CONTRIBUTING.md
- README.md
- data/mat2.png
- data/mat2.svg
- + debian/mat2.docs
- doc/mat2.1
- libmat2/__init__.py
- libmat2/abstract.py
- libmat2/archive.py
- libmat2/audio.py
- + libmat2/exiftool.py
- libmat2/harmless.py
- libmat2/images.py
- libmat2/office.py
- libmat2/parser_factory.py
- libmat2/pdf.py
- libmat2/torrent.py
- + libmat2/video.py
- mat2
- setup.py
- + tests/data/dirty.avi
- tests/data/dirty.flac
- tests/test_climat2.py
- tests/test_corrupted_files.py
- tests/test_libmat2.py
- + tests/test_lightweigh_cleaning.py
Changes:
=====================================
.gitlab-ci.yml
=====================================
@@ -9,7 +9,7 @@ bandit:
script: # TODO: remove B405 and B314
- apt-get -qqy update
- apt-get -qqy install --no-install-recommends python3-bandit
- - bandit ./mat2 --format txt
+ - bandit ./mat2 --format txt --skip B101
- bandit -r ./nautilus/ --format txt --skip B101
- bandit -r ./libmat2 --format txt --skip B101,B404,B603,B405,B314
@@ -42,9 +42,9 @@ tests:debian:
stage: test
script:
- apt-get -qqy update
- - apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage
+ - apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage ffmpeg
- python3-coverage run --branch -m unittest discover -s tests/
- - python3-coverage report -m --include 'libmat2/*'
+ - python3-coverage report --fail-under=100 -m --include 'libmat2/*'
tests:fedora:
image: fedora
@@ -62,5 +62,5 @@ tests:archlinux:
tags:
- whitewhale
script:
- - pacman -Sy --noconfirm python-mutagen python-gobject gdk-pixbuf2 poppler-glib gdk-pixbuf2 python-cairo perl-image-exiftool python-setuptools mailcap
+ - pacman -Sy --noconfirm python-mutagen python-gobject gdk-pixbuf2 poppler-glib gdk-pixbuf2 python-cairo perl-image-exiftool python-setuptools mailcap ffmpeg
- python3 setup.py test
=====================================
.pylintrc
=====================================
@@ -6,11 +6,12 @@ max-locals=20
disable=
fixme,
invalid-name,
+ duplicate-code,
missing-docstring,
protected-access,
- abstract-method,
- wrong-import-position,
- catching-non-exception,
- cell-var-from-loop,
- locally-disabled,
- invalid-sequence-index, # pylint doesn't like things like `Tuple[int, bytes]` in type annotation
+ abstract-method,
+ wrong-import-position,
+ catching-non-exception,
+ cell-var-from-loop,
+ locally-disabled,
+ invalid-sequence-index, # pylint doesn't like things like `Tuple[int, bytes]` in type annotation
=====================================
CHANGELOG.md
=====================================
@@ -1,3 +1,16 @@
+# 0.5.0 - 2018-10-23
+
+- Video (.avi files for now) support, via FFmpeg, optionally
+- Lightweight cleaning for png and tiff files
+- Processing files starting with a dash is now quicker
+- Metadata are now displayed sorted
+- Recursive metadata support for FLAC files
+- Unsupported extensions aren't displayed in `/.mat -l` anymore
+- Improve the display when no metadata are found
+- Update the logo according to the GNOME guidelines
+- The testsuite is now runnable on the installed version of mat2
+- Various internal cleanup/improvements
+
# 0.4.0 - 2018-10-03
- There is now a policy, for advanced users, to deal with unknown embedded fileformats
=====================================
CONTRIBUTING.md
=====================================
@@ -32,5 +32,6 @@ Since MAT2 is written in Python3, please conform as much as possible to the
9. Create the signed tarball with `git archive --format=tar.xz --prefix=mat-$VERSION/ $VERSION > mat-$VERSION.tar.xz`
10. Sign the tarball with `gpg --armor --detach-sign mat-$VERSION.tar.xz`
11. Upload the result on Gitlab's [tag page](https://0xacab.org/jvoisin/mat2/tags) and add the changelog there
-12. Tell the [downstreams](https://0xacab.org/jvoisin/mat2/blob/master/INSTALL.md) about it
-13. Do the secret release dance
+12. Announce the release on the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
+13. Tell the [downstreams](https://0xacab.org/jvoisin/mat2/blob/master/INSTALL.md) about it
+14. Do the secret release dance
=====================================
README.md
=====================================
@@ -30,10 +30,11 @@ metadata.
- `python3-mutagen` for audio support
- `python3-gi-cairo` and `gir1.2-poppler-0.18` for PDF support
- `gir1.2-gdkpixbuf-2.0` for images support
+- `FFmpeg`, optionally, for video support
- `libimage-exiftool-perl` for everything else
Please note that MAT2 requires at least Python3.5, meaning that it
-doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3),
+doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3).
# Running the test suite
=====================================
data/mat2.png
=====================================
Binary files a/data/mat2.png and b/data/mat2.png differ
=====================================
data/mat2.svg
=====================================
The diff for this file was not included because it is too large.
=====================================
debian/mat2.docs
=====================================
@@ -0,0 +1,2 @@
+doc/implementation_notes.md
+doc/threat_model.md
=====================================
doc/mat2.1
=====================================
@@ -1,4 +1,4 @@
-.TH MAT2 "1" "October 2018" "MAT2 0.4.0" "User Commands"
+.TH MAT2 "1" "October 2018" "MAT2 0.5.0" "User Commands"
.SH NAME
mat2 \- the metadata anonymisation toolkit 2
=====================================
libmat2/__init__.py
=====================================
@@ -1,13 +1,15 @@
-#!/bin/env python3
+#!/usr/bin/env python3
-import os
import collections
import enum
import importlib
from typing import Dict, Optional
+from . import exiftool, video
+
# make pyflakes happy
assert Dict
+assert Optional
# A set of extension that aren't supported, despite matching a supported mimetype
UNSUPPORTED_EXTENSIONS = {
@@ -36,24 +38,13 @@ DEPENDENCIES = {
'mutagen': 'Mutagen',
}
-def _get_exiftool_path() -> Optional[str]: # pragma: no cover
- exiftool_path = '/usr/bin/exiftool'
- if os.path.isfile(exiftool_path):
- if os.access(exiftool_path, os.X_OK):
- return exiftool_path
-
- # ArchLinux
- exiftool_path = '/usr/bin/vendor_perl/exiftool'
- if os.path.isfile(exiftool_path):
- if os.access(exiftool_path, os.X_OK):
- return exiftool_path
- return None
-def check_dependencies() -> dict:
+def check_dependencies() -> Dict[str, bool]:
ret = collections.defaultdict(bool) # type: Dict[str, bool]
- ret['Exiftool'] = True if _get_exiftool_path() else False
+ ret['Exiftool'] = True if exiftool._get_exiftool_path() else False
+ ret['Ffmpeg'] = True if video._get_ffmpeg_path() else False
for key, value in DEPENDENCIES.items():
ret[value] = True
=====================================
libmat2/abstract.py
=====================================
@@ -1,13 +1,15 @@
import abc
import os
-from typing import Set, Dict
+import re
+from typing import Set, Dict, Union
assert Set # make pyflakes happy
class AbstractParser(abc.ABC):
""" This is the base class of every parser.
- It might yield `ValueError` on instantiation on invalid files.
+ It might yield `ValueError` on instantiation on invalid files,
+ and `RuntimeError` when something went wrong in `remove_all`.
"""
meta_list = set() # type: Set[str]
mimetypes = set() # type: Set[str]
@@ -16,21 +18,23 @@ class AbstractParser(abc.ABC):
"""
:raises ValueError: Raised upon an invalid file
"""
+ if re.search('^[a-z0-9./]', filename) is None:
+ # Some parsers are calling external binaries,
+ # this prevents shell command injections
+ filename = os.path.join('.', filename)
+
self.filename = filename
fname, extension = os.path.splitext(filename)
self.output_filename = fname + '.cleaned' + extension
+ self.lightweight_cleaning = False
@abc.abstractmethod
- def get_meta(self) -> Dict[str, str]:
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
pass # pragma: no cover
@abc.abstractmethod
def remove_all(self) -> bool:
- pass # pragma: no cover
-
- def remove_all_lightweight(self) -> bool:
- """ This method removes _SOME_ metadata.
- It might be useful to implement it for fileformats that do
- not support non-destructive cleaning.
"""
- return self.remove_all()
+ :raises RuntimeError: Raised if the cleaning process went wrong.
+ """
+ pass # pragma: no cover
=====================================
libmat2/archive.py
=====================================
@@ -4,13 +4,14 @@ import tempfile
import os
import logging
import shutil
-from typing import Dict, Set, Pattern
+from typing import Dict, Set, Pattern, Union
from . import abstract, UnknownMemberPolicy, parser_factory
# Make pyflakes happy
assert Set
assert Pattern
+assert Union
class ArchiveBasedAbstractParser(abstract.AbstractParser):
=====================================
libmat2/audio.py
=====================================
@@ -1,8 +1,12 @@
+import mimetypes
+import os
import shutil
+import tempfile
+from typing import Dict, Union
import mutagen
-from . import abstract
+from . import abstract, parser_factory
class MutagenParser(abstract.AbstractParser):
@@ -13,13 +17,13 @@ class MutagenParser(abstract.AbstractParser):
except mutagen.MutagenError:
raise ValueError
- def get_meta(self):
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
f = mutagen.File(self.filename)
if f.tags:
return {k:', '.join(v) for k, v in f.tags.items()}
return {}
- def remove_all(self):
+ def remove_all(self) -> bool:
shutil.copy(self.filename, self.output_filename)
f = mutagen.File(self.output_filename)
f.delete()
@@ -30,8 +34,8 @@ class MutagenParser(abstract.AbstractParser):
class MP3Parser(MutagenParser):
mimetypes = {'audio/mpeg', }
- def get_meta(self):
- metadata = {}
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
+ metadata = {} # type: Dict[str, Union[str, dict]]
meta = mutagen.File(self.filename).tags
for key in meta:
metadata[key.rstrip(' \t\r\n\0')] = ', '.join(map(str, meta[key].text))
@@ -44,3 +48,30 @@ class OGGParser(MutagenParser):
class FLACParser(MutagenParser):
mimetypes = {'audio/flac', 'audio/x-flac'}
+
+ def remove_all(self) -> bool:
+ shutil.copy(self.filename, self.output_filename)
+ f = mutagen.File(self.output_filename)
+ f.clear_pictures()
+ f.delete()
+ f.save(deleteid3=True)
+ return True
+
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
+ meta = super().get_meta()
+ for num, picture in enumerate(mutagen.File(self.filename).pictures):
+ name = picture.desc if picture.desc else 'Cover %d' % num
+ extension = mimetypes.guess_extension(picture.mime)
+ if extension is None: # pragma: no cover
+ meta[name] = 'harmful data'
+ continue
+
+ _, fname = tempfile.mkstemp()
+ fname = fname + extension
+ with open(fname, 'wb') as f:
+ f.write(picture.data)
+ p, _ = parser_factory.get_parser(fname) # type: ignore
+ # Mypy chokes on ternaries :/
+ meta[name] = p.get_meta() if p else 'harmful data' # type: ignore
+ os.remove(fname)
+ return meta
=====================================
libmat2/exiftool.py
=====================================
@@ -0,0 +1,67 @@
+import json
+import logging
+import os
+import subprocess
+from typing import Dict, Union, Set
+
+from . import abstract
+
+# Make pyflakes happy
+assert Set
+
+
+class ExiftoolParser(abstract.AbstractParser):
+ """ Exiftool is often the easiest way to get all the metadata
+ from a import file, hence why several parsers are re-using its `get_meta`
+ method.
+ """
+ meta_whitelist = set() # type: Set[str]
+
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
+ out = subprocess.check_output([_get_exiftool_path(), '-json', self.filename])
+ meta = json.loads(out.decode('utf-8'))[0]
+ for key in self.meta_whitelist:
+ meta.pop(key, None)
+ return meta
+
+ def _lightweight_cleanup(self) -> bool:
+ if os.path.exists(self.output_filename):
+ try:
+ # exiftool can't force output to existing files
+ os.remove(self.output_filename)
+ except OSError as e: # pragma: no cover
+ logging.error("The output file %s is already existing and \
+ can't be overwritten: %s.", self.filename, e)
+ return False
+
+ # Note: '-All=' must be followed by a known exiftool option.
+ # Also, '-CommonIFD0' is needed for .tiff files
+ cmd = [_get_exiftool_path(),
+ '-all=', # remove metadata
+ '-adobe=', # remove adobe-specific metadata
+ '-exif:all=', # remove all exif metadata
+ '-Time:All=', # remove all timestamps
+ '-quiet', # don't show useless logs
+ '-CommonIFD0=', # remove IFD0 metadata
+ '-o', self.output_filename,
+ self.filename]
+ try:
+ subprocess.check_call(cmd)
+ except subprocess.CalledProcessError as e: # pragma: no cover
+ logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
+ return False
+ return True
+
+def _get_exiftool_path() -> str: # pragma: no cover
+ exiftool_path = '/usr/bin/exiftool'
+ if os.path.isfile(exiftool_path):
+ if os.access(exiftool_path, os.X_OK):
+ return exiftool_path
+
+ # ArchLinux
+ exiftool_path = '/usr/bin/vendor_perl/exiftool'
+ if os.path.isfile(exiftool_path):
+ if os.access(exiftool_path, os.X_OK):
+ return exiftool_path
+
+ raise RuntimeError("Unable to find exiftool")
=====================================
libmat2/harmless.py
=====================================
@@ -1,5 +1,5 @@
import shutil
-from typing import Dict
+from typing import Dict, Union
from . import abstract
@@ -7,7 +7,7 @@ class HarmlessParser(abstract.AbstractParser):
""" This is the parser for filetypes that can not contain metadata. """
mimetypes = {'text/plain', 'image/x-ms-bmp'}
- def get_meta(self) -> Dict[str, str]:
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
return dict()
def remove_all(self) -> bool:
=====================================
libmat2/images.py
=====================================
@@ -1,10 +1,5 @@
-import subprocess
import imghdr
-import json
import os
-import shutil
-import tempfile
-import re
from typing import Set
import cairo
@@ -13,44 +8,12 @@ import gi
gi.require_version('GdkPixbuf', '2.0')
from gi.repository import GdkPixbuf
-from . import abstract, _get_exiftool_path
+from . import exiftool
# Make pyflakes happy
assert Set
-class _ImageParser(abstract.AbstractParser):
- """ Since we use `exiftool` to get metadata from
- all images fileformat, `get_meta` is implemented in this class,
- and all the image-handling ones are inheriting from it."""
- meta_whitelist = set() # type: Set[str]
-
- @staticmethod
- def __handle_problematic_filename(filename: str, callback) -> str:
- """ This method takes a filename with a problematic name,
- and safely applies it a `callback`."""
- tmpdirname = tempfile.mkdtemp()
- fname = os.path.join(tmpdirname, "temp_file")
- shutil.copy(filename, fname)
- out = callback(fname)
- shutil.rmtree(tmpdirname)
- return out
-
- def get_meta(self):
- """ There is no way to escape the leading(s) dash(es) of the current
- self.filename to prevent parameter injections, so we need to take care
- of this.
- """
- fun = lambda f: subprocess.check_output([_get_exiftool_path(), '-json', f])
- if re.search('^[a-z0-9/]', self.filename) is None:
- out = self.__handle_problematic_filename(self.filename, fun)
- else:
- out = fun(self.filename)
- meta = json.loads(out.decode('utf-8'))[0]
- for key in self.meta_whitelist:
- meta.pop(key, None)
- return meta
-
-class PNGParser(_ImageParser):
+class PNGParser(exiftool.ExiftoolParser):
mimetypes = {'image/png', }
meta_whitelist = {'SourceFile', 'ExifToolVersion', 'FileName',
'Directory', 'FileSize', 'FileModifyDate',
@@ -71,19 +34,26 @@ class PNGParser(_ImageParser):
except MemoryError: # pragma: no cover
raise ValueError
- def remove_all(self):
+ def remove_all(self) -> bool:
+ if self.lightweight_cleaning:
+ return self._lightweight_cleanup()
surface = cairo.ImageSurface.create_from_png(self.filename)
surface.write_to_png(self.output_filename)
return True
-class GdkPixbufAbstractParser(_ImageParser):
+class GdkPixbufAbstractParser(exiftool.ExiftoolParser):
""" GdkPixbuf can handle a lot of surfaces, so we're rending images on it,
this has the side-effect of completely removing metadata.
"""
_type = ''
- def remove_all(self):
+ def __init__(self, filename):
+ super().__init__(filename)
+ if imghdr.what(filename) != self._type: # better safe than sorry
+ raise ValueError
+
+ def remove_all(self) -> bool:
_, extension = os.path.splitext(self.filename)
pixbuf = GdkPixbuf.Pixbuf.new_from_file(self.filename)
if extension.lower() == '.jpg':
@@ -91,11 +61,6 @@ class GdkPixbufAbstractParser(_ImageParser):
pixbuf.savev(self.output_filename, extension[1:], [], [])
return True
- def __init__(self, filename):
- super().__init__(filename)
- if imghdr.what(filename) != self._type: # better safe than sorry
- raise ValueError
-
class JPGParser(GdkPixbufAbstractParser):
_type = 'jpeg'
=====================================
libmat2/office.py
=====================================
@@ -2,7 +2,7 @@ import logging
import os
import re
import zipfile
-from typing import Dict, Set, Pattern
+from typing import Dict, Set, Pattern, Tuple, Union
import xml.etree.ElementTree as ET # type: ignore
@@ -14,9 +14,8 @@ from .archive import ArchiveBasedAbstractParser
assert Set
assert Pattern
-def _parse_xml(full_path: str):
+def _parse_xml(full_path: str) -> Tuple[ET.ElementTree, Dict[str, str]]:
""" This function parses XML, with namespace support. """
-
namespace_map = dict()
for _, (key, value) in ET.iterparse(full_path, ("start-ns", )):
# The ns[0-9]+ namespaces are reserved for internal usage, so
@@ -88,6 +87,7 @@ class MSOfficeParser(ArchiveBasedAbstractParser):
r'^docProps/custom\.xml$',
r'^word/printerSettings/',
r'^word/theme',
+ r'^word/people\.xml$',
# we have a whitelist in self.files_to_keep,
# so we can trash everything else
@@ -182,20 +182,20 @@ class MSOfficeParser(ArchiveBasedAbstractParser):
parent_map = {c:p for p in tree.iter() for c in p}
- elements = list()
+ elements_del = list()
for element in tree.iterfind('.//w:del', namespace):
- elements.append(element)
- for element in elements:
+ elements_del.append(element)
+ for element in elements_del:
parent_map[element].remove(element)
- elements = list()
+ elements_ins = list()
for element in tree.iterfind('.//w:ins', namespace):
for position, item in enumerate(tree.iter()): # pragma: no cover
if item == element:
for children in element.iterfind('./*'):
- elements.append((element, position, children))
+ elements_ins.append((element, position, children))
break
- for (element, position, children) in elements:
+ for (element, position, children) in elements_ins:
parent_map[element].insert(position, children)
parent_map[element].remove(element)
@@ -296,7 +296,7 @@ class MSOfficeParser(ArchiveBasedAbstractParser):
return True
- def get_meta(self) -> Dict[str, str]:
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
"""
Yes, I know that parsing xml with regexp ain't pretty,
be my guest and fix it if you want.
@@ -381,7 +381,7 @@ class LibreOfficeParser(ArchiveBasedAbstractParser):
return False
return True
- def get_meta(self) -> Dict[str, str]:
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
"""
Yes, I know that parsing xml with regexp ain't pretty,
be my guest and fix it if you want.
=====================================
libmat2/parser_factory.py
=====================================
@@ -18,6 +18,8 @@ def __load_all_parsers():
continue
elif fname.endswith('__init__.py'):
continue
+ elif fname.endswith('exiftool.py'):
+ continue
basename = os.path.basename(fname)
name, _ = os.path.splitext(basename)
importlib.import_module('.' + name, package='libmat2')
@@ -33,6 +35,7 @@ def _get_parsers() -> List[T]:
def get_parser(filename: str) -> Tuple[Optional[T], Optional[str]]:
+ """ Return the appropriate parser for a giver filename. """
mtype, _ = mimetypes.guess_type(filename)
_, extension = os.path.splitext(filename)
=====================================
libmat2/pdf.py
=====================================
@@ -7,6 +7,7 @@ import re
import logging
import tempfile
import io
+from typing import Dict, Union
from distutils.version import LooseVersion
import cairo
@@ -37,7 +38,12 @@ class PDFParser(abstract.AbstractParser):
except GLib.GError: # Invalid PDF
raise ValueError
- def remove_all_lightweight(self):
+ def remove_all(self) -> bool:
+ if self.lightweight_cleaning is True:
+ return self.__remove_all_lightweight()
+ return self.__remove_all_thorough()
+
+ def __remove_all_lightweight(self) -> bool:
"""
Load the document into Poppler, render pages on a new PDFSurface.
"""
@@ -64,7 +70,7 @@ class PDFParser(abstract.AbstractParser):
return True
- def remove_all(self):
+ def __remove_all_thorough(self) -> bool:
"""
Load the document into Poppler, render pages on PNG,
and shove those PNG into a new PDF.
@@ -119,13 +125,13 @@ class PDFParser(abstract.AbstractParser):
return True
@staticmethod
- def __parse_metadata_field(data: str) -> dict:
+ def __parse_metadata_field(data: str) -> Dict[str, str]:
metadata = {}
for (_, key, value) in re.findall(r"<(xmp|pdfx|pdf|xmpMM):(.+)>(.+)</\1:\2>", data, re.I):
metadata[key] = value
return metadata
- def get_meta(self):
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
""" Return a dict with all the meta of the file
"""
metadata = {}
=====================================
libmat2/torrent.py
=====================================
@@ -14,7 +14,7 @@ class TorrentParser(abstract.AbstractParser):
if self.dict_repr is None:
raise ValueError
- def get_meta(self) -> Dict[str, str]:
+ def get_meta(self) -> Dict[str, Union[str, dict]]:
metadata = {}
for key, value in self.dict_repr.items():
if key not in self.whitelist:
=====================================
libmat2/video.py
=====================================
@@ -0,0 +1,54 @@
+import os
+import subprocess
+import logging
+
+from . import exiftool
+
+
+class AVIParser(exiftool.ExiftoolParser):
+ mimetypes = {'video/x-msvideo', }
+ meta_whitelist = {'SourceFile', 'ExifToolVersion', 'FileName', 'Directory',
+ 'FileSize', 'FileModifyDate', 'FileAccessDate',
+ 'FileInodeChangeDate', 'FilePermissions', 'FileType',
+ 'FileTypeExtension', 'MIMEType', 'FrameRate', 'MaxDataRate',
+ 'FrameCount', 'StreamCount', 'StreamType', 'VideoCodec',
+ 'VideoFrameRate', 'VideoFrameCount', 'Quality',
+ 'SampleSize', 'BMPVersion', 'ImageWidth', 'ImageHeight',
+ 'Planes', 'BitDepth', 'Compression', 'ImageLength',
+ 'PixelsPerMeterX', 'PixelsPerMeterY', 'NumColors',
+ 'NumImportantColors', 'NumColors', 'NumImportantColors',
+ 'RedMask', 'GreenMask', 'BlueMask', 'AlphaMask',
+ 'ColorSpace', 'AudioCodec', 'AudioCodecRate',
+ 'AudioSampleCount', 'AudioSampleCount',
+ 'AudioSampleRate', 'Encoding', 'NumChannels',
+ 'SampleRate', 'AvgBytesPerSec', 'BitsPerSample',
+ 'Duration', 'ImageSize', 'Megapixels'}
+
+ def remove_all(self) -> bool:
+ cmd = [_get_ffmpeg_path(),
+ '-i', self.filename, # input file
+ '-y', # overwrite existing output file
+ '-loglevel', 'panic', # Don't show log
+ '-hide_banner', # hide the banner
+ '-codec', 'copy', # don't decode anything, just copy (speed!)
+ '-map_metadata', '-1', # remove supperficial metadata
+ '-map_chapters', '-1', # remove chapters
+ '-fflags', '+bitexact', # don't add any metadata
+ '-flags:v', '+bitexact', # don't add any metadata
+ '-flags:a', '+bitexact', # don't add any metadata
+ self.output_filename]
+ try:
+ subprocess.check_call(cmd)
+ except subprocess.CalledProcessError as e:
+ logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
+ return False
+ return True
+
+
+def _get_ffmpeg_path() -> str: # pragma: no cover
+ ffmpeg_path = '/usr/bin/ffmpeg'
+ if os.path.isfile(ffmpeg_path):
+ if os.access(ffmpeg_path, os.X_OK):
+ return ffmpeg_path
+
+ raise RuntimeError("Unable to find ffmpeg")
=====================================
mat2
=====================================
@@ -1,7 +1,7 @@
#!/usr/bin/env python3
import os
-from typing import Tuple
+from typing import Tuple, Generator, List, Union
import sys
import mimetypes
import argparse
@@ -14,7 +14,12 @@ except ValueError as e:
print(e)
sys.exit(1)
-__version__ = '0.4.0'
+__version__ = '0.5.0'
+
+# Make pyflakes happy
+assert Tuple
+assert Union
+
def __check_file(filename: str, mode: int=os.R_OK) -> bool:
if not os.path.exists(filename):
@@ -29,7 +34,7 @@ def __check_file(filename: str, mode: int=os.R_OK) -> bool:
return True
-def create_arg_parser():
+def create_arg_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description='Metadata anonymisation toolkit 2')
parser.add_argument('files', nargs='*', help='the files to process')
parser.add_argument('-v', '--version', action='version',
@@ -61,16 +66,28 @@ def show_meta(filename: str):
if p is None:
print("[-] %s's format (%s) is not supported" % (filename, mtype))
return
+ __print_meta(filename, p.get_meta())
+
+
+def __print_meta(filename: str, metadata: dict, depth: int=1):
+ padding = " " * depth*2
+ if not metadata:
+ print(padding + "No metadata found")
+ return
- print("[+] Metadata for %s:" % filename)
- for k, v in p.get_meta().items():
+ print("[%s] Metadata for %s:" % ('+'*depth, filename))
+
+ for (k, v) in sorted(metadata.items()):
+ if isinstance(v, dict):
+ __print_meta(k, v, depth+1)
+ continue
try: # FIXME this is ugly.
- print(" %s: %s" % (k, v))
+ print(padding + " %s: %s" % (k, v))
except UnicodeEncodeError:
- print(" %s: harmful content" % k)
+ print(padding + " %s: harmful content" % k)
+
-def clean_meta(params: Tuple[str, bool, UnknownMemberPolicy]) -> bool:
- filename, is_lightweight, unknown_member_policy = params
+def clean_meta(filename: str, is_lightweight: bool, policy: UnknownMemberPolicy) -> bool:
if not __check_file(filename, os.R_OK|os.W_OK):
return False
@@ -78,30 +95,36 @@ def clean_meta(params: Tuple[str, bool, UnknownMemberPolicy]) -> bool:
if p is None:
print("[-] %s's format (%s) is not supported" % (filename, mtype))
return False
- p.unknown_member_policy = unknown_member_policy
- if is_lightweight:
- return p.remove_all_lightweight()
- return p.remove_all()
+ p.unknown_member_policy = policy
+ p.lightweight_cleaning = is_lightweight
+
+ try:
+ return p.remove_all()
+ except RuntimeError as e:
+ print("[-] %s can't be cleaned: %s" % (filename, e))
+ return False
-def show_parsers():
+
+def show_parsers() -> bool:
print('[+] Supported formats:')
- formats = list()
- for parser in parser_factory._get_parsers():
+ formats = set() # Set[str]
+ for parser in parser_factory._get_parsers(): # type: ignore
for mtype in parser.mimetypes:
- extensions = set()
+ extensions = set() # Set[str]
for extension in mimetypes.guess_all_extensions(mtype):
- if extension[1:] not in UNSUPPORTED_EXTENSIONS: # skip the dot
+ if extension not in UNSUPPORTED_EXTENSIONS:
extensions.add(extension)
if not extensions:
# we're not supporting a single extension in the current
# mimetype, so there is not point in showing the mimetype at all
continue
- formats.append(' - %s (%s)' % (mtype, ', '.join(extensions)))
+ formats.add(' - %s (%s)' % (mtype, ', '.join(extensions)))
print('\n'.join(sorted(formats)))
+ return True
-def __get_files_recursively(files):
+def __get_files_recursively(files: List[str]) -> Generator[str, None, None]:
for f in files:
if os.path.isdir(f):
for path, _, _files in os.walk(f):
@@ -112,7 +135,7 @@ def __get_files_recursively(files):
elif __check_file(f):
yield f
-def main():
+def main() -> int:
arg_parser = create_arg_parser()
args = arg_parser.parse_args()
@@ -121,13 +144,13 @@ def main():
if not args.files:
if args.list:
- show_parsers()
+ return show_parsers()
elif args.check_dependencies:
print("Dependencies required for MAT2 %s:" % __version__)
for key, value in sorted(check_dependencies().items()):
print('- %s: %s' % (key, 'yes' if value else 'no'))
else:
- return arg_parser.print_help()
+ arg_parser.print_help()
return 0
elif args.show:
@@ -136,13 +159,13 @@ def main():
return 0
else:
- unknown_member_policy = UnknownMemberPolicy(args.unknown_members)
- if unknown_member_policy == UnknownMemberPolicy.KEEP:
+ policy = UnknownMemberPolicy(args.unknown_members)
+ if policy == UnknownMemberPolicy.KEEP:
logging.warning('Keeping unknown member files may leak metadata in the resulting file!')
no_failure = True
for f in __get_files_recursively(args.files):
- if clean_meta([f, args.lightweight, unknown_member_policy]) is False:
+ if clean_meta(f, args.lightweight, policy) is False:
no_failure = False
return 0 if no_failure is True else -1
=====================================
setup.py
=====================================
@@ -5,7 +5,7 @@ with open("README.md", "r") as fh:
setuptools.setup(
name="mat2",
- version='0.4.0',
+ version='0.5.0',
author="Julien (jvoisin) Voisin",
author_email="julien.voisin+mat2 at dustri.org",
description="A handy tool to trash your metadata",
=====================================
tests/data/dirty.avi
=====================================
Binary files /dev/null and b/tests/data/dirty.avi differ
=====================================
tests/data/dirty.flac
=====================================
Binary files a/tests/data/dirty.flac and b/tests/data/dirty.flac differ
=====================================
tests/test_climat2.py
=====================================
@@ -4,16 +4,24 @@ import subprocess
import unittest
+mat2_binary = ['./mat2']
+
+if 'MAT2_GLOBAL_PATH_TESTSUITE' in os.environ:
+ # Debian runs tests after installing the package
+ # https://0xacab.org/jvoisin/mat2/issues/16#note_153878
+ mat2_binary = ['/usr/bin/env', 'mat2']
+
+
class TestHelp(unittest.TestCase):
def test_help(self):
- proc = subprocess.Popen(['./mat2', '--help'], stdout=subprocess.PIPE)
+ proc = subprocess.Popen(mat2_binary + ['--help'], stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]',
stdout)
self.assertIn(b'[--unknown-members policy] [-s | -L]', stdout)
def test_no_arg(self):
- proc = subprocess.Popen(['./mat2'], stdout=subprocess.PIPE)
+ proc = subprocess.Popen(mat2_binary, stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]',
stdout)
@@ -22,29 +30,29 @@ class TestHelp(unittest.TestCase):
class TestVersion(unittest.TestCase):
def test_version(self):
- proc = subprocess.Popen(['./mat2', '--version'], stdout=subprocess.PIPE)
+ proc = subprocess.Popen(mat2_binary + ['--version'], stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertTrue(stdout.startswith(b'MAT2 '))
class TestDependencies(unittest.TestCase):
def test_dependencies(self):
- proc = subprocess.Popen(['./mat2', '--check-dependencies'], stdout=subprocess.PIPE)
+ proc = subprocess.Popen(mat2_binary + ['--check-dependencies'], stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertTrue(b'MAT2' in stdout)
class TestReturnValue(unittest.TestCase):
def test_nonzero(self):
- ret = subprocess.call(['./mat2', './mat2'], stdout=subprocess.DEVNULL)
+ ret = subprocess.call(mat2_binary + ['mat2'], stdout=subprocess.DEVNULL)
self.assertEqual(255, ret)
- ret = subprocess.call(['./mat2', '--whololo'], stderr=subprocess.DEVNULL)
+ ret = subprocess.call(mat2_binary + ['--whololo'], stderr=subprocess.DEVNULL)
self.assertEqual(2, ret)
def test_zero(self):
- ret = subprocess.call(['./mat2'], stdout=subprocess.DEVNULL)
+ ret = subprocess.call(mat2_binary, stdout=subprocess.DEVNULL)
self.assertEqual(0, ret)
- ret = subprocess.call(['./mat2', '--show', './mat2'], stdout=subprocess.DEVNULL)
+ ret = subprocess.call(mat2_binary + ['--show', 'mat2'], stdout=subprocess.DEVNULL)
self.assertEqual(0, ret)
@@ -57,22 +65,23 @@ class TestCleanFolder(unittest.TestCase):
shutil.copy('./tests/data/dirty.jpg', './tests/data/folder/clean1.jpg')
shutil.copy('./tests/data/dirty.jpg', './tests/data/folder/clean2.jpg')
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/folder/'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/folder/'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: Created with GIMP', stdout)
- proc = subprocess.Popen(['./mat2', './tests/data/folder/'],
+ proc = subprocess.Popen(mat2_binary + ['./tests/data/folder/'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
os.remove('./tests/data/folder/clean1.jpg')
os.remove('./tests/data/folder/clean2.jpg')
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/folder/'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/folder/'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertNotIn(b'Comment: Created with GIMP', stdout)
+ self.assertIn(b'No metadata found', stdout)
shutil.rmtree('./tests/data/folder/')
@@ -81,16 +90,16 @@ class TestCleanMeta(unittest.TestCase):
def test_jpg(self):
shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/clean.jpg'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: Created with GIMP', stdout)
- proc = subprocess.Popen(['./mat2', './tests/data/clean.jpg'],
+ proc = subprocess.Popen(mat2_binary + ['./tests/data/clean.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/clean.cleaned.jpg'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.cleaned.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertNotIn(b'Comment: Created with GIMP', stdout)
@@ -100,32 +109,34 @@ class TestCleanMeta(unittest.TestCase):
class TestIsSupported(unittest.TestCase):
def test_pdf(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.pdf'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.pdf'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertNotIn(b"isn't supported", stdout)
class TestGetMeta(unittest.TestCase):
+ maxDiff = None
+
def test_pdf(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.pdf'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.pdf'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'producer: pdfTeX-1.40.14', stdout)
def test_png(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.png'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.png'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: This is a comment, be careful!', stdout)
def test_jpg(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.jpg'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: Created with GIMP', stdout)
def test_docx(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.docx'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.docx'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Application: LibreOffice/5.4.5.1$Linux_X86_64', stdout)
@@ -133,7 +144,7 @@ class TestGetMeta(unittest.TestCase):
self.assertIn(b'revision: 1', stdout)
def test_odt(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.odt'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.odt'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'generator: LibreOffice/3.3$Unix', stdout)
@@ -141,22 +152,22 @@ class TestGetMeta(unittest.TestCase):
self.assertIn(b'date_time: 2011-07-26 02:40:16', stdout)
def test_mp3(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.mp3'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.mp3'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'TALB: harmfull', stdout)
self.assertIn(b'COMM::: Thank you for using MAT !', stdout)
def test_flac(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.flac'],
- stdout=subprocess.PIPE)
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.flac'],
+ stdout=subprocess.PIPE, bufsize=0)
stdout, _ = proc.communicate()
self.assertIn(b'comments: Thank you for using MAT !', stdout)
self.assertIn(b'genre: Python', stdout)
self.assertIn(b'title: I am so', stdout)
def test_ogg(self):
- proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.ogg'],
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.ogg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'comments: Thank you for using MAT !', stdout)
=====================================
tests/test_corrupted_files.py
=====================================
@@ -5,7 +5,8 @@ import shutil
import os
import logging
-from libmat2 import pdf, images, audio, office, parser_factory, torrent, harmless
+from libmat2 import pdf, images, audio, office, parser_factory, torrent
+from libmat2 import harmless, video
# No need to logging messages, should something go wrong,
# the testsuite _will_ fail.
@@ -192,3 +193,32 @@ class TestCorruptedFiles(unittest.TestCase):
with self.assertRaises(ValueError):
images.JPGParser('./tests/data/clean.jpg')
os.remove('./tests/data/clean.jpg')
+
+ def test_png_lightweight(self):
+ return
+ shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.png')
+ p = images.PNGParser('./tests/data/clean.png')
+ self.assertTrue(p.remove_all())
+ os.remove('./tests/data/clean.png')
+
+ def test_avi(self):
+ try:
+ video._get_ffmpeg_path()
+ except RuntimeError:
+ raise unittest.SkipTest
+
+ shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.avi')
+ p = video.AVIParser('./tests/data/clean.avi')
+ self.assertFalse(p.remove_all())
+ os.remove('./tests/data/clean.avi')
+
+ def test_avi_injection(self):
+ try:
+ video._get_ffmpeg_path()
+ except RuntimeError:
+ raise unittest.SkipTest
+
+ shutil.copy('./tests/data/dirty.torrent', './tests/data/--output.avi')
+ p = video.AVIParser('./tests/data/--output.avi')
+ self.assertFalse(p.remove_all())
+ os.remove('./tests/data/--output.avi')
=====================================
tests/test_libmat2.py
=====================================
@@ -6,12 +6,16 @@ import os
import zipfile
from libmat2 import pdf, images, audio, office, parser_factory, torrent, harmless
-from libmat2 import check_dependencies
+from libmat2 import check_dependencies, video
class TestCheckDependencies(unittest.TestCase):
def test_deps(self):
- ret = check_dependencies()
+ try:
+ ret = check_dependencies()
+ except RuntimeError:
+ return # this happens if not every dependency is installed
+
for value in ret.values():
self.assertTrue(value)
@@ -33,6 +37,32 @@ class TestParameterInjection(unittest.TestCase):
self.assertEqual(meta['ModifyDate'], "2018:03:20 21:59:25")
os.remove('-ver')
+ def test_ffmpeg_injection(self):
+ try:
+ video._get_ffmpeg_path()
+ except RuntimeError:
+ raise unittest.SkipTest
+
+ shutil.copy('./tests/data/dirty.avi', './--output')
+ p = video.AVIParser('--output')
+ meta = p.get_meta()
+ self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
+ os.remove('--output')
+
+ def test_ffmpeg_injection_complete_path(self):
+ try:
+ video._get_ffmpeg_path()
+ except RuntimeError:
+ raise unittest.SkipTest
+
+ shutil.copy('./tests/data/dirty.avi', './tests/data/ --output.avi')
+ p = video.AVIParser('./tests/data/ --output.avi')
+ meta = p.get_meta()
+ self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
+ self.assertTrue(p.remove_all())
+ os.remove('./tests/data/ --output.avi')
+ os.remove('./tests/data/ --output.cleaned.avi')
+
class TestUnsupportedEmbeddedFiles(unittest.TestCase):
def test_odt_with_svg(self):
@@ -96,6 +126,7 @@ class TestGetMeta(unittest.TestCase):
p = audio.FLACParser('./tests/data/dirty.flac')
meta = p.get_meta()
self.assertEqual(meta['title'], 'I am so')
+ self.assertEqual(meta['Cover 0'], {'Comment': 'Created with GIMP'})
def test_docx(self):
p = office.MSOfficeParser('./tests/data/dirty.docx')
@@ -181,40 +212,6 @@ class TestRevisionsCleaning(unittest.TestCase):
os.remove('./tests/data/revision_clean.docx')
os.remove('./tests/data/revision_clean.cleaned.docx')
-class TestLightWeightCleaning(unittest.TestCase):
- def test_pdf(self):
- shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
- p = pdf.PDFParser('./tests/data/clean.pdf')
-
- meta = p.get_meta()
- self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
-
- ret = p.remove_all_lightweight()
- self.assertTrue(ret)
-
- p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
- expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
- self.assertEqual(p.get_meta(), expected_meta)
-
- os.remove('./tests/data/clean.pdf')
- os.remove('./tests/data/clean.cleaned.pdf')
-
- def test_png(self):
- shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
- p = images.PNGParser('./tests/data/clean.png')
-
- meta = p.get_meta()
- self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
-
- ret = p.remove_all_lightweight()
- self.assertTrue(ret)
-
- p = images.PNGParser('./tests/data/clean.cleaned.png')
- self.assertEqual(p.get_meta(), {})
-
- os.remove('./tests/data/clean.png')
- os.remove('./tests/data/clean.cleaned.png')
-
class TestCleaning(unittest.TestCase):
def test_pdf(self):
shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
@@ -468,3 +465,26 @@ class TestCleaning(unittest.TestCase):
os.remove('./tests/data/clean.txt')
os.remove('./tests/data/clean.cleaned.txt')
os.remove('./tests/data/clean.cleaned.cleaned.txt')
+
+ def test_avi(self):
+ try:
+ video._get_ffmpeg_path()
+ except RuntimeError:
+ raise unittest.SkipTest
+
+ shutil.copy('./tests/data/dirty.avi', './tests/data/clean.avi')
+ p = video.AVIParser('./tests/data/clean.avi')
+
+ meta = p.get_meta()
+ self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
+
+ ret = p.remove_all()
+ self.assertTrue(ret)
+
+ p = video.AVIParser('./tests/data/clean.cleaned.avi')
+ self.assertEqual(p.get_meta(), {})
+ self.assertTrue(p.remove_all())
+
+ os.remove('./tests/data/clean.avi')
+ os.remove('./tests/data/clean.cleaned.avi')
+ os.remove('./tests/data/clean.cleaned.cleaned.avi')
=====================================
tests/test_lightweigh_cleaning.py
=====================================
@@ -0,0 +1,65 @@
+#!/usr/bin/env python3
+
+import unittest
+import shutil
+import os
+
+from libmat2 import pdf, images
+
+class TestLightWeightCleaning(unittest.TestCase):
+ def test_pdf(self):
+ shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
+ p = pdf.PDFParser('./tests/data/clean.pdf')
+
+ meta = p.get_meta()
+ self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
+
+ p.lightweight_cleaning = True
+ ret = p.remove_all()
+ self.assertTrue(ret)
+
+ p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
+ expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
+ self.assertEqual(p.get_meta(), expected_meta)
+
+ os.remove('./tests/data/clean.pdf')
+ os.remove('./tests/data/clean.cleaned.pdf')
+
+ def test_png(self):
+ shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
+ p = images.PNGParser('./tests/data/clean.png')
+
+ meta = p.get_meta()
+ self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
+
+ p.lightweight_cleaning = True
+ ret = p.remove_all()
+ self.assertTrue(ret)
+
+ p = images.PNGParser('./tests/data/clean.cleaned.png')
+ self.assertEqual(p.get_meta(), {})
+
+ p = images.PNGParser('./tests/data/clean.png')
+ p.lightweight_cleaning = True
+ ret = p.remove_all()
+ self.assertTrue(ret)
+
+ os.remove('./tests/data/clean.png')
+ os.remove('./tests/data/clean.cleaned.png')
+
+ def test_jpg(self):
+ shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
+ p = images.JPGParser('./tests/data/clean.jpg')
+
+ meta = p.get_meta()
+ self.assertEqual(meta['Comment'], 'Created with GIMP')
+
+ p.lightweight_cleaning = True
+ ret = p.remove_all()
+ self.assertTrue(ret)
+
+ p = images.JPGParser('./tests/data/clean.cleaned.jpg')
+ self.assertEqual(p.get_meta(), {})
+
+ os.remove('./tests/data/clean.jpg')
+ os.remove('./tests/data/clean.cleaned.jpg')
View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/compare/54f7c9b83dad0826134713ca25d482db55b11287...6b3513999636ea4becf633412c3f6f1847dbb39f
--
View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/compare/54f7c9b83dad0826134713ca25d482db55b11287...6b3513999636ea4becf633412c3f6f1847dbb39f
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-privacy-commits/attachments/20181023/7f853dc6/attachment-0001.html>
More information about the Pkg-privacy-commits
mailing list