[Pkg-privacy-commits] [Git][pkg-privacy-team/mat2][upstream] New upstream version 0.5.0

Tue Oct 23 18:47:44 BST 2018

Georg Faerber pushed to branch upstream at Privacy Maintainers / mat2


Commits:
86df3b37 by Georg Faerber at 2018-10-23T17:46:00Z
New upstream version 0.5.0
- - - - -


28 changed files:

- .gitlab-ci.yml
- .pylintrc
- CHANGELOG.md
- CONTRIBUTING.md
- README.md
- data/mat2.png
- data/mat2.svg
- doc/mat2.1
- libmat2/__init__.py
- libmat2/abstract.py
- libmat2/archive.py
- libmat2/audio.py
- + libmat2/exiftool.py
- libmat2/harmless.py
- libmat2/images.py
- libmat2/office.py
- libmat2/parser_factory.py
- libmat2/pdf.py
- libmat2/torrent.py
- + libmat2/video.py
- mat2
- setup.py
- + tests/data/dirty.avi
- tests/data/dirty.flac
- tests/test_climat2.py
- tests/test_corrupted_files.py
- tests/test_libmat2.py
- + tests/test_lightweigh_cleaning.py


Changes:

=====================================
.gitlab-ci.yml
=====================================
@@ -9,7 +9,7 @@ bandit:
   script:  # TODO: remove B405 and B314
   - apt-get -qqy update
   - apt-get -qqy install --no-install-recommends python3-bandit
-  - bandit ./mat2 --format txt
+  - bandit ./mat2 --format txt --skip B101
   - bandit -r ./nautilus/ --format txt --skip B101
   - bandit -r ./libmat2 --format txt --skip B101,B404,B603,B405,B314
 
@@ -42,9 +42,9 @@ tests:debian:
   stage: test
   script:
   - apt-get -qqy update
-  - apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage
+  - apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage ffmpeg
   - python3-coverage run --branch -m unittest discover -s tests/
-  - python3-coverage report -m --include 'libmat2/*'
+  - python3-coverage report --fail-under=100 -m --include 'libmat2/*'
 
 tests:fedora:
   image: fedora
@@ -62,5 +62,5 @@ tests:archlinux:
   tags:
     - whitewhale
   script:
-  - pacman -Sy --noconfirm python-mutagen python-gobject gdk-pixbuf2 poppler-glib gdk-pixbuf2 python-cairo perl-image-exiftool python-setuptools mailcap
+  - pacman -Sy --noconfirm python-mutagen python-gobject gdk-pixbuf2 poppler-glib gdk-pixbuf2 python-cairo perl-image-exiftool python-setuptools mailcap ffmpeg
   - python3 setup.py test


=====================================
.pylintrc
=====================================
@@ -6,11 +6,12 @@ max-locals=20
 disable=
     fixme,
     invalid-name,
+    duplicate-code,
     missing-docstring,
     protected-access,
-		abstract-method,
-		wrong-import-position,
-		catching-non-exception,
-		cell-var-from-loop,
-		locally-disabled,
-		invalid-sequence-index,  # pylint doesn't like things like `Tuple[int, bytes]` in type annotation
+    abstract-method,
+    wrong-import-position,
+    catching-non-exception,
+    cell-var-from-loop,
+    locally-disabled,
+    invalid-sequence-index,  # pylint doesn't like things like `Tuple[int, bytes]` in type annotation


=====================================
CHANGELOG.md
=====================================
@@ -1,3 +1,16 @@
+# 0.5.0 - 2018-10-23
+
+- Video (.avi files for now) support, via FFmpeg, optionally
+- Lightweight cleaning for png and tiff files
+- Processing files starting with a dash is now quicker
+- Metadata are now displayed sorted
+- Recursive metadata support for FLAC files
+- Unsupported extensions aren't displayed in `/.mat -l` anymore
+- Improve the display when no metadata are found
+- Update the logo according to the GNOME guidelines
+- The testsuite is now runnable on the installed version of mat2
+- Various internal cleanup/improvements
+
 # 0.4.0 - 2018-10-03
 
 - There is now a policy, for advanced users, to deal with unknown embedded fileformats


=====================================
CONTRIBUTING.md
=====================================
@@ -32,5 +32,6 @@ Since MAT2 is written in Python3, please conform as much as possible to the
 9. Create the signed tarball with `git archive --format=tar.xz --prefix=mat-$VERSION/ $VERSION > mat-$VERSION.tar.xz`
 10. Sign the tarball with `gpg --armor --detach-sign mat-$VERSION.tar.xz`
 11. Upload the result on Gitlab's [tag page](https://0xacab.org/jvoisin/mat2/tags) and add the changelog there
-12. Tell the [downstreams](https://0xacab.org/jvoisin/mat2/blob/master/INSTALL.md) about it
-13. Do the secret release dance
+12. Announce the release on the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
+13. Tell the [downstreams](https://0xacab.org/jvoisin/mat2/blob/master/INSTALL.md) about it
+14. Do the secret release dance


=====================================
README.md
=====================================
@@ -30,10 +30,11 @@ metadata.
 - `python3-mutagen` for audio support
 - `python3-gi-cairo` and `gir1.2-poppler-0.18` for PDF support
 - `gir1.2-gdkpixbuf-2.0` for images support
+- `FFmpeg`, optionally, for video support 
 - `libimage-exiftool-perl` for everything else
 
 Please note that MAT2 requires at least Python3.5, meaning that it
-doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3),
+doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3).
 
 # Running the test suite
 


=====================================
data/mat2.png
=====================================
Binary files a/data/mat2.png and b/data/mat2.png differ


=====================================
data/mat2.svg
=====================================
The diff for this file was not included because it is too large.

=====================================
doc/mat2.1
=====================================
@@ -1,4 +1,4 @@
-.TH MAT2 "1" "October 2018" "MAT2 0.4.0" "User Commands"
+.TH MAT2 "1" "October 2018" "MAT2 0.5.0" "User Commands"
 
 .SH NAME
 mat2 \- the metadata anonymisation toolkit 2


=====================================
libmat2/__init__.py
=====================================
@@ -1,13 +1,15 @@
-#!/bin/env python3
+#!/usr/bin/env python3
 
-import os
 import collections
 import enum
 import importlib
 from typing import Dict, Optional
 
+from . import exiftool, video
+
 # make pyflakes happy
 assert Dict
+assert Optional
 
 # A set of extension that aren't supported, despite matching a supported mimetype
 UNSUPPORTED_EXTENSIONS = {
@@ -36,24 +38,13 @@ DEPENDENCIES = {
     'mutagen': 'Mutagen',
     }
 
-def _get_exiftool_path() -> Optional[str]:  # pragma: no cover
-    exiftool_path = '/usr/bin/exiftool'
-    if os.path.isfile(exiftool_path):
-        if os.access(exiftool_path, os.X_OK):
-            return exiftool_path
-
-    # ArchLinux
-    exiftool_path = '/usr/bin/vendor_perl/exiftool'
-    if os.path.isfile(exiftool_path):
-        if os.access(exiftool_path, os.X_OK):
-            return exiftool_path
 
-    return None
 
-def check_dependencies() -> dict:
+def check_dependencies() -> Dict[str, bool]:
     ret = collections.defaultdict(bool)  # type: Dict[str, bool]
 
-    ret['Exiftool'] = True if _get_exiftool_path() else False
+    ret['Exiftool'] = True if exiftool._get_exiftool_path() else False
+    ret['Ffmpeg'] = True if video._get_ffmpeg_path() else False
 
     for key, value in DEPENDENCIES.items():
         ret[value] = True


=====================================
libmat2/abstract.py
=====================================
@@ -1,13 +1,15 @@
 import abc
 import os
-from typing import Set, Dict
+import re
+from typing import Set, Dict, Union
 
 assert Set  # make pyflakes happy
 
 
 class AbstractParser(abc.ABC):
     """ This is the base class of every parser.
-    It might yield `ValueError` on instantiation on invalid files.
+    It might yield `ValueError` on instantiation on invalid files,
+    and `RuntimeError` when something went wrong in `remove_all`.
     """
     meta_list = set()  # type: Set[str]
     mimetypes = set()  # type: Set[str]
@@ -16,21 +18,23 @@ class AbstractParser(abc.ABC):
         """
         :raises ValueError: Raised upon an invalid file
         """
+        if re.search('^[a-z0-9./]', filename) is None:
+            # Some parsers are calling external binaries,
+            # this prevents shell command injections
+            filename = os.path.join('.', filename)
+
         self.filename = filename
         fname, extension = os.path.splitext(filename)
         self.output_filename = fname + '.cleaned' + extension
+        self.lightweight_cleaning = False
 
     @abc.abstractmethod
-    def get_meta(self) -> Dict[str, str]:
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
         pass  # pragma: no cover
 
     @abc.abstractmethod
     def remove_all(self) -> bool:
-        pass  # pragma: no cover
-
-    def remove_all_lightweight(self) -> bool:
-        """ This method removes _SOME_ metadata.
-        It might be useful to implement it for fileformats that do
-        not support non-destructive cleaning.
         """
-        return self.remove_all()
+        :raises RuntimeError: Raised if the cleaning process went wrong.
+        """
+        pass  # pragma: no cover


=====================================
libmat2/archive.py
=====================================
@@ -4,13 +4,14 @@ import tempfile
 import os
 import logging
 import shutil
-from typing import Dict, Set, Pattern
+from typing import Dict, Set, Pattern, Union
 
 from . import abstract, UnknownMemberPolicy, parser_factory
 
 # Make pyflakes happy
 assert Set
 assert Pattern
+assert Union
 
 
 class ArchiveBasedAbstractParser(abstract.AbstractParser):


=====================================
libmat2/audio.py
=====================================
@@ -1,8 +1,12 @@
+import mimetypes
+import os
 import shutil
+import tempfile
+from typing import Dict, Union
 
 import mutagen
 
-from . import abstract
+from . import abstract, parser_factory
 
 
 class MutagenParser(abstract.AbstractParser):
@@ -13,13 +17,13 @@ class MutagenParser(abstract.AbstractParser):
         except mutagen.MutagenError:
             raise ValueError
 
-    def get_meta(self):
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
         f = mutagen.File(self.filename)
         if f.tags:
             return {k:', '.join(v) for k, v in f.tags.items()}
         return {}
 
-    def remove_all(self):
+    def remove_all(self) -> bool:
         shutil.copy(self.filename, self.output_filename)
         f = mutagen.File(self.output_filename)
         f.delete()
@@ -30,8 +34,8 @@ class MutagenParser(abstract.AbstractParser):
 class MP3Parser(MutagenParser):
     mimetypes = {'audio/mpeg', }
 
-    def get_meta(self):
-        metadata = {}
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
+        metadata = {}  # type: Dict[str, Union[str, dict]]
         meta = mutagen.File(self.filename).tags
         for key in meta:
             metadata[key.rstrip(' \t\r\n\0')] = ', '.join(map(str, meta[key].text))
@@ -44,3 +48,30 @@ class OGGParser(MutagenParser):
 
 class FLACParser(MutagenParser):
     mimetypes = {'audio/flac', 'audio/x-flac'}
+
+    def remove_all(self) -> bool:
+        shutil.copy(self.filename, self.output_filename)
+        f = mutagen.File(self.output_filename)
+        f.clear_pictures()
+        f.delete()
+        f.save(deleteid3=True)
+        return True
+
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
+        meta = super().get_meta()
+        for num, picture in enumerate(mutagen.File(self.filename).pictures):
+            name = picture.desc if picture.desc else 'Cover %d' % num
+            extension = mimetypes.guess_extension(picture.mime)
+            if extension is None:  #  pragma: no cover
+                meta[name] = 'harmful data'
+                continue
+
+            _, fname = tempfile.mkstemp()
+            fname = fname + extension
+            with open(fname, 'wb') as f:
+                f.write(picture.data)
+            p, _ = parser_factory.get_parser(fname)  # type: ignore
+            # Mypy chokes on ternaries :/
+            meta[name] = p.get_meta() if p else 'harmful data'  # type: ignore
+            os.remove(fname)
+        return meta


=====================================
libmat2/exiftool.py
=====================================
@@ -0,0 +1,67 @@
+import json
+import logging
+import os
+import subprocess
+from typing import Dict, Union, Set
+
+from . import abstract
+
+# Make pyflakes happy
+assert Set
+
+
+class ExiftoolParser(abstract.AbstractParser):
+    """ Exiftool is often the easiest way to get all the metadata
+    from a import file, hence why several parsers are re-using its `get_meta`
+    method.
+    """
+    meta_whitelist = set()  # type: Set[str]
+
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
+        out = subprocess.check_output([_get_exiftool_path(), '-json', self.filename])
+        meta = json.loads(out.decode('utf-8'))[0]
+        for key in self.meta_whitelist:
+            meta.pop(key, None)
+        return meta
+
+    def _lightweight_cleanup(self) -> bool:
+        if os.path.exists(self.output_filename):
+            try:
+                # exiftool can't force output to existing files
+                os.remove(self.output_filename)
+            except OSError as e:  # pragma: no cover
+                logging.error("The output file %s is already existing and \
+                               can't be overwritten: %s.", self.filename, e)
+                return False
+
+        # Note: '-All=' must be followed by a known exiftool option.
+        # Also, '-CommonIFD0' is needed for .tiff files
+        cmd = [_get_exiftool_path(),
+               '-all=',         # remove metadata
+               '-adobe=',       # remove adobe-specific metadata
+               '-exif:all=',    # remove all exif metadata
+               '-Time:All=',    # remove all timestamps
+               '-quiet',        # don't show useless logs
+               '-CommonIFD0=',  # remove IFD0 metadata
+               '-o', self.output_filename,
+               self.filename]
+        try:
+            subprocess.check_call(cmd)
+        except subprocess.CalledProcessError as e:  # pragma: no cover
+            logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
+            return False
+        return True
+
+def _get_exiftool_path() -> str:  # pragma: no cover
+    exiftool_path = '/usr/bin/exiftool'
+    if os.path.isfile(exiftool_path):
+        if os.access(exiftool_path, os.X_OK):
+            return exiftool_path
+
+    # ArchLinux
+    exiftool_path = '/usr/bin/vendor_perl/exiftool'
+    if os.path.isfile(exiftool_path):
+        if os.access(exiftool_path, os.X_OK):
+            return exiftool_path
+
+    raise RuntimeError("Unable to find exiftool")


=====================================
libmat2/harmless.py
=====================================
@@ -1,5 +1,5 @@
 import shutil
-from typing import Dict
+from typing import Dict, Union
 from . import abstract
 
 
@@ -7,7 +7,7 @@ class HarmlessParser(abstract.AbstractParser):
     """ This is the parser for filetypes that can not contain metadata. """
     mimetypes = {'text/plain', 'image/x-ms-bmp'}
 
-    def get_meta(self) -> Dict[str, str]:
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
         return dict()
 
     def remove_all(self) -> bool:


=====================================
libmat2/images.py
=====================================
@@ -1,10 +1,5 @@
-import subprocess
 import imghdr
-import json
 import os
-import shutil
-import tempfile
-import re
 from typing import Set
 
 import cairo
@@ -13,44 +8,12 @@ import gi
 gi.require_version('GdkPixbuf', '2.0')
 from gi.repository import GdkPixbuf
 
-from . import abstract, _get_exiftool_path
+from . import exiftool
 
 # Make pyflakes happy
 assert Set
 
-class _ImageParser(abstract.AbstractParser):
-    """ Since we use `exiftool` to get metadata from
-    all images fileformat, `get_meta` is implemented in this class,
-    and all the image-handling ones are inheriting from it."""
-    meta_whitelist = set()  # type: Set[str]
-
-    @staticmethod
-    def __handle_problematic_filename(filename: str, callback) -> str:
-        """ This method takes a filename with a problematic name,
-        and safely applies it a `callback`."""
-        tmpdirname = tempfile.mkdtemp()
-        fname = os.path.join(tmpdirname, "temp_file")
-        shutil.copy(filename, fname)
-        out = callback(fname)
-        shutil.rmtree(tmpdirname)
-        return out
-
-    def get_meta(self):
-        """ There is no way to escape the leading(s) dash(es) of the current
-        self.filename to prevent parameter injections, so we need to take care
-        of this.
-        """
-        fun = lambda f: subprocess.check_output([_get_exiftool_path(), '-json', f])
-        if re.search('^[a-z0-9/]', self.filename) is None:
-            out = self.__handle_problematic_filename(self.filename, fun)
-        else:
-            out = fun(self.filename)
-        meta = json.loads(out.decode('utf-8'))[0]
-        for key in self.meta_whitelist:
-            meta.pop(key, None)
-        return meta
-
-class PNGParser(_ImageParser):
+class PNGParser(exiftool.ExiftoolParser):
     mimetypes = {'image/png', }
     meta_whitelist = {'SourceFile', 'ExifToolVersion', 'FileName',
                       'Directory', 'FileSize', 'FileModifyDate',
@@ -71,19 +34,26 @@ class PNGParser(_ImageParser):
         except MemoryError:  # pragma: no cover
             raise ValueError
 
-    def remove_all(self):
+    def remove_all(self) -> bool:
+        if self.lightweight_cleaning:
+            return self._lightweight_cleanup()
         surface = cairo.ImageSurface.create_from_png(self.filename)
         surface.write_to_png(self.output_filename)
         return True
 
 
-class GdkPixbufAbstractParser(_ImageParser):
+class GdkPixbufAbstractParser(exiftool.ExiftoolParser):
     """ GdkPixbuf can handle a lot of surfaces, so we're rending images on it,
         this has the side-effect of completely removing metadata.
     """
     _type = ''
 
-    def remove_all(self):
+    def __init__(self, filename):
+        super().__init__(filename)
+        if imghdr.what(filename) != self._type:  # better safe than sorry
+            raise ValueError
+
+    def remove_all(self) -> bool:
         _, extension = os.path.splitext(self.filename)
         pixbuf = GdkPixbuf.Pixbuf.new_from_file(self.filename)
         if extension.lower() == '.jpg':
@@ -91,11 +61,6 @@ class GdkPixbufAbstractParser(_ImageParser):
         pixbuf.savev(self.output_filename, extension[1:], [], [])
         return True
 
-    def __init__(self, filename):
-        super().__init__(filename)
-        if imghdr.what(filename) != self._type:  # better safe than sorry
-            raise ValueError
-
 
 class JPGParser(GdkPixbufAbstractParser):
     _type = 'jpeg'


=====================================
libmat2/office.py
=====================================
@@ -2,7 +2,7 @@ import logging
 import os
 import re
 import zipfile
-from typing import Dict, Set, Pattern
+from typing import Dict, Set, Pattern, Tuple, Union
 
 import xml.etree.ElementTree as ET  # type: ignore
 
@@ -14,9 +14,8 @@ from .archive import ArchiveBasedAbstractParser
 assert Set
 assert Pattern
 
-def _parse_xml(full_path: str):
+def _parse_xml(full_path: str) -> Tuple[ET.ElementTree, Dict[str, str]]:
     """ This function parses XML, with namespace support. """
-
     namespace_map = dict()
     for _, (key, value) in ET.iterparse(full_path, ("start-ns", )):
         # The ns[0-9]+ namespaces are reserved for internal usage, so
@@ -88,6 +87,7 @@ class MSOfficeParser(ArchiveBasedAbstractParser):
             r'^docProps/custom\.xml$',
             r'^word/printerSettings/',
             r'^word/theme',
+            r'^word/people\.xml$',
 
             # we have a whitelist in self.files_to_keep,
             # so we can trash everything else
@@ -182,20 +182,20 @@ class MSOfficeParser(ArchiveBasedAbstractParser):
 
         parent_map = {c:p for p in tree.iter() for c in p}
 
-        elements = list()
+        elements_del = list()
         for element in tree.iterfind('.//w:del', namespace):
-            elements.append(element)
-        for element in elements:
+            elements_del.append(element)
+        for element in elements_del:
             parent_map[element].remove(element)
 
-        elements = list()
+        elements_ins = list()
         for element in tree.iterfind('.//w:ins', namespace):
             for position, item in enumerate(tree.iter()):  # pragma: no cover
                 if item == element:
                     for children in element.iterfind('./*'):
-                        elements.append((element, position, children))
+                        elements_ins.append((element, position, children))
                     break
-        for (element, position, children) in elements:
+        for (element, position, children) in elements_ins:
             parent_map[element].insert(position, children)
             parent_map[element].remove(element)
 
@@ -296,7 +296,7 @@ class MSOfficeParser(ArchiveBasedAbstractParser):
 
         return True
 
-    def get_meta(self) -> Dict[str, str]:
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
         """
         Yes, I know that parsing xml with regexp ain't pretty,
         be my guest and fix it if you want.
@@ -381,7 +381,7 @@ class LibreOfficeParser(ArchiveBasedAbstractParser):
                 return False
         return True
 
-    def get_meta(self) -> Dict[str, str]:
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
         """
         Yes, I know that parsing xml with regexp ain't pretty,
         be my guest and fix it if you want.


=====================================
libmat2/parser_factory.py
=====================================
@@ -18,6 +18,8 @@ def __load_all_parsers():
             continue
         elif fname.endswith('__init__.py'):
             continue
+        elif fname.endswith('exiftool.py'):
+            continue
         basename = os.path.basename(fname)
         name, _ = os.path.splitext(basename)
         importlib.import_module('.' + name, package='libmat2')
@@ -33,6 +35,7 @@ def _get_parsers() -> List[T]:
 
 
 def get_parser(filename: str) -> Tuple[Optional[T], Optional[str]]:
+    """ Return the appropriate parser for a giver filename. """
     mtype, _ = mimetypes.guess_type(filename)
 
     _, extension = os.path.splitext(filename)


=====================================
libmat2/pdf.py
=====================================
@@ -7,6 +7,7 @@ import re
 import logging
 import tempfile
 import io
+from typing import Dict, Union
 from distutils.version import LooseVersion
 
 import cairo
@@ -37,7 +38,12 @@ class PDFParser(abstract.AbstractParser):
         except GLib.GError:  # Invalid PDF
             raise ValueError
 
-    def remove_all_lightweight(self):
+    def remove_all(self) -> bool:
+        if self.lightweight_cleaning is True:
+            return self.__remove_all_lightweight()
+        return self.__remove_all_thorough()
+
+    def __remove_all_lightweight(self) -> bool:
         """
             Load the document into Poppler, render pages on a new PDFSurface.
         """
@@ -64,7 +70,7 @@ class PDFParser(abstract.AbstractParser):
 
         return True
 
-    def remove_all(self):
+    def __remove_all_thorough(self) -> bool:
         """
             Load the document into Poppler, render pages on PNG,
             and shove those PNG into a new PDF.
@@ -119,13 +125,13 @@ class PDFParser(abstract.AbstractParser):
         return True
 
     @staticmethod
-    def __parse_metadata_field(data: str) -> dict:
+    def __parse_metadata_field(data: str) -> Dict[str, str]:
         metadata = {}
         for (_, key, value) in re.findall(r"<(xmp|pdfx|pdf|xmpMM):(.+)>(.+)</\1:\2>", data, re.I):
             metadata[key] = value
         return metadata
 
-    def get_meta(self):
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
         """ Return a dict with all the meta of the file
         """
         metadata = {}


=====================================
libmat2/torrent.py
=====================================
@@ -14,7 +14,7 @@ class TorrentParser(abstract.AbstractParser):
         if self.dict_repr is None:
             raise ValueError
 
-    def get_meta(self) -> Dict[str, str]:
+    def get_meta(self) -> Dict[str, Union[str, dict]]:
         metadata = {}
         for key, value in self.dict_repr.items():
             if key not in self.whitelist:


=====================================
libmat2/video.py
=====================================
@@ -0,0 +1,54 @@
+import os
+import subprocess
+import logging
+
+from . import exiftool
+
+
+class AVIParser(exiftool.ExiftoolParser):
+    mimetypes = {'video/x-msvideo', }
+    meta_whitelist = {'SourceFile', 'ExifToolVersion', 'FileName', 'Directory',
+                      'FileSize', 'FileModifyDate', 'FileAccessDate',
+                      'FileInodeChangeDate', 'FilePermissions', 'FileType',
+                      'FileTypeExtension', 'MIMEType', 'FrameRate', 'MaxDataRate',
+                      'FrameCount', 'StreamCount', 'StreamType', 'VideoCodec',
+                      'VideoFrameRate', 'VideoFrameCount', 'Quality',
+                      'SampleSize', 'BMPVersion', 'ImageWidth', 'ImageHeight',
+                      'Planes', 'BitDepth', 'Compression', 'ImageLength',
+                      'PixelsPerMeterX', 'PixelsPerMeterY', 'NumColors',
+                      'NumImportantColors', 'NumColors', 'NumImportantColors',
+                      'RedMask', 'GreenMask', 'BlueMask', 'AlphaMask',
+                      'ColorSpace', 'AudioCodec', 'AudioCodecRate',
+                      'AudioSampleCount', 'AudioSampleCount',
+                      'AudioSampleRate', 'Encoding', 'NumChannels',
+                      'SampleRate', 'AvgBytesPerSec', 'BitsPerSample',
+                      'Duration', 'ImageSize', 'Megapixels'}
+
+    def remove_all(self) -> bool:
+        cmd = [_get_ffmpeg_path(),
+               '-i', self.filename,      # input file
+               '-y',                     # overwrite existing output file
+               '-loglevel', 'panic',     # Don't show log
+               '-hide_banner',           # hide the banner
+               '-codec', 'copy',         # don't decode anything, just copy (speed!)
+               '-map_metadata', '-1',    # remove supperficial metadata
+               '-map_chapters', '-1',    # remove chapters
+               '-fflags', '+bitexact',   # don't add any metadata
+               '-flags:v', '+bitexact',  # don't add any metadata
+               '-flags:a', '+bitexact',  # don't add any metadata
+               self.output_filename]
+        try:
+            subprocess.check_call(cmd)
+        except subprocess.CalledProcessError as e:
+            logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
+            return False
+        return True
+
+
+def _get_ffmpeg_path() -> str:  # pragma: no cover
+    ffmpeg_path = '/usr/bin/ffmpeg'
+    if os.path.isfile(ffmpeg_path):
+        if os.access(ffmpeg_path, os.X_OK):
+            return ffmpeg_path
+
+    raise RuntimeError("Unable to find ffmpeg")


=====================================
mat2
=====================================
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 
 import os
-from typing import Tuple
+from typing import Tuple, Generator, List, Union
 import sys
 import mimetypes
 import argparse
@@ -14,7 +14,12 @@ except ValueError as e:
     print(e)
     sys.exit(1)
 
-__version__ = '0.4.0'
+__version__ = '0.5.0'
+
+# Make pyflakes happy
+assert Tuple
+assert Union
+
 
 def __check_file(filename: str, mode: int=os.R_OK) -> bool:
     if not os.path.exists(filename):
@@ -29,7 +34,7 @@ def __check_file(filename: str, mode: int=os.R_OK) -> bool:
     return True
 
 
-def create_arg_parser():
+def create_arg_parser() -> argparse.ArgumentParser:
     parser = argparse.ArgumentParser(description='Metadata anonymisation toolkit 2')
     parser.add_argument('files', nargs='*', help='the files to process')
     parser.add_argument('-v', '--version', action='version',
@@ -61,16 +66,28 @@ def show_meta(filename: str):
     if p is None:
         print("[-] %s's format (%s) is not supported" % (filename, mtype))
         return
+    __print_meta(filename, p.get_meta())
+
+
+def __print_meta(filename: str, metadata: dict, depth: int=1):
+    padding = " " * depth*2
+    if not metadata:
+        print(padding + "No metadata found")
+        return
 
-    print("[+] Metadata for %s:" % filename)
-    for k, v in p.get_meta().items():
+    print("[%s] Metadata for %s:" % ('+'*depth, filename))
+
+    for (k, v) in sorted(metadata.items()):
+        if isinstance(v, dict):
+            __print_meta(k, v, depth+1)
+            continue
         try:  # FIXME this is ugly.
-            print("  %s: %s" % (k, v))
+            print(padding + "  %s: %s" % (k, v))
         except UnicodeEncodeError:
-            print("  %s: harmful content" % k)
+            print(padding + "  %s: harmful content" % k)
+
 
-def clean_meta(params: Tuple[str, bool, UnknownMemberPolicy]) -> bool:
-    filename, is_lightweight, unknown_member_policy = params
+def clean_meta(filename: str, is_lightweight: bool, policy: UnknownMemberPolicy) -> bool:
     if not __check_file(filename, os.R_OK|os.W_OK):
         return False
 
@@ -78,30 +95,36 @@ def clean_meta(params: Tuple[str, bool, UnknownMemberPolicy]) -> bool:
     if p is None:
         print("[-] %s's format (%s) is not supported" % (filename, mtype))
         return False
-    p.unknown_member_policy = unknown_member_policy
-    if is_lightweight:
-        return p.remove_all_lightweight()
-    return p.remove_all()
+    p.unknown_member_policy = policy
+    p.lightweight_cleaning = is_lightweight
+
+    try:
+        return p.remove_all()
+    except RuntimeError as e:
+        print("[-] %s can't be cleaned: %s" % (filename, e))
+    return False
 
 
-def show_parsers():
+
+def show_parsers() -> bool:
     print('[+] Supported formats:')
-    formats = list()
-    for parser in parser_factory._get_parsers():
+    formats = set()  # Set[str]
+    for parser in parser_factory._get_parsers():  # type: ignore
         for mtype in parser.mimetypes:
-            extensions = set()
+            extensions = set()  # Set[str]
             for extension in mimetypes.guess_all_extensions(mtype):
-                if extension[1:] not in UNSUPPORTED_EXTENSIONS:  # skip the dot
+                if extension not in UNSUPPORTED_EXTENSIONS:
                     extensions.add(extension)
             if not extensions:
                 # we're not supporting a single extension in the current
                 # mimetype, so there is not point in showing the mimetype at all
                 continue
-            formats.append('  - %s (%s)' % (mtype, ', '.join(extensions)))
+            formats.add('  - %s (%s)' % (mtype, ', '.join(extensions)))
     print('\n'.join(sorted(formats)))
+    return True
 
 
-def __get_files_recursively(files):
+def __get_files_recursively(files: List[str]) -> Generator[str, None, None]:
     for f in files:
         if os.path.isdir(f):
             for path, _, _files in os.walk(f):
@@ -112,7 +135,7 @@ def __get_files_recursively(files):
         elif __check_file(f):
             yield f
 
-def main():
+def main() -> int:
     arg_parser = create_arg_parser()
     args = arg_parser.parse_args()
 
@@ -121,13 +144,13 @@ def main():
 
     if not args.files:
         if args.list:
-            show_parsers()
+            return show_parsers()
         elif args.check_dependencies:
             print("Dependencies required for MAT2 %s:" % __version__)
             for key, value in sorted(check_dependencies().items()):
                 print('- %s: %s' % (key, 'yes' if value else 'no'))
         else:
-            return arg_parser.print_help()
+            arg_parser.print_help()
         return 0
 
     elif args.show:
@@ -136,13 +159,13 @@ def main():
         return 0
 
     else:
-        unknown_member_policy = UnknownMemberPolicy(args.unknown_members)
-        if unknown_member_policy == UnknownMemberPolicy.KEEP:
+        policy = UnknownMemberPolicy(args.unknown_members)
+        if policy == UnknownMemberPolicy.KEEP:
             logging.warning('Keeping unknown member files may leak metadata in the resulting file!')
 
         no_failure = True
         for f in __get_files_recursively(args.files):
-            if clean_meta([f, args.lightweight, unknown_member_policy]) is False:
+            if clean_meta(f, args.lightweight, policy) is False:
                 no_failure = False
         return 0 if no_failure is True else -1
 


=====================================
setup.py
=====================================
@@ -5,7 +5,7 @@ with open("README.md", "r") as fh:
 
 setuptools.setup(
     name="mat2",
-    version='0.4.0',
+    version='0.5.0',
     author="Julien (jvoisin) Voisin",
     author_email="julien.voisin+mat2 at dustri.org",
     description="A handy tool to trash your metadata",


=====================================
tests/data/dirty.avi
=====================================
Binary files /dev/null and b/tests/data/dirty.avi differ


=====================================
tests/data/dirty.flac
=====================================
Binary files a/tests/data/dirty.flac and b/tests/data/dirty.flac differ


=====================================
tests/test_climat2.py
=====================================
@@ -4,16 +4,24 @@ import subprocess
 import unittest
 
 
+mat2_binary = ['./mat2']
+
+if 'MAT2_GLOBAL_PATH_TESTSUITE' in os.environ:
+    # Debian runs tests after installing the package
+    # https://0xacab.org/jvoisin/mat2/issues/16#note_153878
+    mat2_binary = ['/usr/bin/env', 'mat2']
+
+
 class TestHelp(unittest.TestCase):
     def test_help(self):
-        proc = subprocess.Popen(['./mat2', '--help'], stdout=subprocess.PIPE)
+        proc = subprocess.Popen(mat2_binary + ['--help'], stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]',
                       stdout)
         self.assertIn(b'[--unknown-members policy] [-s | -L]', stdout)
 
     def test_no_arg(self):
-        proc = subprocess.Popen(['./mat2'], stdout=subprocess.PIPE)
+        proc = subprocess.Popen(mat2_binary, stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]',
                       stdout)
@@ -22,29 +30,29 @@ class TestHelp(unittest.TestCase):
 
 class TestVersion(unittest.TestCase):
     def test_version(self):
-        proc = subprocess.Popen(['./mat2', '--version'], stdout=subprocess.PIPE)
+        proc = subprocess.Popen(mat2_binary + ['--version'], stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertTrue(stdout.startswith(b'MAT2 '))
 
 class TestDependencies(unittest.TestCase):
     def test_dependencies(self):
-        proc = subprocess.Popen(['./mat2', '--check-dependencies'], stdout=subprocess.PIPE)
+        proc = subprocess.Popen(mat2_binary + ['--check-dependencies'], stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertTrue(b'MAT2' in stdout)
 
 class TestReturnValue(unittest.TestCase):
     def test_nonzero(self):
-        ret = subprocess.call(['./mat2', './mat2'], stdout=subprocess.DEVNULL)
+        ret = subprocess.call(mat2_binary + ['mat2'], stdout=subprocess.DEVNULL)
         self.assertEqual(255, ret)
 
-        ret = subprocess.call(['./mat2', '--whololo'], stderr=subprocess.DEVNULL)
+        ret = subprocess.call(mat2_binary + ['--whololo'], stderr=subprocess.DEVNULL)
         self.assertEqual(2, ret)
 
     def test_zero(self):
-        ret = subprocess.call(['./mat2'], stdout=subprocess.DEVNULL)
+        ret = subprocess.call(mat2_binary, stdout=subprocess.DEVNULL)
         self.assertEqual(0, ret)
 
-        ret = subprocess.call(['./mat2', '--show', './mat2'], stdout=subprocess.DEVNULL)
+        ret = subprocess.call(mat2_binary + ['--show', 'mat2'], stdout=subprocess.DEVNULL)
         self.assertEqual(0, ret)
 
 
@@ -57,22 +65,23 @@ class TestCleanFolder(unittest.TestCase):
         shutil.copy('./tests/data/dirty.jpg', './tests/data/folder/clean1.jpg')
         shutil.copy('./tests/data/dirty.jpg', './tests/data/folder/clean2.jpg')
 
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/folder/'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/folder/'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'Comment: Created with GIMP', stdout)
 
-        proc = subprocess.Popen(['./mat2', './tests/data/folder/'],
+        proc = subprocess.Popen(mat2_binary + ['./tests/data/folder/'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
 
         os.remove('./tests/data/folder/clean1.jpg')
         os.remove('./tests/data/folder/clean2.jpg')
 
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/folder/'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/folder/'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertNotIn(b'Comment: Created with GIMP', stdout)
+        self.assertIn(b'No metadata found', stdout)
 
         shutil.rmtree('./tests/data/folder/')
 
@@ -81,16 +90,16 @@ class TestCleanMeta(unittest.TestCase):
     def test_jpg(self):
         shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
 
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/clean.jpg'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.jpg'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'Comment: Created with GIMP', stdout)
 
-        proc = subprocess.Popen(['./mat2', './tests/data/clean.jpg'],
+        proc = subprocess.Popen(mat2_binary + ['./tests/data/clean.jpg'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
 
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/clean.cleaned.jpg'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.cleaned.jpg'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertNotIn(b'Comment: Created with GIMP', stdout)
@@ -100,32 +109,34 @@ class TestCleanMeta(unittest.TestCase):
 
 class TestIsSupported(unittest.TestCase):
     def test_pdf(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.pdf'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.pdf'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertNotIn(b"isn't supported", stdout)
 
 class TestGetMeta(unittest.TestCase):
+    maxDiff = None
+
     def test_pdf(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.pdf'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.pdf'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'producer: pdfTeX-1.40.14', stdout)
 
     def test_png(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.png'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.png'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'Comment: This is a comment, be careful!', stdout)
 
     def test_jpg(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.jpg'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.jpg'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'Comment: Created with GIMP', stdout)
 
     def test_docx(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.docx'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.docx'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'Application: LibreOffice/5.4.5.1$Linux_X86_64', stdout)
@@ -133,7 +144,7 @@ class TestGetMeta(unittest.TestCase):
         self.assertIn(b'revision: 1', stdout)
 
     def test_odt(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.odt'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.odt'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'generator: LibreOffice/3.3$Unix', stdout)
@@ -141,22 +152,22 @@ class TestGetMeta(unittest.TestCase):
         self.assertIn(b'date_time: 2011-07-26 02:40:16', stdout)
 
     def test_mp3(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.mp3'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.mp3'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'TALB: harmfull', stdout)
         self.assertIn(b'COMM::: Thank you for using MAT !', stdout)
 
     def test_flac(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.flac'],
-                stdout=subprocess.PIPE)
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.flac'],
+                stdout=subprocess.PIPE, bufsize=0)
         stdout, _ = proc.communicate()
         self.assertIn(b'comments: Thank you for using MAT !', stdout)
         self.assertIn(b'genre: Python', stdout)
         self.assertIn(b'title: I am so', stdout)
 
     def test_ogg(self):
-        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.ogg'],
+        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.ogg'],
                 stdout=subprocess.PIPE)
         stdout, _ = proc.communicate()
         self.assertIn(b'comments: Thank you for using MAT !', stdout)


=====================================
tests/test_corrupted_files.py
=====================================
@@ -5,7 +5,8 @@ import shutil
 import os
 import logging
 
-from libmat2 import pdf, images, audio, office, parser_factory, torrent, harmless
+from libmat2 import pdf, images, audio, office, parser_factory, torrent
+from libmat2 import harmless, video
 
 # No need to logging messages, should something go wrong,
 # the testsuite _will_ fail.
@@ -192,3 +193,32 @@ class TestCorruptedFiles(unittest.TestCase):
         with self.assertRaises(ValueError):
              images.JPGParser('./tests/data/clean.jpg')
         os.remove('./tests/data/clean.jpg')
+
+    def test_png_lightweight(self):
+        return
+        shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.png')
+        p = images.PNGParser('./tests/data/clean.png')
+        self.assertTrue(p.remove_all())
+        os.remove('./tests/data/clean.png')
+
+    def test_avi(self):
+        try:
+            video._get_ffmpeg_path()
+        except RuntimeError:
+            raise unittest.SkipTest
+
+        shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.avi')
+        p = video.AVIParser('./tests/data/clean.avi')
+        self.assertFalse(p.remove_all())
+        os.remove('./tests/data/clean.avi')
+
+    def test_avi_injection(self):
+        try:
+            video._get_ffmpeg_path()
+        except RuntimeError:
+            raise unittest.SkipTest
+
+        shutil.copy('./tests/data/dirty.torrent', './tests/data/--output.avi')
+        p = video.AVIParser('./tests/data/--output.avi')
+        self.assertFalse(p.remove_all())
+        os.remove('./tests/data/--output.avi')


=====================================
tests/test_libmat2.py
=====================================
@@ -6,12 +6,16 @@ import os
 import zipfile
 
 from libmat2 import pdf, images, audio, office, parser_factory, torrent, harmless
-from libmat2 import check_dependencies
+from libmat2 import check_dependencies, video
 
 
 class TestCheckDependencies(unittest.TestCase):
     def test_deps(self):
-        ret = check_dependencies()
+        try:
+            ret = check_dependencies()
+        except RuntimeError:
+            return   # this happens if not every dependency is installed
+
         for value in ret.values():
             self.assertTrue(value)
 
@@ -33,6 +37,32 @@ class TestParameterInjection(unittest.TestCase):
         self.assertEqual(meta['ModifyDate'], "2018:03:20 21:59:25")
         os.remove('-ver')
 
+    def test_ffmpeg_injection(self):
+        try:
+            video._get_ffmpeg_path()
+        except RuntimeError:
+            raise unittest.SkipTest
+
+        shutil.copy('./tests/data/dirty.avi', './--output')
+        p = video.AVIParser('--output')
+        meta = p.get_meta()
+        self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
+        os.remove('--output')
+
+    def test_ffmpeg_injection_complete_path(self):
+        try:
+            video._get_ffmpeg_path()
+        except RuntimeError:
+            raise unittest.SkipTest
+
+        shutil.copy('./tests/data/dirty.avi', './tests/data/ --output.avi')
+        p = video.AVIParser('./tests/data/ --output.avi')
+        meta = p.get_meta()
+        self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
+        self.assertTrue(p.remove_all())
+        os.remove('./tests/data/ --output.avi')
+        os.remove('./tests/data/ --output.cleaned.avi')
+
 
 class TestUnsupportedEmbeddedFiles(unittest.TestCase):
     def test_odt_with_svg(self):
@@ -96,6 +126,7 @@ class TestGetMeta(unittest.TestCase):
         p = audio.FLACParser('./tests/data/dirty.flac')
         meta = p.get_meta()
         self.assertEqual(meta['title'], 'I am so')
+        self.assertEqual(meta['Cover 0'], {'Comment': 'Created with GIMP'})
 
     def test_docx(self):
         p = office.MSOfficeParser('./tests/data/dirty.docx')
@@ -181,40 +212,6 @@ class TestRevisionsCleaning(unittest.TestCase):
         os.remove('./tests/data/revision_clean.docx')
         os.remove('./tests/data/revision_clean.cleaned.docx')
 
-class TestLightWeightCleaning(unittest.TestCase):
-    def test_pdf(self):
-        shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
-        p = pdf.PDFParser('./tests/data/clean.pdf')
-
-        meta = p.get_meta()
-        self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
-
-        ret = p.remove_all_lightweight()
-        self.assertTrue(ret)
-
-        p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
-        expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
-        self.assertEqual(p.get_meta(), expected_meta)
-
-        os.remove('./tests/data/clean.pdf')
-        os.remove('./tests/data/clean.cleaned.pdf')
-
-    def test_png(self):
-        shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
-        p = images.PNGParser('./tests/data/clean.png')
-
-        meta = p.get_meta()
-        self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
-
-        ret = p.remove_all_lightweight()
-        self.assertTrue(ret)
-
-        p = images.PNGParser('./tests/data/clean.cleaned.png')
-        self.assertEqual(p.get_meta(), {})
-
-        os.remove('./tests/data/clean.png')
-        os.remove('./tests/data/clean.cleaned.png')
-
 class TestCleaning(unittest.TestCase):
     def test_pdf(self):
         shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
@@ -468,3 +465,26 @@ class TestCleaning(unittest.TestCase):
         os.remove('./tests/data/clean.txt')
         os.remove('./tests/data/clean.cleaned.txt')
         os.remove('./tests/data/clean.cleaned.cleaned.txt')
+
+    def test_avi(self):
+        try:
+            video._get_ffmpeg_path()
+        except RuntimeError:
+            raise unittest.SkipTest
+
+        shutil.copy('./tests/data/dirty.avi', './tests/data/clean.avi')
+        p = video.AVIParser('./tests/data/clean.avi')
+
+        meta = p.get_meta()
+        self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
+
+        ret = p.remove_all()
+        self.assertTrue(ret)
+
+        p = video.AVIParser('./tests/data/clean.cleaned.avi')
+        self.assertEqual(p.get_meta(), {})
+        self.assertTrue(p.remove_all())
+
+        os.remove('./tests/data/clean.avi')
+        os.remove('./tests/data/clean.cleaned.avi')
+        os.remove('./tests/data/clean.cleaned.cleaned.avi')


=====================================
tests/test_lightweigh_cleaning.py
=====================================
@@ -0,0 +1,65 @@
+#!/usr/bin/env python3
+
+import unittest
+import shutil
+import os
+
+from libmat2 import pdf, images
+
+class TestLightWeightCleaning(unittest.TestCase):
+    def test_pdf(self):
+        shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
+        p = pdf.PDFParser('./tests/data/clean.pdf')
+
+        meta = p.get_meta()
+        self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
+
+        p.lightweight_cleaning = True
+        ret = p.remove_all()
+        self.assertTrue(ret)
+
+        p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
+        expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
+        self.assertEqual(p.get_meta(), expected_meta)
+
+        os.remove('./tests/data/clean.pdf')
+        os.remove('./tests/data/clean.cleaned.pdf')
+
+    def test_png(self):
+        shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
+        p = images.PNGParser('./tests/data/clean.png')
+
+        meta = p.get_meta()
+        self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
+
+        p.lightweight_cleaning = True
+        ret = p.remove_all()
+        self.assertTrue(ret)
+
+        p = images.PNGParser('./tests/data/clean.cleaned.png')
+        self.assertEqual(p.get_meta(), {})
+
+        p = images.PNGParser('./tests/data/clean.png')
+        p.lightweight_cleaning = True
+        ret = p.remove_all()
+        self.assertTrue(ret)
+
+        os.remove('./tests/data/clean.png')
+        os.remove('./tests/data/clean.cleaned.png')
+
+    def test_jpg(self):
+        shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
+        p = images.JPGParser('./tests/data/clean.jpg')
+
+        meta = p.get_meta()
+        self.assertEqual(meta['Comment'], 'Created with GIMP')
+
+        p.lightweight_cleaning = True
+        ret = p.remove_all()
+        self.assertTrue(ret)
+
+        p = images.JPGParser('./tests/data/clean.cleaned.jpg')
+        self.assertEqual(p.get_meta(), {})
+
+        os.remove('./tests/data/clean.jpg')
+        os.remove('./tests/data/clean.cleaned.jpg')



View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/commit/86df3b3764ad06d70b2051ff985db755f47ddaad

-- 
View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/commit/86df3b3764ad06d70b2051ff985db755f47ddaad
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-privacy-commits/attachments/20181023/6b642076/attachment-0001.html>