[Pkg-privacy-commits] [Git][pkg-privacy-team/mat2][upstream] New upstream version 0.12.0

Sat Dec 26 20:21:09 GMT 2020


Georg Faerber pushed to branch upstream at Privacy Maintainers / mat2


Commits:
1d6ab2f4 by Georg Faerber at 2020-12-26T18:24:10+00:00
New upstream version 0.12.0
- - - - -


19 changed files:

- .gitlab-ci.yml
- .pylintrc
- CHANGELOG.md
- README.md
- doc/mat2.1
- dolphin/mat2.desktop
- libmat2/audio.py
- libmat2/bubblewrap.py
- libmat2/images.py
- libmat2/office.py
- libmat2/parser_factory.py
- libmat2/pdf.py
- mat2
- nautilus/README.md
- nautilus/mat2.py
- setup.py
- tests/data/malformed_content_types.docx
- tests/test_climat2.py
- tests/test_corrupted_files.py


Changes:

=====================================
.gitlab-ci.yml
=====================================
@@ -31,9 +31,9 @@ linting:pylint:
   image: $CONTAINER_REGISTRY:linting
   stage: linting
   script:
-    - pylint --disable=no-else-return,no-else-raise,no-else-continue,unnecessary-comprehension --extension-pkg-whitelist=cairo,gi ./libmat2 ./mat2
+    - pylint --disable=no-else-return,no-else-raise,no-else-continue,unnecessary-comprehension,raise-missing-from,unsubscriptable-object --extension-pkg-whitelist=cairo,gi ./libmat2 ./mat2
     # Once nautilus-python is in Debian, decomment it form the line below
-    - pylint --disable=no-else-return,no-else-raise,no-else-continue,unnecessary-comprehension --extension-pkg-whitelist=Nautilus,GObject,Gtk,Gio,GLib,gi ./nautilus/mat2.py
+    - pylint --disable=no-else-return,no-else-raise,no-else-continue,unnecessary-comprehension,raise-missing-from,unsubscriptable-object --extension-pkg-whitelist=Nautilus,GObject,Gtk,Gio,GLib,gi ./nautilus/mat2.py
 
 linting:pyflakes:
   image: $CONTAINER_REGISTRY:linting
@@ -66,7 +66,7 @@ tests:debian_with_bubblewrap:
   <<: *prepare_env
   script:
     - su - mat2 -c "python3-coverage run --branch -m unittest discover -s tests/"
-    - su - mat2 -c "python3-coverage report --fail-under=100 -m --include 'libmat2/*'"
+    - su - mat2 -c "python3-coverage report --fail-under=95 -m --include 'libmat2/*'"
 
 tests:fedora:
   image: $CONTAINER_REGISTRY:fedora


=====================================
.pylintrc
=====================================
@@ -14,4 +14,5 @@ disable=
     catching-non-exception,
     cell-var-from-loop,
     locally-disabled,
+		raise-missing-from,
     invalid-sequence-index,  # pylint doesn't like things like `Tuple[int, bytes]` in type annotation


=====================================
CHANGELOG.md
=====================================
@@ -1,3 +1,13 @@
+# 0.12.0 - 2020-12-18
+
+- Improve significantly MS Office formats support
+- Fix some typos in the Nautilus extension
+- Improve reliability of the mp3, pdf and svg parsers
+- Improve compatibility with ffmpeg when sandboxing is used
+- Improve the dolphin extension usability
+- libmat2 now raises a ValueError on malformed files while trying to 
+  find the right parser, instead of returning None
+
 # 0.11.0 - 2020-03-29
 
 - Improve significantly MS Office formats support


=====================================
README.md
=====================================
@@ -93,6 +93,23 @@ Note that mat2 **will not** clean files in-place, but will produce, for
 example, with a file named "myfile.png" a cleaned version named
 "myfile.cleaned.png".
 
+## Web interface
+
+It's possible to run mat2 as a web service, via
+[mat2-web](https://0xacab.org/jvoisin/mat2-web).
+
+## Desktop GUI
+
+For GNU/Linux desktops, it's possible to use the
+[Metadata Cleaner](https://gitlab.com/rmnvgr/metadata-cleaner) GTK application.
+
+# Supported formats
+
+The following formats are supported: avi, bmp, css, epub/ncx, flac, gif, jpeg,
+m4a/mp2/mp3/…, mp4, odc/odf/odg/odi/odp/ods/odt/…, off/opus/oga/spx/…, pdf,
+png, ppm, pptx/xlsx/docx/…, svg/svgz/…, tar/tar.gz/tar.bz2/tar.xz/…, tiff,
+torrent, wav, wmv, zip, …
+  
 # Notes about detecting metadata
 
 While mat2 is doing its very best to display metadata when the `--show` flag is
@@ -126,11 +143,15 @@ of the guarantee that mat2 won't modify the data of their files, there is the
 # Contact
 
 If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
-or the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
+or the [mailing list](https://www.autistici.org/mailman/listinfo/mat-dev)
 Should a more private contact be needed (eg. for reporting security issues),
 you can email Julien (jvoisin) Voisin at `julien.voisin+mat2 at dustri.org`,
 using the gpg key `9FCDEE9E1A381F311EA62A7404D041E8171901CC`.
 
+# Donations
+
+If you want to donate some money, please give it to [Tails]( https://tails.boum.org/donate/?r=contribute ).
+
 # License
 
 This program is free software: you can redistribute it and/or modify
@@ -163,4 +184,3 @@ mat2 wouldn't exist without:
 - friends
 
 Many thanks to them!
-


=====================================
doc/mat2.1
=====================================
@@ -1,4 +1,4 @@
-.TH mat2 "1" "March 2020" "mat2 0.11.0" "User Commands"
+.TH mat2 "1" "December 2020" "mat2 0.12.0" "User Commands"
 
 .SH NAME
 mat2 \- the metadata anonymisation toolkit 2


=====================================
dolphin/mat2.desktop
=====================================
@@ -1,6 +1,6 @@
 [Desktop Entry]
 X-KDE-ServiceTypes=KonqPopupMenu/Plugin
-MimeType=application/pdf;application/vnd.oasis.opendocument.chart ;application/vnd.oasis.opendocument.formula ;application/vnd.oasis.opendocument.graphics ;application/vnd.oasis.opendocument.image ;application/vnd.oasis.opendocument.presentation ;application/vnd.oasis.opendocument.spreadsheet ;application/vnd.oasis.opendocument.text ;application/vnd.openxmlformats-officedocument.presentationml.presentation ;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ;application/vnd.openxmlformats-officedocument.wordprocessingml.document ;application/x-bittorrent ;application/zip ;audio/flac ;audio/mpeg ;audio/ogg ;audio/x-flac ;image/jpeg ;image/png ;image/tiff ;image/x-ms-bmp ;text/plain ;video/mp4 ;video/x-msvideo;
+MimeType=application/pdf;application/vnd.oasis.opendocument.chart;application/vnd.oasis.opendocument.formula;application/vnd.oasis.opendocument.graphics;application/vnd.oasis.opendocument.image;application/vnd.oasis.opendocument.presentation;application/vnd.oasis.opendocument.spreadsheet;application/vnd.oasis.opendocument.text;application/vnd.openxmlformats-officedocument.presentationml.presentation;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;application/vnd.openxmlformats-officedocument.wordprocessingml.document;application/x-bittorrent;application/zip;audio/flac;audio/mpeg;audio/ogg;audio/x-flac;image/jpeg;image/png;image/tiff;image/x-ms-bmp;text/plain;video/mp4;video/x-msvideo;
 Actions=cleanMetadata;
 Type=Service
 


=====================================
libmat2/audio.py
=====================================
@@ -37,6 +37,8 @@ class MP3Parser(MutagenParser):
     def get_meta(self) -> Dict[str, Union[str, dict]]:
         metadata = {}  # type: Dict[str, Union[str, dict]]
         meta = mutagen.File(self.filename).tags
+        if not meta:
+            return metadata
         for key in meta:
             if not hasattr(meta[key], 'text'):  # pragma: no cover
                 continue


=====================================
libmat2/bubblewrap.py
=====================================
@@ -29,7 +29,6 @@ def _get_bwrap_path() -> str:
     raise RuntimeError("Unable to find bwrap")  # pragma: no cover
 
 
-# pylint: disable=bad-whitespace
 def _get_bwrap_args(tempdir: str,
                     input_filename: str,
                     output_filename: Optional[str] = None) -> List[str]:
@@ -38,7 +37,7 @@ def _get_bwrap_args(tempdir: str,
 
     # XXX: use --ro-bind-try once all supported platforms
     # have a bubblewrap recent enough to support it.
-    ro_bind_dirs = ['/usr', '/lib', '/lib64', '/bin', '/sbin', cwd]
+    ro_bind_dirs = ['/usr', '/lib', '/lib64', '/bin', '/sbin', '/etc/alternatives', cwd]
     for bind_dir in ro_bind_dirs:
         if os.path.isdir(bind_dir):  # pragma: no cover
             ro_bind_args.extend(['--ro-bind', bind_dir, bind_dir])
@@ -77,7 +76,6 @@ def _get_bwrap_args(tempdir: str,
     return args
 
 
-# pylint: disable=bad-whitespace
 def run(args: List[str],
         input_filename: str,
         output_filename: Optional[str] = None,


=====================================
libmat2/images.py
=====================================
@@ -41,7 +41,7 @@ class SVGParser(exiftool.ExiftoolParser):
 
         # The namespace is mandatory, but only the …/2000/svg is valid.
         ns = 'http://www.w3.org/2000/svg'
-        if meta.get('Xmlns', ns) == ns:
+        if meta.get('Xmlns') == ns:
             meta.pop('Xmlns')
         return meta
 


=====================================
libmat2/office.py
=====================================
@@ -87,16 +87,21 @@ class MSOfficeParser(ZipParser):
         self.files_to_keep = set(map(re.compile, {  # type: ignore
             r'^\[Content_Types\]\.xml$',
             r'^_rels/\.rels$',
-            r'^(?:word|ppt)/_rels/document\.xml\.rels$',
-            r'^(?:word|ppt)/_rels/footer[0-9]*\.xml\.rels$',
-            r'^(?:word|ppt)/_rels/header[0-9]*\.xml\.rels$',
+            r'^(?:word|ppt|xl)/_rels/document\.xml\.rels$',
+            r'^(?:word|ppt|xl)/_rels/footer[0-9]*\.xml\.rels$',
+            r'^(?:word|ppt|xl)/_rels/header[0-9]*\.xml\.rels$',
+            r'^(?:word|ppt|xl)/styles\.xml$',
+            # TODO: randomize axId ( https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oi29500/089f849f-fcd6-4fa0-a281-35aa6a432a16 )
+            r'^(?:word|ppt|xl)/charts/chart[0-9]*\.xml$',
+            r'^xl/workbook\.xml$',
+            r'^xl/worksheets/sheet[0-9]+\.xml$',
             r'^ppt/slideLayouts/_rels/slideLayout[0-9]+\.xml\.rels$',
             r'^ppt/slideLayouts/slideLayout[0-9]+\.xml$',
-            r'^(?:word|ppt)/tableStyles\.xml$',
+            r'^(?:word|ppt|xl)/tableStyles\.xml$',
             r'^ppt/slides/_rels/slide[0-9]*\.xml\.rels$',
             r'^ppt/slides/slide[0-9]*\.xml$',
             # https://msdn.microsoft.com/en-us/library/dd908153(v=office.12).aspx
-            r'^(?:word|ppt)/stylesWithEffects\.xml$',
+            r'^(?:word|ppt|xl)/stylesWithEffects\.xml$',
             r'^ppt/presentation\.xml$',
             # TODO: check if p:bgRef can be randomized
             r'^ppt/slideMasters/slideMaster[0-9]+\.xml',
@@ -106,20 +111,20 @@ class MSOfficeParser(ZipParser):
             r'^customXml/',
             r'webSettings\.xml$',
             r'^docProps/custom\.xml$',
-            r'^(?:word|ppt)/printerSettings/',
-            r'^(?:word|ppt)/theme',
-            r'^(?:word|ppt)/people\.xml$',
-            r'^(?:word|ppt)/numbering\.xml$',
-            r'^(?:word|ppt)/tags/',
+            r'^(?:word|ppt|xl)/printerSettings/',
+            r'^(?:word|ppt|xl)/theme',
+            r'^(?:word|ppt|xl)/people\.xml$',
+            r'^(?:word|ppt|xl)/numbering\.xml$',
+            r'^(?:word|ppt|xl)/tags/',
             # View properties like view mode, last viewed slide etc
-            r'^(?:word|ppt)/viewProps\.xml$',
+            r'^(?:word|ppt|xl)/viewProps\.xml$',
             # Additional presentation-wide properties like printing properties,
             # presentation show properties etc.
-            r'^(?:word|ppt)/presProps\.xml$',
+            r'^(?:word|ppt|xl)/presProps\.xml$',
 
             # we have an allowlist in self.files_to_keep,
             # so we can trash everything else
-            r'^(?:word|ppt)/_rels/',
+            r'^(?:word|ppt|xl)/_rels/',
         }))
 
         if self.__fill_files_to_keep_via_content_types() is False:
@@ -142,7 +147,7 @@ class MSOfficeParser(ZipParser):
         except ET.ParseError:
             return False
         for c in tree:
-            if 'PartName' not in c.attrib or 'ContentType' not in c.attrib:
+            if 'PartName' not in c.attrib or 'ContentType' not in c.attrib:  # pragma: no cover
                 continue
             elif c.attrib['ContentType'] in self.content_types_to_keep:
                 fname = c.attrib['PartName'][1:]  # remove leading `/`
@@ -265,8 +270,8 @@ class MSOfficeParser(ZipParser):
             logging.error("Unable to parse %s: %s", full_path, e)
             return False
 
-        if len(namespace.items()) != 1:
-            return False  # there should be only one namespace for Types
+        if len(namespace.items()) != 1:  # pragma: no cover
+            logging.debug("Got several namespaces for Types: %s", namespace.items())
 
         removed_fnames = set()
         with zipfile.ZipFile(self.filename) as zin:
@@ -356,7 +361,7 @@ class MSOfficeParser(ZipParser):
         if full_path.endswith('/[Content_Types].xml'):
             # this file contains references to files that we might
             # remove, and MS Office doesn't like dangling references
-            if self.__remove_content_type_members(full_path) is False:
+            if self.__remove_content_type_members(full_path) is False:  # pragma: no cover
                 return False
         elif full_path.endswith('/word/document.xml'):
             # this file contains the revisions


=====================================
libmat2/parser_factory.py
=====================================
@@ -1,4 +1,3 @@
-import logging
 import glob
 import os
 import mimetypes
@@ -40,7 +39,10 @@ def _get_parsers() -> List[T]:
 
 
 def get_parser(filename: str) -> Tuple[Optional[T], Optional[str]]:
-    """ Return the appropriate parser for a given filename. """
+    """ Return the appropriate parser for a given filename.
+
+        :raises ValueError: Raised if the instantiation of the parser went wrong.
+    """
     mtype, _ = mimetypes.guess_type(filename)
 
     _, extension = os.path.splitext(filename)
@@ -53,10 +55,6 @@ def get_parser(filename: str) -> Tuple[Optional[T], Optional[str]]:
 
     for parser_class in _get_parsers():  # type: ignore
         if mtype in parser_class.mimetypes:
-            try:
-                return parser_class(filename), mtype
-            except ValueError as e:
-                logging.info("Got an exception when trying to instantiate "
-                             "%s for %s: %s", parser_class, filename, e)
-                return None, mtype
+            # This instantiation might raise a ValueError on malformed files
+            return parser_class(filename), mtype
     return None, mtype


=====================================
libmat2/pdf.py
=====================================
@@ -84,6 +84,9 @@ class PDFParser(abstract.AbstractParser):
 
         for pagenum in range(pages_count):
             page = document.get_page(pagenum)
+            if page is None:  # pragma: no cover
+                logging.error("Unable to get PDF pages")
+                return False
             page_width, page_height = page.get_size()
             logging.info("Rendering page %d/%d", pagenum + 1, pages_count)
 


=====================================
mat2
=====================================
@@ -17,7 +17,7 @@ except ValueError as e:
     print(e)
     sys.exit(1)
 
-__version__ = '0.11.0'
+__version__ = '0.12.0'
 
 # Make pyflakes happy
 assert Set
@@ -85,7 +85,11 @@ def show_meta(filename: str, sandbox: bool):
     if not __check_file(filename):
         return
 
-    p, mtype = parser_factory.get_parser(filename)  # type: ignore
+    try:
+        p, mtype = parser_factory.get_parser(filename)  # type: ignore
+    except ValueError as e:
+        print("[-] something went wrong when processing %s: %s" % (filename, e))
+        return
     if p is None:
         print("[-] %s's format (%s) is not supported" % (filename, mtype))
         return
@@ -126,7 +130,11 @@ def clean_meta(filename: str, is_lightweight: bool, inplace: bool, sandbox: bool
     if not __check_file(filename, mode):
         return False
 
-    p, mtype = parser_factory.get_parser(filename)  # type: ignore
+    try:
+        p, mtype = parser_factory.get_parser(filename)  # type: ignore
+    except ValueError as e:
+        print("[-] something went wrong when cleaning %s: %s" % (filename, e))
+        return False
     if p is None:
         print("[-] %s's format (%s) is not supported" % (filename, mtype))
         return False


=====================================
nautilus/README.md
=====================================
@@ -9,7 +9,7 @@
 
 Simply copy the `mat2.py` file to `~/.local/share/nautilus-python/extensions`,
 and launch Nautilus; you should now have a "Remove metadata" item in the
-right-clic menu on supported files.
+right-click menu on supported files.
 
 Please note: This is not needed if using a distribution provided package. It
 only applies if installing from source.


=====================================
nautilus/mat2.py
=====================================
@@ -2,7 +2,7 @@
 
 """
 Because writing GUI is non-trivial (cf. https://0xacab.org/jvoisin/mat2/issues/3),
-we decided to write a Nautilus extensions instead
+we decided to write a Nautilus extension instead
 (cf. https://0xacab.org/jvoisin/mat2/issues/2).
 
 The code is a little bit convoluted because Gtk isn't thread-safe,
@@ -36,7 +36,7 @@ def _remove_metadata(fpath) -> Tuple[bool, Optional[str]]:
     return parser.remove_all(), mtype
 
 class Mat2Extension(GObject.GObject, Nautilus.MenuProvider, Nautilus.LocationWidgetProvider):
-    """ This class adds an item to the right-clic menu in Nautilus. """
+    """ This class adds an item to the right-click menu in Nautilus. """
 
     def __init__(self):
         super().__init__()


=====================================
setup.py
=====================================
@@ -5,7 +5,7 @@ with open("README.md", encoding='utf-8') as fh:
 
 setuptools.setup(
     name="mat2",
-    version='0.11.0',
+    version='0.12.0',
     author="Julien (jvoisin) Voisin",
     author_email="julien.voisin+mat2 at dustri.org",
     description="A handy tool to trash your metadata",


=====================================
tests/data/malformed_content_types.docx
=====================================
Binary files a/tests/data/malformed_content_types.docx and b/tests/data/malformed_content_types.docx differ


=====================================
tests/test_climat2.py
=====================================
@@ -29,7 +29,7 @@ class TestHelp(unittest.TestCase):
         self.assertIn(b' [-v] [-l]', stdout)
         self.assertIn(b'[--check-dependencies]', stdout)
         self.assertIn(b'[-L | -s]', stdout)
-        self.assertIn(b'[files [files ...]]', stdout)
+        self.assertIn(b'[files ...]', stdout)
 
     def test_no_arg(self):
         proc = subprocess.Popen(mat2_binary, stdout=subprocess.PIPE)
@@ -39,7 +39,7 @@ class TestHelp(unittest.TestCase):
         self.assertIn(b'[--inplace]', stdout)
         self.assertIn(b'[--no-sandbox]', stdout)
         self.assertIn(b' [-v] [-l] [--check-dependencies] [-L | -s]', stdout)
-        self.assertIn(b'[files [files ...]]', stdout)
+        self.assertIn(b'[files ...]', stdout)
 
 
 class TestVersion(unittest.TestCase):


=====================================
tests/test_corrupted_files.py
=====================================
@@ -65,8 +65,10 @@ class TestCorruptedEmbedded(unittest.TestCase):
     def test_docx(self):
         shutil.copy('./tests/data/embedded_corrupted.docx', './tests/data/clean.docx')
         parser, _ = parser_factory.get_parser('./tests/data/clean.docx')
-        self.assertFalse(parser.remove_all())
-        self.assertIsNotNone(parser.get_meta())
+        with self.assertRaises(ValueError):
+            parser.remove_all()
+        with self.assertRaises(ValueError):
+            self.assertIsNotNone(parser.get_meta())
         os.remove('./tests/data/clean.docx')
 
     def test_odt(self):
@@ -89,9 +91,8 @@ class TestExplicitelyUnsupportedFiles(unittest.TestCase):
 class TestWrongContentTypesFileOffice(unittest.TestCase):
     def test_office_incomplete(self):
         shutil.copy('./tests/data/malformed_content_types.docx', './tests/data/clean.docx')
-        p = office.MSOfficeParser('./tests/data/clean.docx')
-        self.assertIsNotNone(p)
-        self.assertFalse(p.remove_all())
+        with self.assertRaises(ValueError):
+            office.MSOfficeParser('./tests/data/clean.docx')
         os.remove('./tests/data/clean.docx')
 
     def test_office_broken(self):
@@ -121,8 +122,8 @@ class TestCorruptedFiles(unittest.TestCase):
 
     def test_png2(self):
         shutil.copy('./tests/test_libmat2.py', './tests/clean.png')
-        parser, _ = parser_factory.get_parser('./tests/clean.png')
-        self.assertIsNone(parser)
+        with self.assertRaises(ValueError):
+            parser_factory.get_parser('./tests/clean.png')
         os.remove('./tests/clean.png')
 
     def test_torrent(self):
@@ -238,10 +239,10 @@ class TestCorruptedFiles(unittest.TestCase):
             zout.write('./tests/data/embedded_corrupted.docx')
         p, mimetype = parser_factory.get_parser('./tests/data/clean.zip')
         self.assertEqual(mimetype, 'application/zip')
-        meta = p.get_meta()
-        self.assertEqual(meta['tests/data/dirty.flac']['comments'], 'Thank you for using MAT !')
-        self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
-        self.assertFalse(p.remove_all())
+        with self.assertRaises(ValueError):
+            p.get_meta()
+        with self.assertRaises(ValueError):
+            self.assertFalse(p.remove_all())
         os.remove('./tests/data/clean.zip')
 
     def test_html(self):
@@ -316,10 +317,10 @@ class TestCorruptedFiles(unittest.TestCase):
                 zout.addfile(tarinfo, f)
         p, mimetype = parser_factory.get_parser('./tests/data/clean.tar')
         self.assertEqual(mimetype, 'application/x-tar')
-        meta = p.get_meta()
-        self.assertEqual(meta['./tests/data/dirty.flac']['comments'], 'Thank you for using MAT !')
-        self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
-        self.assertFalse(p.remove_all())
+        with self.assertRaises(ValueError):
+            p.get_meta()
+        with self.assertRaises(ValueError):
+            self.assertFalse(p.remove_all())
         os.remove('./tests/data/clean.tar')
 
         shutil.copy('./tests/data/dirty.png', './tests/data/clean.tar')



View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/-/commit/1d6ab2f4cb8a0c72cf6afa2f9e7eb59539fb38c0

-- 
View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/-/commit/1d6ab2f4cb8a0c72cf6afa2f9e7eb59539fb38c0
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-privacy-commits/attachments/20201226/e5f86b29/attachment-0001.html>