[Pkg-privacy-commits] [Git][pkg-privacy-team/mat2][upstream] New upstream version 0.12.0
Georg Faerber
georg at debian.org
Sat Dec 26 20:21:09 GMT 2020
Georg Faerber pushed to branch upstream at Privacy Maintainers / mat2
Commits:
1d6ab2f4 by Georg Faerber at 2020-12-26T18:24:10+00:00
New upstream version 0.12.0
- - - - -
19 changed files:
- .gitlab-ci.yml
- .pylintrc
- CHANGELOG.md
- README.md
- doc/mat2.1
- dolphin/mat2.desktop
- libmat2/audio.py
- libmat2/bubblewrap.py
- libmat2/images.py
- libmat2/office.py
- libmat2/parser_factory.py
- libmat2/pdf.py
- mat2
- nautilus/README.md
- nautilus/mat2.py
- setup.py
- tests/data/malformed_content_types.docx
- tests/test_climat2.py
- tests/test_corrupted_files.py
Changes:
=====================================
.gitlab-ci.yml
=====================================
@@ -31,9 +31,9 @@ linting:pylint:
image: $CONTAINER_REGISTRY:linting
stage: linting
script:
- - pylint --disable=no-else-return,no-else-raise,no-else-continue,unnecessary-comprehension --extension-pkg-whitelist=cairo,gi ./libmat2 ./mat2
+ - pylint --disable=no-else-return,no-else-raise,no-else-continue,unnecessary-comprehension,raise-missing-from,unsubscriptable-object --extension-pkg-whitelist=cairo,gi ./libmat2 ./mat2
# Once nautilus-python is in Debian, decomment it form the line below
- - pylint --disable=no-else-return,no-else-raise,no-else-continue,unnecessary-comprehension --extension-pkg-whitelist=Nautilus,GObject,Gtk,Gio,GLib,gi ./nautilus/mat2.py
+ - pylint --disable=no-else-return,no-else-raise,no-else-continue,unnecessary-comprehension,raise-missing-from,unsubscriptable-object --extension-pkg-whitelist=Nautilus,GObject,Gtk,Gio,GLib,gi ./nautilus/mat2.py
linting:pyflakes:
image: $CONTAINER_REGISTRY:linting
@@ -66,7 +66,7 @@ tests:debian_with_bubblewrap:
<<: *prepare_env
script:
- su - mat2 -c "python3-coverage run --branch -m unittest discover -s tests/"
- - su - mat2 -c "python3-coverage report --fail-under=100 -m --include 'libmat2/*'"
+ - su - mat2 -c "python3-coverage report --fail-under=95 -m --include 'libmat2/*'"
tests:fedora:
image: $CONTAINER_REGISTRY:fedora
=====================================
.pylintrc
=====================================
@@ -14,4 +14,5 @@ disable=
catching-non-exception,
cell-var-from-loop,
locally-disabled,
+ raise-missing-from,
invalid-sequence-index, # pylint doesn't like things like `Tuple[int, bytes]` in type annotation
=====================================
CHANGELOG.md
=====================================
@@ -1,3 +1,13 @@
+# 0.12.0 - 2020-12-18
+
+- Improve significantly MS Office formats support
+- Fix some typos in the Nautilus extension
+- Improve reliability of the mp3, pdf and svg parsers
+- Improve compatibility with ffmpeg when sandboxing is used
+- Improve the dolphin extension usability
+- libmat2 now raises a ValueError on malformed files while trying to
+ find the right parser, instead of returning None
+
# 0.11.0 - 2020-03-29
- Improve significantly MS Office formats support
=====================================
README.md
=====================================
@@ -93,6 +93,23 @@ Note that mat2 **will not** clean files in-place, but will produce, for
example, with a file named "myfile.png" a cleaned version named
"myfile.cleaned.png".
+## Web interface
+
+It's possible to run mat2 as a web service, via
+[mat2-web](https://0xacab.org/jvoisin/mat2-web).
+
+## Desktop GUI
+
+For GNU/Linux desktops, it's possible to use the
+[Metadata Cleaner](https://gitlab.com/rmnvgr/metadata-cleaner) GTK application.
+
+# Supported formats
+
+The following formats are supported: avi, bmp, css, epub/ncx, flac, gif, jpeg,
+m4a/mp2/mp3/…, mp4, odc/odf/odg/odi/odp/ods/odt/…, off/opus/oga/spx/…, pdf,
+png, ppm, pptx/xlsx/docx/…, svg/svgz/…, tar/tar.gz/tar.bz2/tar.xz/…, tiff,
+torrent, wav, wmv, zip, …
+
# Notes about detecting metadata
While mat2 is doing its very best to display metadata when the `--show` flag is
@@ -126,11 +143,15 @@ of the guarantee that mat2 won't modify the data of their files, there is the
# Contact
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
-or the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
+or the [mailing list](https://www.autistici.org/mailman/listinfo/mat-dev)
Should a more private contact be needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at `julien.voisin+mat2 at dustri.org`,
using the gpg key `9FCDEE9E1A381F311EA62A7404D041E8171901CC`.
+# Donations
+
+If you want to donate some money, please give it to [Tails]( https://tails.boum.org/donate/?r=contribute ).
+
# License
This program is free software: you can redistribute it and/or modify
@@ -163,4 +184,3 @@ mat2 wouldn't exist without:
- friends
Many thanks to them!
-
=====================================
doc/mat2.1
=====================================
@@ -1,4 +1,4 @@
-.TH mat2 "1" "March 2020" "mat2 0.11.0" "User Commands"
+.TH mat2 "1" "December 2020" "mat2 0.12.0" "User Commands"
.SH NAME
mat2 \- the metadata anonymisation toolkit 2
=====================================
dolphin/mat2.desktop
=====================================
@@ -1,6 +1,6 @@
[Desktop Entry]
X-KDE-ServiceTypes=KonqPopupMenu/Plugin
-MimeType=application/pdf;application/vnd.oasis.opendocument.chart ;application/vnd.oasis.opendocument.formula ;application/vnd.oasis.opendocument.graphics ;application/vnd.oasis.opendocument.image ;application/vnd.oasis.opendocument.presentation ;application/vnd.oasis.opendocument.spreadsheet ;application/vnd.oasis.opendocument.text ;application/vnd.openxmlformats-officedocument.presentationml.presentation ;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ;application/vnd.openxmlformats-officedocument.wordprocessingml.document ;application/x-bittorrent ;application/zip ;audio/flac ;audio/mpeg ;audio/ogg ;audio/x-flac ;image/jpeg ;image/png ;image/tiff ;image/x-ms-bmp ;text/plain ;video/mp4 ;video/x-msvideo;
+MimeType=application/pdf;application/vnd.oasis.opendocument.chart;application/vnd.oasis.opendocument.formula;application/vnd.oasis.opendocument.graphics;application/vnd.oasis.opendocument.image;application/vnd.oasis.opendocument.presentation;application/vnd.oasis.opendocument.spreadsheet;application/vnd.oasis.opendocument.text;application/vnd.openxmlformats-officedocument.presentationml.presentation;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;application/vnd.openxmlformats-officedocument.wordprocessingml.document;application/x-bittorrent;application/zip;audio/flac;audio/mpeg;audio/ogg;audio/x-flac;image/jpeg;image/png;image/tiff;image/x-ms-bmp;text/plain;video/mp4;video/x-msvideo;
Actions=cleanMetadata;
Type=Service
=====================================
libmat2/audio.py
=====================================
@@ -37,6 +37,8 @@ class MP3Parser(MutagenParser):
def get_meta(self) -> Dict[str, Union[str, dict]]:
metadata = {} # type: Dict[str, Union[str, dict]]
meta = mutagen.File(self.filename).tags
+ if not meta:
+ return metadata
for key in meta:
if not hasattr(meta[key], 'text'): # pragma: no cover
continue
=====================================
libmat2/bubblewrap.py
=====================================
@@ -29,7 +29,6 @@ def _get_bwrap_path() -> str:
raise RuntimeError("Unable to find bwrap") # pragma: no cover
-# pylint: disable=bad-whitespace
def _get_bwrap_args(tempdir: str,
input_filename: str,
output_filename: Optional[str] = None) -> List[str]:
@@ -38,7 +37,7 @@ def _get_bwrap_args(tempdir: str,
# XXX: use --ro-bind-try once all supported platforms
# have a bubblewrap recent enough to support it.
- ro_bind_dirs = ['/usr', '/lib', '/lib64', '/bin', '/sbin', cwd]
+ ro_bind_dirs = ['/usr', '/lib', '/lib64', '/bin', '/sbin', '/etc/alternatives', cwd]
for bind_dir in ro_bind_dirs:
if os.path.isdir(bind_dir): # pragma: no cover
ro_bind_args.extend(['--ro-bind', bind_dir, bind_dir])
@@ -77,7 +76,6 @@ def _get_bwrap_args(tempdir: str,
return args
-# pylint: disable=bad-whitespace
def run(args: List[str],
input_filename: str,
output_filename: Optional[str] = None,
=====================================
libmat2/images.py
=====================================
@@ -41,7 +41,7 @@ class SVGParser(exiftool.ExiftoolParser):
# The namespace is mandatory, but only the …/2000/svg is valid.
ns = 'http://www.w3.org/2000/svg'
- if meta.get('Xmlns', ns) == ns:
+ if meta.get('Xmlns') == ns:
meta.pop('Xmlns')
return meta
=====================================
libmat2/office.py
=====================================
@@ -87,16 +87,21 @@ class MSOfficeParser(ZipParser):
self.files_to_keep = set(map(re.compile, { # type: ignore
r'^\[Content_Types\]\.xml$',
r'^_rels/\.rels$',
- r'^(?:word|ppt)/_rels/document\.xml\.rels$',
- r'^(?:word|ppt)/_rels/footer[0-9]*\.xml\.rels$',
- r'^(?:word|ppt)/_rels/header[0-9]*\.xml\.rels$',
+ r'^(?:word|ppt|xl)/_rels/document\.xml\.rels$',
+ r'^(?:word|ppt|xl)/_rels/footer[0-9]*\.xml\.rels$',
+ r'^(?:word|ppt|xl)/_rels/header[0-9]*\.xml\.rels$',
+ r'^(?:word|ppt|xl)/styles\.xml$',
+ # TODO: randomize axId ( https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oi29500/089f849f-fcd6-4fa0-a281-35aa6a432a16 )
+ r'^(?:word|ppt|xl)/charts/chart[0-9]*\.xml$',
+ r'^xl/workbook\.xml$',
+ r'^xl/worksheets/sheet[0-9]+\.xml$',
r'^ppt/slideLayouts/_rels/slideLayout[0-9]+\.xml\.rels$',
r'^ppt/slideLayouts/slideLayout[0-9]+\.xml$',
- r'^(?:word|ppt)/tableStyles\.xml$',
+ r'^(?:word|ppt|xl)/tableStyles\.xml$',
r'^ppt/slides/_rels/slide[0-9]*\.xml\.rels$',
r'^ppt/slides/slide[0-9]*\.xml$',
# https://msdn.microsoft.com/en-us/library/dd908153(v=office.12).aspx
- r'^(?:word|ppt)/stylesWithEffects\.xml$',
+ r'^(?:word|ppt|xl)/stylesWithEffects\.xml$',
r'^ppt/presentation\.xml$',
# TODO: check if p:bgRef can be randomized
r'^ppt/slideMasters/slideMaster[0-9]+\.xml',
@@ -106,20 +111,20 @@ class MSOfficeParser(ZipParser):
r'^customXml/',
r'webSettings\.xml$',
r'^docProps/custom\.xml$',
- r'^(?:word|ppt)/printerSettings/',
- r'^(?:word|ppt)/theme',
- r'^(?:word|ppt)/people\.xml$',
- r'^(?:word|ppt)/numbering\.xml$',
- r'^(?:word|ppt)/tags/',
+ r'^(?:word|ppt|xl)/printerSettings/',
+ r'^(?:word|ppt|xl)/theme',
+ r'^(?:word|ppt|xl)/people\.xml$',
+ r'^(?:word|ppt|xl)/numbering\.xml$',
+ r'^(?:word|ppt|xl)/tags/',
# View properties like view mode, last viewed slide etc
- r'^(?:word|ppt)/viewProps\.xml$',
+ r'^(?:word|ppt|xl)/viewProps\.xml$',
# Additional presentation-wide properties like printing properties,
# presentation show properties etc.
- r'^(?:word|ppt)/presProps\.xml$',
+ r'^(?:word|ppt|xl)/presProps\.xml$',
# we have an allowlist in self.files_to_keep,
# so we can trash everything else
- r'^(?:word|ppt)/_rels/',
+ r'^(?:word|ppt|xl)/_rels/',
}))
if self.__fill_files_to_keep_via_content_types() is False:
@@ -142,7 +147,7 @@ class MSOfficeParser(ZipParser):
except ET.ParseError:
return False
for c in tree:
- if 'PartName' not in c.attrib or 'ContentType' not in c.attrib:
+ if 'PartName' not in c.attrib or 'ContentType' not in c.attrib: # pragma: no cover
continue
elif c.attrib['ContentType'] in self.content_types_to_keep:
fname = c.attrib['PartName'][1:] # remove leading `/`
@@ -265,8 +270,8 @@ class MSOfficeParser(ZipParser):
logging.error("Unable to parse %s: %s", full_path, e)
return False
- if len(namespace.items()) != 1:
- return False # there should be only one namespace for Types
+ if len(namespace.items()) != 1: # pragma: no cover
+ logging.debug("Got several namespaces for Types: %s", namespace.items())
removed_fnames = set()
with zipfile.ZipFile(self.filename) as zin:
@@ -356,7 +361,7 @@ class MSOfficeParser(ZipParser):
if full_path.endswith('/[Content_Types].xml'):
# this file contains references to files that we might
# remove, and MS Office doesn't like dangling references
- if self.__remove_content_type_members(full_path) is False:
+ if self.__remove_content_type_members(full_path) is False: # pragma: no cover
return False
elif full_path.endswith('/word/document.xml'):
# this file contains the revisions
=====================================
libmat2/parser_factory.py
=====================================
@@ -1,4 +1,3 @@
-import logging
import glob
import os
import mimetypes
@@ -40,7 +39,10 @@ def _get_parsers() -> List[T]:
def get_parser(filename: str) -> Tuple[Optional[T], Optional[str]]:
- """ Return the appropriate parser for a given filename. """
+ """ Return the appropriate parser for a given filename.
+
+ :raises ValueError: Raised if the instantiation of the parser went wrong.
+ """
mtype, _ = mimetypes.guess_type(filename)
_, extension = os.path.splitext(filename)
@@ -53,10 +55,6 @@ def get_parser(filename: str) -> Tuple[Optional[T], Optional[str]]:
for parser_class in _get_parsers(): # type: ignore
if mtype in parser_class.mimetypes:
- try:
- return parser_class(filename), mtype
- except ValueError as e:
- logging.info("Got an exception when trying to instantiate "
- "%s for %s: %s", parser_class, filename, e)
- return None, mtype
+ # This instantiation might raise a ValueError on malformed files
+ return parser_class(filename), mtype
return None, mtype
=====================================
libmat2/pdf.py
=====================================
@@ -84,6 +84,9 @@ class PDFParser(abstract.AbstractParser):
for pagenum in range(pages_count):
page = document.get_page(pagenum)
+ if page is None: # pragma: no cover
+ logging.error("Unable to get PDF pages")
+ return False
page_width, page_height = page.get_size()
logging.info("Rendering page %d/%d", pagenum + 1, pages_count)
=====================================
mat2
=====================================
@@ -17,7 +17,7 @@ except ValueError as e:
print(e)
sys.exit(1)
-__version__ = '0.11.0'
+__version__ = '0.12.0'
# Make pyflakes happy
assert Set
@@ -85,7 +85,11 @@ def show_meta(filename: str, sandbox: bool):
if not __check_file(filename):
return
- p, mtype = parser_factory.get_parser(filename) # type: ignore
+ try:
+ p, mtype = parser_factory.get_parser(filename) # type: ignore
+ except ValueError as e:
+ print("[-] something went wrong when processing %s: %s" % (filename, e))
+ return
if p is None:
print("[-] %s's format (%s) is not supported" % (filename, mtype))
return
@@ -126,7 +130,11 @@ def clean_meta(filename: str, is_lightweight: bool, inplace: bool, sandbox: bool
if not __check_file(filename, mode):
return False
- p, mtype = parser_factory.get_parser(filename) # type: ignore
+ try:
+ p, mtype = parser_factory.get_parser(filename) # type: ignore
+ except ValueError as e:
+ print("[-] something went wrong when cleaning %s: %s" % (filename, e))
+ return False
if p is None:
print("[-] %s's format (%s) is not supported" % (filename, mtype))
return False
=====================================
nautilus/README.md
=====================================
@@ -9,7 +9,7 @@
Simply copy the `mat2.py` file to `~/.local/share/nautilus-python/extensions`,
and launch Nautilus; you should now have a "Remove metadata" item in the
-right-clic menu on supported files.
+right-click menu on supported files.
Please note: This is not needed if using a distribution provided package. It
only applies if installing from source.
=====================================
nautilus/mat2.py
=====================================
@@ -2,7 +2,7 @@
"""
Because writing GUI is non-trivial (cf. https://0xacab.org/jvoisin/mat2/issues/3),
-we decided to write a Nautilus extensions instead
+we decided to write a Nautilus extension instead
(cf. https://0xacab.org/jvoisin/mat2/issues/2).
The code is a little bit convoluted because Gtk isn't thread-safe,
@@ -36,7 +36,7 @@ def _remove_metadata(fpath) -> Tuple[bool, Optional[str]]:
return parser.remove_all(), mtype
class Mat2Extension(GObject.GObject, Nautilus.MenuProvider, Nautilus.LocationWidgetProvider):
- """ This class adds an item to the right-clic menu in Nautilus. """
+ """ This class adds an item to the right-click menu in Nautilus. """
def __init__(self):
super().__init__()
=====================================
setup.py
=====================================
@@ -5,7 +5,7 @@ with open("README.md", encoding='utf-8') as fh:
setuptools.setup(
name="mat2",
- version='0.11.0',
+ version='0.12.0',
author="Julien (jvoisin) Voisin",
author_email="julien.voisin+mat2 at dustri.org",
description="A handy tool to trash your metadata",
=====================================
tests/data/malformed_content_types.docx
=====================================
Binary files a/tests/data/malformed_content_types.docx and b/tests/data/malformed_content_types.docx differ
=====================================
tests/test_climat2.py
=====================================
@@ -29,7 +29,7 @@ class TestHelp(unittest.TestCase):
self.assertIn(b' [-v] [-l]', stdout)
self.assertIn(b'[--check-dependencies]', stdout)
self.assertIn(b'[-L | -s]', stdout)
- self.assertIn(b'[files [files ...]]', stdout)
+ self.assertIn(b'[files ...]', stdout)
def test_no_arg(self):
proc = subprocess.Popen(mat2_binary, stdout=subprocess.PIPE)
@@ -39,7 +39,7 @@ class TestHelp(unittest.TestCase):
self.assertIn(b'[--inplace]', stdout)
self.assertIn(b'[--no-sandbox]', stdout)
self.assertIn(b' [-v] [-l] [--check-dependencies] [-L | -s]', stdout)
- self.assertIn(b'[files [files ...]]', stdout)
+ self.assertIn(b'[files ...]', stdout)
class TestVersion(unittest.TestCase):
=====================================
tests/test_corrupted_files.py
=====================================
@@ -65,8 +65,10 @@ class TestCorruptedEmbedded(unittest.TestCase):
def test_docx(self):
shutil.copy('./tests/data/embedded_corrupted.docx', './tests/data/clean.docx')
parser, _ = parser_factory.get_parser('./tests/data/clean.docx')
- self.assertFalse(parser.remove_all())
- self.assertIsNotNone(parser.get_meta())
+ with self.assertRaises(ValueError):
+ parser.remove_all()
+ with self.assertRaises(ValueError):
+ self.assertIsNotNone(parser.get_meta())
os.remove('./tests/data/clean.docx')
def test_odt(self):
@@ -89,9 +91,8 @@ class TestExplicitelyUnsupportedFiles(unittest.TestCase):
class TestWrongContentTypesFileOffice(unittest.TestCase):
def test_office_incomplete(self):
shutil.copy('./tests/data/malformed_content_types.docx', './tests/data/clean.docx')
- p = office.MSOfficeParser('./tests/data/clean.docx')
- self.assertIsNotNone(p)
- self.assertFalse(p.remove_all())
+ with self.assertRaises(ValueError):
+ office.MSOfficeParser('./tests/data/clean.docx')
os.remove('./tests/data/clean.docx')
def test_office_broken(self):
@@ -121,8 +122,8 @@ class TestCorruptedFiles(unittest.TestCase):
def test_png2(self):
shutil.copy('./tests/test_libmat2.py', './tests/clean.png')
- parser, _ = parser_factory.get_parser('./tests/clean.png')
- self.assertIsNone(parser)
+ with self.assertRaises(ValueError):
+ parser_factory.get_parser('./tests/clean.png')
os.remove('./tests/clean.png')
def test_torrent(self):
@@ -238,10 +239,10 @@ class TestCorruptedFiles(unittest.TestCase):
zout.write('./tests/data/embedded_corrupted.docx')
p, mimetype = parser_factory.get_parser('./tests/data/clean.zip')
self.assertEqual(mimetype, 'application/zip')
- meta = p.get_meta()
- self.assertEqual(meta['tests/data/dirty.flac']['comments'], 'Thank you for using MAT !')
- self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
- self.assertFalse(p.remove_all())
+ with self.assertRaises(ValueError):
+ p.get_meta()
+ with self.assertRaises(ValueError):
+ self.assertFalse(p.remove_all())
os.remove('./tests/data/clean.zip')
def test_html(self):
@@ -316,10 +317,10 @@ class TestCorruptedFiles(unittest.TestCase):
zout.addfile(tarinfo, f)
p, mimetype = parser_factory.get_parser('./tests/data/clean.tar')
self.assertEqual(mimetype, 'application/x-tar')
- meta = p.get_meta()
- self.assertEqual(meta['./tests/data/dirty.flac']['comments'], 'Thank you for using MAT !')
- self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
- self.assertFalse(p.remove_all())
+ with self.assertRaises(ValueError):
+ p.get_meta()
+ with self.assertRaises(ValueError):
+ self.assertFalse(p.remove_all())
os.remove('./tests/data/clean.tar')
shutil.copy('./tests/data/dirty.png', './tests/data/clean.tar')
View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/-/commit/1d6ab2f4cb8a0c72cf6afa2f9e7eb59539fb38c0
--
View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/-/commit/1d6ab2f4cb8a0c72cf6afa2f9e7eb59539fb38c0
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-privacy-commits/attachments/20201226/e5f86b29/attachment-0001.html>
More information about the Pkg-privacy-commits
mailing list