[Pkg-privacy-commits] [Git][pkg-privacy-team/mat2][upstream] New upstream version 0.14.0
Georg Faerber (@georg)
georg at debian.org
Thu Oct 23 14:47:53 BST 2025
Georg Faerber pushed to branch upstream at Privacy Maintainers / mat2
Commits:
f2036480 by Georg Faerber at 2025-10-23T13:23:49+00:00
New upstream version 0.14.0
- - - - -
27 changed files:
- + .github/workflows/builds.yaml
- − .gitlab-ci.yml
- − .pylintrc
- CHANGELOG.md
- CONTRIBUTING.md
- INSTALL.md
- README.md
- doc/comparison_to_others.md
- doc/mat2.1
- libmat2/abstract.py
- libmat2/archive.py
- libmat2/audio.py
- − libmat2/bubblewrap.py
- libmat2/epub.py
- libmat2/exiftool.py
- libmat2/images.py
- libmat2/office.py
- libmat2/pdf.py
- libmat2/video.py
- mat2
- pyproject.toml
- setup.py
- + tests/data/dirty.webp
- tests/fuzz.py
- tests/test_climat2.py
- tests/test_libmat2.py
- tests/test_lightweight_cleaning.py
Changes:
=====================================
.github/workflows/builds.yaml
=====================================
@@ -0,0 +1,47 @@
+name: CI for Python versions
+on:
+ pull_request:
+ push:
+ schedule:
+ - cron: '0 16 * * 5'
+
+jobs:
+ linting:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout at v5
+ - uses: actions/setup-python at v5
+ - run: pip install ruff
+ - run: |
+ ruff check .
+ build:
+ needs: linting
+ runs-on: ubuntu-latest
+ strategy:
+ matrix:
+ python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
+ steps:
+ - uses: actions/checkout at v5
+ - name: Setup Python
+ uses: actions/setup-python at v5
+ with:
+ python-version: ${{ matrix.python-version }}
+ - name: Install dependencies
+ run: |
+ sudo apt update && \
+ sudo apt-get install --no-install-recommends --no-install-suggests --yes \
+ ffmpeg \
+ gir1.2-gdkpixbuf-2.0 \
+ gir1.2-poppler-0.18 \
+ gir1.2-rsvg-2.0 \
+ gobject-introspection \
+ libcairo2-dev \
+ libgirepository1.0-dev \
+ libgirepository-2.0-dev \
+ libimage-exiftool-perl \
+ python3-gi-cairo \
+ python3-mutagen \
+ webp-pixbuf-loader
+ pip install .
+ - name: Build and run the testsuite
+ run: python3 -m unittest discover -v
=====================================
.gitlab-ci.yml deleted
=====================================
@@ -1,103 +0,0 @@
-variables:
- CONTAINER_REGISTRY: $CI_REGISTRY/georg/mat2-ci-images
- GIT_DEPTH: "5"
- GIT_STRATEGY: clone
-
-stages:
- - linting
- - test
-
-.prepare_env: &prepare_env
- before_script: # This is needed to not run the testsuite as root
- - useradd --home-dir ${CI_PROJECT_DIR} mat2
- - chown -R mat2 .
-
-linting:ruff:
- image: $CONTAINER_REGISTRY:linting
- stage: linting
- script:
- - apt update
- - apt install -qqy --no-install-recommends python3-venv
- - python3 -m venv venv
- - source venv/bin/activate
- - pip3 install ruff
- - ruff check .
-
-linting:mypy:
- image: $CONTAINER_REGISTRY:linting
- stage: linting
- script:
- - mypy --ignore-missing-imports mat2 libmat2/*.py
-
-tests:archlinux:
- image: $CONTAINER_REGISTRY:archlinux
- stage: test
- script:
- - python3 -m unittest discover -v
-
-tests:debian:
- image: $CONTAINER_REGISTRY:debian
- stage: test
- <<: *prepare_env
- script:
- - apt-get -qqy purge bubblewrap
- - su - mat2 -c "python3-coverage run --branch -m unittest discover -s tests/"
- - su - mat2 -c "python3-coverage report --fail-under=95 -m --include 'libmat2/*'"
-
-tests:debian_with_bubblewrap:
- image: $CONTAINER_REGISTRY:debian
- stage: test
- allow_failure: true
- <<: *prepare_env
- script:
- - apt-get -qqy install bubblewrap
- - python3 -m unittest discover -v
-
-tests:fedora:
- image: $CONTAINER_REGISTRY:fedora
- stage: test
- script:
- - python3 -m unittest discover -v
-
-tests:gentoo:
- image: $CONTAINER_REGISTRY:gentoo
- stage: test
- <<: *prepare_env
- script:
- - su - mat2 -c "python3 -m unittest discover -v"
-
-tests:python3.7:
- image: $CONTAINER_REGISTRY:python3.7
- stage: test
- script:
- - python3 -m unittest discover -v
-
-tests:python3.8:
- image: $CONTAINER_REGISTRY:python3.8
- stage: test
- script:
- - python3 -m unittest discover -v
-
-tests:python3.9:
- image: $CONTAINER_REGISTRY:python3.9
- stage: test
- script:
- - python3 -m unittest discover -v
-
-tests:python3.10:
- image: $CONTAINER_REGISTRY:python3.10
- stage: test
- script:
- - python3 -m unittest discover -v
-
-tests:python3.11:
- image: $CONTAINER_REGISTRY:python3.11
- stage: test
- script:
- - python3 -m unittest discover -v
-
-tests:python3.12:
- image: $CONTAINER_REGISTRY:python3.12
- stage: test
- script:
- - python3 -m unittest discover -v
=====================================
.pylintrc deleted
=====================================
@@ -1,18 +0,0 @@
-[FORMAT]
-good-names=e,f,i,x,s
-max-locals=20
-
-[MESSAGES CONTROL]
-disable=
- fixme,
- invalid-name,
- duplicate-code,
- missing-docstring,
- protected-access,
- abstract-method,
- wrong-import-position,
- catching-non-exception,
- cell-var-from-loop,
- locally-disabled,
- raise-missing-from,
- invalid-sequence-index, # pylint doesn't like things like `Tuple[int, bytes]` in type annotation
=====================================
CHANGELOG.md
=====================================
@@ -1,15 +1,24 @@
+# 0.14.0 - 2025-10-23
+- Add webp support
+- Improve reliability
+- Correctly handle PDF with weird filenames
+- Improve epub support
+- Improve MSOffice documents support
+- Add Python 3.13 and 3.14 support
+- Remove bubblewrap sandboxing
+
# 0.13.5 - 2025-01-09
- Keep orientation metadata on jpeg and tiff files
- Improve cairo-related error/exceptions handling
- Improve the logging
- Improve the sandboxing
-- Improve Python3.12 support
+- Improve Python 3.12 support
- Improve MSOffice documents handling
# 0.13.4 - 2023-08-02
- Add documentation about mat2 on OSX
-- Make use of python3.7 constructs to simplify code
+- Make use of python 3.7 constructs to simplify code
- Use moderner type annotations
- Harden get_meta in archive.py against variants of CVE-2022-35410
- Improve MSOffice document support
@@ -88,7 +97,7 @@
# 0.10.0 - 2019-11-30
-- Make mat2 work on Python3.8
+- Make mat2 work on Python 3.8
- Minor improvement of ppt handling
- Minor improvement of odt handling
- Add an integration KDE's file manager: Dolphin
=====================================
CONTRIBUTING.md
=====================================
@@ -1,15 +1,11 @@
# Contributing to mat2
-The main repository for mat2 is on [0xacab]( https://0xacab.org/jvoisin/mat2 ),
+The main repository for mat2 is on [github]( https://github.com/jvoisin/mat2 ),
but you can send patches to jvoisin by [email](https://dustri.org/) if you prefer.
-Do feel free to pick up [an issue]( https://0xacab.org/jvoisin/mat2/issues )
+Do feel free to pick up [an issue]( https://github.com/jvoisin/mat2/issues )
and to send a pull-request.
-Before sending the pull-request, please do check that everything is fine by
-running the full test suite in GitLab. To do that, after forking mat2 in GitLab,
-you need to go in Settings -> CI/CD -> Runner and there enable shared runners.
-
Mat2 also has unit tests (that are also run in the full test suite). You can run
them with `python3 -m unittest discover -v`.
@@ -27,19 +23,19 @@ Since mat2 is written in Python3, please conform as much as possible to the
# Doing a release
-1. Update the [changelog](https://0xacab.org/jvoisin/mat2/blob/master/CHANGELOG.md)
-2. Update the version in the [mat2](https://0xacab.org/jvoisin/mat2/blob/master/mat2) file
-3. Update the version in the [setup.py](https://0xacab.org/jvoisin/mat2/blob/master/setup.py) file
-4. Update the version in the [pyproject.toml](https://0xacab.org/jvoisin/mat2/blob/master/yproject.toml) file
-5. Update the version and date in the [man page](https://0xacab.org/jvoisin/mat2/blob/master/doc/mat2.1)
+1. Update the [changelog](https://github.com/jvoisin/mat2/blob/master/CHANGELOG.md)
+2. Update the version in the [mat2](https://github.com/jvoisin/mat2/blob/master/mat2) file
+3. Update the version in the [setup.py](https://github.com/jvoisin/mat2/blob/master/setup.py) file
+4. Update the version in the [pyproject.toml](https://github.com/jvoisin/mat2/blob/master/yproject.toml) file
+5. Update the version and date in the [man page](https://github.com/jvoisin/mat2/blob/master/doc/mat2.1)
6. Commit the modified files
7. Create a tag with `git tag -s $VERSION`
8. Push the commit with `git push origin master`
9. Push the tag with `git push --tags`
-10. Download the gitlab archive of the release
+10. Download the github archive of the release
11. Diff it against the local copy
12. If there is no difference, sign the archive with `gpg --armor --detach-sign mat2-$VERSION.tar.xz`
-13. Upload the signature on Gitlab's [tag page](https://0xacab.org/jvoisin/mat2/tags) and add the changelog there
+13. Upload the signature on github [tag page](https://github.com/jvoisin/mat2/tags) and add the changelog there
14. Announce the release on the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
15. Sign'n'upload the new version on pypi with `python3 setup.py sdist bdist_wheel` then `twine upload -s dist/*`
16. Do the secret release dance
=====================================
INSTALL.md
=====================================
@@ -11,11 +11,6 @@ pip3 install mat2
# GNU/Linux
-## Optional dependencies
-
-When [bubblewrap](https://github.com/projectatomic/bubblewrap) is
-installed, mat2 uses it to sandbox any external processes it invokes.
-
## Arch Linux
Thanks to [kpcyrd](https://archlinux.org/packages/?maintainer=kpcyrd), there is an package available on
=====================================
README.md
=====================================
@@ -39,7 +39,6 @@ If you prefer a regular graphical user interface, you might be interested in
- `gir1.2-rsvg-2.0` for svg support
- `FFmpeg`, optionally, for video support
- `libimage-exiftool-perl` for everything else
-- `bubblewrap`, optionally, for sandboxing
Please note that mat2 requires at least Python3.5.
@@ -65,7 +64,7 @@ $ python3-coverage report --include -m --include /libmat2/*'
# How to use mat2
```
-usage: mat2 [-h] [-V] [--unknown-members policy] [--inplace] [--no-sandbox]
+usage: mat2 [-h] [-V] [--unknown-members policy] [--inplace]
[-v] [-l] [--check-dependencies] [-L | -s]
[files [files ...]]
@@ -82,7 +81,6 @@ optional arguments:
(policy should be one of: abort, omit, keep) [Default:
abort]
--inplace clean in place, without backup
- --no-sandbox Disable bubblewrap's sandboxing
-v, --version show program's version number and exit
-l, --list list all supported fileformats
--check-dependencies check if mat2 has all the dependencies it needs
@@ -149,7 +147,7 @@ of the guarantee that mat2 won't modify the data of their files, there is the
# Contact
-If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
+If possible, use the [issues system](https://github.com/jvoisin/mat2/issues)
or the [mailing list](https://www.autistici.org/mailman/listinfo/mat-dev)
Should a more private contact be needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at `julien.voisin+mat2 at dustri.org`,
=====================================
doc/comparison_to_others.md
=====================================
@@ -19,14 +19,14 @@ details.
# jpegoptim, optipng, …
While designed to reduce as much as possible the size of pictures,
-those software can be used to remove metadata. They usually have very good
+those software can be used to remove metadata. They usually have excellent
support for a single picture format, and can be used in place of mat2 for them.
# PDF Redact Tools
[PDF Redact Tools](https://github.com/firstlookmedia/pdf-redact-tools) is
-a software developed by the people from [First Look
+software developed by the people from [First Look
Media](https://firstlook.media/), the entity behind, amongst other things,
[The Intercept](https://theintercept.com/).
@@ -34,13 +34,13 @@ The tool uses roughly the same approach than mat2 to deal with PDF,
which is unfortunately the only fileformat that it does support.
It's interesting to note that it has counter-measures against
[yellow dots](https://en.wikipedia.org/wiki/Machine_Identification_Code),
-a capacity that mat2 [doesn't possess yet](https://0xacab.org/jvoisin/mat2/issues/43).
+a capacity that mat2 doesn't have.
# Exiv2
[Exiv2](https://www.exiv2.org/) was considered for mat2,
-but it currently [misses a lot of metadata](https://0xacab.org/jvoisin/mat2/issues/85)
+but it currently misses a lot of metadata.
# Others non open source software/online service
=====================================
doc/mat2.1
=====================================
@@ -1,4 +1,4 @@
-.TH mat2 "1" "January 2025" "mat2 0.13.5" "User Commands"
+.TH mat2 "1" "October 2025" "mat2 0.14.0" "User Commands"
.SH NAME
mat2 \- the metadata anonymisation toolkit 2
@@ -46,9 +46,6 @@ list harmful metadata detectable by mat2 without removing them
\fB\-L\fR, \fB\-\-lightweight\fR
remove SOME metadata
.TP
-\fB\--no-sandbox\fR
-disable bubblewrap's sandboxing
-.TP
\fB\--inplace\fR
clean in place, without backup
@@ -84,7 +81,7 @@ but keep in mind by doing so, some metadata \fBwon't be cleaned\fR.
While mat2 does its very best to remove every single metadata,
it's still in beta, and \fBsome\fR might remain. Should you encounter
-some issues, check the bugtracker: https://0xacab.org/jvoisin/mat2/issues
+some issues, check the bugtracker: https://github.com/jvoisin/mat2/issues
.PP
Please use accordingly and be careful.
=====================================
libmat2/abstract.py
=====================================
@@ -30,7 +30,6 @@ class AbstractParser(abc.ABC):
self.output_filename = fname + '.cleaned' + extension
self.lightweight_cleaning = False
- self.sandbox = True
@abc.abstractmethod
def get_meta(self) -> Dict[str, Union[str, Dict]]:
=====================================
libmat2/archive.py
=====================================
@@ -152,7 +152,10 @@ class ArchiveBasedAbstractParser(abstract.AbstractParser):
self.filename, member_name, full_path)
break
- zin.extract(member=item, path=temp_folder)
+ try:
+ zin.extract(member=item, path=temp_folder)
+ except OSError as e:
+ logging.error("Unable to extraxt %s from %s: %s", item, self.filename, e)
os.chmod(full_path, stat.S_IRUSR)
@@ -161,7 +164,6 @@ class ArchiveBasedAbstractParser(abstract.AbstractParser):
member_parser, _ = parser_factory.get_parser(full_path) # type: ignore
if member_parser:
- member_parser.sandbox = self.sandbox
local_meta = {**local_meta, **member_parser.get_meta()}
if local_meta:
@@ -249,7 +251,6 @@ class ArchiveBasedAbstractParser(abstract.AbstractParser):
abort = True
continue
else:
- member_parser.sandbox = self.sandbox
if member_parser.remove_all() is False:
logging.warning("In file %s, something went wrong \
with the cleaning of %s \
=====================================
libmat2/audio.py
=====================================
@@ -84,7 +84,6 @@ class FLACParser(MutagenParser):
p, _ = parser_factory.get_parser(fname) # type: ignore
if p is None:
raise ValueError
- p.sandbox = self.sandbox
# Mypy chokes on ternaries :/
meta[name] = p.get_meta() if p else 'harmful data' # type: ignore
os.remove(fname)
=====================================
libmat2/bubblewrap.py deleted
=====================================
@@ -1,113 +0,0 @@
-"""
-Wrapper around a subset of the subprocess module,
-that uses bwrap (bubblewrap) when it is available.
-
-Instead of importing subprocess, other modules should use this as follows:
-
- from . import subprocess
-"""
-
-import os
-import shutil
-import subprocess
-import tempfile
-import functools
-from typing import Optional, List
-
-
-__all__ = ['PIPE', 'run', 'CalledProcessError']
-PIPE = subprocess.PIPE
-CalledProcessError = subprocess.CalledProcessError
-
-# pylint: disable=subprocess-run-check
-
-
- at functools.lru_cache(maxsize=None)
-def _get_bwrap_path() -> str:
- which_path = shutil.which('bwrap')
- if which_path:
- return which_path
-
- raise RuntimeError("Unable to find bwrap") # pragma: no cover
-
-
-def _get_bwrap_args(tempdir: str,
- input_filename: str,
- output_filename: Optional[str] = None) -> List[str]:
- ro_bind_args = []
- cwd = os.getcwd()
-
- # XXX: use --ro-bind-try once all supported platforms
- # have a bubblewrap recent enough to support it.
- ro_bind_dirs = ['/usr', '/lib', '/lib64', '/bin', '/sbin', '/etc/alternatives', cwd]
- for bind_dir in ro_bind_dirs:
- if os.path.isdir(bind_dir): # pragma: no cover
- ro_bind_args.extend(['--ro-bind', bind_dir, bind_dir])
-
- ro_bind_files = ['/etc/ld.so.cache']
- for bind_file in ro_bind_files:
- if os.path.isfile(bind_file): # pragma: no cover
- ro_bind_args.extend(['--ro-bind', bind_file, bind_file])
-
- args = ro_bind_args + \
- ['--dev', '/dev',
- '--proc', '/proc',
- '--chdir', cwd,
- '--unshare-user-try',
- '--unshare-ipc',
- '--unshare-pid',
- '--unshare-net',
- '--unshare-uts',
- '--unshare-cgroup-try',
- '--new-session',
- '--cap-drop', 'all',
- # XXX: enable --die-with-parent once all supported platforms have
- # a bubblewrap recent enough to support it.
- # '--die-with-parent',
- ]
-
- if output_filename:
- # Mount an empty temporary directory where the sandboxed
- # process will create its output file
- output_dirname = os.path.dirname(os.path.abspath(output_filename))
- args.extend(['--bind', tempdir, output_dirname])
-
- absolute_input_filename = os.path.abspath(input_filename)
- args.extend(['--ro-bind', absolute_input_filename, absolute_input_filename])
-
- return args
-
-
-def run(args: List[str],
- input_filename: str,
- output_filename: Optional[str] = None,
- **kwargs) -> subprocess.CompletedProcess:
- """Wrapper around `subprocess.run`, that uses bwrap (bubblewrap) if it
- is available.
-
- Extra supported keyword arguments:
-
- - `input_filename`, made available read-only in the sandbox
- - `output_filename`, where the file created by the sandboxed process
- is copied upon successful completion; an empty temporary directory
- is made visible as the parent directory of this file in the sandbox.
- Optional: one valid use case is to invoke an external process
- to inspect metadata present in a file.
- """
- try:
- bwrap_path = _get_bwrap_path()
- except RuntimeError: # pragma: no cover
- # bubblewrap is not installed ⇒ short-circuit
- return subprocess.run(args, **kwargs)
-
- with tempfile.TemporaryDirectory() as tempdir:
- prefix_args = [bwrap_path] + \
- _get_bwrap_args(input_filename=input_filename,
- output_filename=output_filename,
- tempdir=tempdir)
- completed_process = subprocess.run(prefix_args + args, **kwargs)
- if output_filename and completed_process.returncode == 0:
- shutil.copy(os.path.join(tempdir, os.path.basename(output_filename)),
- output_filename)
-
- return completed_process
=====================================
libmat2/epub.py
=====================================
@@ -16,6 +16,7 @@ class EPUBParser(archive.ZipParser):
super().__init__(filename)
self.files_to_keep = set(map(re.compile, { # type: ignore
'META-INF/container.xml',
+ 'META-INF/com.apple.ibooks.display-options.xml', # specify is "specified fonts" should be used
'mimetype',
'OEBPS/content.opf',
'content.opf',
=====================================
libmat2/exiftool.py
=====================================
@@ -7,7 +7,6 @@ import subprocess
from typing import Union, Set, Dict
from . import abstract
-from . import bubblewrap
class ExiftoolParser(abstract.AbstractParser):
@@ -19,15 +18,9 @@ class ExiftoolParser(abstract.AbstractParser):
def get_meta(self) -> Dict[str, Union[str, Dict]]:
try:
- if self.sandbox:
- out = bubblewrap.run([_get_exiftool_path(), '-json',
- self.filename],
- input_filename=self.filename,
- check=True, stdout=subprocess.PIPE).stdout
- else:
- out = subprocess.run([_get_exiftool_path(), '-json',
- self.filename],
- check=True, stdout=subprocess.PIPE).stdout
+ out = subprocess.run([_get_exiftool_path(), '-json',
+ self.filename],
+ check=True, stdout=subprocess.PIPE).stdout
except subprocess.CalledProcessError: # pragma: no cover
raise ValueError
meta = json.loads(out.decode('utf-8'))[0]
@@ -56,12 +49,7 @@ class ExiftoolParser(abstract.AbstractParser):
'-o', self.output_filename,
self.filename]
try:
- if self.sandbox:
- bubblewrap.run(cmd, check=True,
- input_filename=self.filename,
- output_filename=self.output_filename)
- else:
- subprocess.run(cmd, check=True)
+ subprocess.run(cmd, check=True)
except subprocess.CalledProcessError as e: # pragma: no cover
logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
return False
=====================================
libmat2/images.py
=====================================
@@ -196,3 +196,15 @@ class HEICParser(exiftool.ExiftoolParser):
def remove_all(self) -> bool:
return self._lightweight_cleanup()
+
+class WEBPParser(GdkPixbufAbstractParser):
+ mimetypes = {'image/webp'}
+ meta_allowlist = {'SourceFile', 'ExifToolVersion', 'FileName',
+ 'Directory', 'FileSize', 'FileModifyDate',
+ 'FileAccessDate', "FileInodeChangeDate",
+ 'FilePermissions', 'FileType', 'FileTypeExtension',
+ 'MIMEType', 'ImageWidth', 'ImageSize', 'BitsPerSample',
+ 'ColorComponents', 'EncodingProcess', 'JFIFVersion',
+ 'ResolutionUnit', 'XResolution', 'YCbCrSubSampling',
+ 'YResolution', 'Megapixels', 'ImageHeight', 'Orientation',
+ 'HorizontalScale', 'VerticalScale', 'VP8Version'}
=====================================
libmat2/office.py
=====================================
@@ -135,6 +135,7 @@ class MSOfficeParser(ZipParser):
r'^customXml/',
r'webSettings\.xml$',
r'^docProps/custom\.xml$',
+ r'^docProps/thumbnail.wmf$',
r'^(?:word|ppt|xl)/printerSettings/',
r'^(?:word|ppt|xl)/theme',
r'^(?:word|ppt|xl)/people\.xml$',
=====================================
libmat2/pdf.py
=====================================
@@ -27,7 +27,7 @@ class PDFParser(abstract.AbstractParser):
def __init__(self, filename):
super().__init__(filename)
- self.uri = 'file://' + os.path.abspath(self.filename)
+ self.uri = 'file://' + GLib.Uri.escape_string(os.path.abspath(self.filename), '/', True)
self.__scale = 200 / 72.0 # how much precision do we want for the render
try: # Check now that the file is valid, to avoid surprises later
Poppler.Document.new_from_file(self.uri, None)
@@ -125,11 +125,11 @@ class PDFParser(abstract.AbstractParser):
@staticmethod
def __remove_superficial_meta(in_file: str, out_file: str) -> bool:
- document = Poppler.Document.new_from_file('file://' + in_file)
+ document = Poppler.Document.new_from_file('file://' + GLib.Uri.escape_string(in_file, '/', True))
document.set_producer('')
document.set_creator('')
document.set_creation_date(-1)
- document.save('file://' + os.path.abspath(out_file))
+ document.save('file://' + GLib.Uri.escape_string(os.path.abspath(out_file), '/', True))
# Cairo adds "/Producer" and "/CreationDate", and Poppler sometimes
# fails to remove them, we have to use this terrible regex.
=====================================
libmat2/video.py
=====================================
@@ -6,8 +6,6 @@ import logging
from typing import Union, Dict
from . import exiftool
-from . import bubblewrap
-
class AbstractFFmpegParser(exiftool.ExiftoolParser):
""" Abstract parser for all FFmpeg-based ones, mainly for video. """
@@ -34,14 +32,9 @@ class AbstractFFmpegParser(exiftool.ExiftoolParser):
'-flags:a', '+bitexact', # don't add any metadata
self.output_filename]
try:
- if self.sandbox:
- bubblewrap.run(cmd, check=True,
- input_filename=self.filename,
- output_filename=self.output_filename)
- else:
- subprocess.run(cmd, check=True)
+ subprocess.run(cmd, check=True)
except subprocess.CalledProcessError as e:
- logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
+ logging.error("Something went wrong during the processing of %s: return code %d", self.filename, e.returncode)
return False
return True
=====================================
mat2
=====================================
@@ -9,6 +9,7 @@ import argparse
import logging
import unicodedata
import concurrent.futures
+import warnings
try:
from libmat2 import parser_factory, UNSUPPORTED_EXTENSIONS
@@ -17,7 +18,7 @@ except ValueError as ex:
print(ex)
sys.exit(1)
-__version__ = '0.13.5'
+__version__ = '0.14.0'
logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.WARNING)
@@ -57,8 +58,9 @@ def create_arg_parser() -> argparse.ArgumentParser:
', '.join(p.value for p in UnknownMemberPolicy))
parser.add_argument('--inplace', action='store_true',
help='clean in place, without backup')
- parser.add_argument('--no-sandbox', dest='sandbox', action='store_false',
- default=True, help='Disable bubblewrap\'s sandboxing')
+ parser.add_argument('--no-sandbox', dest='sandbox', action='store_true',
+ default=False, help='Disable bubblewrap\'s sandboxing')
+
excl_group = parser.add_mutually_exclusive_group()
excl_group.add_argument('files', nargs='*', help='the files to process',
@@ -82,7 +84,7 @@ def create_arg_parser() -> argparse.ArgumentParser:
return parser
-def show_meta(filename: str, sandbox: bool):
+def show_meta(filename: str):
if not __check_file(filename):
return
@@ -94,7 +96,6 @@ def show_meta(filename: str, sandbox: bool):
if p is None:
__print_without_chars("[-] %s's format (%s) is not supported" % (filename, mtype))
return
- p.sandbox = sandbox
__print_meta(filename, p.get_meta())
@@ -119,7 +120,7 @@ def __print_meta(filename: str, metadata: Dict, depth: int = 1):
pass # for things that aren't iterable
-def clean_meta(filename: str, is_lightweight: bool, inplace: bool, sandbox: bool,
+def clean_meta(filename: str, is_lightweight: bool, inplace: bool,
policy: UnknownMemberPolicy) -> bool:
mode = (os.R_OK | os.W_OK) if inplace else os.R_OK
if not __check_file(filename, mode):
@@ -135,7 +136,6 @@ def clean_meta(filename: str, is_lightweight: bool, inplace: bool, sandbox: bool
return False
p.unknown_member_policy = policy
p.lightweight_cleaning = is_lightweight
- p.sandbox = sandbox
try:
logging.debug('Cleaning %s…', filename)
@@ -185,6 +185,9 @@ def main() -> int:
arg_parser = create_arg_parser()
args = arg_parser.parse_args()
+ if args.sandbox:
+ warnings.warn("sandboxing support has been removed", DeprecationWarning)
+
if args.verbose:
logging.getLogger(__name__).setLevel(logging.DEBUG)
@@ -203,7 +206,7 @@ def main() -> int:
elif args.show:
for f in __get_files_recursively(args.files):
- show_meta(f, args.sandbox)
+ show_meta(f)
return 0
else:
@@ -220,7 +223,7 @@ def main() -> int:
with concurrent.futures.ProcessPoolExecutor() as executor:
for f in files:
future = executor.submit(clean_meta, f, args.lightweight,
- inplace, args.sandbox, policy)
+ inplace, policy)
futures.append(future)
for future in concurrent.futures.as_completed(futures):
no_failure &= future.result()
=====================================
pyproject.toml
=====================================
@@ -1,6 +1,6 @@
[project]
name = "mat2"
-version = "0.13.5"
+version = "0.14.0"
description = "mat2 is a metadata removal tool, supporting a wide range of commonly used file formats, written in python3: at its core, it's a library, used by an eponymous command-line interface, as well as several file manager extensions."
readme = "README.md"
license = {file = "LICENSE"}
@@ -11,9 +11,9 @@ dependencies = [
'pycairo',
]
[project.urls]
-Repository = "https://0xacab.org/jvoisin/mat2"
-Issues = "https://0xacab.org/jvoisin/mat2/-/issues"
-Changelog = "https://0xacab.org/jvoisin/mat2/-/blob/master/CHANGELOG.md"
+Repository = "https://github.com/jvoisin/mat2"
+Issues = "https://github.com/jvoisin/mat2/issues"
+Changelog = "https://github.com/jvoisin/mat2/blob/master/CHANGELOG.md"
[tool.ruff]
target-version = "py39"
=====================================
setup.py
=====================================
@@ -5,13 +5,13 @@ with open("README.md", encoding='utf-8') as fh:
setuptools.setup(
name="mat2",
- version='0.13.5',
+ version='0.14.0',
author="Julien (jvoisin) Voisin",
author_email="julien.voisin+mat2 at dustri.org",
description="A handy tool to trash your metadata",
long_description=long_description,
long_description_content_type="text/markdown",
- url="https://0xacab.org/jvoisin/mat2",
+ url="https://github.com/jvoisin/mat2",
python_requires = '>=3.5.0',
scripts=['mat2'],
install_requires=[
@@ -31,6 +31,6 @@ setuptools.setup(
"Intended Audience :: End Users/Desktop",
],
project_urls={
- 'bugtacker': 'https://0xacab.org/jvoisin/mat2/issues',
+ 'bugtacker': 'https://github.com/jvoisin/mat2/issues',
},
)
=====================================
tests/data/dirty.webp
=====================================
Binary files /dev/null and b/tests/data/dirty.webp differ
=====================================
tests/fuzz.py
=====================================
@@ -41,7 +41,6 @@ def TestOneInput(data):
try:
p, _ = parser_factory.get_parser(fname)
if p:
- p.sandbox = False
p.get_meta()
p.remove_all()
p, _ = parser_factory.get_parser(fname)
=====================================
tests/test_climat2.py
=====================================
@@ -24,7 +24,6 @@ class TestHelp(unittest.TestCase):
self.assertIn(b'mat2 [-h] [-V]', stdout)
self.assertIn(b'[--unknown-members policy]', stdout)
self.assertIn(b'[--inplace]', stdout)
- self.assertIn(b'[--no-sandbox]', stdout)
self.assertIn(b' [-v] [-l]', stdout)
self.assertIn(b'[--check-dependencies]', stdout)
self.assertIn(b'[-L | -s]', stdout)
@@ -36,8 +35,10 @@ class TestHelp(unittest.TestCase):
self.assertIn(b'mat2 [-h] [-V]', stdout)
self.assertIn(b'[--unknown-members policy]', stdout)
self.assertIn(b'[--inplace]', stdout)
- self.assertIn(b'[--no-sandbox]', stdout)
- self.assertIn(b' [-v] [-l] [--check-dependencies] [-L | -s]', stdout)
+ self.assertIn(b' [-v]', stdout)
+ self.assertIn(b'[-l]', stdout)
+ self.assertIn(b'[--check-dependencies]', stdout)
+ self.assertIn(b'[-L | -s]', stdout)
self.assertIn(b'[files ...]', stdout)
@@ -121,26 +122,6 @@ class TestCleanMeta(unittest.TestCase):
os.remove('./tests/data/clean.jpg')
- def test_jpg_nosandbox(self):
- shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
-
- proc = subprocess.Popen(mat2_binary + ['--show', '--no-sandbox', './tests/data/clean.jpg'],
- stdout=subprocess.PIPE)
- stdout, _ = proc.communicate()
- self.assertIn(b'Comment: Created with GIMP', stdout)
-
- proc = subprocess.Popen(mat2_binary + ['./tests/data/clean.jpg'],
- stdout=subprocess.PIPE)
- stdout, _ = proc.communicate()
-
- proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.cleaned.jpg'],
- stdout=subprocess.PIPE)
- stdout, _ = proc.communicate()
- self.assertNotIn(b'Comment: Created with GIMP', stdout)
-
- os.remove('./tests/data/clean.jpg')
- os.remove('./tests/data/clean.cleaned.jpg')
-
class TestCopyPermissions(unittest.TestCase):
def test_jpg_777(self):
@@ -236,6 +217,11 @@ class TestGetMeta(unittest.TestCase):
self.assertIn(b'i am a : various comment', stdout)
self.assertIn(b'artist: jvoisin', stdout)
+ def test_webp(self):
+ proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.webp'],
+ stdout=subprocess.PIPE)
+ stdout, _ = proc.communicate()
+ self.assertIn(b'Warning: [minor] Improper EXIF header', stdout)
class TestControlCharInjection(unittest.TestCase):
def test_jpg(self):
=====================================
tests/test_libmat2.py
=====================================
@@ -4,6 +4,7 @@ import unittest
import shutil
import os
import re
+import sys
import tarfile
import tempfile
import zipfile
@@ -113,6 +114,11 @@ class TestGetMeta(unittest.TestCase):
meta = p.get_meta()
self.assertEqual(meta['Comment'], 'Created with GIMP')
+ def test_webp(self):
+ p = images.WEBPParser('./tests/data/dirty.webp')
+ meta = p.get_meta()
+ self.assertEqual(meta['Warning'], '[minor] Improper EXIF header')
+
def test_ppm(self):
p = images.PPMParser('./tests/data/dirty.ppm')
meta = p.get_meta()
@@ -333,6 +339,11 @@ class TestCleaning(unittest.TestCase):
'parser': images.JPGParser,
'meta': {'Comment': 'Created with GIMP'},
'expected_meta': {},
+ #}, {
+ # 'name': 'webp',
+ # 'parser': images.WEBPParser,
+ # 'meta': {'Warning': '[minor] Improper EXIF header'},
+ # 'expected_meta': {},
}, {
'name': 'wav',
'parser': audio.WAVParser,
@@ -526,7 +537,40 @@ class TestCleaning(unittest.TestCase):
'parser': images.HEICParser,
'meta': {},
'expected_meta': {
+ 'BlueMatrixColumn': '0.14305 0.06061 0.71393',
+ 'BlueTRC': '(Binary data 32 bytes, use -b option to extract)',
+ 'CMMFlags': 'Not Embedded, Independent',
+ 'ChromaticAdaptation': '1.04788 0.02292 -0.05022 0.02959 0.99048 -0.01707 -0.00925 0.01508 0.75168',
+ 'ChromaticityChannel1': '0.64 0.33002',
+ 'ChromaticityChannel2': '0.3 0.60001',
+ 'ChromaticityChannel3': '0.15001 0.06',
+ 'ChromaticityChannels': 3,
+ 'ChromaticityColorant': 'Unknown',
+ 'ColorSpaceData': 'RGB ',
+ 'ConnectionSpaceIlluminant': '0.9642 1 0.82491',
+ 'DeviceAttributes': 'Reflective, Glossy, Positive, Color',
+ 'DeviceManufacturer': '',
+ 'DeviceMfgDesc': 'GIMP',
+ 'DeviceModel': '',
+ 'DeviceModelDesc': 'sRGB',
'ExifByteOrder': 'Big-endian (Motorola, MM)',
+ 'GreenMatrixColumn': '0.38512 0.7169 0.09706',
+ 'GreenTRC': '(Binary data 32 bytes, use -b option to extract)',
+ 'MediaWhitePoint': '0.9642 1 0.82491',
+ 'PrimaryPlatform': 'Apple Computer Inc.',
+ 'ProfileCMMType': 'Little CMS',
+ 'ProfileClass': 'Display Device Profile',
+ 'ProfileConnectionSpace': 'XYZ ',
+ 'ProfileCopyright': 'Public Domain',
+ 'ProfileCreator': 'Little CMS',
+ 'ProfileDateTime': '2022:05:15 16:29:22',
+ 'ProfileDescription': 'GIMP built-in sRGB',
+ 'ProfileFileSignature': 'acsp',
+ 'ProfileID': 0,
+ 'ProfileVersion': '4.3.0',
+ 'RedMatrixColumn': '0.43604 0.22249 0.01392',
+ 'RedTRC': '(Binary data 32 bytes, use -b option to extract)',
+ 'RenderingIntent': 'Perceptual',
'Warning': 'Bad IFD0 directory',
},
}
@@ -563,13 +607,13 @@ class TestCleaning(unittest.TestCase):
meta = p2.get_meta()
if meta:
for k, v in p2.get_meta().items():
- self.assertIn(k, case['expected_meta'], '"%s" is not in "%s" (%s)' % (k, case['expected_meta'], case['name']))
+ self.assertIn(k, case['expected_meta'], '"%s" is not in "%s" (%s), with all of them being %s' % (k, case['expected_meta'], case['name'], p2.get_meta().items()))
if str(case['expected_meta'][k]) in str(v):
continue
if 'extra_expected_meta' in case and k in case['extra_expected_meta']:
if str(case['extra_expected_meta'][k]) in str(v):
continue
- self.assertTrue(False, "got a different value (%s) than excepted (%s) for %s" % (str(v), meta, k))
+ self.assertTrue(False, "got a different value (%s) than excepted (%s) for %s, with all of them being %s" % (str(v), meta, k, p2.get_meta().items()))
self.assertTrue(p2.remove_all())
os.remove(target)
@@ -595,14 +639,20 @@ class TestCleaning(unittest.TestCase):
os.remove('./tests/data/clean.cleaned.html')
os.remove('./tests/data/clean.cleaned.cleaned.html')
- with open('./tests/data/clean.html', 'w') as f:
- f.write('<title><title><pouet/><meta/></title></title><test/>')
- p = web.HTMLParser('./tests/data/clean.html')
- self.assertTrue(p.remove_all())
- with open('./tests/data/clean.cleaned.html', 'r') as f:
- self.assertEqual(f.read(), '<title></title><test/>')
+ if sys.version_info >= (3, 13):
+ with open('./tests/data/clean.html', 'w') as f:
+ f.write('<title><title><pouet/><meta/></title></title><test/>')
+ with self.assertRaises(ValueError):
+ p = web.HTMLParser('./tests/data/clean.html')
+ else:
+ with open('./tests/data/clean.html', 'w') as f:
+ f.write('<title><title><pouet/><meta/></title></title><test/>')
+ p = web.HTMLParser('./tests/data/clean.html')
+ self.assertTrue(p.remove_all())
+ with open('./tests/data/clean.cleaned.html', 'r') as f:
+ self.assertEqual(f.read(), '<title></title><test/>')
+ os.remove('./tests/data/clean.cleaned.html')
os.remove('./tests/data/clean.html')
- os.remove('./tests/data/clean.cleaned.html')
with open('./tests/data/clean.html', 'w') as f:
f.write('<test><title>Some<b>metadata</b><br/></title></test>')
@@ -820,45 +870,6 @@ class TestCleaningArchives(unittest.TestCase):
os.remove('./tests/data/dirty.cleaned.tar.xz')
os.remove('./tests/data/dirty.cleaned.cleaned.tar.xz')
-class TestNoSandbox(unittest.TestCase):
- def test_avi_nosandbox(self):
- shutil.copy('./tests/data/dirty.avi', './tests/data/clean.avi')
- p = video.AVIParser('./tests/data/clean.avi')
- p.sandbox = False
-
- meta = p.get_meta()
- self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
-
- ret = p.remove_all()
- self.assertTrue(ret)
-
- p = video.AVIParser('./tests/data/clean.cleaned.avi')
- self.assertEqual(p.get_meta(), {})
- self.assertTrue(p.remove_all())
-
- os.remove('./tests/data/clean.avi')
- os.remove('./tests/data/clean.cleaned.avi')
- os.remove('./tests/data/clean.cleaned.cleaned.avi')
-
- def test_png_nosandbox(self):
- shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
- p = images.PNGParser('./tests/data/clean.png')
- p.sandbox = False
- p.lightweight_cleaning = True
-
- meta = p.get_meta()
- self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
-
- ret = p.remove_all()
- self.assertTrue(ret)
-
- p = images.PNGParser('./tests/data/clean.cleaned.png')
- self.assertEqual(p.get_meta(), {})
- self.assertTrue(p.remove_all())
-
- os.remove('./tests/data/clean.png')
- os.remove('./tests/data/clean.cleaned.png')
- os.remove('./tests/data/clean.cleaned.cleaned.png')
class TestComplexOfficeFiles(unittest.TestCase):
def test_complex_pptx(self):
=====================================
tests/test_lightweight_cleaning.py
=====================================
@@ -23,6 +23,11 @@ class TestLightWeightCleaning(unittest.TestCase):
'parser': images.JPGParser,
'meta': {'Comment': 'Created with GIMP'},
'expected_meta': {},
+ }, {
+ 'name': 'webp',
+ 'parser': images.WEBPParser,
+ 'meta': {'Warning': '[minor] Improper EXIF header'},
+ 'expected_meta': {},
}, {
'name': 'torrent',
'parser': torrent.TorrentParser,
View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/-/commit/f20364807a7703e8a21dfefa181ba4ec0c265242
--
View it on GitLab: https://salsa.debian.org/pkg-privacy-team/mat2/-/commit/f20364807a7703e8a21dfefa181ba4ec0c265242
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-privacy-commits/attachments/20251023/f39a0062/attachment-0001.htm>
More information about the Pkg-privacy-commits
mailing list