Bug#864082: [Fontconfig] Next steps for a reproducible Fontconfig?

Keith Packard keithp at keithp.com
Thu Jan 10 20:41:26 GMT 2019


Alexander Larsson <alexander.larsson at gmail.com> writes:

> I'd like to repeat that this is not really flatpak specific as such.
> The issue can happen in multiple cases like nfs mounts, multi-boot
> systems, docker containers, etc.

Sure, any place where path names are not the same would cause the same
issue. I think your other examples are unlikely to exhibit this in
practice though -- when NFS is used for system-level sharing, the normal
configuration tooling goes to extreme measures to ensure that pathnames
are the same across all systems.

Given that no-one has ever complained about this issue until flatpak
came along, I'd like to suggest that flatpak is somewhat unique in its
requirements here. Which means that we should feel free to find
solutions involving changes within flatpak, instead of attempting to fix
this solely within fontconfig. I'm not yet certain that we couldn't come
up with a fix that didn't affect flatpak, but we've got some strong
indications that this isn't possible.

> Flatpak has an additional weakness here, which is that we don't store
> mtimes (to maximize content sharing abilities the mtime is not part of
> the content addressing). This means the mtime can't be used to detect
> a stale cache, so we use the uuid to detect such changes.

I don't understand this -- the only way UUID could be used to trigger a
stale cache is by being missing. Otherwise, I thought mtimes were still
the only method used to know if the cache file was out of date?

> I think this misrepresents what the UUID is for.  The UUID represents
> a uniqe identifier for the *location*, not the contents, and the goal
> is to make it independent on how you found the directory. If you add
> fonts to that directory, then you're supposed to keep the UUID because
> you want to regenerate the same cache for the new content.

I may have mis-represented the UUID idea in my email, but please accept
that I do understand that they are designed to be a path-independent
identifier for each directory.

The problem is not the UUID file itself, but in how the key it contains
is generated. To generate a reproducible UUID file, you have to have a
deterministic algorithm using only data in the source files and
toolchain along with the font directory path name. But that means that
/usr/share/fonts will necessarily have the same key on all systems.

So we end with conflicting requirements -- reproducibility requires that
the font cache key for '/usr/share/fonts' be the same everywhere, but
flatpak requires that they differ so that the host /usr/share/fonts have
a different cache key than the flatpak /usr/share/fonts.

> I realize this is not what the reproducible builds project wants, but
> it is what the UUID was added for.

The UUID files also violate one of the principle design goals I adopted
for fontconfig when cache files were moved to a separate directory --
never touch the font directories.

I also fought with fontconfig for about a week when the release
including them was installed on my machine as firefox would spin
whenever it found a directory with no fonts. At the time, I felt injured
by this change.

I add this here just to let you know that I am biased against any design
including UUID files and would prefer a solution which eliminates them.

> If the UUID really *was* content addressed, then it would change each
> time some font was added, and old font caches would become stale (and
> reaped via some other way like mtimes). In this case the fact that
> caches between the sandbox and the hosts collide is not even a
> problem, since the cached data is identical and could be shared. The
> problem is rather that the font directory is mutable, and if it
> changes without immediately updating the uuid you run into issues.

Hrm. Could some combination of mtime checking and content addressing
work? Consider a .uuid file generated from a hash of the directory
contents. If the .uuid file is older than the directory, you could
regenerate it reasonably quickly by hashing the directory contents. That
would be faster than re-scanning all of the fonts in the directory at
least.

I think this would assume that font file names were globally unique,
which may not be reasonable.

It also ignores the question of reaping stale cache files; presumably
fc-cache could do that somehow...

> I'm willing to make *some* changes to flatpak, but I'm not sure this
> is the right approach. First of all it just looks at a tiny subset of
> the problem (only flatpak, and only one directory).

Changing flatpak to *always* mount host directories at the same place in
the container should solve the problem for all directories.

> Secondly, it is likely to run into issues having non-standard
> paths. For example, the fedora flatpak runtime is created from the
> standard fedora rpms, so it will have to be tweaked post install, and
> its possible that some code hard-coded
> /usr/share/fonts/some-specific-font which is in the app but not the
> host..

This is not an entirely theoretical issue -- I've run into the
ruby-prawn-icon package in the last week. That ships a selection of
fonts and accesses them via hard-coded paths. "Fortunately", the default
location for those is not within /usr/share/fonts, but I could be
convinced that these fonts should be installed in /usr/share/fonts to
match distribution policy/conventions.

I know -- 'union mounts' to the rescue! (not a serious suggestion)

> Also, we'll be guaranteeing that caches for /usr/share/fonts and
> /usr/share/fonts-minimal don't conflict, but there is no guarantee
> that different versions of /usr/share/fonts-minimal don't conflict.

I don't understand this -- the cache for /usr/share/fonts-minimalive inside the flatpak environment, and should be per-flatpak?

> Here is my proposal:
>
> Make the uuid *generation* optional and manual. Then, when we create
> the flatpak runtime we run fc-cache --make-uuid (or something) to
> generate the uuid files. Then fontconfig would never confuse the
> sandboxed /usr/share/fonts with any other, and since we would get a
> new uuid each time we regenerated the runtime it would correctly pick
> up stale caches when we update the runtime (even with no mtime
> change).

Hrm. This is a tempting solution -- normal users would never see .uuid
files at all.

However, it means that new directories created within the flatpak while
the system is running would not get .uuid files and might then have
cache names which collide with the outer system.

How about making it a font configuration per-'dir' option instead? This
way, uuid files would be automatically added to all 'internal'
directories and never to external ones.

And users could add this when adding references to external font
repositories.

> This would make the default installation of fontconfig reproducible,
> and it would solve the first problem (don't mix up sandboxed and host
> font dirs). It would also let you opt-in to the uuid in other cases
> where it makes sense. For instance, you could have a uuid file on a
> NFS share or USB drive font dir, so that any caches for it will always
> be the same no matter how it happens to be mounted.

It sounds like a good direction for discussion at least.

> We still wouldn't have a way to reuse host caches which were mounted
> in a different way, but if we assume all conflicting directories use
> uuids (like they would in the flatpak case), then we could solve this
> in a pretty simple way by a config file saying "treat all instances of
> /run/host/fonts as /usr/share/fonts", and I could make flatpak
> generate such a file.

I've already got a patch series which solves this problem -- you can map
paths to cache keys on a per-'dir' element basis.

Here's an alternative proposal:

        Add a per 'dir' element 'salt' value, which is stirred into the
        path name when generating the cache key. You'd generate this
        randomly when the flatpak was created so that all cache keys
        would not collide with entries using a different (or absent)
        salt value.

With this, and my path->key mapping series, we would be able to access
the existing cache files for external fonts (via the mapping mechanism), as
well as avoid collisions between internal and external font paths within
the cache. And we wouldn't have .uuid files (see above).

I still don't understand how UUID files help with the missing mtime
issue though; if you could explain that in a bit more detail, that would
help me, and perhaps expose a weakness with my alternative proposal.

-- 
-keith
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/pkg-freedesktop-maintainers/attachments/20190110/4ea18dfc/attachment-0001.sig>


More information about the Pkg-freedesktop-maintainers mailing list