Bug#848049: diffoscope: Add detection of order-only differences in plain text formats

Jérémy Bobbio lunar at debian.org
Sun Dec 25 14:28:52 UTC 2016


Hi!

Маша Глухова:
> The reason why I did not use some algorihm like that is that it requires to
> read files for the second time. Right now, all the actual work with the
> content of the files (except for the quick check for has_same_content) is
> delegated to diff, and on big files, it occupies most of the time. Assuming
> that for big files, reading them from drive would be the bottleneck, I
> tried to avoid reading them again, instead working with the result of the
> diff.
> Still, I would be happily mistaken. I will implement your version and
> compare the performance.

You would not have to read the file twice as long as you do the hash
in the difference module, when each line is actually fed to diff.
A similar trick is already used to cope with files that are too long,
see diffoscope.difference.make_feeder_from_raw_reader()

I don't know if my suggestions is a good one. It might not be a good
idea at all. Feel free to discuss it with your mentor before spending
too much time on it.

> Thank you again :)

PS: Please call me Lunar. :)

-- 
Lunar                                .''`. 
lunar at debian.org                    : :Ⓐ  :  # apt-get install anarchism
                                    `. `'` 
                                      `-   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20161225/bc51f84f/attachment.sig>


More information about the Reproducible-builds mailing list