Bug#839538: diffoscope: json: detect order-only differences

Daniel Shahaf danielsh at apache.org
Sat Oct 1 18:06:38 UTC 2016


Control: tags -1 patch

Daniel Shahaf wrote on Sat, Oct 01, 2016 at 17:23:42 +0000:
> It would be better to report "json files are equal up to order of
> elements in an object (= hash, dictionary, associative array)", and to
> print the difference in a more readable way than a hex dump.  (For
> example, a linewise diff of pretty-printed json.)

Proposed patch attached.  It behaves as follows:

[[[
% head *.json
==> 1.json <==
{ "hello": 42, "world": 43 }

==> 2.json <==
{ "world": 43, "hello": 42 }
% bin/diffoscope *.json
--- 1.json
+++ 2.json
│   --- 1.json
├── +++ 2.json
│┄ ordering differences only
│ @@ -1,4 +1,4 @@
│  {
│ -    "hello": 42,
│ -    "world": 43
│ +    "world": 43,
│ +    "hello": 42
│  }
╵
]]]

It passes the existing test suite, but I haven't yet tried writing
a unit test for this.

Cheers,

Daniel

diff --git a/diffoscope/comparators/json.py b/diffoscope/comparators/json.py
index d16a762..8d0c104 100644
--- a/diffoscope/comparators/json.py
+++ b/diffoscope/comparators/json.py
@@ -17,6 +17,7 @@
 # You should have received a copy of the GNU General Public License
 # along with diffoscope.  If not, see <http://www.gnu.org/licenses/>.
 
+from collections import OrderedDict
 import re
 import json
 
@@ -34,18 +35,26 @@ class JSONFile(File):
 
         with open(file.path) as f:
             try:
-                file.parsed = json.load(f)
+                file.parsed = json.load(f, object_pairs_hook=OrderedDict)
             except json.JSONDecodeError:
                 return False
 
         return True
 
     def compare_details(self, other, source=None):
-        return [Difference.from_text(self.dumps(self), self.dumps(other),
-            self.path, other.path)]
+        difference = Difference.from_text(self.dumps(self), self.dumps(other),
+            self.path, other.path)
+        if difference:
+            return [difference]
+
+        difference = Difference.from_text(self.dumps(self, sort_keys=False),
+                                          self.dumps(other, sort_keys=False),
+                                          self.path, other.path,
+                                          comment="ordering differences only")
+        return [difference]
 
     @staticmethod
-    def dumps(file):
+    def dumps(file, sort_keys=True):
         if not hasattr(file, 'parsed'):
             return ""
-        return json.dumps(file.parsed, indent=4, sort_keys=True)
+        return json.dumps(file.parsed, indent=4, sort_keys=sort_keys)



More information about the Reproducible-builds mailing list