data: Replace custom formats with msgpack by fstachura · Pull Request #374 · bootlin/elixir

fstachura · 2024-12-29T21:34:09Z

I have previously attempted to refactor data converters used in data.py in this branch. In the end I wasn't happy with the result, because I believe that writing custom parsers for each database is the wrong approach.

This PR replaces all data.py parsers with msgpack. Plain python objects can be serialized and deserialized into the databases.
The main advantage is convenience - values can be manipulated like normal Python objects, no string parsing is required anywhere in the codebase that interacts with the database.

From what I remember, larger databases were also a bit smaller, mainly because large ints take less space in msgpack than in base10 representation. But to be fair, there is some storage overhead for other datatypes.
I also wouldn't be surprised if average serialization/deserialization times were a bit smaller, although I don't have numbers on that and I doubt it's a major bottleneck anywhere.

Leaving this as a draft - I tested it only a little bit.

tleb · 2025-02-14T16:04:39Z

elixir/data.py

+            self.families = parsed_data[1]
+        else:
+            self.entries = []
+            self.families = ""


Could we work only on the raw data, and parsing it when things are requested? Goal is to store only a bytes buffer without taking loads of memory if we want to have loads in memory.

tleb · 2025-02-14T16:07:49Z

elixir/data.py

+            else:
+                return self.ctype(p)
+        else:
+            return None


Early return the if p is None case.

tleb · 2025-02-14T16:08:30Z

elixir/data.py

+        if type(key) is str:
+            key = key.encode()
+        elif type(key) is int:
+            key = msgpack.dumps(key)


Do we want to do this? Isn't it rather an error if someone gives us a string or int? A key is of type bytes, callers that don't respect that have a bug IMO.

tleb · 2025-02-14T16:09:22Z

elixir/lib.py

    elif type(arg) is int:
-        arg = str(arg).encode()
+        arg = msgpack.dumps(arg)
    return arg


Same comment: is this used? Shouldn't callers know what they have and do the right thing themselves.

tleb · 2025-02-14T16:09:44Z

static/dynamic-references.js

      previous_type = sd.type;
    }
-    let ln = sd.line.toString().split(',');
+    let ln = [sd.line];


Why is that related to msgpack?

tleb

Nothing major, the change isn't massive, no surprise. Some more cleanup (like random code comments here and there) is required.

data: Replace custom formats with msgpack

7eaa77b

tleb reviewed Feb 14, 2025

View reviewed changes

elixir/data.py

else:

return self.ctype(p)

else:

return None

Copy link

Member

tleb Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return the if p is None case.

tleb reviewed Feb 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data: Replace custom formats with msgpack#374

data: Replace custom formats with msgpack#374
fstachura wants to merge 1 commit intobootlin:masterfrom
fstachura:msgpack

fstachura commented Dec 29, 2024

Uh oh!

tleb Feb 14, 2025

Uh oh!

tleb Feb 14, 2025

Uh oh!

tleb Feb 14, 2025

Uh oh!

tleb Feb 14, 2025

Uh oh!

tleb Feb 14, 2025

Uh oh!

tleb left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fstachura commented Dec 29, 2024

Uh oh!

tleb Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

tleb Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

tleb Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

tleb Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

tleb Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

tleb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tleb left a comment •

edited

Loading