lljson encode and decode functions with vectors, quaternions, and UUIDs.
in progress
SungAli Resident
Right now, vectors and quaternions encode to strings and decode to strings with the "<... >" format. Likewise UUIDs decode to strings. This is a major nuisance to scripters when decoding tables with embedded vectors, quaternions or UUIDs, requiring special decoding functions in each and every script using them.
The underlying problem is that at present there is no way to differentiate between an encoded vector, quaternion, or UUID and the equivalent string value. i.e vector(1,2,3) and "<1,2,3>" both encode to "<1,2,3>". I'd like to propose a variation to encode/decode, (call them lljson.pack and unpack or better, perhaps, add an optional EncodingType argument to encode and decode), that when a vector or quaternion is encountered, encodes them as they are now. UUIDs would then be encoded by adding the same <> delimiters around the current UUID string. When a string starting with < and ending with > is encountered, encode it adding an extra < and > at each end, then decode such strings by removing the added < and >. Then the vectors, quaternions, and UUIDs can be uniquely identified as such by the undoubled delimiters and the appropriate internal format and decoded directly to the appropriate type. This would allow the encoding and decoding of all SLua types in tables without special intervention by the scripters--significantly simplifying scripting such operations and greatly improving performance (one optimized pass through the data in C, rather than one in C followed by one of random scripter quality in SLua) when passing tables between scripts, or storing and retrieving tables in Linkset Data.
The only reason for keeping encode and decode as they are now is for the sake of compatibility with external json operations and even then if vectors, quaternions, or UUIDs are involved the proposed operations would likely be superior as it would be necessary to make accommodations for these types on the remote end anyway.
Log In
H
Harold Linden
marked this post as
in progress
nya Resident
not sure if it helps, but I've seen this approached like this, in i.e. object stores like Couchbase etc.
[{ "_type": "vector", "_value": [1, 2, 3] }] -> [<1, 2, 3>]
and double underscore in some .NET environments, though I understand this would bloat the strings a bit
H
Harold Linden
Merged in a post:
The library lljson could encode sparse arrays as objects instead of throwing an error
SuzannaLinn Resident
For example:
t = {}
t[10] = true
print(lljson.encode(t))
returns:
[null,null,null,null,null,null,null,null,null,true]
But:
t = {}
t[11] = true
print(lljson.encode(t))
throws the error:
Cannot serialise table: excessively sparse array
It could instead be encoded as an object.
Or always encoded as an object when
#t == 0 or next(t, #t) ~= nil
.H
Harold Linden
Now that I think of it, encoding sparse arrays as objects would have issues currently because all object keys in JSON must be strings, so tables would lose their "array-ness" if we encoded them as objects automatically.
Might need to tag the keys as "integer-like" for sparse arrays for this to work seamlessly, and we'll need an SL-specific serialization mode for that. I'll merge these posts.
H
Harold Linden
marked this post as
planned
Thanks for bringing this up in the meeting! Some variant of this seems reasonable to me in the short term.
There is still the ambiguity problem of what to do with legitimately string-ized vectors (use
<<(whatever)>>
as a marker that this is not to be interpreted specially?)Overloading JSON by stuffing semantic information in a string value and interpreting that specially makes me a bit queasy as a data nut, but it's the solution that's least likely to give everyone a big headache.
It'd also gives us a better way to support sparse tables by specifying that table keys (JSON objects only allow string keys) should actually be treated as numbers.
H
Harold Linden
I'd initially been thinking about a binary format for space-sensitive applications like storage in linkset data, but zlib-compressed JSON is probably competitive for SLua's workloads. I'll look at exposing
zlib
(de)compression as well.SuzannaLinn Resident
Harold Linden
Could the current
lljson.encode/decode
functions remain for standard JSON, while new functions, such as encodell/decodell
, are added to handle the extended JSON for SL (LLON)?H
Harold Linden
SuzannaLinn Resident Yep, my intention is to make it explicit via a separate function or a second arg with a table of options, rather than handing people a loaded footgun :)
Nexii Malthus
Harold Linden doesn't zstd offer much better de-/compression speeds and as widely supported?
(
zstd 1.5.7 -1
at 510 MB/s / 1550 MB/s versus zlib 1.3.1 -1
at 105 MB/s / 390 MB/s)H
Harold Linden
Nexii Malthus I'd have to see how that scales for relatively small payload sizes. If setup time is constant-time and that constant cost is high, that could be a problem, given that the payloads for SLua will be small.
I'm not particular about whatever compression algo, though. Just want an implementation that's not going to add a ton of code to the software bill of materials and zlib is pretty small.
Nexii Malthus
Harold Linden it's by facebook for the web and integrated into browsers for faster page loading -- https://caniuse.com/zstd -- and has a special mode for small data: https://facebook.github.io/zstd/#small-data
H
Harold Linden
Nexii Malthus I'm familiar with zstd and do use it, it's just that the "small data" mode they refer to relies on you having a much larger dictionary elsewhere that's representative of what you're trying to compress. That dictionary also has to be transmitted out-of-band.
That makes a lot of sense when you know for a given use-case you're compressing HTML / JSON of a very regular form, not as much when you're providing a general-purpose compression API.
ETA: and of course, latency is more important here than in most cases, a low constant setup cost is ideal since we want scripts to be able to complete their work in 0.0001 seconds in many cases.
I'll have to look at benchmarking it with our actual usecase to see what's best for throughput and memory usage.
SungAli Resident
Lachesis, JSON for compatibility with existing structures and human readability, but I would also like and am considering suggesting adding a more compact serialization option to the JSON library that is not particularly human readable. This wouldn't technically be proper JSON, but by including it as an option to the encode and decode functions, scripters could readily switch between the two by changing a single value, thus using JSON (or my variation thereon) while debugging, then switching to more efficient means for production.
Lachesis Ethereal
Does that really have to be JSON, or is it actually just a serializer/deserializer you want?