https://github.com/conspack/cl-conspack.git
git clone 'https://github.com/conspack/cl-conspack.git'
(ql:quickload :cl-conspack)
Recent changes:
WITH-PROPERTIES
if used outside an ENCODE
or DECODE
, see below.CONSPACK was inspired by MessagePack, and by the general lack of features among prominent serial/wire formats:
JSON isn't terrible, but can become rather large, and is potentially susceptible to READ exploits (as per the recently-fixed “10e99999999” bug in SBCL).
BSON (binary JSON) doesn't really solve much; though it encodes numbers, it's not particularly smaller or more featureful than JSON.
MessagePack is small, but lacks significant features; it can essentially encode arrays or maps of numbers, and any interpretation beyond that is up to the receiver.
Protobufs and Thrift are static.
It should be noted that, significantly, none of these support references. Of course, references can be implemented at a higher layer (e.g., JSPON), but this requires implemeting an entire additional layer of abstraction and escaping, including rewalking the parsed object hierarchy and looking for specific signatures, which can be error-prone, and hurt performance.
Additionally, none of these appear to have much in the way of security, and communicating with an untrusted peer is probably not recommended.
CONSPACK, on the other hand, attempts to be a more robust solution:
Richer set of data types, differentiating between arrays, lists, maps, typed-maps (for encoding classes/structures etc), numbers, strings, symbols, and a few more.
Very compact representation that can be smaller than MessagePack.
In-stream references, including optional forward references, which can allow for shared or circular data structures. Additionally, remote references allow the receiver the flexibility to parse and return its own objects without further passes on the output.
Security, including byte-counting for (estimated) maximum output size, and the elimination of circular data structures.
Speed, using fast-io, encoding and decoding can be many times faster than alternatives, even while tracking references (faster still without!).
See SPEC for complete details on encoding.
cl-conspack
is simple to use:
(encode '(1 2 3)) ;; => #(40 4 16 1 16 2 16 3 0)
(decode (encode '(1 2 3))) ;; => (1 2 3)
;; Smaller if the element-type is known:
(encode (fast-io:octets-from '(1 2 3)))
;; => #(36 3 20 1 2 3)
Conspack provides the ability to serialize and deserialize objects of any kind.
The easiest way, for the common case:
(conspack:defencoding my-class
slot-1 slot-2 slot-3)
This expands to the more flexible way, which specializes
ENCODE-OBJECT
and DECODE-OBJECT
:
(defmethod conspack:encode-object ((object my-class) &key &allow-other-keys)
(conspack:slots-to-alist (object)
slot-1 slot-2 slot-3 ...))
(defmethod conspack:decode-object ((class (eql 'my-class)) alist
&key &allow-other-keys)
(alist-to-slots (alist :class my-class)
slot-1 slot-2 slot-3))
ENCODE-OBJECT
should specialize on the object and return an alist.
The alist returned will be checked for circularity of tracking-refs
is in use.
DECODE-OBJECT
should specialize on (eql 'class-name)
, and produce
an object based on the alist.
As you can see, this does not require objects be in any particular format, or that you store any particular slots or values. It does not specify how you restore an object.
But for the “normal” case, SLOTS-TO-ALIST
and ALIST-TO-SLOTS
are
provided to build and restore from alists, and DEFENCODING
can
define all of this in one simple form.
Circularity tracking is not on by default, you can enable it for a
particular block of encode
s or decode
s by using tracking-refs
:
(tracking-refs ()
(decode (encode CIRCULAR-OBJECT)))
“Remote” references are application-level references. You may encode a reference using an arbitrary object as a descriptor:
(encode (r-ref '((:url . "http://..."))))
When decoding, you may provide a function to handle these:
(with-remote-refs (lambda (x) (decode-url x))
(decode OBJECT))
If you have a relatively small static set of symbols you will always use for a particular encoding/decoding, you may want to use indexes. These allow symbols to be very-tightly-packed: for up to 15 symbols, a single byte can encode the symbol! For up to 256, two bytes, and so on.
Trivially:
(cpk:with-index (specifier-1 specifier-2 specifier-3)
(cpk:encode '(specifier-1 specifier-2 specifier-3)))
;; => #(40 4 176 177 178 0)
;; Contrast this with:
(cpk:encode '(specifier-1 specifier-2 specifier-3))
;; #(40 4 130 64 11 83 80 69 67 73 70 73 69 82 45 49 129 64 16 67 79
;; 77 77 79 78 45 76 73 83 80 45 85 83 69 82 130 64 11 83 80 69 67 73
;; 70 73 69 82 45 50 129 64 16 67 79 77 77 79 78 45 76 73 83 80 45 85
;; 83 69 82 130 64 11 83 80 69 67 73 70 73 69 82 45 51 129 64 16 67 79
;; 77 77 79 78 45 76 73 83 80 45 85 83 69 82 0)
(This is a somewhat excessive example, since long non-keyword symbols are used. Shorter keyword symbols would be relatively shorter, but this is the general case.)
For more “realistic” use, you may define an index and refer to it:
(define-index index-name
symbol-1 symbol-2 ...)
(with-named-index 'index-name
(encode ...))
For instance, you may define multiple indexes for multiple different format versions, read the version, and use the appropriate index:
(define-index 'version-1 ...)
(define-index 'version-2 ...)
(let ((version (decode-stream s)))
(with-named-index version
;; Decode the rest of the stream appropriately. You may want to
;; do more checking on VERSION if security is required...
(decode-stream s)))
Note that using tracking-refs
will also help encode symbols
efficiently, but not quite as efficiently:
However, tracking-refs
is a perfectly suitable option, especially if
flexibility is desired, since all symbol information is encoded, and
nothing special is needed for decoding.
(Properties now require a WITH-PROPERTIES
block in some
circumstances, see below.)
Properties are a way to specify additional information about an object
that may be useful at decode-time. For instance, while hash tables
are supported as maps, there are no bits to specify the :test
parameter, so decoding a hash table of strings would produce a useless
object. In this case, the :test
property is set when encoding and
checked when decoding hash tables.
You may specify arbitrary properties for arbitrary objects; the only
restriction is the objects must test by EQ
.
(conspack:with-properties ()
(let ((object (make-instance ...)))
(setf (property object :foo) 'bar)
(property object :foo))) ;; => BAR
This sets the :foo
property to the symbol bar
, and it is encoded
along with the object. Note this will increase the object size, by
the amount required to store a map of symbols-to-values.
When decoding, you can access properties about an object via
*current-properties*
:
(defmethod decode-object (...)
(let ((prop (getf *current-properties* NAME)))
...))
You may remove them with remove-property
or remove-properties
.
Properties are now only available within a WITH-PROPERTIES
block. This has a number of benefits, including some thread safety,
and ensuring properties don't stick around forever.
ENCODE
and DECODE
have implicit WITH-PROPERTIES
blocks: you
don't need to specify WITH-PROPERTIES
if you use properties inside
ENCODE-OBJECT
, DECODE-OBJECT
, or encode and decode any objects
that have implicit properties. You only need this if you wish to
access properties outside of the encode or decode (e.g.,
preassigning properties to be encoded).
Conspack provides some level of “security” by approximately limiting the amount of bytes allocated when reading objects.
By default, because format sizes are prespecified statically, it's possible to specify extremely large allocations for e.g. arrays with only a few bytes. Obviously, this is not suitable for untrusted conspack data.
The solution is simply to cap allocations:
(with-conspack-security (:max-bytes 200000)
(decode ...))
Since actual allocation sizes are rather difficult to get in most lisps, this approximates the allocation based on how big each object might be, e.g.:
pointer-size * array-size
string-length
number-size
Each object header is tallied against the limit just prior to its decoding; if the object would exceed the allowed bytes, decoding halts with an error.
Further options may be added in the future.
Since conspack is a binary format, it's rather difficult for humans to
read just looking at the stream of bytes. Thus an EXPLAIN
feature
is provided. This is mostly useful for debugging the format; however
it may be of interest otherwise and certainly may be helpful when
creating other implementations.
For instance:
(explain (encode '(1 2 3)))
;; =>
((:LIST 4
((:NUMBER :INT8 1) (:NUMBER :INT8 2) (:NUMBER :INT8 3) (:BOOLEAN NIL)))
END-OF-FILE)