Regina 7.3 Calculation Engine
Tight encodings of data

Regina includes support for tight encodings, which are encodings of various data types as short printable strings.

Tight encodings have the following properties:

  • They use only printable ASCII characters (the 94 ASCII values from 33 to 126 inclusive), and do not contain any whitespace. This means (for example) you can use them as whitespace-separated tokens in plain text files. However, they do make use of all of the ASCII punctuation symbols, and so you must take care when (for example) trying to hard-code them as strings in source code, or using them as components of filenames.
  • They aim to be short (typically much shorter than the usual human-readable representations, such as decimal representations of integers or full text representations of polynomials).
  • When reading an encoded object character-by-character, the encoding contains enough information to know when the last character has been read (even without the foreknowledge of whether the input stream contains more characters).
  • Objects with the same inherent value, even if they use different underlying C++ types, will encode to the same string. For example, the integer 7 will have the same encoding regardless of whether it is stored as an int, a long, or a regina::Integer. Note that this guarantee only extends to types that "conceptually" intend to represent the same broad types of objects, possibly with different limitations. So, for example, there is no guarantee that the integer 7, the rational 7/1, and/or the constant polynomial 7 would encode to the same string.
  • Conversely, objects of the same type but with different inherent values will encode to different strings. So, for example, the integers 7 and -7 will have different encodings.

A consequence of the last two points is that, if the type of an object is known in advance, then its value can be recovered from its encoding. However, the encoding does not contain enough information to deduce the type if this is not already known.

Because encodings contain enough information to identify where they end, this means that you can encode a sequence of objects by concatenating the individual encodings with no separators, and (assuming the types of the objects are fixed) this will be enough to guarantee that different sequences likewise have different encodngs.

Tight encodings were originally designed to support perfect hashing (essentially "compressing" data into a short printable string whilst preserving the correctness of equality tests). As a result, they were originally intended to be used only in one direction. However, Regina does provide matching decoding routines if you need to reconstruct objects from their tight encodings.

For native C++ data types where tight encodings and decodings are supported, these are provided in the header utilities/tightencoding.h through overloads of the global functions tightEncoding() and tightDecoding() (which work with strings), and tightEncode() and tightDecode() (which work with input/output streams).

For Regina's own data types where tight encodings and decodings are supported, these are provided through member functions tightEncoding(), tightEncode(), tightDecoding() and tightDecode() within the corresponding classes.


Copyright © 1999-2023, The Regina development team
This software is released under the GNU General Public License, with some additional permissions; see the source code for details.
For further information, or to submit a bug or other problem, please contact Ben Burton (bab@maths.uq.edu.au).