Regina 7.0 Calculation Engine
Tight encodings of data

Regina includes support for tight encodings, which are encodings of various data types as short printable strings.

Tight encodings have the following properties:

  • They use only printable ASCII characters (the 94 ASCII values from 33 to 126 inclusive).
  • They aim to be short (typically much shorter than the usual human-readable representations, such as decimal representations of integers or full text representations of polynomials).
  • When reading an encoded object character-by-character, the encoding contains enough information to know when the last character has been read (even without the foreknowledge of whether the input stream contains more characters).
  • Objects with the same inherent value, even if they use different underlying C++ types, will encode to the same string. For example, the integer 7 will have the same encoding regardless of whether it is stored as an int, a long, or a regina::Integer. Note that this guarantee only extends to types that "conceptually" intend to represent the same broad types of objects, possibly with different limitations. So, for example, there is no guarantee that the integer 7, the rational 7/1, and/or the constant polynomial 7 would encode to the same string.
  • Conversely, objects of the same type but with different inherent values will encode to different strings. So, for example, the integers 7 and -7 will have different encodings.

A consequence of the last two points is that, if the type of an object is known in advance, then its value can in theory be recovered from its encoding. However, the encoding does not contain enough information to deduce the type if this is not already known.

Because encodings contain enough information to identify where they end, this means that you can encode a sequence of objects by concatenating the individual encodings with no separators, and (assuming the types of the objects are fixed) this will be enough to guarantee that different sequences likewise have different encodngs.

Regina does not provide decoding routines, though (as noted above) this should be possible if the underlying types are known. This is because tight encodings were originally designed for applications such as perfect hashing, where the aim is essentially to "compress" the data in a short printable string whilst preserving the correctness of equality tests.

For native C++ data types where tight encodings are supported, these are provided in the header utilities/tightencoding.h through overloads of the global functions tightEncoding() (which returns a string) and tightEncode() (which writes to an output stream).

For Regina's own data types where tight encodings are supported, these are provided through tightEncode() and tightEncoding() member functions of the corresopnding classes.


Copyright © 1999-2021, The Regina development team
This software is released under the GNU General Public License, with some additional permissions; see the source code for details.
For further information, or to submit a bug or other problem, please contact Ben Burton (bab@maths.uq.edu.au).