Thursday, July 8, 2010

Characters and Character Handling

There will be three character representations: VARCHAR, which is an 8 bit representation, NVARCHAR, which is a 16 bit character representation (with an internal compression used for all-ASCII strings), and NVARCHAR4, which is a 32 bit (4 byte) representation. Query text will be done in NVARCHAR4, although all query keywords and identifiers will be C locale alphanumerics and representable in 7 bits. BINARY will also be available.

In addition to the above, there'll be a couple compact representations available: RCHAR4 (4 bits per character for an enumerated set of 16 characters), and RCHAR6 (6 bits per character for an enumerated set of 64 characters). The specific characters used for an enumeration will be done by the creation of a TYPE, ie

CREATE CHARACTER TYPE HEXSTR USING RCHAR4 VALUES ('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F');

CREATE RECORD TYPE T (HEXVAL HEXSTR);

Note that HEXSTR may be a "builtin".

Sorting and Ordering

ORDER for ORDER BY and indexes will be done using native byte ordering unless an ORDER BY function is used, either in the query or in the CREATE INDEX definition for ordered indexes. (What exactly ORDER BY functions mean will be discussed later after I think about them a bit more...)

Also note that BINARY cannot be interpreted as a character set and only supports single-byte native ordering.

No comments: