CONNECT BIN Table Type

Overview

A table of type BIN is physically a binary file in which each row is a logical record of fixed length[1]. Within a record, column fields are of a fixed offset and length as with FIX tables. Specific to BIN tables is that numerical values are internally encoded using native platform representation, so no conversion is needed to handle numerical values in expressions.

It is not required that the lines of a BIN file be separated by characters such as CR and/or LF but this is possible. In such an event, the lrecl option must be specified accordingly.

Note: Unlike for the DOS and FIX types, the width of the fields is the length of their internal representation in the file. For instance for a column declared as:

number int(5) not null,

The field width in the file is 4 characters, the size of a binary integer. This is the value used to calculate the offset of the next field if it is not specified. Therefore, if the next field is placed 5 characters after this one, this declaration is not enough, and the flag option will have to be used on the next field.

Type Conversion in BIN Tables

Here are the correspondences between the column type and field format provided by default:

Column type File default format
Char(n) Text of n characters.
Date Integer (4 bytes)
Int(n) Integer (4 bytes)
Smallint(n) Short integer (2 bytes)
TinyInt(n) Char (1 Byte)
Bigint(n) Large integer (8 bytes)
Double(n,d) Double floating point (8 bytes)

However, the column type need not necessarily match the field format within the table file. In particular, this occurs for field formats that correspond to numeric types that are not handled by CONNECT[2]. Indeed, BIN table files may internally contain float numbers or binary numbers of any byte length in big-endian or little-endian representation[3]. Also, as in DOS or FIX types tables, you may want to handle some character fields as numeric or vice versa.

This is why it is possible to specify the field format when it does not correspond to the column type default using the field_format column option in the CREATE TABLE statement. Here are the available field formats for BIN tables:

Field_format Internal representation
[n]{L or B or H}[n] n bytes binary number in little endian, big endian or host endian representation.
C Characters string (n bytes)
I integer (4 bytes)
D Double float (8 bytes)
S Short integer (2 bytes)
T Tiny integer (1 byte)
G Big integer (8 bytes)
F or R Real or float (Floating point number on 4 bytes)
X Use the default format field for the column type

All field formats (except the first one) are a one-character specification[4]. 'X' is equivalent to not specifying the field format. For the 'C' character specification, n is the column width as specified with the column type. For one-column formats, the number of bytes of the numeric fields corresponds to what it is on most platforms. However, it could vary for some. The G, I, S and T formats are deprecated because they correspond to supported data types and may not be supported in future versions.

Example

Here is an example of a BIN table. The file record layout is supposed to be:

NNNNCCCCCCCCCCIIIISSFFFFSS

Here N represents numeric characters, C any characters, I integer bytes, S short integer bytes, and F float number bytes. The IIII field contains a date in numeric format.

The table could be created by:

create table testbal (
fig int(4) not null field_format='C',
name char(10) not null,
birth date not null field_format='L',
id char(5) not null field_format='L2',
salary double(9,2) not null default 0.00 field_format='F',
dept int(4) not null field_format='L2')
engine=CONNECT table_type=BIN block_size=5 file_name='Testbal.dat';

Specifying the little-endian representation for binary values is not useful on most machines, but makes the create table statement portable on a machine using big endian, as well as the table file.

The field offsets and the file record length are calculated according the column internal format and eventually modified by the field format. It is not necessary to specify them for a packed binary file without line endings. If a line ending is desired, specify the ending option or specify the lrecl option adding the ending width. The table can be filled by:

insert into testbal values
  (5500,'ARCHIBALD','1980-01-25','3789',4380.50,318),
  (123,'OLIVER','1953-08-10','23456',3400.68,2158),
  (3123,'FOO','2002-07-23','888',default,318);

Note that the types of the inserted values must match the column type, not the field format type.

The query:

select * from testbal;

returns:

fig name birth id salary dept
5500 ARCHIBALD 1980-01-25 3789 4380.50 318
123 OLIVER 1953-08-10 23456 3400.68 2158
3123 FOO 2002-07-23 888 0.00 318

Numeric fields alignment

In binary files, numeric fields and record length can be aligned on 4-or-8-byte boundaries to optimize performance on certain processors. This can be modified in the OPTION_LIST with an "align" option ("packed" meaning align=1 is the default).


  1. Sometimes it can be a physical record if LF or CRLF have been written in the file.
  2. Most of these are obsolete because CONNECT supports all column types except float
  3. The default endian representation used in the table file can be specified by setting the ENDIAN option as ‘L’ or ‘B’ in the option list.
  4. It can be specified with more than one character, but only the first one is significant.
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.

© 2021 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/connect-bin-table-type/