CONNECT DOS and FIX Table Types

Overview

Tables of type DOS and FIX are based on text files (see CONNECT Table Types - Data Files). Within a record, column fields are positioned at a fixed offset from the beginning of the record. Except sometimes for the last field, column fields are also of fixed length. If the last field has varying length, the type of the table is DOS. For instance, having the file dept.dat formatted like:

0318 KINGSTON       70012 SALES       Bank/Insurance
0021 ARMONK         87777 CHQ         Corporate headquarter
0319 HARRISON       40567 SALES       Federal Administration
2452 POUGHKEEPSIE   31416 DEVELOPMENT Research & development

You can define a table based on it with:

create table department (
  number char(4) not null,
  location char(15) not null flag=5,
  director char(5) not null flag=20,
  function char(12) not null flag=26,
  name char(22) not null flag=38)
engine=CONNECT table_type=DOS file_name='dept.dat';

Here the flag column option represents the offset of this column inside the records. If the offset of a column is not specified, it defaults to the end of the previous column and defaults to 0 for the first one. The lrecl parameter that represents the maximum size of a record is calculated by default as the end of the rightmost column and can be unspecified except when some trailing information exists after the rightmost column.

Note: A special case is files having an encoding such as UTF-8 (for instance specifying charset=UTF8) in which some characters may be represented with several bytes. Unlike the type size that MariaDB interprets as a number of characters, the lrecl value is the record size in bytes and the flag value represents the offset of the field in the record in bytes. If the flag and/or the lrecl value are not specified, they will be calculated by the number of characters in the fields multiplied by a value that is the maximum size in bytes of a character for the corresponding charset. For UTF-8 this value is 3 which is often far too much as there are very few characters requiring 3 bytes to be represented. When creating a new file, you are on the safe side by only doubling the maximum number of characters of a field to calculate the offset of the next field. Of course, for already existing files, the offset must be specified according to what it is in it.

Although the field representation is always text in the table file, you can freely choose the corresponding column type, characters, date, integer or floating point according to its contents.

Sometimes, as in the number column of the above department table, you have the choice of the type, numeric or characters. This will modify how the column is internally handled in characters 0021 is different from 21 but not in numeric as well as how it is displayed.

If the last field has fixed length, the table should be referred as having the type FIX. For instance, to create a table on the file boys.txt:

John      Boston      25/01/1986  02/06/2010
Henry     Boston      07/06/1987  01/04/2008
George    San Jose    10/08/1981  02/06/2010
Sam       Chicago     22/11/1979  10/10/2007
James     Dallas      13/05/1992  14/12/2009
Bill      Boston      11/09/1986  10/02/2008

You can for instance use the command:

create table boys (
  name char(12) not null,
  city char(12) not null,
  birth date not null date_format='DD/MM/YYYY',
  hired date not null date_format='DD/MM/YYYY' flag=36)
engine=CONNECT table_type=FIX file_name='boys.txt' lrecl=48;

Here some flag options were not specified because the fields have no intermediate space between them except for the last column. The offsets are calculated by default adding the field length to the offset of the preceding field. However, for formatted date columns, the offset in the file depends on the format and cannot be calculated by default. For fixed files, the lrecl option is the physical length of the record including the line ending character(s). It is calculated by adding to the end of the last field 2 bytes under Windows (CRLF) or 1 byte under UNIX. If the file is imported from another operating system, the ENDING option will have to be specified with the proper value.

For this table, the last offset and the record length must be specified anyway because the date columns have field length coming from their format that is not known by CONNECT. Do not forget to add the line ending length to the total length of the fields.

This table is displayed as:

name city birth hired
John Boston 1986-01-25 2010-06-02
Henry Boston 1987-06-07 2008-04-01
George San Jose 1981-08-10 2010-06-02
Sam Chicago 1979-11-22 2007-10-10
James Dallas 1992-05-13 2009-12-14
Bill Boston 1986-09-11 2008-02-10

Whenever possible, the fixed format should be preferred to the varying one because it is much faster to deal with fixed tables than with variable tables. Sure enough, instead of being read or written record by record, FIX tables are processed by blocks of BLOCK_SIZE records, resulting in far less input/output operations to execute. The block size defaults to 100 if not specified in the Create Table statement.

Note 1: It is not mandatory to declare in the table all the fields existing in the source file. However, if some fields are ignored, the flag option of the following field and/or the lrecl option will have to be specified.

Note 2: Some files have an EOF marker (CTRL+Z 1A) that can prevent the table to be recognized as fixed because the file length is not a multiple of the fixed record size. To indicate this, use in the option list the create option EOF. For instance, if after creating the FIX table xtab on the file foo.dat that you know have fixed record size, you get, when you try to use it, a message such as:

File foo.dat is not fixed length, len=302587 lrecl=141

After checking that the LRECL default or specified specification is correct, you can indicate to ignore that extra EOF character by:

alter table xtab option_list='eof=1';

Of course, you can specify this option directly in the Create statement. All this applies to some other table types, in particular to BIN tables.

Note 3: The width of the fields is the length specified in the column declaration. For instance for a column declared as:

number int(3) not null,

The field width in the file is 3 characters. This is the value used to calculate the offset of the next field if it is not specified. If this length is not specified, it defaults to the MySQL default type length.

Specifying the Field Format

Some files have specific format for their numeric fields. For instance, the decimal point is absent and/or the field should be filled with leading zeros. To deal with such files, as well in reading as in writing, the format can be specified in the CREATE TABLE column definition. The syntax of the field format specification is:

Field_format='[Z][N][d]'

The optional parts of the format are:

Z The field has leading zeros
N No decimal point exist in the file
d The number of decimals, defaults to the column precision

Example

Let us see how it works in the following example. We define a table based on the file xfmt.txt having eight fields of 12 characters:

create table xfmt (
  col1 double(12,3) not null,
  col2 double(12,3) not null field_format='4',
  col3 double(12,2) not null field_format='N3',
  col4 double(12,3) not null field_format='Z',
  col5 double(12,3) not null field_format='Z3',
  col6 double(12,5) not null field_format='ZN5',
  col7 int(12) not null field_format='N3',
  col8 smallint(12) not null field_format='N3')
engine=CONNECT table_type=FIX file_name='xfmt.txt';

insert into xfmt values(4567.056,4567.056,4567.056,4567.056,-23456.8,
    3.14159,4567,4567);
select * from xfmt;

The first row is displayed as:

COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8
4567.056 4567.056 4567.06 4567.056 -23456.800 3.14159 4567 4567

The number of decimals displayed for all float columns is the column precision, the second argument of the column type option. Of course, integer columns have no decimals, although their formats specify some.

More interesting is the file layout. To see it let us define another table based on the same file but whose columns are all characters:

create table cfmt (
  col1 char(12) not null,
  col2 char(12) not null,
  col3 char(12) not null,
  col4 char(12) not null,
  col5 char(12) not null,
  col6 char(12) not null,
  col7 char(12) not null,
  col8 char(12) not null)
engine=CONNECT table_type=FIX file_name='xfmt.txt';
select * from cfmt;

The (transposed) display of the select command shows the file text layout for each field. Below a third column was added in this document to comment this result.

Column Row 1 Comment (all numeric fields are written right justified)
COL1 4567.056 No format, the value was entered as is.
COL2 4567.0560 The format ‘4’ forces to write 4 decimals.
COL3 4567060 N3 → No decimal point. The last 3 digits are decimals. However, the second decimal was rounded because of the column precision.
COL4 00004567.056 Z → Leading zeros, 3 decimals (the column precision)
COL5 -0023456.800 Z3 → (Minus sign) leading zeros, 3 decimals.
COL6 000000314159 ZN5 → Leading zeros, no decimal point, 5 decimals.
COL7 4567000 N3 → No decimal point. The last 3 digits are decimals.
COL8 4567000 Same. Any decimals would be ignored.

Note: For columns internally using double precision floating-point numbers, MariaDB limits the decimal precision of any calculation to the column precision. The declared column precision should be at least the number of decimals of the format to avoid a loss of decimals as it happened for col3 of the above example.

Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.

© 2021 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/connect-dos-and-fix-table-types/