Internal Storage
As DataBasic variables are dynamically "typed", knowledge of how the runtime code handles data storage allows programs to be written that make most efficient use of data space and minimise the amount of converting from one type to another.
Descriptor Table
Variable storage within a DataBasic program is via the descriptor table, with one entry in the table for each variable used within the program. Each element of a dimensioned array is a separate variable in this context.
A descriptor of a simple variable is referenced internally via an offset to the descriptor table. Array elements referenced by a literal subscript or by its EQUATEd name are also referenced in this way. Other array element references are by the offset of the base of the array, plus a calculation for the offset of the required element.
The Stack
The operands, intermediate results, and final result of operations and calculations, plus program control information, are all stored on a stack. The size and format of stack entries are identical to descriptor table entries except many types are only used on the stack.
Data Storage
The dynamic variable typing means that each descriptor and stack element have to contain the type as well as the data or data reference. Although there are numerous different types, only a few characteristics are of direct concern to the DataBasic programmer. These are numeric or string, direct or indirect, file descriptors, and binary strings.
Numeric vs String types
A numeric type is generated from a numeric literal, the result of an arithmetic formula, or from a numeric function.
String types are generated from string literals, non-arithmetic formulae, dynamic array operations, string functions, I/O statements including INPUT, the result of format strings, and so on.
Whenever a string type is used where a numeric is required, or a numeric type used where a string is required, the data has first to be converted to the correct type, resulting in a performance overhead. The type of a variable in the descriptor table is only changed when assignment is made to the variable, therefore if the variable is used repeatedly in operations it may get converted many times, resulting in a performance overhead. The type may not be obvious because:
-
A number INPUT from the terminal (or obtained from any other I/O statement or function including OCONV and ICONV) will be a string type and therefore will be converted every time it is used in an arithmetic calculation.
-
Numbers stored in a dynamic array and used in arithmetic calculations will be converted to numeric when referenced and converted back to string if stored back into the array.
Note
If a literal contains only numerics but will not be used in numeric calculations, place it in quotes so that it will be stored as a string type.
Direct vs Indirect types
Direct type is where the data can be stored in the descriptor or stack element, indirect type is where the descriptor contains a pointer to the data that is stored in the DataBasic free storage space. Processing of direct types is many times more efficient than indirect types. Direct types are:
-
A numeric type that can be stored in a 48 bit signed binary number. This is described further in the topic Numeric Operations.
-
A non-binary string of up to 13 characters.
Free storage space is managed in buffers of 48, 144 and multiples of 256 bytes. Consequently, there is a possible performance gain if strings are kept small enough to fit into one of these buffer sizes. When the buffer allocated to a string is not large enough to store a new string assigned to it, the buffer is released and a new one of the appropriate size obtained.
Note
If a variable will grow to a large size, consider initially assigning a large string to it to allocate an appropriately sized buffer at the outset. However, if a string of 13 or fewer characters is subsequently assigned to the variable, the descriptor will become direct and the buffer released.
File Descriptors
A file descriptor, corresponding to a file variable in the source code, is a pointer to structures that define an open file. A file descriptor is created when a file is opened, and, apart from being assigned to another variable, may only be used in file access statements. Further details can be found in the topics File Open and Close and File Input And Output.
Binary Strings
Binary strings, due to the fact that they may contain a Segment Mark, the normal string terminator in DataBasic, are stored in a more complex way. Therefore operations involving binary strings may take slightly longer than non-binary strings. Operations that assume a Reality data structure containing Reality system delimiters, such as dynamic array access, should not be performed on binary strings.
Variable Names
Since in the compiled program variables are referenced only by a numeric reference, the actual variable names used in the source do not affect the size of the DataBasic object code. There is no limit on source code size, so descriptive variable names should in principle be chosen to make the source code more readable and therefore more easily maintained. However, variable names in the debugger symbol table remain in items that are loaded into shared memory for execution, so variable names of a sensible length should be used.
Garbage Collect
Initially, free storage is one contiguous block of space. Buffers are allocated from the beginning of the free storage area. When a string is assigned to a variable that exceeds the current buffer size of that variable, the buffer is abandoned and a new buffer is allocated from the remaining contiguous portion of free storage.
If there is not enough contiguous space for the new buffer, a procedure called "garbage collection" automatically takes place. Garbage collection collects the abandoned buffer space and forms a single block of contiguous space.