Selecting the Modulo for a File

There are two ways to calculate the best modulo for a file data section:

Calculating the modulo manually

To calculate a modulo that will facilitate efficient storage and retrieval, you should do the following:

Note

Before determining the modulo for a data section, see File Sizing Considerations.

  1. Use the following formula to calculate the average size (S) of an in-group item part.

    If (Adata) > FS / 2 then S = (Aid + 16)  else S = (Aid + 16 + Adata)

    where FS is the frame size of the database, Aid is the average size of an item-id, and Adata is the average size of the item data.

  2. Determine the number of items per group (I). This can be determined from the average item size, as follows:

    If S > (FS / 3), then I =2

    If (FS / 5) < S <= (FS / 3), then I = 3

    Otherwise, I = INT[((FS - 16) - (S * 1.5)) / S], where INT[ ] means 'Integer part of'.

  3. Calculate the modulo, as follows:

    modulo = next-prime[N / I]

    where,

    N = Number of items in file,

    next-prime[ ] means 'Next higher prime number after'.

Note

A modulo of 11 is not recommended.

Example 1

FS = 1Kb; (FS / 3) < S <= (FS / 2)

N = 3000
Aid = 10
Adata = 323
S = 349
I = 2
N/I = 3000/2 = 1500
modulo = next-prime[1500] = 1511

Example 2

FS = 4Kb; (FS / 5) < S <= (FS / 3)

N = 3000
Aid = 10
Adata = 892
S = 918
I = 3
N/I = 3000/3 = 1000
modulo = next-prime[1000] = 1009

Example 3

FS = 1Kb;  S <= (FS / 5)

N = 3000
Aid = 10
Adata = 23
S = 49
I = INT[((1024 - 16) - (49 * 1.5)) /49]
  = INT[(1008 - 73.5) /49]
  = INT[934.5 /49]
  = INT[19.1]
  = 19
N/I = 3000/19 = 158
modulo = next-prime[158] = 163

File sizing goals

There are two goals in selecting the best modulo:

  1. To optimize access time.
  2. To use disk space most efficiently.

These two factors conflict with each other. If you do not give a data section enough primary space, much of its data is stored in overflow (linked) frames. The more linked frames there are, the more disk reads the system must perform to find items on the disk which can be very expensive in terms of response time. On the other hand, a file with too much unused disk space is wasteful.

File sizing considerations

There are two circumstances under which you might consider a modification to the way you calculate the modulo of a data section:

File statistics

A file statistics report tells you how a data section's items are distributed into groups. A report is generated each time you:

For a discussion of file statistics, refer to File Statistics Reports.

You can use the ISTAT and HASH-TEST commands to display file hashing statistics and an optional histogram.

Separation parameter

Separation is the number of contiguous frames allocated to a group. For compatibility with proprietary versions of Reality, the CREATE-FILE command accepts an optional separation parameter. On current versions of Reality, this is ignored and is always set to 1.