Documentation Comments

Use this form to comment on this topic. You can also provide any general observations about the Online Documentation, or request that additional information be added in a future release.

RealityV15.1Online Documentation (MoTW) Revision 7

Selecting the Modulo for a File (File and File Index Management) (m604805+selectfilemodulo.htm)

To	Reality
Version
Topic
Submitted by
Company
Location
Email address
Comment

Selecting the Modulo for a File

There are two ways to calculate the best modulo for a file data section:

Run the ALL-FILE-STATS command or perform a FILE-SAVE, to generate file statistics. Then use the OPTIMUM-MODULO utility to set the reallocation parameters.
Calculate the modulo manually. You will need to do this if you are creating a new file or data section.

Calculating the Modulo Manually

To calculate a modulo that will facilitate efficient storage and retrieval, you should do the following:

Note: Before determining the modulo for a data section, see File Sizing Considerations.

Calculate the average size of an item-id (A_id).
Calculate the average size of the item data (A_data).
Use the file modulo calculator to calculate an approximate modulo. You must supply the frame size, the values that you estimated in steps 1 and 2, and the number of item in the data section.
Pass the result to the SHOW-MODULI TCL command to obtain the next prime number.

Note: A modulo of 11 is not recommended.

The file modulo calculator uses the following algorithm:

Use the following formula to calculate the average size (S) of an in-group item part.

If (A_data) > FS / 2 then S = (A_id + 16) else S = (A_id + 16 + A_data)

where FS is the frame size of the database.
Determine the number of items per group (I). This can be determined from the average item size, as follows:

If S > (FS / 3), then I =2

If (FS / 5) < S <= (FS / 3), then I = 3

Otherwise, I = INT[((FS - 16) - (S * 1.5)) / S], where INT[ ] means 'Integer part of'.
Calculate the modulo, as follows:

modulo = next-prime[N / I]

where,

N = Number of items in file,

next-prime[ ] means 'Next higher prime number after'.

Example 1

FS = 1Kb; (FS / 3) < S <= (FS / 2)

N = 3000
A_id = 10
A_data = 323
S = 349
I = 2
N/I = 3000/2 = 1500
modulo = next-prime[1500] = 1511

Example 2

FS = 4Kb; (FS / 5) < S <= (FS / 3)

N = 3000
A_id = 10
A_data = 892
S = 918
I = 3
N/I = 3000/3 = 1000
modulo = next-prime[1000] = 1009

Example 3

FS = 1Kb; S <= (FS / 5)

N = 3000
A_id = 10
A_data = 23
S = 49
I = INT[((1024 - 16) - (49 * 1.5)) / 49]
= INT[(1008 - 73.5) / 49]
  = INT[934.5 / 49]
  = INT[19.1]
  = 19
N/I = 3000/19 = 158
modulo = next-prime[158] = 163

File Sizing Goals

There are two goals in selecting the best modulo:

To optimize access time.
To use disk space most efficiently.

These two factors conflict with each other. If you do not give a data section enough primary space, much of its data is stored in overflow (linked) frames. The more linked frames there are, the more disk reads the system must perform to find items on the disk which can be very expensive in terms of response time. On the other hand, a file with too much unused disk space is wasteful.

File Sizing Considerations

There are two circumstances under which you might consider a modification to the way you calculate the modulo of a data section:

If the file is a work file. For example, a file that you use only in the process of performing a month-end closing. The number of items in the work file will tend to vary greatly over time. It may have no items during the month, but then increase to a much greater size at month-end closing. In determining the modulo, you must consider how often the file is used and to what extent the volume of data changes.
If the distribution of items is uneven. An uneven distribution of items can arise when there is much deviation from the average item size. Try several modulos using the HASH-TEST command to see which combination produces the best fit.

A poor choice of item-ids can also cause uneven hashing. The best item-ids are sequential, numeric, and uniform in size. The worst item-ids are ones in which the rightmost portion does not vary - for example, if all part numbers end in *01, the items will hash very unevenly.

File Statistics

A file statistics report tells you how a data section's items are distributed into groups. A report is generated each time you:

Perform an ACCOUNT-SAVE or FILE-SAVE.
Execute the SAVE command with the S option.
Execute the ALL-FILE-STATS, ACCOUNT-FILE-STATS, or LIST-$STAT-FILE* Procs.

For a discussion of file statistics, refer to File Statistics Reports.

You can use the ISTAT and HASH-TEST commands to display file hashing statistics and an optional histogram.

Separation Parameter

Separation is the number of contiguous frames allocated to a group. For compatibility with proprietary versions of Reality, the CREATE-FILE command accepts an optional separation parameter. On current versions of Reality, this is ignored and is always set to 1.