Selecting the Modulo for a File
There are two ways to calculate the best modulo for a file data section:
-
Run the ALL-FILE-STATS command or perform a FILE-SAVE, to generate file statistics. Then use the
OPTIMUM-MODULO utility to set the reallocation parameters. -
Calculate the modulo manually. You will need to do this if you are creating a new file or data section.
Calculating the modulo manually
To calculate a modulo that will facilitate efficient storage and retrieval, you should do the following:
Note
Before determining the modulo for a data section, see File Sizing Considerations.
-
Use the following formula to calculate the average size (S) of an in-group item part.
If (Adata) > FS / 2 then S = (Aid + 16) else S = (Aid + 16 + Adata)
where FS is the frame size of the database, Aid is the average size of an item-id, and Adata is the average size of the item data.
-
Determine the number of items per group (I). This can be determined from the average item size, as follows:
If S > (FS / 3), then I =2
If (FS / 5) < S <= (FS / 3), then I = 3
Otherwise, I = INT[((FS - 16) - (S * 1.5)) / S], where INT[ ] means 'Integer part of'.
-
Calculate the modulo, as follows:
modulo = next-prime[N / I]
where,
N = Number of items in file,
next-prime[ ] means 'Next higher prime number after'.
Note
A modulo of 11 is not recommended.
Example 1
FS = 1Kb; (FS / 3) < S <= (FS / 2)
N = 3000
Aid = 10
Adata = 323
S = 349
I = 2
N/I = 3000/2 = 1500
modulo = next-prime[1500] = 1511
Example 2
FS = 4Kb; (FS / 5) < S <= (FS / 3)
N = 3000
Aid = 10
Adata = 892
S = 918
I = 3
N/I = 3000/3 = 1000
modulo = next-prime[1000] = 1009
Example 3
FS = 1Kb; S <= (FS / 5)
N
= 3000
Aid = 10
Adata = 23
S = 49
I = INT[((1024 - 16) - (49 * 1.5)) /49]
= INT[(1008 - 73.5) /49]
= INT[934.5 /49]
= INT[19.1]
= 19
N/I
= 3000/19 = 158
modulo = next-prime[158]
= 163
File sizing goals
There are two goals in selecting the best modulo:
- To optimize access time.
- To use disk space most efficiently.
These two factors conflict with each other. If you do not give a data section enough primary space, much of its data is stored in overflow (linked) frames. The more linked frames there are, the more disk reads the system must perform to find items on the disk which can be very expensive in terms of response time. On the other hand, a file with too much unused disk space is wasteful.
File sizing considerations
There are two circumstances under which you might consider a modification to the way you calculate the modulo of a data section:
-
If the file is a work file. For example, a file that you use only in the process of performing a month-end closing. The number of items in the work file will tend to vary greatly over time. It may have no items during the month, but then increase to a much greater size at month-end closing. In determining the modulo, you must consider how often the file is used and to what extent the volume of data changes.
-
If the distribution of items is uneven. An uneven distribution of items can arise when there is much deviation from the average item size. Try several modulos using the HASH-TEST command to see which combination produces the best fit.
A poor choice of item-ids can also cause uneven hashing. The best item-ids are sequential, numeric, and uniform in size. The worst item-ids are ones in which the rightmost portion does not vary - for example, if all part numbers end in *01, the items will hash very unevenly.
File statistics
A file statistics report tells you how a data section's items are distributed into groups. A report is generated each time you:
-
Perform an ACCOUNT-SAVE or FILE-SAVE.
-
Execute the SAVE command with the S option.
-
Execute the ALL-FILE-STATS, ACCOUNT-FILE-STATS, or LIST-$STAT-FILE* Procs.
For a discussion of file statistics, refer to File Statistics Reports.
You can use the ISTAT and HASH-TEST commands to display file hashing statistics and an optional histogram.
Separation parameter
Separation is the number of contiguous frames allocated to a group. For compatibility with proprietary versions of Reality, the CREATE-FILE command accepts an optional separation parameter. On current versions of Reality, this is ignored and is always set to 1.