Documentation Comments
Use this form to comment on this topic. You can also provide any general observations about the Online Documentation, or request that additional information be added in a future release.
Reality V15.0 ()
Selecting the Modulo for a File (File and File Index Management) (m604805+selectfilemodulo.htm)
There are two ways to calculate the best modulo for a file data section:
To calculate a modulo that will facilitate efficient storage and retrieval, you should do the following:
Note: Before determining the modulo for a data section, see File Sizing Considerations.
Note: A modulo of 11 is not recommended.
The file modulo calculator uses the following algorithm:
Use the following formula to calculate the average size (S) of an in-group item part.
If (Adata) > FS / 2 then S = (Aid + 16) else S = (Aid + 16 + Adata)
where FS is the frame size of the database.
Determine the number of items per group (I). This can be determined from the average item size, as follows:
If S > (FS / 3), then I =2
If (FS / 5) < S <= (FS / 3), then I = 3
Otherwise, I = INT[((FS - 16) - (S * 1.5)) / S], where INT[ ] means 'Integer part of'.
Calculate the modulo, as follows:
modulo = next-prime[N / I]
where,
N = Number of items in file,
next-prime[ ] means 'Next higher prime number after'.
FS = 1Kb; (FS / 3) < S <= (FS / 2)
N = 3000
Aid = 10
Adata = 323
S = 349
I = 2
N/I = 3000/2 = 1500
modulo = next-prime[1500] = 1511
FS = 4Kb; (FS / 5) < S <= (FS / 3)
N = 3000
Aid = 10
Adata = 892
S = 918
I = 3
N/I = 3000/3 = 1000
modulo = next-prime[1000] = 1009
FS = 1Kb; S <= (FS / 5)
N
= 3000
Aid = 10
Adata = 23
S = 49
I = INT[((1024 - 16) - (49 * 1.5)) /49]
= INT[(1008 - 73.5) /49]
= INT[934.5 /49]
= INT[19.1]
= 19
N/I
= 3000/19 = 158
modulo = next-prime[158]
= 163
There are two goals in selecting the best modulo:
These two factors conflict with each other. If you do not give a data section enough primary space, much of its data is stored in overflow (linked) frames. The more linked frames there are, the more disk reads the system must perform to find items on the disk which can be very expensive in terms of response time. On the other hand, a file with too much unused disk space is wasteful.
There are two circumstances under which you might consider a modification to the way you calculate the modulo of a data section:
If the distribution of items is uneven. An uneven distribution of items can arise when there is much deviation from the average item size. Try several modulos using the HASH-TEST command to see which combination produces the best fit.
A poor choice of item-ids can also cause uneven hashing. The best item-ids are sequential, numeric, and uniform in size. The worst item-ids are ones in which the rightmost portion does not vary - for example, if all part numbers end in *01, the items will hash very unevenly.
A file statistics report tells you how a data section's items are distributed into groups. A report is generated each time you:
For a discussion of file statistics, refer to File Statistics Reports.
You can use the ISTAT and HASH-TEST commands to display file hashing statistics and an optional histogram.
Separation is the number of contiguous frames allocated to a group. For compatibility with proprietary versions of Reality, the CREATE-FILE command accepts an optional separation parameter. On current versions of Reality, this is ignored and is always set to 1.