There are two ways of working out the best modulo for an index:
The method shown below allows you to calculate the modulo for an index. However, the type of data also needs to be considered. If the size of the file is expected to change, you should use estimated values.
The following calculations provide only approximate values and assume that at least 200 items are to be indexed.
Estimate the following:
The average number of values/subvalues on which the index is based (mvsvAttrCount). This depends on the BY clauses used to define the index:
Note: A modulo of 11 is not recommended.
The index modulo calculator uses the following algorithm:
Choose the appropriate overhead value (overhead), depending on the level of complexity of the index:
Calculate the number of index keys from the number of items to be indexed:
numKeys = numItems * mvsvAttrCount
Calculate the average key size by adding together your estimated key size, item-id size and the overhead value selected in 1 above:
averageKeySize = defKeySize + itemidSize + overhead
Calculate the average available node size by taking half the frame size, allowing for the size of an item header and then taking 90% of the result:
averageAvailableNodeSize = INT(((frameSize / 2) - 27) * 0.9)
where INT( ) means 'Integer part of'.
Calculate the number of keys per node from the average available node size and the average key size. There will be a minimum of six keys per node:
keysPerNode = INT(averageAvailableNodeSize
/ averageKeySize)
if keysPerNode < 6 then keysPerNode = 6
Calculate the number of leaf nodes by dividing the number of keys by the number of keys per node. Allow an additional 10% for branch nodes:
nodes = INT(INT(numKeys / keysPerNode) * 1.1)
The modulo depends on whether the nodes are in-group or out-of-group. The nodes will be out of group if the the average key size times number of keys per node is greater than half the frame size. In this case, the size of the in-group part of an out-of-group item is about 20 bytes, so the modulo is the number of 20 byte nodes that will fit in a frame.
Otherwise, the nodes will be in-group we can assume that each group will contain two in-group nodes:
If (averageKeySize
*
keysPerNode) > (frameSize / 2) then
modulo = INT(nodes /
INT(frameSize
/ 20))
else
modulo = INT(nodes / 2)
Add 20% to allow for bad hashing and growth:
modulo = INT(modulo * 1.2)
frameSize = 1Kb = 1024 bytes
numItems = 1000 items
mvsvAttrCount = 1 (attributes are not multivalued)
itemidSize = 10 bytes
defKeySize = 40 bytes
overhead = 0 (attributes are not multivalued)
numKeys = 1000 * 1 = 1000
averageKeySize = 40 + 10 + 0 = 50
availableNodeSize = (1024 / 2) 27 = 512 27 = 485
averageAvailableNodeSize = INT(485 * 0.9) = 436
keysPerNode = INT(436 / 50) = 8
leafNodes = INT(1000 / 8) = 125
nodes = INT(125 * 1.1) = 137
(50 * 8) = 400 < 512 (frameSize / 2)
Therefore modulo = INT(137 / 2) = 68
modulo = next-prime(INT(68 * 1.2)) =
next-prime(81) = 83
frameSize = 4Kb = 4096 bytes
numItems = 1000 items
mvsvAttrCount = 1 (attributes are not multivalued)
itemidSize = 50 bytes
defKeySize = 50 bytes
overhead = 0 (attributes are not multivalued)
numKeys = 1000 * 1 = 1000
averageKeySize = 50 + 50 + 0 = 100
availableNodeSize = (4096 / 2) 27 = 2048 27 = 2021
averageAvailableNodeSize = INT(2021 * 0.9) = 1818
keysPerNode = INT(1818 / 100) = 18
leafNodes = INT(1000 / 18) = 55
nodes = INT(55 * 1.1) = 60
(100 * 18) = 1800 < 2048 (frameSize / 2)
Therefore modulo = INT(60 / 2) = 30
modulo = next-prime(INT(30 * 1.2)) =
next-prime(36) = 37
frameSize = 1Kb = 1024 bytes
numItems = 1000 items
mvsvAttrCount = 1 (attributes are not multivalued)
itemidSize = 50 bytes
defKeySize = 100 bytes
overhead = 0 (attributes are not multivalued)
numKeys = 1000 * 1 = 1000
averageKeySize = 100 + 50 + 0 = 150
availableNodeSize = (1024 / 2) 27 = 512 27 = 485
averageAvailableNodeSize = INT(485 * 0.9) = 436
keysPerNode = INT(436 / 150) = 2
keysPerNode < 6. Therefore keysPerNode = 6
leafNodes = INT(1000 / 6) = 166
nodes = INT(166 * 1.1) = 182
(150 * 6) = 900 > 512 (frameSize / 2)
Therefore modulo = INT(182 / INT(1024 / 20)) =
INT(182 / 51) = 3
modulo = next-prime(INT(3 * 1.2)) =
next-prime(3) = 3
frameSize = 1Kb = 1024 bytes
numItems = 1000 items
mvsvAttrCount = 5 (multivalues per attribute)
itemidSize = 50 bytes
defKeySize = 100 bytes
overhead = 5 (attributes are multivalued)
numKeys = 1000 * 5 = 5000
averageKeySize = 100 + 50 + 5 = 155
availableNodeSize = (1024 / 2) 27 = 512 27 = 485
averageAvailableNodeSize = INT(485 * 0.9) = 436
keysPerNode = INT(436 / 155) = 2
keysPerNode < 6. Therefore keysPerNode = 6
leafNodes = INT(5000 / 6) = 833
nodes = INT(833 * 1.1) = 916
(155 * 6) = 930 > 512 (frameSize / 2)
Therefore modulo = INT(916 / INT(1024 / 20)) =
INT(916 / 51) = 17
modulo = next-prime(INT(17 * 1.2)) =
next-prime(20)
= 23
This section shows the results you might expect when you use ISTATs M command to try different modulos, and gives you some guidelines for choosing a suitable modulo. The examples use a file called EMPLOYEE containing 5000 items, one for each employee, on a database with a 1Kb frame size. Attribute 6 of the file has been used to build a simple index, NAME, which contains the employee's surname. The index contains about 1000 different names.
The following commands were used to obtain a display of the index statistics:
:
ISTAT EMPLOYEE (JIndex name:
NAME
After the initial display, the following command was entered:
Command:
N100
This sets the maximum value of N to 100 (see below).
This example shows the output from ISTAT with the
modulo set to 127 using the ISTAT command M127
.
File='EMPLOYEE' Index='NAME' Modulus=127 Frame-size=1008 Hash-type=2 ¦ Number of groups with :- N ¦ N items N used IG frames N % usage of IG frames 0 - 13 13 13 1 - 44 114 0 2 - 70 0 0 49 - 0 0 24 50 - 0 0 17 53 - 0 0 1 72 - 0 0 1 93 - 0 0 1 97 - 0 0 7 98 - 0 0 29 99 - 0 0 27 100+ - 0 0 4 Total items= 184, Total bytes= 89724, Total frames= 127 IG bytes= 89724, Overflow frames= 0, Wasted bytes= 38292 (29%) OG items= 0, OG frames= 0, Groups used= 114
Refer to the description of the ISTAT command for details of the information shown in this report.
The example above tells you the following:
Column 2:
N items
Out of a total of 127 groups (the modulo equals 127), the file contains 13 empty
groups, 44 groups with 1 item and 70 groups with 2 index node structures. About
10% of the groups are therefore empty, with no groups containing more than 2
node structures.
Column 3:
N Used IG frames
The file contains 13 empty groups, and 114 groups that use a single frame.
No groups have used any overflow frames.
Column 4:
N % usage of IG frames
13 frames are empty (about 10%), 41 (24 + 17) frames (about 32%) are about 50%
full and 64 (1+7+29+27) frames (about 50%) are over 90% full. 4 frames (about
3%) are full and overflow frames might therefore be needed if the file increases
in size.
This example shows the output from ISTAT with the modulo set to 29.
File='EMPLOYEE Index='NAME' Modulus=29 Frame-size=1008 Hash-type=2 ¦ Number of groups with :- N ¦ N items N used IG frames N % usage of IG frames 0 - 0 0 0 3 - 0 19 0 4 - 0 10 0 5 - 2 0 0 6 - 17 0 0 7 - 8 0 0 8 - 2 0 0 82 - 0 0 2 83 - 0 0 1 86 - 0 0 5 87 - 0 0 3 92 - 0 0 2 98 - 0 0 13 99 - 0 0 3 100+ - 0 0 0 Total items= 184, Total bytes= 89724, Total frames= 97 IG bytes= 89724, Overflow frames= 68, Wasted bytes= 8052 (8%) OG items= 0, OG frames= 0, Groups used= 29
This example shows the following:
Column 2:
N items
This shows that all groups contain between 5 and 8 index node structures. The
node structures are therefore fairly evenly distributed across the groups.
Column 3:
N Used IG frames
This shows that 19 groups are using 2 linked overflow frames and 10 are using 3.
Every group in the base allocation has therefore overflowed.
Column 4:
N % usage of IG frames
This shows that all groups are over 82% full, with most over 90% full.
The summary at the bottom of the report shows that the 184 index node structures are badly overflowing the 29 base frames. A total of 97 frames are in use, 68 of which are overflow frames. This is the key indicator for this index, in that the total frame count is over three times the base group allocation.
This example shows the output from ISTAT with the modulo set to 383.
File='EMPLOYEE' Index='NAME' Modulus=383 Frame-size=1008 Hash-type=2 ¦ Number of groups with :- N ¦ N items N used IG frames N % usage of IG frames 0 - 201 201 201 1 - 180 182 0 2 - 2 0 0 22 - 0 0 1 45 - 0 0 1 48 - 0 0 8 49 - 0 0 84 50 - 0 0 86 53 - 0 0 1 76 - 0 0 1 Total items= 184, Total bytes= 89724, Total frames= 383 IG bytes= 89724, Overflow frames= 0, Wasted bytes= 296340 (76%) OG items= 0, OG frames= 0, Groups used= 182
This example shows the following:
Column 2:
N
items
This shows that most frames are empty, so the distribution of index node
structures is very poor.
Column 3:
N
Used IG frames
This shows that about half of the frames are empty, with none overflowed.
Column 4:
N %
usage of IG frames
This also shows that about half of the frames are empty, with most of the
remaining frames about 50% full.
The summary at the bottom of the report shows that the 184 index node structures are spread thinly across the base group allocation, with 201 (52%) of the base group allocation empty.
Size for uniform item distribution.
You should aim to spread the number of items in a file uniformly across all groups. The more even the distribution, the better the compromise between disk space used and performance.
When to upsize.
If the index uses a large number of overflow frames, performance could be affected. You should consider increasing the size of the index when the number of overflow frames used is more than 50% of the base frame allocation. As a guide, increase the modulo by half the number of overflow frames.
When to downsize.
If the majority of in-group frames are empty this is wasting disk space. However, as index node structures are only accessed randomly, performance is unlikely to be affected.
Nevertheless, if more than 25% of the base frame allocation is empty, you should consider reducing the size of the index. As a guide, you should reduce the modulo by about half the number of empty frames.
File statistics listings generated by routine database backup procedures show index sections after the associated data section, separated by colons (:). These statistics allow you to see what index sections exist and how much space they take up.
The LISTFILES command, which lists all files defined via a
D-pointer from the current account (or a named