Database 版 (精华区)
发信人: rhine (风雨无阻), 信区: Database
标 题: Oracle DBA Guide, V8, part 16
发信站: 哈工大紫丁香 (2000年07月08日20:34:54 星期六), 站内信件
发信人: chendu (Oracle Only), 信区: Database
标 题: Oracle DBA Guide, V8, part 16
发信站: BBS 水木清华站 (Sat Nov 13 15:35:21 1999)
Hi, required by several friends, I am posting the
Oracle Database Administrator's Guide version 8
here.
Web Master, if my post it too long and not good for
the bbs, please delete it and I am sorry for it.
Hoping everyone enjoy it!
Good luck!
----------------------------------------------------------------------------
----
16
Managing Hash Clusters
This chapter describes how to manage hash clusters, and includes the followi
ng topics:
Guidelines for Managing Hash Clusters
Altering Hash Clusters
Dropping Hash Clusters
See Also: Before attempting tasks described in this chapter, familiarize you
rself with the concepts in Chapter 10, Guidelines for Managing Schema Object
s.
Guidelines for Managing Hash Clusters
This section describes guidelines to consider before attempting to manage ha
sh clusters, and includes the following topics:
Advantages of Hashing
Disadvantages of Hashing
Estimate Size Required by Hash Clusters and Set Storage Parameters
Storing a table in a hash cluster is an optional way to improve the performa
nce of data retrieval. A hash cluster provides an alternative to a non-clust
ered table with an index or an index cluster. With an indexed table or index
cluster, Oracle locates the rows in a table using key values that Oracle st
ores in a separate index. To use hashing, you create a hash cluster and load
tables into it. Oracle physically stores the rows of a table in a hash clus
ter and retrieves them according to the results of a hash function.
Oracle uses a hash function to generate a distribution of numeric values, ca
lled hash values, which are based on specific cluster key values. The key of
a hash cluster, like the key of an index cluster, can be a single column or
composite key (multiple column key). To find or store a row in a hash clust
o find or store a row in a hash clust
er, Oracle applies the hash function to the row's cluster key value; the res
ulting hash value corresponds to a data block in the cluster, which Oracle t
hen reads or writes on behalf of the issued statement.
To find or store a row in an indexed table or cluster, a minimum of two (the
re are usually more) I/Os must be performed:
one or more I/Os to find or store the key value in the index
another I/O to read or write the row in the table or cluster
In contrast, Oracle uses a hash function to locate a row in a hash cluster;
no I/O is required. As a result, a minimum of one I/O operation is necessary
to read or write a row in a hash cluster.
Advantages of Hashing
If you opt to use indexing rather than hashing, consider whether to store a
table individually or as part of a cluster.
Hashing is most advantageous when you have the following conditions:
Most queries are equality queries on the cluster key:
SELECT . . . WHERE cluster_key = . . . ;
In such cases, the cluster key in the equality condition is hashed, and the
corresponding hash key is usually found with a single read. In comparison, f
or an indexed table the key value must first be found in the index (usually
several reads), and then the row is read from the table (another read).
The tables in the hash cluster are primarily static in size so that you can
determine the number of rows and amount of space required for the tables in
nt of space required for the tables in
the cluster. If tables in a hash cluster require more space than the initial
allocation for the cluster, performance degradation can be substantial beca
use overflow blocks are required.
Disadvantages of Hashing
Hashing is not advantageous in the following situations:
Most queries on the table retrieve rows over a range of cluster key values.
For example, in full table scans or queries like the following, a hash funct
ion cannot be used to determine the location of specific hash keys; instead,
the equivalent of a full table scan must be done to fetch the rows for the
query:
SELECT . . . WHERE cluster_key < . . . ;
With an index, key values are ordered in the index, so cluster key values th
at satisfy the WHERE clause of a query can be found with relatively few I/Os
.
The table is not static and continually growing. If a table grows without li
mit, the space required over the life of the table (its cluster) cannot be p
re-determined.
Applications frequently perform full-table scans on the table and the table
is sparsely populated. A full-table scan in this situation takes longer unde
r hashing.
You cannot afford to pre-allocate the space that the hash cluster will event
ually need.
ually need.
See Also: For more information about creating hash clusters and specifying h
ash functions see the Oracle8 SQL Reference.
For information about hash functions and specifying user-defined hash functi
ons, see Oracle8 Concepts.
Even if you decide to use hashing, a table can still have separate indexes o
n any columns, including the cluster key. See the Oracle8 Application Develo
per's Guide for additional recommendations.
Estimate Size Required by Hash Clusters and Set Storage Parameters
As with index clusters, it is important to estimate the storage required for
the data in a hash cluster.
Oracle guarantees that the initial allocation of space is sufficient to stor
e the hash table according to the settings SIZE and HASHKEYS. If settings fo
r the storage parameters INITIAL, NEXT, and MINEXTENTS do not account for th
e hash table size, incremental (additional) extents are allocated until at l
east SIZE*HASHKEYS is reached. For example, assume that the data block size
is 2K, the available data space per block is approximately 1900 bytes (data
block size minus overhead), and that the STORAGE and HASH parameters are spe
cified in the CREATE CLUSTER command as follows:
STORAGE (INITIAL 100K
NEXT 150K
MINEXTENTS 1
PCTINCREASE 0)
PCTINCREASE 0)
SIZE 1500
HASHKEYS 100
In this example, only one hash key can be assigned per data block. Therefore
, the initial space required for the hash cluster is at least 100*2K or 200K
. The settings for the storage parameters do not account for this requiremen
t. Therefore, an initial extent of 100K and a second extent of 150K are allo
cated to the hash cluster.
Alternatively, assume the HASH parameters are specified as follows:
SIZE 500 HASHKEYS 100
In this case, three hash keys are assigned to each data block. Therefore, th
e initial space required for the hash cluster is at least 34*2K or 68K. The
initial settings for the storage parameters are sufficient for this requirem
ent (an initial extent of 100K is allocated to the hash cluster).
See Also: To estimate the size of a hash cluster, use the procedure given in
"Estimating Space Required by Clusters".
Creating Hash Clusters
After a hash cluster is created, tables can be created in the cluster. A has
h cluster is created using the SQL command CREATE CLUSTER. For example, the
following statement creates a cluster named TRIAL_CLUSTER that stores the TR
IAL table, clustered by the TRIALNO column:
CREATE CLUSTER trial_cluster (trialno NUMBER(5,0))
PCTUSED 80
PCTUSED 80
PCTFREE 5
TABLESPACE users
STORAGE (INITIAL 250K NEXT 50K
MINEXTENTS 1 MAXEXTENTS 3
PCTINCREASE 0)
HASH IS trialno HASHKEYS 150;
CREATE TABLE trial (
trialno NUMBER(5,0) PRIMARY KEY,
...)
CLUSTER trial_cluster (trialno);
The following sections explain setting the parameters of the CREATE CLUSTER
command specific to hash clusters.
See Also: For additional information about creating tables in a cluster, gui
delines for setting other parameters of the CREATE CLUSTER command, and the
privileges required to create a hash cluster, see "Creating Clusters".
Controlling Space Use Within a Hash Cluster
When creating a hash cluster, it is important to choose the cluster key corr
ectly and set the HASH IS, SIZE, and HASHKEYS parameters so that performance
and space use are optimal. The following guidelines describe how to set the
se parameters.
Choosing the Key
Choosing the correct cluster key is dependent on the most common types of qu
eries issued against the clustered tables. For example, consider the EMP tab
le in a hash cluster. If queries often select rows by employee number, the E
MPNO column should be the cluster key; if queries often select rows by depar
tment number, the DEPTNO column should be the cluster key. For hash clusters
that contain a single table, the cluster key is typically the entire primar
y key of the contained table.
The key of a hash cluster, like that of an index cluster, can be a single co
lumn or a composite key (multiple column key). A hash cluster with a composi
te key must use Oracle's internal hash function.
Setting HASH IS
Only specify the HASH IS parameter if the cluster key is a single column of
the NUMBER datatype, and contains uniformly distributed integers. If the abo
ve conditions apply, you can distribute rows in the cluster so that each uni
que cluster key value hashes, with no collisions, to a unique hash value. If
these conditions do not apply, omit this option so that you use the interna
l hash function.
Setting SIZE
SIZE should be set to the average amount of space required to hold all rows
for any given hash key. Therefore, to properly determine SIZE, you must be a
ware of the characteristics of your data:
If the hash cluster is to contain only a single table and the hash key value
s of the rows in that table are unique (one row per value), SIZE can be set
s of the rows in that table are unique (one row per value), SIZE can be set
to the average row size in the cluster.
If the hash cluster is to contain multiple tables, SIZE can be set to the av
erage amount of space required to hold all rows associated with a representa
tive hash value.
See Also: To estimate a preliminary value for SIZE, follow the procedures gi
ven in Appendix A, "Space Estimations for Schema Objects". If the preliminar
y value for SIZE is small (more than four hash keys can be assigned per data
block), you can use this value for SIZE in the CREATE CLUSTER command.
However, if the value of SIZE is large (fewer than five hash keys can be ass
igned per data block), you should also consider the expected frequency of co
llisions and whether performance of data retrieval or efficiency of space us
age is more important to you:
If the hash cluster does not use the internal hash function (if you specifie
d HASH IS) and you expect little or no collisions, you can set SIZE as estim
ated; no collisions occur and space is used as efficiently as possible.
If you expect frequent collisions on inserts, the likelihood of overflow blo
cks being allocated to store rows is high. To reduce the possibility of over
flow blocks and maximize performance when collisions are frequent, you shoul
d increase SIZE according to Table 16-1.
Table 16-1 SIZE Increase Chart
Available Space per Block/Calculated SIZE Setting for SIZE
1
1
Calculated SIZE
2
Calculated SIZE + 15%
3
Calculated SIZE + 12%
4
Calculated SIZE + 8%
>4
Calculated SIZE
Overestimating the value of SIZE increases the amount of unused space in the
cluster. If space efficiency is more important than the performance of data
retrieval, disregard the above adjustments and use the estimated value for
SIZE.
Setting HASHKEYS
For maximum distribution of rows in a hash cluster, HASHKEYS should always b
e a prime number.
For example, suppose you cluster the EMP table by DEPTNO, and there are 100
DEPTNOs, with values 10, 20, . . ., 1000. Assuming you bypass the internal h
ash function and you create a cluster with HASHKEYS of 100, then department
10 will hash to 10, department 20 to 20, . . ., department 110 to 10 (110 mo
d 100), department 120 to 20, and so on. Notice that there are 10 entries fo
r hash values of 10, 20 . . ., but none for 1, 2, . . ., and so on. As a res
ult, there is a lot of wasted space and possibly a lot of overflow blocks be
cause of collisions. Alternatively, if HASHKEYS is set to 101, then each dep
artment number hashes to a unique hash key value.
Controlling Space in Hash Clusters: Examples
The following examples show how to correctly choose the cluster key and set
the HASH IS, SIZE, and HASHKEYS parameters. For all examples, assume that th
e data block size is 2K and that on average, 1950 bytes of each block is ava
ilable data space (block size minus overhead).
Example 1
You decide to load the EMP table into a hash cluster. Most queries retriev
e employee records by their employee number. You estimate that the maximum n
umber of rows in the EMP table at any given time is 10000 and that the avera
ge row size is 55 bytes.
In this case, EMPNO should be the cluster key. Since this column contains
integers that are unique, the internal hash function can be bypassed. SIZE c
an be set to the average row size, 55 bytes; note that 34 hash keys are assi
gned per data block. HASHKEYS can be set to the number of rows in the table,
10000, rounded up to the next highest prime number, 10001:
CREATE CLUSTER emp_cluster (empno
NUMBER)
. . .
SIZE 55
SIZE 55
HASH IS empno HASHKEYS 10001;
Example 2
Conditions similar to the previous example exist. In this case, however, r
ows are usually retrieved by department number. At most, there are 1000 depa
rtments with an average of 10 employees per department. Note that department
numbers increment by 10 (0, 10, 20, 30, . . . ).
In this case, DEPTNO should be the cluster key. Since this column contains
integers that are uniformly distributed, the internal hash function can be
bypassed. A pre-estimated SIZE (the average amount of space required to hold
all rows per department) is 55 bytes * 10, or 550 bytes. Using this value f
or SIZE, only three hash keys can be assigned per data block. If you expect
some collisions and want maximum performance of data retrieval, slightly alt
er your estimated SIZE to prevent collisions from requiring overflow blocks.
By adjusting SIZE by 12%, to 620 bytes (see previous section about setting
SIZE for clarification), only three hash keys are assigned per data block, l
eaving more space for rows from expected collisions.
HASHKEYS can be set to the number of unique department numbers, 1000, roun
ded up to the next highest prime number, 1009:
CREATE CLUSTER emp_cluster (deptno NUMBER)
. . .
SIZE 620
HASH IS deptno HASHKEYS 1009;
Altering Hash Clusters
You can alter a hash cluster with the SQL command ALTER CLUSTER:
ALTER CLUSTER emp_dept . . . ;
The implications for altering a hash cluster are identical to those for alte
ring an index cluster. However, note that the SIZE, HASHKEYS, and HASH IS pa
rameters cannot be specified in an ALTER CLUSTER statement. You must re-crea
te the cluster to change these parameters and then copy the data from the or
iginal cluster.
See Also: For more information about altering an index cluster, see "Alterin
g Clusters".
Dropping Hash Clusters
You can drop a hash cluster using the SQL command DROP CLUSTER:
DROP CLUSTER emp_dept;
A table in a hash cluster is dropped using the SQL command DROP TABLE. The i
mplications of dropping hash clusters and tables in hash clusters are the sa
me for index clusters.
See Also: For more information about dropping clusters, see "Dropping Cluste
rs".
----------------------------------------------------------------------------
----
---------------------------------------------------------------------------
-------------------------
--
※ 来源:·BBS 水木清华站 bbs.net.tsinghua.edu.cn·[FROM: 162.105.41.166]
--
海纳百川,
有容乃大,
壁立千尺,
无欲则刚。
※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: dip.hit.edu.cn]
Powered by KBS BBS 2.0 (http://dev.kcn.cn)
页面执行时间:208.012毫秒