精华区文章阅读

发信人: Lerry (life is waiting...), 信区: Algorithm
标题: 数据挖掘——概念与技术（影印版）
发信站: 哈工大紫丁香 (2002年11月09日14:56:23 星期六), 站内信件

数据挖掘——概念与技术（影印版） ISBN 7-04-010041-X/TP.693 P575
　　Data Mining: Concepts and Techniques
　　Jiawei Han, Micheline Kamber，2001.4出版，定价：35.00元
本书阐述了数据挖掘（通常称为数据库知识发现）的概念、方法和应用。从强调数据分
析入手，介绍了数据库和数据挖掘的概念，指出数据挖掘是对大型数据库、数据构件库
和其他大型信息资源中标识知识含义的那些类型的自动的或便捷的提取，并通过一个通
用的框架回顾了当前的市场可供产品。数据挖掘是一个跨学科的知识领域，汲取了数据
库技术、人工智能、机器学习、神经网络、统计学、模式识别、知识库系统、知识获取
、信息检索、高性能计算、数据可视化等方面的成果，本书内容从数据库的视角，描述
了数据挖掘系统的原型、结构、特征、方法，重点讲解了数据挖掘的可行性、实用性、
有效性和大型数据库中模型发现的可测量性等问题。本书逐章讲解了数据分类、预测、
联结和分组的概念和技术，这些专题都配有实例，对各类问题都分别列举了最佳算法，
并对怎样运用技术给出了经过实践检验的实用型规则。这种讲述方式决定了本书的可读
性强，能够使读者从中学到数据挖掘领域的知识，了解产业最新动向。本书适用于计算
机科学系的学生、应用软件开发人员、商业领域的专家和相关知识领域的科技研究人员
。
　　内容：1. 数据挖掘简介 2. 数据构件库和数据挖掘中的在线分析处理技术 3. 数据
处理 4. 数据挖掘原型、语言和系统结构 5. 概念描述：特征与对比 6. 大型数据库中
的挖掘联结规则 7. 分类和预测 8. 分组分析9. 挖掘复合数据类型 10. 数据挖掘应用
及趋势附录一微软公司数据挖掘的对象链接和嵌入数据库附录二数据库挖掘器简介

Foreword
by Jim Gray
Microsoft Research
　　We are deluged by data-scientific data, medical data, demographic data,
financial data, and marketing data. People have no time to look at this data
. Human attention has become a precious resource. So, we must find ways to a
utomatically analyze the data, to automatically classify it, to automaticall
y summarize it, to automatic ally discover and characterize trends in it, an
d to automatically flag anomalies . This is one of the most active and excit
ing areas of the database research community. Researchers in areas such as s
tatistics, visualization, artificial intelligence, and machine learning are
contributing to this field. The breadth of the field makes it difficult to g
rasp its extraordinary progress over the last few years.
　　Jiawei Han and Micheline Kamber have done a wonderful job of organizing
and presenting data mining in this very readable textbook. They begin by giv
ing quick introductions to database and data mining concepts with particular
emphasis on data analysis. They review the current product offerings by pre
senting a general framework that covers them all. They then cover in a chapt
er-by-chapter tour the concepts and techniques that underlie classification,
prediction, association, and clustering. These topics are presented with ex
amples, a tour of the best algorithms for each problem class, and pragmatic
rules of thumb about when to apply each technique. I found this presentation
style to be very readable, and I certainly learned a lot from reading the b
ook. Jiawei Han and Micheline Kamber have be enleading contributors to data
mining research. This is the text they use with their students to bring them
up to speed on the field. The field is evolving very rapidly, but this book
is a quick way to learn the basic ideas, and to understand where the field
is today. I found it very informative and stimulating, and I expect you will
too.
Contents
Foreword
Preface
Chapter 1 Introduction
1.1 What Motivated Data Mining? Why is it important?
1.2 So, What is Data Mining?
1.3 Data Mining-On What Kind of Data?
　　1.3.1 Relational Databases
　　1.3.2 Data Warehouses
　　1.3.3 Transactional Databases
　　1.3.4 Advanced Database Systems and Advanced Database Applications
1.4 Data Mining Functionalities—What Kinds of Patterns Can Be Mined?
　　1.4.1 Concept/Class Description: Characterization and Discrimination
　　1.4.2 Association analysis
　　1.4.3 Classification and Prediction
　　1.4.4 Cluster Analysis
　　1.4.5 Outlier Analysis
　　1.4.6 Evolution analysis
1.5 Are All of the Patterns Interesting?
1.6 classification of Data Mining systems
1.7 major lssues in data Mining
1.8 summary
　　Exercises
　　Bibliographic Notes
Chapter 2 Data Warehouse and OLAP Technology for Data Mining
2.1 What is a Data Warehouse?
　　2.1.1 Differences between Operational Database systems and Data Warehous
es
　　2.1.2 But, Why Have a Separate Data warehouse?
2.2 A Multidimensional Data Model
　　2.2.1 From Tables and spreadsheets to Data Cubes
　　2.2.2 Stars, Snowflakes, and Fact Constellations: Schemas for Multidimen
sional Databases
　　2.2.3 Examples for Defining Star, Snowflake, and Fact Constellation Sche
mas
　　2.2.4 Measures: Their Categorization and Computation
　　2.2.5 Introducing Concept hierarchies
　　2.2.6 OLAP Operations in the Multidimensional Data Model
　　2.2.7 A Starnet Query Model for Querying Multidimensional Databases
2.3 Data warehouse Architecture
　　2.3.1 Steps for the Design and Construction of Data Warehouses
　　2.3.2 A Three-Tier Data Warehouse Architecture
　　2.3.3 Types of OLAP Servers: ROLAP versus MOLAP versus HOLAP
2.4 Data Warehouse Implementation
　　2.4.1 Efficient computation of Data Cubes
　　2.4.2 Indexing OLAP Data
　　2.4.3 Efficient Processing of OLAP Queries
　　2.4.4 Metadata Repository
　　2.4.5 Data Warehouse Back-End Tools and Utilities
2.5 Further Development of Data cube Technology
　　2.5.1 Discovery-Driven Exploration of Data Cubes
　　2.5.2 Complex Aggregation at Multiple Granularities:Multifeature Cubes
　　2.5.3 Other Developments
2.6 From Data Warehousing to Data Mining
　　2.6.1 Data Warehouse Usage
　　2.6.2 From On-Line Analytical Processing to On-Line Analytical Mining
2.7 Summary
　　Exercises
　　Bibliographic Notes
Chapter 3 Data Preprocessing
3.1 Why Preprocess the Data?
3.2 Data Cleaning
　　3.2.1 Missing Values
　　3.2.2 Noisy Data
　　3.2.3 Inconsistent Data
3.3 Data Integration and Transformation
　　3.3.1 Data Integration
　　3.3.2 Data Transformation
3.4 Data Reduction
　　3.4.1 Data Cube Aggregation
　　3.4.2 Dimensionality Reduction
　　3.4.3 Data Compression
　　3.4.4 Numerosity Reduction
3.5 Discretization and concept hierarchy Generation
　　3.5.1 Discretization and concept Hierarchy Generation for Numeric Data
　　3.5.2 Concept Hierarchy Generation for Categorical Data
3.6 Summary
　　Exercises
　　Bibliographic Notes
Chapter 4 Data Mining Primitives, Languages, and System Architectures
4.1 Data Mining Priming Primitives:What Defines a Data Mining Task?
　　4.1.1 Task-Relevant Data
　　4.1.2 The Kind of Knowledge to be Mined
　　4.1.3 Background Knowledge:Concept Hierarchies
　　4.1.4 Interestingness Measures
　　4.1.5 Presentation and Visualization of Discovered Patterns
4.2 A Data Mining Query language
　　4.2.1 Syntax for Task-Relevant Data specification
　　4.2.2 Syntax for Specifying the Kind of Knowledge to be Mined
　　4.2.3 Syntax for Concept Hierarchy Specification
　　4.2.4 Syntax for Interestingness Measure Specification
　　4.2.5 Syntax for Pattern Presentation and Visualization Specification
　　4.2.6 Putting It All Together—An Example of a DMOL Query
　　4.2.7 Other Data Mining Languages and the Standardization of Data Mining
Primitives
4.3 Designing Graphical User Interfaces Based on a Data Mining Query Languag
e
4.4 Architectures of Data Mining Systems
4.5 Summary
　　Exercises
　　Bibliographic Notes
Chapter 5 Concept Description: Characterization and Comparison
5.1 What Is Concept Description?
5.2 Data Generalization and Summarization-Based Characterization
　　5.2.1 Attribute-Oriented Induction
　　5.2.2 Efficient Implementation of Attribute-Oriented induction
　　5.2.3 Presentation of the Derived Generalization
5.3 Analytical Characterization: Analysis of Attribute Relevance
　　5.3.1 Why Perform Attribute Relevance analysis?
　　5.3.2 Methods of Attribute Relevance Analysis
　　5.3.3 Analytical characterization: An Example
5.4 Mining Class Comparisons: Discriminating between Different Classes
　　5.4.1 Class Comparison Methods and Implementations
　　5.4.2 Presentation of Class Comparison Descriptions
　　5.4.3 Class Description: Presentation of Both Characterization and Compa
rison
5.5 Mining Descriptive Statistical Measures in Large Databases
　　5.5.1 Measuring the Central Tendency
　　5.5.2 Measuring the Dispersion of Data
　　5.5.3 Graph Displays of Basic Statistical Class Descriptions
5.6 Discussion
　　5.6.1 Concept Description: A Comparison with Typical Machine Learning Me
thods
　　5.6.2 Incremental and Parallel Mining of Concept Description
5.7 Summary
　　Exercises
　　Bibliographic Notes
Chapter 6 Mining Association Rules in Large Databases
6.1 Association Rule Mining
　　6.1.1 Market Basket Analysis: A Motivating Example for Association Rule
Mining
　　6.1.2 Basic Concepts
　　6.1.3 Association Rule Mining: A Road Map
6.2 Mining Single-Dimensional Boolean Association Rules from Transactional D
atab ases
　　6.2.1 The Apriori algorithm: Finding Frequent Itemsets Using candidate G
eneration
　　6.2.2 Generating Association Rules from Frequent Itemsets
　　6.2.3 Improving the Efficiency of Apriori
　　6.2.4 Mining Frequent Itemsets without Candidate Generation
　　6.2.5 Iceberg Queries
6.3 Mining Multilevel Association rules from transaction Databases
　　6.3.1 Multilevel Association rules
　　6.3.2 Approaches to Mining Multilevel Association Rules
　　6.3.3 Checking for Redundant Multilevel Association rules
6.4 Mining Multidimensional Association rules from Relational databases and
Data Warehouses
　　6.4.1 Multidimensional Association rules
　　6.4.2 Mining Multidimensional Association rules Using static discretizat
ion of quantitative Attributes
　　6.4.3 Mining quantitative Association Rules
　　6.4.4 Mining distance-Based Association Rules
6.5 From Association Mining to correlation analysis
　　6.5.1 Strong Rules Are Not Necessarily Interesting: An Example
　　6.5.2 From Association Analysis to Correlation Analysis
6.6 Constraint-Based Association Mining
　　6.6.1 Metarule-Guided Mining of Association Rules
　　6.6.2 Mining Guided by additional Rule Constraints
6.7 Summary
　　Exercises
　　Bibliographic Notes
Chapter 7 Classification and Prediction
7.1 What Is Classification? What Is Prediction?
7.2 Issues Regarding Classification and Prediction
　　7.2.1 Preparing the Data for Classification and Prediction
　　7.2.2 Comparing Classification Methods
7.3 Classification by Decision Tree Induction
　　7.3.1 Decision Tree Induction
　　7.3.2 Tree Pruning
　　7.3.3 Extracting Classification Rules from Decision Trees
　　7.3.4 Enhancements to Basic Decision Tree Induction
　　7.3.5 Scalability and Decision Tree Induction
　　7.3.6 Integrating Data warehousing techniques and Decision Tree Inductio
n
7.4 Bayesian Classification
　　7.4.1 Bayes Theorem
　　7.4.2 Naive Bayesian Classification
　　7.4.3 Bayesian Belief Networks
　　7.4.4 Training Bayesian Belief Networks
7.5 Classification by Backpropagation
　　7.5.1 A Multilayer Feed-Forward Neural Network
　　7.5.2 Defining a Network Topology
　　7.5.3 Backpropagation
　　7.5.4 Backpropagation and Interpretability
7.6 Classification Based on Concepts from Association Rule Mining
7.7 Other Classification Methods
　　7.7.1 k-Hnarest Neighbor Classifiers
　　7.7.2 Case-Based Reasoning
　　7.7.3 Genetic Algorithms
　　7.7.4 Rough Set Approach
　　7.7.5 Fuzzy Set Approaches
7.8 Prediction
　　7.8.1 Linear and Multiple Regression
　　7.8.2 Nonlinear regression
　　7.8.3 Other Regression Models
7.9 Classifier Accuracy
　　7.9.1 Estimating Classifier Accuracy
　　7.9.2 Increasing Classifier Accuracy
　　7.9.3 Is Accuracy Enough to Judge a Classifier?
7.10 Summary
　　Exercises
　　Bibliographic Notes
Chapter 8 Cluster Analysis
8.1 What Is Cluster Analysis?
8.2 Types of Data in Cluster Analysis
　　8.2.1 Interval-Scaled Variables
　　8.2.2 Binary Variables
　　8.2.3 Nominal, Ordinal, and Ratio-Scaled Variables
　　8.2.4 Variables of Mixed Types
8.3 A Categorization of Major Clustering methods
8.4 Partitioning Methods
　　8.4.1 Classical Partitioning Methods: k-Means and k-Medoids
　　8.4.2 Partitioning Methods in large Databases: From k-Medoids to CLARANS

8.5 Hierarchical Methods
　　8.5.1 Agglomerative and Divisive hierarchical Clustering
　　8.5.2 BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchie
s
　　8.5.3 CURE:Clustering Using REpresentatives
　　8.5.4 Chameleon:A Hierarchical Clustering Algorithm Using Dynamic Modeli
ng
8.6 Density-Based Methods
　　8.6.1 DBSCAN: A Density-Based Clustering Method Based on Connected Regio
ns with Sufficiently High Density
　　8.6.2 OPTICS: Ordering Points To Identify the Clustering Structure
　　8.6.3 DENCLUE: Clustering Based on Density Distribution Functions
　　8.6 Grid-Based Methods
8.7 Grid-Based Methods
　　8.7.1 STING: STatistical INformation Grid
　　8.7.2 WaveCluster: Clustering Using Wavelet Transformation
　　8.7.3 CLIQUE: Clustering High-Dimensional Space
8.8 Model-Based Clustering Methods
　　8.8.1 Statistical Approach
　　8.8.2 Neural Network Approach
8.9 Outlier Analysis
　　8.9.1 Statistical-Based Outlier Detection
　　8.9.2 Distance-Based Outlier Detection
　　8.9.3 Deviation-Based Outlier Detection
8.10 Summary
　　Exercises
　　Bibliographic Notes
Chapter 9 Mining Complex Types of Data
9.1 Multidimensional Analysis and Descriptive Mining of Complex Data Objects

　　9.1.1 Generalization of Structured Data
　　9.1.2 Aggregation and approximation in Spatial and Multimedia Data Gener
alization
　　9.1.3 Generalization of Object Identifiers and Class/Subclass Hierarchie
s
　　9.1.4 Generalization of Class Composition Hierarchies
　　9.1.5 Construction and Mining of Object Cubes
　　9.1.6 Generalization-Based Mining of Plan Databases by Divide-and-Conque
r
9.2 Mining Spatial Databases
　　9.2.1 Spatial Data Cube Construction and spatial OLAP
　　9.2.2 Spatial Association Analysis
　　9.2.3 Spatial Clustering Methods
　　9.2.4 Spatial Classification and Spatial Trend Analysis
　　9.2.5 Mining Raster Databases
9.3 Mining Multimedia Databases
　　9.3.1 Similarity Search in Multimedia Data
　　9.3.2 Multidimensional analysis of Multimedia Data
　　9.3.3 Classification and Prediction Analysis of Multimedia Data
　　9.3.4 Mining Associations in Multimedia Data
9.4 Mining Time-Series and Sequence Data
　　9.4.1 Trend Analysis
　　9.4.2 Similarity Search in Time-Series Analysis
　　9.4.3 Sequential Pattern Mining
　　9.4.4 Periodicity Analysis
9.5 Mining Text Databases
　　9.5.1 Text Data Analysis and Information Retrieval
　　9.5.2 Text Mining: Keyword-Based Association and Document Classification

9.6 Mining the World Wide Web
　　9.6.1 Mining the Web's Link Structures to Identify Authoritative Web Pag
es
　　9.6.2 Automatic Classification of Web Documents
　　9.6.3 Construction of a Multilayered Web Information Base
　　9.6.4 Web Usage Mining
9.7 Summary
　　Exercises
　　Bibliographic Notes
Chapter 10 Applications and Trends in Data Mining
10.1 Data Mining Applications
　　10.1.1 Data Mining for Biomedical and DNA Data Analysis
　　10.1.2 Data Mining for Financial Data Analysis
　　10.1.3 Data Mining for the Retail Industry
　　10.1.4 Data Mining for the Telecommunication Industry
10.2 Data Mining System Products and Research Prototypes
　　10.2.1 How to Choose a Data Mining System
　　10.2.2 Examples of Commercial Data Mining Systems
10.3 Additional Themes on Data Mining
　　10.3.1 Visual and Audio Data Mining
　　10.3.2 Scientific and Statistical Data Mining
　　10.3.3 Theoretical Foundations of Data Mining
　　10.3.4 Data Mining and Intelligent Query Answering
10.4 Social Impacts of Data Mining
　　10.4.1 Is Data Mining a Hype or a Persistent, Steadily Growing Business?

　　10.4.2 Is Data Mining Merely Managers' Busness or Everyone's Business?
　　10.4.3 Is Data Mining a Threat to Privacy and Data Security?
10.5 Trends in Data Mining
10.6 Summary
　　Exercises
　　Bibliographic Notes
Appendix A An Introduction to Microsoft's OLE DB for Data Mining
　　A.1 Creating a DMM object
　　A.2 Inserting training Data into the Model and Training the Model
　　A.3 Using the Model
Appendix B An Introduction to DBMiner
　　B.1 System Architecture
　　B.2 Input and Output
　　B.3 Data Mining Tasks Supported by the System
　　B.4 Support for Task and Method Selection
　　B.5 Support of the KDD Process
　　B.6 Main Applications
　　B.7 Current Status
Bibliography
Index
Preface
　　Our capabilities of both generating and collecting data have been increa
sing rap idly in the last several decades. Contributing factors include the
widespread us e of bar codes for most commercial products, the computerizati
on of many business, scientific, and government transactions, and advances i
n data collection tool s ranging from scanned text and image platforms to sa
tellite remote sensing systems. In addition, popular use of the World Wide W
eb as a global information system has flooded us with a tremendous amount of
data and information. This explosive growth in stored data has generated an
urgent need for new techniques and automated tools that can intelligently a
ssist us in transforming the vast amounts of data into useful information an
d knowledge.
　　This book explores the concepts and techniques of data mining, a promisi
ng and flourishing frontier in database systems and new database application
s. Data mining, also popularly referred to as knowledge discovery in databas
es (KDD), is the automated or convenient extraction of patterns representing
knowledge implicitly stored in large databases, data warehouses, and other
massive information repositories.
　　Data mining is a multidisciplinary field, drawing work from areas includ
ing data base technology, artificial intelligence, machine learning, neural
networks, statistics, pattern recognition, knowledge-based systems, knowledg
e acquisition, in formation retrieval, high-performance computing, and data
visualization. We present the material in this book from a database perspect
ive. That is, we focus on issues relating to the feasibility, usefulness, ef
ficiency, and scalability of techniques for the discovery of patterns hidden
in large databases. as a result, this book is not intended as an introducti
on to database systems, machine learning , statistics, or other such areas,
although we do provide the background necessary in these areas in order to f
acilitate the reader's comprehension of their respective roles in data minin
g. Rather, the book is a comprehensive introduction to data mining, presente
d with database issues in focus. It should be useful for computing science s
tudents, application developers, and business professionals, as well as rese
archers involved in any of the disciplines listed above.
　　Data mining emerged during the late 1980s, has made great strides during
the 199 0s, and is expected to continue to flourish into the new millennium
. This book p resents an overall picture of the field from a database resear
cher's point of view, introducing interesting data mining techniques and sys
tems, and discussing applications and research directions. An important moti
vation for writing this boo k was the need to build an organized framework f
or the study of data mining—a challenging task owing to the extensive multi
disciplinary nature of this fast developing field. We hope that this book wi
ll encourage people with different backgrounds and experiences to exchange t
heir views regarding data mining so as to contribute toward the further prom
otion and shaping of this exciting and dynamic field.
To the Teacher
　　This book is designed to give a broad, yet in-depth overview of the fiel
d of data mining. You will find it useful for teaching a course on data mini
ng at an advanced undergraduate level or the first-year graduate level. In a
ddition, individual chapters may be included as material for courses on sele
cted topics in database systems or in artificial intelligence. We have tried
to make the chapters as self-contained as possible so that you are not conf
ined to reading each chapter in sequence. For a course taught at the undergr
aduate level, you might use Chapters 1 through 8 as the core course material
. Remaining class material may be selected from among the more advanced topi
cs described in Chapters 9 and 10. For a graduate-level course, you may choo
se to cover the entire book in one semester.
　　Each chapter ends with a set of exercises, suitable as assigned homework
. The exercises are either short questions that text basic mastery of the ma
terial covered, or longer questions that require analytical thinking.
To the Student
　　We hope that this textbook will spark your interest in the fresh, yet ev
olving field of data mining. We have attempted to present the material in a
clear manner, with careful explanation of the topics covered. Each chapter e
nds with a summary describing the main points. We have included many figures
and illustrations throughout the text in order to make the book more enjoya
ble and “reader-friendly”. Although this book was designed as a textbook,
we have tried to organize it so that it will also be useful to you as a refe
rence book or handbook, should you later decide to pursue a career in data m
ining.
　　What do you need to know in order to read this book?
　　·You should have some knowledge of the concepts and terminololgy associ
ated with database systems. However, we do try to provide enough background
of the basics in database technology, so that if your memory is a bit rusty,
you will not have trouble following the discussions in the book. You should
have some knowledge of database querying, although knowledge of any specifi
c query language is not required.
　　·You should have some programming experience. In particular, you should
be able t o read pseudocode, and understand simple data structures such as
multidimensional arrays.
　　·It will be helpful to have some preliminary background in statistics,
machine learning, or pattern recognition. However, we will familiarize you w
ith the basic concepts of these areas that are relevant to data mining from
a database perspective.
To the Professional
　　This book was designed to cover a broad range of topics in the field of
data mining. As a result, it is an excellent handbook on the subject. Becaus
e each chapter is designed to be as stand-alone as possible, you can focus o
n the topics that most interest you. Much of the book is suited to applicati
ons programmers or information service managers like yourself who with to le
arn about the key ideas of data mining on their own.
　　The techniques and algorithms presented are of practical utility. Rather
than selecting algorithms that perform well on small “toy”databases, the
algorithms described in the book are geared for the discovery of data patter
ns hidden in large, real databases. In Chapter 10, we briefly discuss data m
ining systems in commercial use, as well as promising research prototypes. E
ach algorithm presented in the book is illustrated in pseudocode. The pseudo
code is similar to the C programming language, yet is designed so that it sh
ould be easy to follow by programmers unfamiliar with C or C++. If you wish
to implement any of the algorithms, you should find the translation of our p
seudocode into the programming language o f your choice to be a fairly strai
ghtforward task.
Organization of the Book
　　The book is organized as follows.
　　Chapter 1 provides an introduction to the multidisciplinary field of dat
a mining. It discusses the evolutionary path of database technology that has
led to the need for data mining, and the importance of its application pote
ntial. The basic architecture of data mining systems is described, and a bri
ef introduction to t he concepts of database systems and data warehouses is
given. A detailed classification of data mining tasks is presented, based on
the different kinds of knowledge to be mined. A classification of data mini
ng systems is presented, and major challenges in the field are discussed.
　　Chapter 2 is an introduction to data warehouses and OLAP (On-Line Analyt
ical Processing). Topics include the concept of data warehouses and multidim
ensional databases, the construction of data cubes, the implementation of on
-line analytical processing, and the relationship between data warehousing a
nd data mining.
　　Chapter 3 describes techniques for preprocessing the data prior to minin
g. Methods of data cleaning, data integration and transformation, and data r
eduction are discussed, including the use of concept hierarchies for dynamic
and static discretization. The automatic generation of concept hierarchies
is also described.
　　Chapter 4 introduces the primitives of data mining that define the speci
fication of a data mining task. It describes a data mining query language (D
MQL) and pro vides examples of data mining queries. Other languages are also
described, as well as the construction of graphical user interfaces and dat
a mining architecture s.
　　Chapter 5 describes techniques for concept description, including charac
terization and discrimination. An attribute-oriented generalization techniqu
e is introduced, as well as its different implementations including a genera
lized relation technique and a multidimensional data cube technique. Several
forms of knowledge presentation and visualization are illustrated. Relevanc
e analysis is discussed. Methods for class comparison at multiple abstractio
n levels and methods for the extraction of characteristic rules and discrimi
nant rules with interestingness measurements are presented. In addition, sta
tistical measures for descriptive mining are discussed.
　　Chapter 6 presents methods for mining association rules in transaction d
atabases as well as relational databases and data warehouses. It includes a
classification of association rules, a presentation of the basic Apriori alg
orithm and its variations, and techniques for mining multilevel association
rules, multidimensional association rules, quantitative association rules, a
nd correlation rules. A new technique called frequent pattern growth is intr
oduced, which mines frequent patterns without candidate set generation. Stra
tegies for finding interesting rules by constraint-based mining and the use
of interestingness measures to focus the rule search are also described.
　　Chapter 7 describes methods for data classification and prediction, incl
uding decision tree induction, Bayesian classification, the neural network t
echnique of backpropagation, k-nearest neighbor classifiers, case-based reas
oning, genetic algorithms, rough set theory, and fuzzy set approaches. Class
ification based on concepts from association rule mining is presented. Metho
ds of regression are introduced, and issues regarding classifier accuracy ar
e discussed.
　　Chapter 8 describes methods of cluster analysis. It first introduces the
concept of data clustering and then presents several major data clustering
approaches, including partition-based clustering, hierarchical clustering, a
nd model-based clustering. Methods for clustering continuous data, discrete
data, and data in multidimensional data cubes are presented. The scalability
of clustering algorithm s is discussed in detail.
　　Chapter 9 discusses methods for data mining in advanced database systems
. It includes data mining in object-oriented databases, spatial databases, m
ultimedia databases, time-series databases, text databases, and the World Wi
de Web.
　　Finally, in Chapter 10, we summarize the concepts presented in this book
and discuss applications of data mining and some challenging research issue
s.
　　Throughout the text, italic is used to emphasize terms that are defined,
while bold is used to highlight main ideas.
Errors
　　It is likely that this book may contain typos, errors, or omissions. If
you notice any errors, have suggestions regarding additional exercises, or h
ave other constructive criticism, we would be very happy to hear from you. W
e welcome and appreciate your suggestions. You can send your comments to
　　Data mining: Concepts and Techniques
　　Intelligent Database Systems Research Laboratory
　　School of Computing Science
　　Simon Fraser University
　　Burnaby, British Columbia
　　Canada V5A IS6
　　Fax: (604) 291-3045
Alternatively, you can use electronic mail to submit bug reports, request a
list of known errors, or make constructive suggestions. To receive instructi
on, send e-mail to dmbook@cs.sfu.ca with “Subject: help” in the message he
ader. We regret that we cannot personally respond to all e-mail messages. Th
e errata of the book and other updated information related to the book can b
e found by referencing the Web address www.cs.sfu.ca/~han/Dm_Book.
Acknowledgments
　　We would like to express our sincere thanks to all those who have worked
or are currently working with us on data mining related research and/or the
DBMiner pro iect, or have provided us with various support in data mining.
These include Rak esh Agrawal, Stella Atkins, Yvan Bedard, Binay Bhattachary
a, Dora(Yandong) Cai, Nick Cercone, Surajit Chaudhuri, Sonny H.S. Chee, Jian
ping Chen, MingSyan Chen, Qing Chen, Qiming Chen, Shan Cheng, David Cheung,
Shi Cong, Son Dao, Umeshwar Da yal, James Delgrande, Guozhu Dong, Carole Edw
ards, Max Egenhofer, Martin Ester, Usama Fayyad, Ling Feng, Ada Fu, Yongjian
Fu, Daphne Gelbart, Randy Goebel, Jim Gray, Robert Grossman, Wan Gong, Yike
Guo, Eli Hagen, Howard hamilton, Jing He, Larry Henschen Jean Hou, MeiChun
Hsu, Kan Hu, Haiming Huang, Yue Huang, Julia It skevitch, Wen Jin, Tiko kame
da, Hiroyuki Kawano, Rizwan, Rizwan Kheraj, Eddie Ki m, Krzystof Koperski, H
ans-Peter Kriegel, Vipin Kumar, Laks V.S. L
--
６、如果你一事无成，不是父母的错。所以不要怨天尤人，

要学会从错误中学习。

——比尔·盖茨

※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: 218.7.33.243]

Algorithm 版 (精华区)