How to Use This Book
Data Mining is the detection, characterization, and exploitation of actionable patterns in data.
This book is a wide ranging treatment of the practical aspects of data mining in the “real world”. It presents in a systematic way the analytic principles acquired by the author during his 30+ years as a practicing engineer, data miner, information scientist, and Adjunct Professor of Computer Science.
This book is not intended to be read and then put on the shelf. Rather, it is a working “field manual”, designed to serve as an on-the-job guidebook. It has been written specifically for IT consultants, professional data analysts, and sophisticated data owners who want to establish data mining projects, but are not themselves data mining experts.
Most chapters contain one or more cases studies. These are synopses of data mining projects led by the author, and include project descriptions, data mining methods used, challenges encountered, and the results obtained. When possible, numerical details are provided, grounding the presentation in specifics.
Also included are checklists that guide the reader through the practical considerations associated with each phase of the data mining process. These are working checklists: material the reader will want to carry into meetings with customers, planning discussions with management, technical planning meetings with senior scientists. They lay out the questions to ask, the points to make, explain the what’s and why’s… the “lessons learned” that are known to all seasoned experts, but rarely written down.
While the treatment here is systematic, it is not formal: the reader will not encounter eclectic theorems, tables of equations, or detailed descriptions of algorithms. The “bit-level” mechanics of data mining techniques are addressed pretty well in the online literature, and “freeware” is available for many of them (refer also to the appendix of vendors for supported applications). The goal of this book is to help the non-expert address practical questions like:
- What is data mining, and what problems does it address?
- How is a quantitative business case for a data mining project developed and assessed?
- What process model should be used to plan and execute a data mining project?
- What skill sets are needed for different types/phases of data mining?
- What data mining techniques exist, and what do they do? How do I decide which are needed/best for MY problem?
- What are the common mistakes made during data mining projects, and how can they be avoided?
- How are data mining projects tracked and evaluated?