The objective of this work is to provide a comprehensive understanding of data mining: defining its concept, tracing its evolution, elucidating its methods and tasks, and describing its typical process
Table of contents
1 Introduction
1.1 Problem definition
1.2 Aim of the w ork
2 Methods and tasks
2.1 Clustering
2.2 Association
2.3 Decision trees
3 Data mining process
4 Conclusion
5 Bibliography
1. Introduction
1.1 Problem
The amount of information is growing rapidly in today's customer-driven economy. Data is raw material for business growth. The results of analysing data with the help of data mining methods represent a great information potential for the companies that use them.
In the mid-1990s, more and more companies started to use data mining tools.4) The first prototypes were presented at the beginning of the 1990s. They were of interest in business administration, especially in the field of marketing. "Data mining aims to extract knowledge from data".3) The method that independently discovers knowledge in large data sets³. At the end of the 1980s, Knowledge Discovery in Databases KDD developed, the interdisciplinary research direction that is now predominantly referred to as data mining. In the business management field they are synonyms2) . The first use of the term arose in the 1960s in statistics for searching for data patterns.
1.2 Aim of the work
In the context of this work, the term data mining is to be defined, its origin and development described. Its methods and tasks are to be clarified and its typical process is to be described.
2 Methods and tasks
2.1 Clustering
Clustering or customer segmentation aims to divide a set of data that is not yet classified into groups. The data sets are divided in such a way that the similarity within a group is as high as possible and that between the groups is as low as possible.
Our example is clusters of customers of a telephone company. Typical examples are grouping customers into homogeneous groups based on demographic data, such as average age, gender, marital status, or based on their buying behaviour.
2.2 Association
The 2nd method is association, with the help of association rules one can discover unknown dependencies in customer behaviour. It is a classic question: Which customers who have bought product A are also likely to buy product B? I explain the connections using the example of fruit and vegetable sales.
We have 6 transactions.
Abbildung in dieser Leseprobe nicht enthalten
Table1: Fruit and vegetable sales
Table 1 shows us what was bought in all 6 transactions. We now want to formulate a rule that says: if apples are bought, pears are bought. The quality of a rule is determined by its degree of uncertainty. The degree of uncertainty is characterised by 2 numbers called Support and Confidence.
Support is the number of transactions that include all products of both the condition if and the then part. It is expressed as a percentage of the total transactions. For our example, the support is equal to 4/6 or 2/3, i.e. 66%. In 2/3 transactions apples and pears are bought together.
The second value is called the confidence of the rule. It is the quotient of the number of transactions contained in the If and Then part and the number of transactions from the If part. It expresses the dependency, which lies between 0 and 1. The value 1 expresses a mandatory dependency; apples and pears are always bought together.
Abbildung in dieser Leseprobe nicht enthalten
Table 2: Results table
Confidence is equal to 4/5 or 80% in our example. Table 2 shows the uncertainty levels for four association rules. Using data mining methods, we have identified the following patterns: Whoever buys bananas also buys pears is compelling, or whoever buys pears also necessarily buys apples, but not vice versa.
2.3 Decision trees
Decision trees are used to predict customer behaviour. This is a group of logically linked test questions that lead to a required outcome. A tree consists of the following components:
- Node. In each node, an attribute, i.e. a property, is queried and evaluated. Each node, with the exception of the top node, has only one other node directly above it.
- Branches represents decisions regarding the evaluation of the respective property value.
- Hierarchy levels - they are formed by nodes that are equidistant from the roots.
- Leaves embody a group of property values, they have no sub-nodes.
In our example, we proceed according to the following criteria:
1. Low and high turnover volume
2. Customers with or without framework agreements
3. Number of monthly representative visits.
Abbildung in dieser Leseprobe nicht enthalten
[...]
- Arbeit zitieren
- Anonym,, 2014, Data Mining Unveiled. Definition, Evolution, and Key Processes, München, GRIN Verlag, https://www.grin.com/document/1361647
-
Laden Sie Ihre eigenen Arbeiten hoch! Geld verdienen und iPhone X gewinnen. -
Laden Sie Ihre eigenen Arbeiten hoch! Geld verdienen und iPhone X gewinnen. -
Laden Sie Ihre eigenen Arbeiten hoch! Geld verdienen und iPhone X gewinnen. -
Laden Sie Ihre eigenen Arbeiten hoch! Geld verdienen und iPhone X gewinnen. -
Laden Sie Ihre eigenen Arbeiten hoch! Geld verdienen und iPhone X gewinnen.