With the increase in the usage of databases in various fields and domains, to overcome the challenges in a centralized data mining environment, more and more databases are distributed in networks. The objective of distributed data mining is to perform data mining operations based on the type and availability of distributed resources. To make a proper choice of a particular DDM system/model, the basic differences between each of them must be understood. This paper produces a survey of some of the DDM systems available. It mainly focusses on the homogeneous DDM models. It discusses methods based on semantic web and grid, multi-agent, mobile agent and i-Analyst. A hybrid method AGrIP is also discussed. A comparative analysis is made considering different key issues of DDM. Each method is described in detail by its method/algorithm.

Excerpt

Inhaltsverzeichnis (Table of Contents)

Introduction
Classification of DDM Systems
- Heterogeneous Vs. Homogeneous
Methods & Architecture
- Extendible Multi Agent Data mining System
- CAKE
- i-Analyst based DDM
- Multi Agent DDM model using AATP
- Mobile Agent in DMM
- DDM based on Semantic Web and Grid
- AGRIP based DDM
COMPARISON & ANALYSIS
- Comparative Analysis:
- Different approaches
- Challenges
CONCLUSION

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This paper presents a survey of Distributed Data Mining (DDM) systems, focusing on homogeneous models. The paper aims to analyze different DDM approaches, emphasizing their strengths and weaknesses in a comparative manner.

Categorization of DDM systems into heterogeneous and homogeneous models
Detailed exploration of various homogeneous DDM methods, including those based on data mining agents, grid computing, meta-learning, and semantic web and grid technologies
Comparative analysis of different DDM approaches based on criteria like openness, platform independence, result quality, and communication cost
Identification of key challenges faced by existing DDM methods, such as result quality and efficiency
Discussion of potential future directions for DDM research, including the integration of cloud computing.

Zusammenfassung der Kapitel (Chapter Summaries)

Introduction: This chapter introduces the concept of distributed data mining (DDM) and its advantages over centralized data mining systems. It highlights key challenges in DDM, such as data inconsistency, communication cost, and knowledge integration. It also presents a general architecture for DDM systems.
Classification of DDM Systems: This chapter presents a classification of existing DDM systems based on the type of data partitioning. It categorizes systems into heterogeneous and homogeneous models, further exploring sub-categories of each type. It provides examples of specific DDM systems within each category.
Methods & Architecture: This chapter provides detailed descriptions of various homogeneous DDM methods, including EMADS, CAKE, i-Analyst, AATP, mobile agent, semantic web and grid, and AGRIP. It discusses the architecture, agents, and key features of each method, highlighting their working principles and advantages.
COMPARISON & ANALYSIS: This chapter provides a comparative analysis of different DDM approaches based on several criteria like openness, platform independence, result quality, and communication cost. It analyzes the strengths and weaknesses of each approach, discussing their specific advantages and limitations.

Schlüsselwörter (Keywords)

This paper focuses on distributed data mining (DDM), exploring a range of techniques including multi-agent systems, i-Agent models, ontology, semantic web, grid computing, and collective data mining (CDM). The paper analyzes various DDM systems, highlighting their advantages, limitations, and key challenges, particularly regarding result quality and efficiency.

Frequently Asked Questions

What is Distributed Data Mining (DDM)?

DDM is a field of data mining where data is stored in different locations across a network, and mining operations are performed locally to reduce communication costs.

What is the difference between homogeneous and heterogeneous DDM?

Homogeneous systems deal with data having the same schema across sites, while heterogeneous systems deal with data having different attributes or schemas.

What are the main challenges in DDM?

Key challenges include maintaining result quality, minimizing communication overhead, and ensuring knowledge integration from various sources.

What role do Multi-Agent Systems play in DDM?

Agents are used to autonomously perform mining tasks at local sites and coordinate with other agents to form a global result.

How does Grid Computing benefit Data Mining?

Grid computing provides the massive computational power and resource sharing needed to process large-scale distributed datasets efficiently.

Excerpt out of 5 pages - scroll top

Details

Title: Survey on Distributed Data Mining Systems
College: University of North Texas (Department of Computer Science)
Course: Distributed and Parallel Databases
Grade: A
Authors: Swetha Reddy Allam (Author), Kotagiri Santhosh (Author)
Publication Year: 2014
Pages: 5
Catalog Number: V294717
ISBN (eBook): 9783656929604
ISBN (Book): 9783656929611
Language: English
Tags: survey distributed data mining systems
Product Safety: GRIN Publishing GmbH

Quote paper: Swetha Reddy Allam (Author), Kotagiri Santhosh (Author), 2014, Survey on Distributed Data Mining Systems, Munich, GRIN Verlag, https://www.grin.com/document/294717

Survey on Distributed Data Mining Systems