In this we discuss security issues for big data Hadoop environment. Big data applications are a great benefit to organization, business and in many small and large scale industries. Security and privacy issues are magnified by velocity, variety and volume of big data. Hadoop projects security as top agenda which in turn represents classified as critical term. With the increasing acceptance of Hadoop, there is increasing trend to create a vast security feature. Therefore a traditional security mechanism, which are tailored to securing a small scale static data are in adequate. The important issues relating to Hadoop are authentication, authorization, editing and encryption within a cluster. In this paper we have highlighted different security aspects of big data Hadoop.
Table of Contents
- I. INTRODUCTION
- II. TRADITIONAL HADOOP SECURITY
- III. SECURITY ISSUES AND CHALLENGES
- A. Fragmented data
- B. Node to node communication
- C. Distributed computing
- D. Interaction with client
- E. Controlling data access
- IV. SECURITY SOLUTIONS FOR HADOOP
- A. Authentication
- B. Authorization
- C. Encryption
Objectives and Key Themes
This paper aims to analyze the security issues and challenges related to big data processing using the Hadoop framework. It explores the evolution of Hadoop security from its initial lack of a robust model to the development of various solutions addressing authentication, authorization, and encryption.
- Security vulnerabilities in traditional Hadoop implementations.
- Challenges posed by big data's volume, velocity, and variety to security.
- Security solutions for authentication in Hadoop ecosystems.
- Methods for authorization and access control in Hadoop.
- Techniques for data encryption in Hadoop, both at rest and in motion.
Chapter Summaries
I. INTRODUCTION: This introductory chapter defines big data, highlighting its key characteristics: volume, velocity, and variety. It emphasizes the challenges posed by these characteristics and introduces Hadoop as a solution for processing large datasets in a distributed environment. The chapter describes Hadoop's architecture and components, including HDFS and MapReduce, setting the stage for a discussion of security concerns within this framework. The sheer scale of big data, as exemplified by the exponential growth of data creation and processing, is underscored, highlighting the need for robust security mechanisms.
II. TRADITIONAL HADOOP SECURITY: This chapter discusses the initial security shortcomings of Hadoop. It details the absence of a comprehensive security model in early Hadoop versions, leading to vulnerabilities such as impersonation and lack of granular access control. The limitations of relying on basic mechanisms like Kerberos, firewalls, and HDFS permissions are explained, emphasizing the inadequacy of traditional security approaches in the context of a large-scale distributed system. The chapter lists various categories of security violations and potential threats, highlighting the need for more sophisticated security measures.
III. SECURITY ISSUES AND CHALLENGES: This section delves into the specific security challenges presented by the Hadoop architecture. It addresses issues stemming from fragmented data across multiple nodes, insecure node-to-node communication via RPC over TCP/IP, the increased attack surface of distributed computing, vulnerabilities in client-server interactions, and the limitations of existing database security schemas when applied to Hadoop's unique data model. The chapter explores the complexities of securing access to data spread across numerous servers in a constantly changing environment.
IV. SECURITY SOLUTIONS FOR HADOOP: This chapter presents various security solutions designed to address the vulnerabilities discussed in the previous sections. It focuses on authentication, authorization, and encryption techniques employed within the Hadoop ecosystem. Specific technologies, such as Kerberos, SASL, Apache Knox, Apache Sentry, and Project Rhino, are discussed in detail, along with their respective functionalities and contributions to enhancing the overall security of Hadoop implementations. The chapter illustrates how different components of the Hadoop ecosystem require tailored security approaches.
Keywords
Big data, Hadoop, HDFS, MapReduce, security, authentication, authorization, encryption, distributed computing, data privacy, Kerberos, Apache Knox, Apache Sentry, Project Rhino, security challenges, big data security solutions.
Frequently Asked Questions: Hadoop Security in Big Data Processing
What is the main topic of this document?
This document provides a comprehensive overview of security in Hadoop, a framework for processing big data. It analyzes traditional Hadoop security weaknesses, explores the challenges posed by big data's characteristics (volume, velocity, variety), and details various security solutions including authentication, authorization, and encryption techniques.
What are the key themes explored in this document?
The key themes include: security vulnerabilities in traditional Hadoop implementations; challenges posed by big data's volume, velocity, and variety to security; security solutions for authentication in Hadoop ecosystems; methods for authorization and access control in Hadoop; and techniques for data encryption in Hadoop (at rest and in motion).
What are the security issues and challenges discussed regarding Hadoop?
The document highlights several key security challenges: fragmented data across multiple nodes, insecure node-to-node communication, the increased attack surface of distributed computing, vulnerabilities in client-server interactions, and difficulties in controlling data access across a distributed system. It emphasizes the inadequacy of traditional security approaches in the context of a large-scale distributed system like Hadoop.
What security solutions for Hadoop are presented?
The document explores various security solutions focusing on authentication, authorization, and encryption. Specific technologies like Kerberos, SASL, Apache Knox, Apache Sentry, and Project Rhino are mentioned, along with their roles in enhancing Hadoop's security. The document emphasizes the need for tailored security approaches for different components of the Hadoop ecosystem.
What are the chapters of this document and what do they cover?
The document is structured into four chapters: I. Introduction (defines big data, introduces Hadoop and its architecture); II. Traditional Hadoop Security (discusses initial security shortcomings and vulnerabilities); III. Security Issues and Challenges (delves into specific security challenges posed by Hadoop's architecture); and IV. Security Solutions for Hadoop (presents various security solutions including authentication, authorization, and encryption techniques).
What are the keywords associated with this document?
Keywords include: Big data, Hadoop, HDFS, MapReduce, security, authentication, authorization, encryption, distributed computing, data privacy, Kerberos, Apache Knox, Apache Sentry, Project Rhino, security challenges, and big data security solutions.
What is the overall aim of this document?
The document aims to analyze the security issues and challenges related to big data processing using Hadoop, tracing the evolution of Hadoop security from its initial weaknesses to the development of more robust solutions.
Who is the intended audience for this document?
This document is intended for an academic audience interested in big data security, particularly concerning the Hadoop framework. It is suitable for researchers, students, and professionals seeking a comprehensive understanding of security considerations within large-scale data processing systems.
- Quote paper
- Rohit Sharma (Author), 2018, Security Issues of Big Data Hadoop, Munich, GRIN Verlag, https://www.grin.com/document/413453