Tree boosting has empirically proven to be a highly effective and versatile approach for data-driven modelling. The core argument is that tree boosting can adaptively determine the local neighbourhoods of the model thereby taking the bias-variance trade-off into consideration during model fitting.
Recently, a tree boosting method known as XGBoost has gained popularity by providing higher accuracy. XGBoost further introduces some improvements which allow it to deal with the bias-variance trade-off even more carefully. In this research work, we propose to demonstrate the use of an adaptive procedure i.e. Learned Loss (LL) to update the loss function as the boosting proceeds.
Accuracy of the proposed algorithm i.e. XGBoost with Learned Loss boosting function is evaluated using test/train method, K-fold cross validation, and Stratified cross validation method and compared with the state of the art algorithms viz. XGBoost, AdaBoost, AdaBoost-NN, Linear Regression(LR),Neural Network(NN), Decision Tree(DT), Support Vector Machine(SVM), bagging-DT, bagging-NN and Random Forest algorithms. The parameters evaluated are accuracy, Type 1 error and Type 2 error (in Percentages). This study uses total ten years of historical data from Jan 2007 to Aug 2017 of two stock market indices CNX Nifty and S&P BSE Sensex which are highly voluminous.
Further, in this research work, we will investigate how XGBoost differs from the more traditional ensemble techniques. Moreover, we will discuss the regularization techniques that these methods offer and the effect these have on the models.
In addition to this, we will attempt to answer the question of why XGBoost seems to win so many competitions. To do this, we will provide some arguments for why tree boosting, and in particular XGBoost, seems to be such a highly effective and versatile approach to predictive modelling. The core argument is that tree boosting can be seen to adaptively determine the local neighbourhoods of the model. Tree boosting can thus be seen to take the bias-variance trade off into consideration during model fitting. XGBoost further introduces some improvements which allow it to deal with the bias-variance trade off even more carefully.
Inhaltsverzeichnis (Table of Contents)
- CHAPTER I.
- Theoretical Foundations.
- Outline
- AdaBoost
- Gradient boosting
- XGBoost
- Comparison of Boosting Algorithms
- Loss Functions in Boosting Algorithms
- Motivation
- Problem Statement.
- Scope and Main Objectives
- Impact to the Society
- Organization of the Book
- CHAPTER II.
- Literature Review
- History
- XGBoost
- Random Forest.
- AdaBoost
- Loss Function.
- CHAPTER III.
- Proposed Work
- Outline
- Proposed Approach
- Objective of XGBoost
- Parameters.
- Parameters for Tree Booster
- Learning Task Parameters
- Training & Parameter tuning
- What XGBoost Brings to the Table
- Square Logistics Loss Function (SqLL)
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This research investigates the application of XGBoost (Extreme Gradient Boosting) in mining applications, specifically focusing on the use of an adaptive Learned Loss (LL) function to update the loss function during the boosting process. The study aims to demonstrate the effectiveness of XGBoost with LL in comparison to other state-of-the-art algorithms, evaluating accuracy, Type 1 error, and Type 2 error.
- Application of XGBoost with Learned Loss in mining applications
- Comparison of XGBoost with LL to other algorithms
- Evaluation of accuracy, Type 1 error, and Type 2 error
- Analysis of the bias-variance trade-off in XGBoost
- Exploration of XGBoost's advantages over traditional ensemble techniques
Zusammenfassung der Kapitel (Chapter Summaries)
Chapter I: Theoretical Foundations provides a foundation for understanding the concepts of boosting algorithms, including AdaBoost, gradient boosting, and XGBoost. It outlines the motivation for this research, the problem statement, the scope and objectives, and the impact of the findings on society. The chapter also discusses the organization of the book.
Chapter II: Literature Review presents a historical overview of boosting algorithms and explores the existing research on XGBoost, Random Forest, AdaBoost, and loss functions.
Chapter III: Proposed Work outlines the proposed approach, including the objective of XGBoost, the parameters used, and the training and parameter tuning process. The chapter also explores the advantages of XGBoost and introduces the Square Logistics Loss Function (SqLL).
Schlüsselwörter (Keywords)
This research focuses on the application of XGBoost with Learned Loss in mining applications, exploring its effectiveness in comparison to other algorithms, analyzing the bias-variance trade-off, and discussing XGBoost's advantages over traditional ensemble techniques. The key concepts include: XGBoost, Learned Loss, boosting algorithms, ensemble techniques, bias-variance trade-off, accuracy, Type 1 error, Type 2 error, mining applications.
- Citation du texte
- Nonita Sharma (Auteur), 2017, XGBoost. The Extreme Gradient Boosting for Mining Applications, Munich, GRIN Verlag, https://www.grin.com/document/415839