In this thesis it is predicted if a regarded firm will grow extraordinary in the next year and maybe even become a big company in the medium term. This is crucial information for private investors and fund managers who need to decide whether they should invest in a certain firm. Companies like Apple and Amazon have shown in the past that people who recognized the potential of such companies and bought their shares have earned a lot of money.
The prediction models, which are described in this paper, can also be used by politicians to identify companies which are eligible for funding. Because growing companies oftentimes hire many employees, it might be meaningful to facilitate their development process by selective subsidies to reduce unemployment. Furthermore, it is possible to question the prediction results of a financial analyst if he came to a different conclusion than a model.
Since annual reports are often publically available for free, it is reasonable to take advantage of them for such a prediction. Additionally, various information providers maintain huge databases with annual reports. A big data approach promises to further improve accuracy of predictions. This paper introduces methods, which enable to generate knowledge out of these huge data sources to identify extraordinary lucrative firms.
To generate these prediction models, a data mining approach is used which is based on the approved CRISP-DM proceeding model for data mining processes. CRISP-DM ensures comparability and the consideration of best practices. The prediction models are based on classification trees and forests because they have some very substantial advantages over other methods like neural networks, which are frequently used in literature. For instance, the underlying algorithms of the used model do not require a certain distributional assumption, accept both quantitative and qualitative inputs, and is not sensitive with respect to outliers. But the two most important advantages are that a tree can be easily interpreted by users which is important for the previously described stakeholders because it is not easy to trust the results of a model which one does not understand. This is why a lack of understanding might impede the practical implementation of such a model. Besides that, the used algorithms can handle missing data which occur very often in the available dataset. In other analysis, these data entries would have been removed even if only one value is missing.
Inhaltsverzeichnis (Table of Contents)
- INTRODUCTION AND PROBLEM DESCRIPTION.
- INTENTION OF THIS THESIS
- PROCEEDING
- INTRODUCTION TO KEY FIGURE ANALYSIS
- THE PRINCIPLE OF KEY FIGURES
- THE CLASSICAL KEY FIGURE ANALYSIS APPROACH...
- MODERN KEY FIGURE ANALYSIS APPROACHES
- LIMITATIONS OF ANNUAL REPORT ANALYSIS....
- THE AVAILABLE DATASET.
- DESCRIPTION OF THE DATASET.
- DATA CLEAN-UP.
- KEY FIGURE SELECTION.
- SIGNIFICANT KEY FIGURE REQUIREMENTS.
- THE SELECTED KEY FIGURES OF THIS ANALYSIS
- Selected class variable
- Selected qualitative key figures..
- Selected absolute key figures..
- Selected relative key figures..
- CLASS ANALYSIS.....
- CLASSIFICATION TREES AND FORESTS.
- PRECONSIDERATIONS
- CLASSIFICATION TREES
- A simple example.......
- Generation of classification trees...
- Pruning an existing tree.
- Relevant properties of CART trees.
- RANDOM FOREST.
- Classification process of a random forest.
- Generation of random forest.
- Relevant properties of random forests
- CLASSIFICATION RESULTS..
- CLASSIFICATION TREE RESULTS......
- Examination of the most precise tree.
- Key indicator importance ranking
- Transfer to data from 2011
- CLASSIFICATION FOREST RESULTS..
- Transfer to data from 2011
- Key indicator importance ranking
- CONCLUSION
- CRITICAL ASSESSMENT.
- OUTLOOK
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This thesis explores the potential of employing classification trees and forests to predict highly lucrative companies based on annual statement datasets. The main goal is to develop a reliable model that can identify financially successful companies by analyzing key figures derived from their annual reports. The study focuses on the identification of significant key figures, their analysis, and the implementation of these findings in the development of classification algorithms.- Identifying key figures that strongly influence a company's profitability.
- Analyzing the effectiveness of classification trees and forests for predicting company success.
- Evaluating the performance of the developed models on real-world datasets.
- Exploring the applicability of the findings to data from different time periods.
- Assessing the potential limitations and opportunities for improvement of the proposed approach.
Zusammenfassung der Kapitel (Chapter Summaries)
The introductory chapter lays out the thesis's intention, which is to develop a model capable of predicting highly lucrative companies using classification trees and forests based on data extracted from annual statements. The chapter also discusses the procedural steps involved in the analysis.
Chapter 2 provides a comprehensive overview of key figure analysis, covering its principles, classical and modern approaches, and inherent limitations. The chapter delves into the significance of key figures in understanding a company's financial health and provides a framework for selecting and analyzing relevant figures.
Chapter 3 describes the dataset used in the study, outlining its structure and content. It also details the data cleaning processes undertaken to ensure the accuracy and reliability of the data for analysis.
Chapter 4 dives into the selection of key figures for the study, focusing on criteria for determining their significance. The chapter lists the selected key figures, categorizing them into classes based on their nature (qualitative, absolute, and relative). The chapter concludes with an analysis of the class variable, which represents the target variable for classification.
Chapter 5 introduces the classification algorithms used in the study, namely classification trees and forests. It explains the principles behind these algorithms, their generation process, and their relevant properties. This chapter provides a theoretical foundation for understanding how these algorithms operate and are applied in the analysis.
Chapter 6 presents the results of the classification process, focusing on the performance of both classification trees and forests in predicting company success. It examines the most precise tree and provides a ranking of key indicators in terms of their importance in determining classification outcomes. The chapter also explores the applicability of the model to data from a different time period (2011) and assesses its accuracy in predicting financial success.
Schlüsselwörter (Keywords)
The key focus of this thesis is on the use of classification trees and forests for predicting highly lucrative companies based on annual statement datasets. The study analyzes key figures, explores data cleaning techniques, and examines the effectiveness of these algorithms in determining financial success. The thesis focuses on the interplay of data analysis, machine learning techniques, and financial performance evaluation in the context of corporate financial reporting. Key terms include: classification trees, random forests, key figure analysis, annual statements, financial reporting, prediction models, and company profitability.- Citation du texte
- B. Sc. Jurij Weinblat (Auteur), 2014, Mining big annual statement datasets to predict highly lucrative companies using classification trees and forests, Munich, GRIN Verlag, https://www.grin.com/document/273792