The contemporary car is a highly complex product which results from the concerted cooperation between the automobile manufacturer and its multiple suppliers. The intensive vehicle use often leads to component failures in the field. In order to identify the causes of failure and generate valuable knowledge about the defective parts, the components are analyzed on a random basis as part of the claims process. Instead of a random sample, however, the selection process should take the attributes and the additional information content of the components into account. A large amount of data is created along the entire product lifecycle and can be used to select the components in a more targeted manner. This thesis investigates the opportunities of the intelligent use and analysis of smart data in order to find data patterns, group the components based on their characteristics, and create data-driven samples for the failure analysis. For this purpose, a data processing and analysis concept is developed that can help to lower the analysis costs, reduce the expenditure of time, and improve the product quality. Additionally, this data analysis tool can also be applied to monitor the current condition of the components which are still in the field and preventively detect potential failures. Since the effectiveness of the data analysis and their results highly depend on the provided data, this thesis also describes the requirements and quality of the used data. The thesis concludes with an exemplary application of the selected data analysis method, the k-Means clustering algorithm.
Table of Contents
List of Figures.
List of Tables.
List of Abbreviations.
1 Introduction..
1.1 Problem Definition..
1.2 Research Objectives...
1.3 Structure of the Thesis...
2 Fundamentals...
2.1 Claims Process.
2.2 Field Observation
2.3 Causes of Failure..
3 Data Management..
3.1 Terminology..
3.2 Data Quality..
3.2.1 Data Quality Dimensions..
3.2.2 Data Checker..
3.2.3 Data Quality Improvement..
3.3 Measurement Scales
3.4 Data Collection.
3.5 Data Integration and Storage.
3.6 Data Exchange.
3.7 Data Mining Process
3.8 Data Analysis
4 Current State from Practice.
5 Data Processing and Analysis Concept.
5.1 Process Flow.
5.2 Requirements
5.3 Data Basis.
5.4 Benefits.
5.5 Sample for the Failure Analysis.
5.5.1 Complete and Partial Surveys.
5.5.2 Simple Random Sample.
5.5.3 Data-driven Sample.
5.6 Preventive Failure Detection.
6 Data Analysis Methods.
6.1 Overview of the Methods
6.1.1 Cluster Analysis.
6.1.1.1 Similarity Metrics.
6.1.1.2 k-Means Algorithm.
6.1.1.3 Advantages and Disadvantages
6.1.2 Classification.
6.1.2.1 Artificial Neural Networks.
6.1.2.2 Advantages and Disadvantages.
6.2 Selection of an Appropriate Method
7 Application of the Selected Method
7.1 Clustering for the Data-driven Sample.
7.2 Clustering for the Preventive Failure Detection.
8 Conclusion.
8.1 Summary.
8.2 Outlook.
Bibliography.
Appendices.
Abstract
The contemporary car is a highly complex product which results from the concerted cooperation between the automobile manufacturer and its multiple suppliers. The intensive vehicle use often leads to component failures in the field. In order to identify the causes of failure and generate valuable knowledge about the defective parts, the components are analyzed on a random basis as part of the claims process. Instead of a random sample, however, the selection process should take the attributes and the additional information content of the components into account. A large amount of data is created along the entire product lifecycle and can be used to select the components in a more targeted manner. This thesis investigates the opportunities of the intelligent use and analysis of smart data in order to find data patterns, group the components based on their characteristics, and create data-driven samples for the failure analysis. For this purpose, a data processing and analysis concept is developed that can help to lower the analysis costs, reduce the expenditure of time, and improve the product quality. Additionally, this data analysis tool can also be applied to monitor the current condition of the components which are still in the field and preventively detect potential failures. Since the effectiveness of the data analysis and their results highly depend on the provided data, this thesis also describes the requirements and quality of the used data. The thesis concludes with an exemplary application of the selected data analysis method, the k-Means clustering algorithm.
Keywords: Digital Transformation; Claims Process; Data Analysis; Forecasting; Automotive Supply Chain; Data-driven Sample; Data Mining, Knowledge Discovery; Field Data; Failure Analysis; Preventive Failure Detection; Product Quality
Zusammenfassung
Das heutige Automobil ist ein hochkomplexes Produkt, welches aus einer abgestimmten Zusammenarbeit zwischen dem Automobilhersteller und seinen zahlreichen Lieferanten resultiert. Die intensive Fahrzeugnutzung führt häufig zu Ausfallen im Feld. Damit die Fehlerursachen identifiziert werden und eine wertvolle Wissensbasis über die defekten Teile gewonnen und aufgebaut wird, werden die Komponenten stichprobenartig über einen Schadensabwicklungsprozess analysiert. Anstelle einer Zufallsstichprobe sollte der Auswahlprozess der Stichprobe die Eigenschaften und den zusätzlichen Informationsgehalt der Teile berücksichtigen. Riesige Datenmengen werden entlang des gesamten Produktlebenszyklus gewonnen und können dazu genutzt werden, die Komponenten zielgerichteter auszuwählen. Im Rahmen dieser wissenschaftlichen Arbeit werden die Möglichkeiten der intelligenten Nutzung von Smart Data und deren Analyse untersucht, um Datenmuster zu finden, Komponenten basierend auf den Charakteristika zu gruppieren, und datenbasierte Stichproben für die Fehleranalyse zu erstellen. Zu diesem Zweck wird ein Konzept zur Datenverarbeitung und Datenanalyse entwickelt, welches dabei hilft, die Analysekosten zu verringern, den Zeitaufwand zu reduzieren und die Produktqualität zu verbessern. Zusätzlich kann dieses Datenverarbeitungsund -analyse-Tool dazu angewendet werden, den aktuellen Zustand von Teilen im Feld zu überwachen und frühzeitig potentielle Ausfälle zu erkennen. Da die Wirksamkeit der Datenanalyse und deren Ergebnisse stark von den bereitgestellten Daten abhängen, werden in der Arbeit außerdem die Anforderungen an die Daten und deren Qualität untersucht. Abschließend wird die Anwendung der gewählten Datenanalysemethode, des k-Means-Algorithmus, beispielhaft präsentiert.
List of Figures.
Figure 1: Players in the Claims Process...
Figure 2: Claims Process
Figure 3: 8D Process
Figure 4: Extended 8D Method Along the Supply Chain
Figure 5 : Examination Types Along the Product Lifecycle.
Figure 6: Failure Rate Mt)..
Figure 7: DIKW Pyramid.
Figure 8: Data, Information, and Knowledge..
Figure 9: Analogy Between Product and Data Manufacturing Processes..
Figure 10: Transformation of Data into Knowledge...
Figure 11: Data, Information, and Knowledge in the Value-added Process...
Figure 12: Data Quality Pyramid
Figure 13: SPC Based Monitoring of the Data Quality...
Figure 14: Total Data Quality Management Cycle.
Figure 15: Central Data Collection and Storage..
Figure 16: System Visualization of a Sensor
Figure 17: Data Within and Outside the Warranty Period..
Figure 18: Data Exchange Within the Supply Chain..
Figure 19: Data Mining Process...
Figure 20: Data Analysis Methods.
Figure 21: Conflict of Objectives in the Random Sample Strategy.
Figure 22: Selection Process of the Random Sample..
Figure 23: Data Processing and Analysis Concept..
Figure 24: Separate Databases with Interfaces
Figure 25: Integrated Database...
Figure 26: Simple Random Sample
Figure 27: Data-driven Sample...
Figure 28: From a Random Sample to a Data-driven Sample..
Figure 29: Two-step Sampling Procedure..
Figure 30: Limit Checking: (a) Absolute Value; (b) Trend.
Figure 31: Preventive Failure Detection.
Figure 32: Comparison Between Euclidean and Manhattan Distance...
Figure 33: Euclidean Distance Based Clustering..
Figure 34: Clustering Process...
Figure 35: Clustering of Two-dimensional Data..
Figure 36: Exemplary Data Set with Ten Attributes
Figure 37: Classification Model..
Figure 38: Artificial Neural Networks.
Figure 39: Clustering of the Exemplary Data Set
Figure 40: Clusters for the Data-driven Sample...
Figure 41: Clusters for the Preventive Failure Detection.
List of Tables
Table 1 : Simulation Types of the Driving Behavior.
Table 2: Data Quality Dimensions by Wang AND Strong (1996)
Table 3: Data Quality Dimensions by DAMA (2013).
Table 4: Data Quality Dimensions by ROHWEDER ET AL. (2015)
Table 5 : Degree of Relevance of Data Quality Dimensions.
Table 6: Data Checkers
Table 7: Priorities of Data Quality Improvement Measures..
Table 8: Types of Sensors in the Car.
Table 9: Distribution Types of a Two-step Sampling.
Table 10: Distance Metrics..
Table 11: Exemplary Data Set
Table 12: Clustering for the Data-driven Sample
Table 13: Clustering for the Preventive Failure Detection..
List of Abbreviations.
illustration not visible in this excerpt
1 Introduction
During the industrial era of the 19th and 20th centuries, production, investment, and financial capital played a crucial role for businesses and the economy as a whole. With the increasing transformation into an information era, knowledge capital has gained importance and become a decisive factor for a sustainable value creation.[1]Knowledge, however, is based on information which is created from data. Since data and information play increasingly important roles and are considered valuable resources, many people speak of the Information Age. More and more companies recognize the great value of the collected huge amount of data, also known as “big data”, and use it for optimizing their businesses.[2]The availability of powerful multiprocessor computers, data collection and storage technologies, and data mining algorithms make it possible to process and analyze these massive amounts of data.[3]
The term “Industrie 4.0” originates from a project of the German government and describes the use of information technology to automatize industrial processes.[4]Originally evolved and used to improve the value-added processes in the manufacturing industry, digital transformation plays now a more and more important role in many different fields and covers multiple use cases.[5]Data science and machine learning have become key technologies in different kinds of fields with a multitude of applications. The automotive industry has already started to explore the broad range of potential uses for these technologies.[6]In the automotive supply chain management, they can be used to support activities, to solve different kinds of problems along the supply chain and to enable data-driven decision making.[7]
Today, an automobile manufacturer only produces a relatively small part of the finished product itself. It is more focused on the design, marketing, assembly, and the development, and, in some cases, production of the engine. A large part of the value is being created by the suppliers.[8]The value-added share of the suppliers has increased up to 75%, while at the same time the automobile manufacturers only produce 25% themselves.[9]The modem automotive supply chain is very complex and involves a large number of suppliers. The globalization of the procurement markets[10]and the increasing complexity and number of both technical innovations and electrical and electronic components make it even more complicated to ensure a high standard quality.[11]Defects and quality problems then often lead to huge recall campaigns. In 2016, at least 1.3 million vehicles have been recalled in Germany alone.[12]Therefore, it is absolutely essential to understand the field behavior of components and systems and their causes of failure in order to reduce field failures in the future.[13]Data collected along the entire product lifecycle can help to predict future breakdowns and analyze their causes. The huge data sets can be managed and analyzed by using a variety of data analysis methods.[14]Car-related data generation is expected to increase in the next years as car-generated data is becoming more valuable and may result in a 450-750 billion USD market by 2030.[15]
1.1 Problem Definition
Whenever a component or system of a vehicle fails, it is important to analyze the reasons for its failure in order to gain valuable knowledge and thus avoid these kinds of failures in the future. The causes of failure, which may arise on the vehicle during its usage, become more differentiated and complicated with the increasing use of highly complex components and systems.[16]The defective parts go through a failure analysis. In case of vendor parts, the car manufacturer retums a percentage of the defective parts back to the supplier, which then analyzes the failures. Currently, the parts are either sent back on a random basis or as a full sample.[17]However, the causes of failure are often similar. In practice, the amount of money, time, personnel, and equipment available is limited and it is not economical and feasible to analyze all the parts. In order to reduce the financial and time expenditure of the failure analysis, it is more useful to prioritize and only analyze those parts whose causes of failure differ from previously analyzed components. This can be realized by collecting and analyzing component-specific data during the entire product lifecycle and grouping those parts. The problem is not the availability of data. There are multiple sources of data which make lots of data available. The problem, however, is aggregating, structuring, and analyzing the relevant data and thus detecting patterns in it to rapidly identify root causes of failures and to fix the problems. Today, there is no one standard for this process and contemporary technologies, such as data mining or machine learning, are not used in an efficient way. A similar problem exists in the preventive failure detection. There is a lack of using these data analysis tools for standardized and far-reaching forecasts about future breakdowns.
1.2 Research Objectives
The objective of this Master’s thesis is to develop a data processing and analysis concept that uses different sources of data to generate valuable knowledge about automotive components and systems to solve two kinds of problems. First of all, this concept can support the decision making in the claims process on selecting the defective components which should be part of the failure analysis. This new data-driven and target-oriented sampling selection process will replace the current random based procedure. The second purpose of this concept is to use it on a real-time basis to identify critical components and their risks still during the use phase based on data patterns and to establish early preventive measures before the actual breakdown occurs. This whole concept is based on data mining methods which help to recognize data patterns. This thesis also investigates the requirements regarding the data and its quality, particularly data collected in the field, that is used as the input. Two of the most common data analysis algorithms will be examined and evaluated concerning their applicability.
1.3 Structure of the Thesis
This thesis is structured into seven parts. It begins with the theoretical basis and explains the technical terms which are used during the entire thesis. In this first part, the claims process, the field observation, and the different causes of failure are explained in detail. Data serves as the basis and is discussed in Chapter 3. The sections in this chapter are organized as follows. Section 3.1 through 3.3 introduce the terminology and discuss data quality and the measurement levels. Section 3.4 through 3.8 explain how data is collected, stored, distributed, and analyzed. Chapter 4 describes the current state of the sample survey of defective parts for the failure analysis. Next, Chapter 5 presents the data processing and analysis concept and analyzes its requirements and benefits. Section 5.6 explains how this data processing and analysis concept can be extended and also used for preventive failure detection. Chapter 6 gives an overview of the potential data analysis methods and selects an appropriate method based on their advantages and disadvantages. Chapter 7 then applies the selected method with regard to the two problems. In closing, Chapter 8 draws a conclusion of the thesis and raises further potential research questions.
2 Fundamentals
The quality of automobiles plays a crucial role to assure a long-term customer satisfaction and has improved in the last few years, in spite of the increasing complexity of the products and their shorter development time. Even through optimized development and manufacturing processes to produce robust and high-quality products, problems cannot be completely avoided during the vehicle use. Annual warranty costs are estimated to be around several billion USD.[18]They were estimated to be 48 billion USD worldwide in 2016.[19]
2.1 Claims Process
The possibility of fault occurrence cannot be completely avoided, even through the use of quality improving techniques.[20]These product related faults can trigger customer complaints which cause tremendous costs. However, these complaints also contain data which can be used by the companies and transformed into valuable knowledge. For this purpose, good process management and data-based systems are mandatory.[21]In order to counteract faults and their impact on customer satisfaction, it is crucial to have an efficient and dependable complaint and reclamation management. It helps to process the complaint, deal with legal aspects, take immediate actions, analyze the root causes of the fault, and find the reasons for the customer dissatisfaction.[22]The claims process itself involves the following players: The vehicle manufacturer, also called OEM (Original Equipment Manufacturer), the supplier, the customer, and the workshop or dealer (see Figure 1). There are different types of suppliers. This thesis only focuses on the warranty cases between the OEM and the Tier 1 suppliers.
Figure 1: Players in the Claims Process (Own illustration)
illustration not visible in this excerpt
The claims process usually starts with a defect in a component or system of a vehicle being discovered in the field use. "Field'’ describes a situation in which the vehicle has already left the plant and is in possession of the user or dealer.[23] A claim usually includes warranty repairs and
replacements and may end up in legal liability conflicts and lawsuits.[24] After a car fails in the field, it goes to the workshop for further inspection. In 2017, over 47.9 million cars have been returned to the workshop in the United States alone.[25] The workshop removes the defective part and retums it to the OEM as a reject. An appropriate number of all the defective parts which are removed from the vehicle and replaced by new parts are then returned as a complete or partial sample to the component manufacturer for a failure analysis. The component manufacturer can either be the OEM or a supplier. The rejected parts, also called field retums, are then analyzed by the component manufacturer, in order to determine the root cause and the party who caused the problem.[26] Before sending the components back, the OEM has to determine which parts should be returned. Since each particular part causes costs, such as logistics and analysis costs, it makes sense to only send back those parts for which adequate analysis findings do not exist yet. The failure causes are often similar, so that it is unnecessary to return and analyze also components with failure causes similar to previously analyzed parts.[27] The failure analysis is usually supported by and documented in the 8D (Eight Disciplines) report.[28] It is cmcial to store and manage the acquired knowledge about the failures and their causes in the form of a database in order to avoid similar failures in the future and to take immediate measures.[29] Figure 2 provides a simplified overview of the claims process.
illustration not visible in this excerpt
Figure 2: Claims Process (Own illustration)
This thesis focuses on finding an optimal, data-driven sample based on the collected data (step 2), but it also discusses the data collection during the field observation (step 1). Besides the 8D process, step 3 of the claims process is not discussed in the thesis.
8D method
The 8D (Eight Disciplines) method is a standardized method and part of the quality management used to systematically analyze and resolve recurring problems in the customer-supplier relationship where the cause of the problem is unknown.[30]Apart from the standard method itself, it also describes a problem-solving process and a report which is used for tracking progress with the task.[31]It was developed by the Ford Motor Company and later published by the Verband der Automobilindustrie (VDA) in its own version for the German automotive industry.[32]The 8D report is used to satisfy customer complaints, solve problems, reduce the overall costs of quality, and improve customer satisfaction.[33]It can be applied to an unlimited number of relations along the supply chain.[34]The 8D process consists of eight steps or disciplines, as shown in Figure 3, and is part of the complaint process in order to provide quality assurance.[35]
illustration not visible in this excerpt
Figure 3: 8D Process (Adapted from: Verband der Automobilindustrie e. V. (VDA) 2010, pp. 5 ff.)
Each individual step is documented which makes the entire process more transparent and traceable. It includes the type of complaints, the responsible persons, and the measures to eliminate the defects. In particular, D4 is very important for identifying the cause of failure, which will be described more precisely in Section 2.3, and to derive a quality assurance strategy according to the Plan-Do-Check-Act (PDCA) cycle.[36]In the following, each stage of the 8D process will be discussed in depth according to the VDA-Volume 4:[37]
Dl: Team: In the first step, a cross-functional team of people with the relevant knowledge, experience, time, readiness to work together, and expertise is determined, and a team champion is appointed as well.
D2: Problem description: The problem is then defined and precisely described in quantifiable terms. Relevant data is collected and analyzed, and details about the problem and the specific failures are recorded and classified.
D3: Immediate containment actionisi: Immediate actions are defined, verified, implemented, and documented in order to isolate the effects of the problems from the customer until permanent solutions are found. The actions should be continuously checked.
D4: Rootcausefsl: All possible causes of the problem are identified. The probable causes are then determined and compared with the description of the problem and other data to figure out if they are the root causes or whether there are any interactions.
D5: Chosen permanent corrective actionisi: The best permanent corrective action in order to eliminate the cause of the problem is selected. It is necessary to ensure that the chosen permanent corrective action really solves the problem from the customer’s point of view and that it does not cause any undesirable side effects.
D6: Implemented permanent corrective actionisi: Continual checks are determined, which ensure that the causes of the problem are truly eliminated. The specific action plans are executed, and their effects and results are observed.
D7: Prevent recurrence: The recurrence of the same or similar failures or problems, which have been eliminated, must be prevented. It can be also useful to introduce a system for recording the history that ensures that similar problems will not occur in the future.
D8: Congratulate your team: Completion of the project and recognition of the team work as well as the individual contributions to the success of the project.
However, the form and the degree of detail in each step of the 8D process can differ between companies, depending on the specific case. The single 8D reports can also be connected along the entire supply chain to an extended 8D report, as can be seen in Figure 4.[38]
illustration not visible in this excerpt
Figure 4: Extended 8D Method Along the Supply Chain (Adapted from: Behrens, Wilde and Hoffmann 2007, p. 93)
2.2 Field Observation
The field observation is part of the driving behavior investigation. Compared with other examination types, the data is collected without manipulating or controlling the process. Since it takes place in the real environment, it is categorized as "Life Simulation'’.[39]It takes place under factual circumstances and considers different factors, such as environmental conditions or user behavior.[40]The field data collected during the field observation can therefore be highly representative. Field observations, however, only take place during the warranty period. Consequently, the collected field data is incomplete and limited in time.[41]Figure 5 gives an overview of the different examination types, integrated into the product lifecycle.
illustration not visible in this excerpt
Figure 5: Examination Types Along the Product Lifecycle (Adapted from: Käppler 2015, p. 9)
As can be seen from Figure 5, the field observation takes places during the after sales stage. All the findings from the later lifecycle phases can be useful for the earlier phases, e.g. the information gathered during the field observation can be used in the product design of future vehicle models. Table 1 characterizes the different simulation types of the driving behavior.
illustration not visible in this excerpt
Table 1: Simulation Types of the Driving Behavior (Adapted from: Käppler 2015, pp. 9, 12)
Among other things, field observations can be helpful to measure the product reliability in the field, for early damage detection, for data-driven decision making, or to forecast anticipated failures.[42]There are different methods and technologies which are used for diagnosing failures. One common standard for fault detection is based on the diagnostic trouble codes (DTC) which identify dysfunctions during the usage and store the data in the fault memory of the vehicle's electronic control units (ECU). This data can then be called up in the repair shop. Nowadays, it is also possible to connect with the car and gather data via remote diagnostics, also known as telediagnosis. This helps to optimize the maintenance, prepare for workshop services, and give support during a car breakdown. All this collected car specific data can be sent to a back-end which is then read by a remote diagnosis service.[43]This back-end function can also be implemented by using a Cloud Server.[44]The most common diagnostic protocol standard used in the automotive industry is called Unified Diagnostic Services (UDS). It consists of several diagnostic services to fulfil different functions in the ECU. Among other things, it can be used for the reading and transferring of data, conditions, and measured variables from the ECU to the diagnostics testing device[45]
2.3 Causes of Failure
During the entire lifecycle of any product or system, it can be affected by a fault which can then lead to its failure. A fault can be described as a deviation of at least one characteristic of a system from its standard condition. A failure, on the other hand, is a permanent interruption of a system to fulfil its function[46]An example for a fault could be a broken wire in a cable which causes the light to go out as its failure[47]The cause of failure can already appear during the development and production stages, for example by means of a wrong design or assembly. Additionally, it can also appear during the use, for example caused by wrong operation, missing maintenance, ageing, corrosion, or wear during its normal operation.[48]In order to investigate the problem systematically, it is important to categorize these causes of failure. Standardized failure cause categories reduce the complexity, helping both the supplier and the customer to achieve a common understanding of the problem and to immediately find solutions which have already been implemented for similar problems by using a database. The supplier is responsible for assigning an appropriate category to each failure cause. Due to the complexity, it is not possible to find a perfect fit. The supplier has to choose a category which fits best to the specific failure cause. There are three selection levels for the failure cause category:[49]
Level 1 : On the first level, the failure cause of a product is assigned to a product lifecycle
phase, e.g. “development”, “production”, or “logistics processes”. If the failure cause cannot be investigated after conducting a failure analysis, the category “failure cause unfamiliar or unknown” has to be selected.
Level 2: The second level contains more details about the selected lifecycle phase, e.g.
“specification”, “product concept”, “product development”, “process development”, or “validation” as successive phases of the “development” phase.
Level 3 : The third level contains some further details about the selected lifecycle and the
execution of the process.
If it is impossible to find an appropriate failure cause category, “miscellaneous” has to be selected.[50]
Failure causes can be classified and belong to one or more of the following failure-cause classifications:[51]
1. Design is faulty
2. Defects in material
3. Shortcomings in the processing and/or manufacturing
4. Defects in the assembly or installation
5. Unintentional service conditions
6. Insufficient maintenance
7. Incorrect operation
Faults can be differentiated by their
- temporal behavior (transient, permanent, random, periodical, intermittent),
- cause (systematic, random),
- and effect (global, local).[52]
The failure rate λ(ί) often changes overtime. It has a characteristic curve which is also known as bathtub curve. It starts with a high failure rate at the beginning, continues with a low failure rate due to random failures, and ends with wear-out failures, as shown in Figure 6.[53]Mechanical parts often have a temporary failure-free phase, whereas electrical components can malfunction at any time.[54]The three phases have different causes of failure. Early failures are mostly caused by material defects, design blunders, and errors in assembly. Random failures are considered to be cases of stress exceeding strength. Wear-out failures, on the other hand, are caused by fatigue or depletion of materials.[55]
illustration not visible in this excerpt
Figure 6: Failure Rate λ(t) as a Function of Time (Adapted from: Schäuffele and Zurawka 2016, p. 100)
3 Data Management
Data forms the basis for the data processing and analysis concept. Therefore, it is necessary to understand the basics of data science. First of all, the terms which will be used throughout the thesis will be explained. The data which serves as the input for the data processing and analysis concept has to be of a certain quality. The different dimensions of data quality and the possibilities for improvement will be introduced in Section 3.2. The subsequent chapters will explain the measurement levels of data and how it can be collected, stored, analyzed, and distributed.
3.1 Terminology
The DAMA (Data Management Association) defines data management as “[...] the business function of planning for, controlling and delivering data and information assets.”.[56]This section will describe the single terms of data management that are crucial for the thesis.
Data science
Generally, data science describes a disciplinary field that combines expertise from different disciplines, such as mathematics, statistics, computer science, and behavioral science[57]to solve different types of problems and predict outcomes.[58]
Predictive analytics
Predictive analytics use these data science tools and can be defined as a technology which can predict the future behavior of individuals by learning from data and thus bring about better decisions.[59]It can be seen as a subarea of data science.[60]Hofmann, Neukart and Bäck (2017) go even one step further and use the terms “optimizing analytics” and “prescriptive analytics” to describe the highest level of data analysis. While predicitve analytics tries to find an answer for the question “What will happen?”, optimizing analytics is supposed to find the answer for “What am I supposed to do?”.[61]
Big data
Big data is a term used to describe data of a certain size and volume, as well as its variety and velocity.[62]In general, this data is too big, too fast, or too hard to be processed with traditional tools.[63]However, big data can include many pieces of irrelevant information which makes the analysis problematic. The size of the collected data worldwide was 8.6 zettabytes in 2015 and is expected to increase to 44 zettabytes by the year 2020.[64]
Smart data
The term smart data is used to describe a smaller but more meaningful amount of data which can be more useful for decision making.[65]It broadens the concept of big data by adding the value to the data. Data is only useful if it helps to find a solution, solve a problem, or add value to the company.[66]
Data mining
Data mining can be defined as the process of analyzing large amounts of data, such as data warehouses, to extract relevant information and valuable knowledge hidden within it.[67]It is often also referred to as big data analytics. The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a commonly used data mining process model. The traditional process consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.[68]The data mining process will be explained in Section 3.7 in more detail.
Data analytics
Data analytics describes the use of computer systems to analyze large amounts of data and thus support the decision making.[69]
Pattern recognition
Pattem recognition can be defined as a scientific discipline that focuses on finding patterns and regularities in data sets and classifying data objects into categories.[70]
Data
Data can be seen as the hidden digital facts, numbers, or texts that are collected by different monitoring systems. These hidden digital facts are not obvious to the system and must be processed. Data forms the basis for the knowledge that can be gained from it. Some important requirements for the data include their format and their type. Regarding the format, data can be presented mathematically or in a two-dimensional tabular, for example. Data can be either labeled or unlabeled. The digital facts in the labeled data are not hidden and can be used for training the machine-learning algorithms. In contrast, the digital facts are hidden in the unlabeled data and can only be used for testing and validation.[71]Data can be stored in databases which can be described as a collection of objects that contain a certain number of attributes providing detailed information about them.[72]
Information
The terms data and information are often used as synonyms, even though there is a difference between them. Information can be described as processed data[73]and be provided by the relationships, patterns, and associations among all the data.[74]It can also be described as the subjectively perceived data used in a particular context.[75]This context includes the meaning, the presentation format, the timeframe, and the relevance of the data elements.[76]
Knowledge
Knowledge can be described as the learned and valuable information gathered from the data. Some examples for knowledge can be the detection of patterns in the data or the classification of the varieties of patterns in the data.[77]
Figure 7 shows the DIKW (Data, Information, Knowledge, Wisdom) pyramid which represents the relationships between data, information, knowledge, and wisdom.
This chapter, however, focuses on data, information, and knowledge only. Figure 8 shows their relationships in more detail.
illustration not visible in this excerpt
Figure 7: DIKW Pyramid (Adapted from: Brelade and Harman 2015, p. 6)
illustration not visible in this excerpt
Figure 8: Data, Information, and Knowledge (Adapted from: DAMA International 2009, p. 2)
Data, information, and knowledge are often considered as the fourth operational resource aside from the classic resources, such as labor, finance, and assets.[78]Contrary to the traditional resources, data has some specific attributes that make it relatively difficult to handle:[79]
1. Data is abstract and not visible.
2. Data may be infinitely reproduced and distributed.
3. Data cannot be used up, in contrast to material resources. However, its value can change over the lifecycle.
4. Data is not interchangeable.
5. Data is more dynamic than other resources.
6. It is difficult to determine the value and price of data.
There are, however, also some similarities with the traditional resources. Data should generate benefits for its users. Moreover, data is created in a production process and has a lifecycle.[80]In a similar manner to the product manufacturing, data processing can be seen as a "data manufacturing'’ process, as shown in Figure 9. Data processing is the most important step and can be realized by using data analysis methods which will be discussed in Chapter 6 in more detail.[81]
illustration not visible in this excerpt
Figure 9: Analogy Between Product and Data Manufacturing Processes (Adapted from: Hazen 2014, p. 73
Figure 10 illustrates the transformation of the data into knowledge using a fonction ƒ. Additionally, a monitoring system consists of three operations: physical operations, mathematical operations, and logical operations. Physical operations include the processes of data capture, data storage, data manipulation, and data visualization. These operations are used for creating a suitable data domain, so that the machine learning methods can be applied. Mathematical operations are mathematical and statistical techniques and tools which are used for the transformation of data into knowledge. This transformation is expressed as the knowledge function f\D=>K, where D is the data domain and к stands for the knowledge or response set. The size of the data does not matter for the executions of the fonction as long as the data itself is structured. The structure as part of the data quality will be discussed in Subsection 3.2.1. Logical operations characterize the logical arguments, justifications, and interpretations of the knowledge. They can be used for deriving relevant facts, e.g. interpreting the class types from the data patterns after initially classifying the data domain and providing these patterns using the knowledge function ƒ : D => K.s[2][82]
illustration not visible in this excerpt
Figure 10: Transformation of Data into Knowledge (Adapted from: Suthaharan 2016, p. 4)
Figure 11 shows the relationships between data, information, knowledge, decision making, and action. Data is the basis which can be interpreted by a knowledge carrier and transformed into information. The information can create knowledge which is the basis for decision making. These decisions then lead to value-adding actions. The entire process can be seen as an iterative cycle as both the decisions and the actions can create new data.[83]
illustration not visible in this excerpt
Figure 11: Data, Information, and Knowledge in the Value-added Process (Adapted from: Dippold et al. 2005, p. 19)
Data can be classified into different types: Master data, transaction data, inventory data,[84]and meta data.
Master data
Master data can be described as the database on which the business processes are built upon and which remain valid over longer periods of time, e.g. the name of the customers or products. They represent the most important part of an information system.[85]
Transaction data
As indicated by the name, transaction data is created during the specific transactions, e.g. while ordering a new replacement part from the supplier.[86]
Inventory data
Inventory data is data resulting from the master and transaction data.[87]
Metadata
Metadata is structured data which contains information about further data and helps to make other data more traceable. It is often divided into technical and business metadata.[88]
In this chapter, the focus lies only on master data and transaction data since data quality and its problems do not have a direct impact on inventory data.[89]
3.2 Data Quality
Sound decisions of any nature require consistent and related data and are only as good as the data on which they are based.[90]Therefore, it is necessary that the data guarantees a certain level of quality. This helps to avoid serious risks caused by wrong decisions.[91]Data quality is not an absolute measure, but it is relative to its purpose and context.[92]Data can be treated as a product with certain quality characteristics. These characteristics determine whether the data consumer uses the data for a specific task.[93]Poor quality data can lead to tangible and intangible losses for businesses and their costs can be as high as 8-12% of the revenues.[94]Data quality can be defined as the totality of all quality characteristic values of a database regarding its ability to meet the defined requirements.[95]Therefore, the quality is high if the data fulfils a certain purpose defined by the user.[96]Since data and information are not the same, there is also a clear difference between data quality and information quality. In many cases, both terms are used as synonyms though because data quality is only assessable on its context of use.[97]Field data, in particular, is often of poor quality. The data is often incomplete and incorrect, its data elements wrongly assigned, and the reproducibility of data not ensured. It can also be heavily impacted by operating conditions, such as a frequent driver change or climate conditions.[98]Collected raw data often includes outliers. Outliers are extreme values which extremely differentiate from the rest of the population and can be of two different types: Valid outliers, e.g. the salary of 1 million USD of the CEO (Chief Executive Officer), and invalid outliers, e.g. a human age of 500 years.[99]In general, data errors can be either random or systematic. The reason for random errors often lies in the data measurement or transmission. Systematic errors can be caused by faulty formulas for the calculation of different values, incorrect calibrations of measuring instruments, or an incorrect scaling. Outliers can be caused by both random errors and systematic errors.[100]
3.2.1 Data Quality Dimensions
Data quality which represents the top of the data quality pyramid (see Figure 12) can be considered as the superset of all the data quality dimensions. The data quality dimensions, in turn, require the so-called data quality metrics to be operationalized and quantified. Data quality metrics represent the objective and operational quality measures.[101]
illustration not visible in this excerpt
Figure 12: Data Quality Pyramid (Adapted from: Gebauer and Windheuser 2015, p. 88)
Data quality can be described by data quality dimensions which are important for data users.[102]Each dimension describes a specific success factor of an information system which is fully operational only if all the dimensions have sufficiently high qualities.[103]There are various hierarchical frameworks which capture the data quality dimensions differently and they are not universally agreed upon. There is not one prescriptive list of data quality dimensions and the use highly depends on the specific business and industry requirements. One famous framework approach was developed by Wang and Strong (1996). It is based on a survey of IT users to identify data quality dimensions from their user perspective.[104]Table 2 shows their 15 most important dimensions out of 179 attributes, their definitions, and examples for bad quality. The dimensions are classified into four data quality categories: Intrinsic, contextual, representational, and accessibility data quality.[105]Intrinsic data quality describes inherent dimensions of data quality. In contrast, contextual data quality represents dimensions of data quality within a specific context. Both representational and accessibility data quality focus on the data storage in computer systems and its access.[106]
[...]
[1]Cf. Dippold et al. 2005, p. 2
[2]Cf. DAMA International 2009, pp. 1 f.
[3]Cf. Sumathi and Sivanandam 2006e, p. 402
[4]Cf. Moyne and Iskandar 2017, p. 1
[5]Cf. Kiem 2016, pp. 5 ff.
[6]Cf. Hofmann, Neukart and Bäck 2017, p. 1
[7]Cf. Waller and Fawcett 2013, p. 77
[8]Cf. Borgeest 2014, p. 279
[9]Cf. Pankow 2017
[10]Cf. Verband der Automobilindustrie e. v. (VDA) 201 lb, p. 6
[11]Cf. Verband der Automobilindustrie e. V. (VDA) 2009b, p. 10
[12]Cf. Allgemeiner Deutscher Automobil-Club e. V. (ADAC) 2017
[13]Cf. Meyer, Meyna and Pauli 2003, p. 262
[14]Cf. Waller and Fawcett 2013, p. 77
[15]Cf. McKinsey & Company 2016, p. 11
[16] Cf. Alfes et al. 2014, p. 9
[17]Cf. Verband der Automobilindustrie e. V. (VDA) 2009a, pp. 40 f.
[18]Cf. Verband der Aiitomobilindiistrie e. V. (VDA) 2009a, p. 8
[19] Cf. Warranty Week 2017
[20]Cf. Behrens', Wilde and Hoffmann 2007, p. 92
[21]Cf. Riesenberger and Sousa 2010, p. 2225
[22]Cf. Behrens, Wilde and Hoffmann 2007, p. 92
[23]Cf. Verband der Aiitomobilindiistrie e. V. (VDA) 2009b, p. 96
[24]Cf. Tsaroiúias and Liberopoulos 2004, p. 305
[25]Cf. Braasch2017,p. 18
[26]Cf. Verband der Automobilindustrie e. V. (VDA) 2009a, p. 8
[27]Cf. Verband der Automobilindustrie e. V. (VDA) 2009a, pp. 40 f.
[28]Cf. Kiem 2016, pp. 179 f.
[29]Cf. Kiem 2016, P205 ׳
[30] Cf. Plinke 2014, p. 38
[31]Cf. Verband der Automobilindustrie e. V. (VDA) 2010, pp. 3 f.
[32]Cf. Behrens, Wilde and Hoffmann 2007, p. 94
[33]Cf. Riesenberger and Sousa 2010, p. 2225
[34]Cf. Behrens, Wilde and Hoffmann 2007, p. 91
[35]Cf. Verband der Automobilindustrie e. V. (VDA) 2017a, p. 7
[36]Cf. Verband der Automobilindustrie e. V. (VDA) 2017a, p. 7
[37]Cf. Verband der Automobilindustrie e. V. (VDA) 2010, pp. 5 ff.
[38]Cf. Behrens, Wilde and Hoffmann 2007, p. 93
[39]Cf. Käppler2015, p. 10
[40]Cf. Verband der Automobilindustrie e. V. (VDA) 2016, p. 50
[41]Cf. Pauli and Meyna 2000, p. 1104
[42]Cf. Verband der Aiitomobilindiistrie e. V. (VDA) 2016, p. 51
[43]Cf. Alles et al. 2014, p. 67
[44]Cf. Johanning and Mildner 2015, p. 20
[45]Cf. Reif 2014a, p. 419
[46]Cf. Isermann 2006, p. 20
[47]Cf. Isermann 2006, p. 22
[48]Cf. Isermann 2006, p. 63
[49]Cf. Verband der Automobilindustrie e. V. (VDA) 2017a, p. 10
[50]Cf. Verband der Automobilindustrie e. V. (VDA) 2017a, p. 10
[51]Cf. Bloch and Geitner 2012, p. 615
[52] Cf. Reif 2014a, p. 258
[53]Cf. Schäuffele and Zurawka 2016, p. 100
[54]Cf. Verband der Automobilindustrie e. V. (VDA) 2016, p. 74
[55]Cf. Wilkins n.d.
[56]Cf. DAMA International 2009, p. 4
[57] Cf. Hazen et al. 2014, p. 72
[58]Cf. Waller and Fawcett 2013, p. 78
[59] Cf. Siegel 2013, pp. 39 f.
[60]Cf. Waller and Fawcett 2013, p. 79
[61]Cf. Hofmann, Neukart and Bäck 2017, p. 2
[62] Cf. Hazen et al. 2014, p. 72
[63]Cf. Lietal. 2015, p. 669
[64] Cf. Großkopf 2017, p. 80
[65]Cf. Dust, Balschun and Wilde 2016, p. 38
[66] Cf. Wierse and Riedel 2017, p. 14
[67]Cf. Brelade and Harman 2005, p. 9
[68]Cf. Hofmann, Neukart and Bäck 2017, pp. 1 f. ® Cf. Runkier 2015, p. 2
[70]Cf. Dutt, Chaudhry and Khan 2012, p. 23
[71]Cf. Suthaharan2016, p. 3
[72]Cf. Sumathi and Sivanandam 2006a, p. 40
[73]Cf. Pipino, Lee and Wang 2002, p. 212
[74]Cf. Lew and Mauch 2006, p. 9
[75]Cf. Harrach2010,p. 13
[76]Cf. DAMA International 2009, p. 2
[77]Cf. Suthaharan 2016, p. 4
[78]Cf. Dippold et al. 2005, pp. 3 f.
[79] Cf. Dippold et al. 2005, p' 245
[80]Cf. Dippold et al. 2005, p. 245
[81]Cf. Hazen et al. 2014, p. 73
[82]Cf. Suthaharan 2016, pp. 4 f.
[83]Cf. Dippold et al. 2005, p. 20 8į Cf. Hildebrand 2006, p. 17
[85]Cf. Hildebrand 2006, p. 17
[86]Cf. Rohweder et al. 2015, p. 27
[87]Cf. Hildebrand 2006, p. 17
[88]Cf. Harrach2010,p. 21
[89]Cf. Hildebrand 2006, p. 17
[90]Cf. Hazen et al. 2014, p. 72
[91] Cf. Harrach2010, p. 1
[92] Cf. Dippold et ai. 2005, p. 216
[93]Cf. Wang and strong 1996, p. 8
[94] Cf. Hazen et ai. 2014, pp. 72 f.
[95]Cf. Gebauer and Windheuser 2015, p. 88
[96] Cf. Harrach2010,p. 2
[97]Cf. Gebauer and Windheuser 2015, p. 87
[98]Cf. Verband der Automobilindustrie e. v. (VDA) 2016, pp. 325 ff.
[99]Cf. Baesens 2014, p. 20
[100] Cf. Runkler2015,p. 23
[101]Cf. Gebauer and Windheuser 2015, p. 88
[102]Cf. Gebauer and Windheuser 2015, p. 89
[103]Cf. Rohweder et al. 2015, p. 29
[104]Cf Gebauer and Windheuser 2015, pp. 89 f.
[105] Cf. Wang and strong 1996, pp. 9-32
[106]Cf. Lee et al. 2002, p. 135
- Citar trabajo
- David Wojcikiewicz (Autor), 2018, Digital Transformation of the Claims Process. Requirements and Benefits of Digital Data Analysis and Forecasting Methods to Increase the Product Quality in the Automotive Supply Chain, Múnich, GRIN Verlag, https://www.grin.com/document/420487
-
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X.