In this work the author attempts to examine a small part of artificial intelligence, producing a real-life approximation of what could be a predictive system based on sales funnel information provided by a customer-relationship-management tool like Salesforce. The work focusses on two main aspects, namely the prediction of the sales funnel and a LinkedIn-based enrichment tool which sources company data in bulk to enrich existing sales information. Along the lines of trying to fulfil these two goals, the thesis is comprised of the four typical elements of an end-to-end advanced analytics project: identification of needed data and it’s sourcing, exploratory analysis of said data, analytical model selection and design, validation and testing of the obtained results obtained in the previous step.
Artificial Intelligence has matured over the past few years to now become a standard in corporate market and business analyses. Those analyses focus mainly on customer acquisition and retention as they drive the revenue. This work attempts to create customer retention, for example a churn prevention model to help accurately predict the opportunities that have a high propensity to be lost, help the salesperson to identify them, and be able to quickly react.
Contents
List of figures
List of tables
1. BUSINESS CHALLENGE
2. TECHNICAL APPROACH
2.1. Information sources
2.1.1. Company information
2.1.2. Internal company data
2.2. Artificial Intelligence
2.2.1. Machine Learning techniques
2.2.2. Deep Learning techniques
2.3. Tested architectures
2.4. Final architecture
3. CONCLUSION
3.1. Objectives revision and functional requirements
3.2. Future guidelines
Appendix 1: Python code for the exploratory analysis
Appendix 2: Execution logs for the model
Appendix 3: Correlation graphs
To my professor, Miguel A´ ngel Cordobe´s, for all his involvement in the project, for accepting it and for giving me the freedom at all times to define the project’s objectives and development.
Abstract
Artificial Intelligence has matured over the past few years to now be- come a standard in corporate market and business analyses. Those anal- yses focus mainly on customer acquisition and retention as they drive the revenue. This Bachelor’s thesis attempts to create a customer retention, e.g. churn prevention model to help accurately predict the opportunities that have a high propensity to be lost, help the salesperson to identify them and be able to quickly react.
Resumen
La inteligencia artificial ha madurado en los u´ ltimos an˜ os hasta con- vertirse en un esta´ndar en el mercado corporativo y en los ana´lisis de negocios. Esos ana´lisis se centran principalmente en la adquisicio´ n y re- tencio´ n de clientes, ya que son los factores principales que influencian la fluctuacio´ n de ingresos. Este trabajo de final de grado intenta crear un modelo de retencio´ n de clientes para ayudar a predecir con precisio´ n las oportunidades que tienen una gran propensio´ n a perderse, ayudar al ven- dedor a identificarlas y poder reaccionar ra´pidamente.
List of Figures
1.1. Visualisation of the SalesForce funnel implementation in Argentina
1.2. Illustration of the data enrichment process
1.3. Representation of the projects’ milestones
2.1. Main menu of the GUI
2.2. GUI: Company enrichment options
2.3. Representation of the LinkedIn API connector’s workflow.
2.4. GUI: Sample execution of a batch query to the LinkedIn REST API
2.5. Data distribution of the training data set
2.6. Correlation matrix
2.7. Final design of the model
2.8. Representation of the Machine Learning Ensemble
2.9. Artificial intelligence components
2.10. Representation of a neural network (ANN)
2.11. Representation of a RNN cell
2.12. Representation of a LSTM cell
2.13. Confusion matrix
2.14. Density chart for two classes
2.15. scenario 1 architecture
2.16. scenario 2 architecture
2.17. Multilayer Perceptron-based model architecture
2.18. LSTM-based model architecture
2.19. Final model architecture
2.20. GUI: Ensemble training options
2.21. GUI: Predict tab
List of Tables
2.1. Chosen LinkedIn API information
2.2. Available internal information
2.3. Training data set size dependant on sales funnel length
2.4. Correlation variables
2.5. Results table 1
2.6. Results table 2
2.7. Results table 3
2.8. Error distribution of final model
Introduction
Within the past few years, the term Machine Learning has swept over the world. According to Arthur Samuel [1], the computer scientist who brought up the term in the ’50s, machine learning is a subfield of computer science, which with the use of large data sets and training algorithms, aims to ”give computers the ability to learn without being explicitly pro- grammed”.
If one would search how the popularity of the term MachineLearn- ing has evolved in the past few years, for example in Google Trends[2], there would be no doubt that the searches for the term have skyrocketed. So much so, that through a recent survey [3] conducted by PwC, 30% of business leaders forecasted AI to be the biggest disruption to their in- dustries within the next five years starting 2017. Two years later, in the present day, this Machine Learning bubble slowly begins to mature as a recent Crunchbase study suggests[4], thus the once startup-based funding becomes a more corporate one. This shift, in turn, means that bigger com- panies with more resources are becoming more aware of the capabilities of this once visionary field of artificial intelligence and are now able to implement it on their daily challenges.
One of the most important performance indicators of any business is its revenue, which is the basis a company is rated to their shareholders and investors. Therefore, it is in every company’s best interest to maximize sales and to accurately forecast it’s highs and lows and try to prevent the latter. This approach begs the question if any subfield of artificial intelli- gence can be used to help shed light into the future of a company’s sales forecast and therefore help predict its revenue more accurately.
This document is my Bachelor’s thesis and is my attempt to examine a small part of artificial intelligence, producing a real-life approximation of what could be a predictive system based on sales funnel information pro- vided by a customer-relationship-management tool like Salesforce. The following work focusses on two main aspects, namely the aforementioned prediction of the sales funnel and a LinkedIn-based enrichment tool which sources company data in bulk to enrich existing sales information.
Along the lines of trying to fulfill these two goals, the thesis is com- prised of the four typical elements of an end-to-end advanced analytics project.
1. Identification of needed data and it’s sourcing. Taking into account the desired outcome, including only the most relevant data is key.
2. Exploratory analysis of said data. Performing typical analysis such as selecting the most important variables, calculating the numerical data distribution, correlation between variables, etc.. is an impor- tant part of understanding the data we train our model with. The reason is because ML is not about having a lot of samples, but the right data. This analysis allows us to have a clear understanding of the data and select the best features to increase performance for our model.
3. Analytical model selection and design. To perform a relevant pre- diction the right model must be chosen with caution. A wrong im- plementation could lead to skewed results.
4. Validation and testing of the obtained results obtained in the pre- vious step. To create the best prediction, it has to be tested. The model should stand in concordance with what we predict.
In order to best fulfil the requirements for my thesis, a real use-case from a big company in the telecommunications sector was chosen. The reason behind it is the amount of information available as well as a real- life approach to the theoretical approximations.
Chapter 1
BUSINESS CHALLENGE
Nowadays, as mentioned before, the use of Artificial Intelligence has skyrocketed. This is in part because due to recent technological advances, the barrier to entry for new users and therefore organizations across any sector and development stage has been greatly reduced. This has been driven in part by three key factors:
- Cheaper storage and mass-production of data: Thanks to the quick rise of cloud computing, more data than ever is produced and pro- cessed. This is can also be used for many business-critical applica- tions if the costs are managed correctly [5].
- Greater processing power: Due to the aforementioned cloud-based solutions, renting custom hardware for Machine Learning is cheaper than ever, increasing the sustainability of these applications for a wide range of business needs.
- Open source initiatives: Thanks to many open source Machine Learning libraries like Google’s Scikit-learn and TensorFlow[6], new and continually improved algorithms are made public to a wider range of individuals.
AI is mostly used in the following three segments: revenue increase, process optimizing and risk reduction [7]. In the banking sector for ex- ample, AI is revolutionizing the current risk models, making many of the current ones obsolete [8]. For sales optimizing and revenue increase, it has become such a standard, that even the biggest CRM providers like Oracle and SalesForce have begun integrating AI solutions to their sales tools [9]. Lastly, for process optimizing, great advances have been made due to new improvements like computer vision, which enables to find even the most minuscule of defects in a production batch. Or generative design, which lets companies perform an amount of computing never seen before for a fraction of the price.
From a business perspective, the use of Artificial Intelligence, how- ever, appears to be used more intensively in the B2C (Business-to-customer) segment, than in the B2B (Business-to-Business) one. This happens be- cause the B2B segments in telco companies produces a smaller part of revenue than their B2C counterpart, which leads companies to invest less ressources in it. On the B2C side, there have been many examples of successful Artificial Intelligence uses, such as user acquisition, customer support, forecasting, security and fraud detection or people management, to name a few.
Especially interesting in the B2B side is the forecasting segment, in which many big corporations have made considerable efforts to use Ar- tificial Intelligence, as a Forbes article suggests [10]. The conditions for sales forecasting however, are not the same in the two segments, as on the B2B side a customer portfolio can be developed, which enables a much more accurate customer tracking via the sales funnel. By the year 2020, over 30% of all B2B-oriented businesses will use AI to further increase sales.
All these developments on the B2C side of the industry beg the ques- tion: Can a forecasting model for the B2B market be made to impact the sales performance positively today?
In the aforementioned telco, a key element is the CRM system. A CRM (customer-relations-management) system, as the name suggests, is a customer relationship management solution, usually oriented to manage three basic areas: commercial management, marketing, and after-sales service or customer service[11]. Basically, it is an application that allows all the interactions between a company and its clients to be centralized in a single database.
The CRM software, by definition, allows us to share and maximize the knowledge of a given client and in this way understand their needs and anticipate them. It collects all the information of commercial transactions keeping a detailed history. It also makes it easier to manage customer ac- quisition and loyalty campaigns. Thanks to the CRM you can control the set of actions carried out on the clients or potential clients, and manage the commercial actions from a detailed scorecard.
According to a study done by Market Research Future [12], the top players in the CRM industry are Salesforce.com (U.S.), SAP AG (Ger- many), Oracle Corporation (U.S.), Microsoft Corporation (U.S.), Adobe Systems Inc. (U.S.), Amdocs (U.S.), Convergys Corporation (U.S.), Huawei Technologies Co. Ltd (China), Infor Global Solutions, Inc. (U.S.), just to name a few. Further research through the internet confirms this claim, as prominent technology magazine, PCMag displays many of the aforemen- tioned companies on it’s comparative CRM article[13].
As it’s clearly discernible, SalesForce is one if not the leading player in its industry and as such, this tool is currently being used in the afore- mentioned telco company. It was implemented to consolidate all old CRM systems, which were acquired after the integration of smaller busi- nesses into the company. It was chosen, since it brings a few key factors over the competition, like full mobile integration, easy customization, in- tegrated data analytics, as well as AI and a very accessible training port- folio for new users[14]. The idea behind it is to facilitate the tracking of opportunities with each new or already approached company worldwide and get new insights from it.
In this example, however, the company’s operation business (O.B.) in Argentina is chosen due to the quick integration and adoption of the system’s features.
Since SalesForce encourages customizability to suit every businesse’s needs, the implementation differs from a standard implementation of the software. Different interface and interface to other systems are given in every implementation, but what really influences sales is the sales funnel which was studied and later deployed. The consensus was to have seven possible states a given opportunity, and therefore an offer made by the correspondent salesperson, could go through.
The funnel or conversion funnel is a marketing term that tries to define the different steps that a user must take to fulfill a specific sales objective, be it the first contact, a purchase or the generation of a lead.
It serves to determine the percentage of losses in each of the steps that the responsible sales person performs on a given opportunity to meet the final goal, as well as what points need to be optimized more urgently to get them to convert the largest number of possible users[15].
In the case of the company referenced in this Bachelor’s thesis, the sales funnel was defined as the following:
StateF6: An offer has not yet been issued to the company and therefore an opportunity has not yet been made.
StateF5: Definition of the need.
StateF4: Design phase of the required solution.
StateF3: Negotiation step.
StateF2: Contract management.
StateF1: The opportunity was a success. The company reached to an agree- ment and will buy the given product.
StateF0: The opportunity has been lost because the company didn’t reach an agreement and will not buy the product.
Below is a more graphical explanation of these steps:
Offer entry to companies
Abbildung in dieser Leseprobe nicht enthalten
Figure 1.1: Visualisation of the SalesForce funnel implementation in Ar- gentina.
The figure above shows the different steps an offer takes before it’s finally rejected or successfully accepted by the customer. In every step a few opportunities fall out, thinning the funnel a bit more.
In conjunction with the different states of the funnel the company may be in, the responsible sales person also has all the information from the in- ternal data warehouse, which may or may not be fully enriched, displayed and at this/her disposal through SalesForce to help close the opportunity more easily. However, many opportunities are lost and there’s no clear explanation if the manager looks at the development of the opportunity through the sales funnel and therefore no real insights can be drawn from it since the outcome of similar offers issued to similar companies have a high variation in their outcomes.
The goal of this Bachelor’s thesis is, therefore, to help accurately predict the opportunities that have a high propensity to be lost, help the salesperson to identify them and be able to quickly react.
Oftentimes, companies have serious data acquisition limitations in the B2B sector due to Data Protection laws (GDPR)[16]. The main game changer is that nowadays consent is a regulation and not a directive, which means all gathered data has to be first approved by the entity the data is extracted of, making many established data gathering and enrichment procedures obsolete. That is why many a company’s Data Warehouse can have serious restrictions on the information they can use in their internal databases.
To help solve that issue, and as part of this TFG’s goal, publicly avail- able data from the business-oriented social network LinkedIn will be ex- tracted to help enrich internal information, further describing the target of every possible opportunity. LinkedIn is GDPR-compliant, meaning all gathered information from that source is available for commercial use.
Once the company’s attributes are available, they can be bundled to the offer’s conditions and the historical data of the funnel. This new in- formation will contribute to the generation of insightful predictions and accurate results.
Abbildung in dieser Leseprobe nicht enthalten
Figure 1.2: Illustration of the data enrichment process.
The prediction will be done by empirically testing three different types of Artificial Intelligence algorithms and their possible ensemble combina- tions. These algorithms are:
- Regular Machine Learning prediction algorithms like tree-oriented Classifiers, boosting algorithms and the k-nearest neighbors algo- rithm (KNN).
- Artificial neural networks (ANN) based off of MLP (Multilayer Per- ceptron) structures.
- Artificial recurrent neural networks (RNN) like the cutting edge al- gorithm LSTM.
All in all, the model is constructed with recall maximization in mind, e.g. predicting right all the opportunities which have a high propensity to be lost in the next month. From a business perspective, the salesper- son should put special attention to these identified opportunities and try to keep them, in the hopes that the offer does not attain the funnel status F0, and therefore proves the model wrong.
Depicted in figure 1.3 is an exemplification of the projects’ different phases.
Abbildung in dieser Leseprobe nicht enthalten
Figure 1.3: Representation of the projects’ milestones.
Chapter 2
TECHNICAL APPROACH
In this section the inner workings of the model will be explored and analyzed. Firstly, the information chosen to train the model will be iden- tified and it’s sourcing explained. An exploratory analysis on the more important features of the information will be performed afterwards. This will allow the reader to understand the chosen data better and get to know the relevant information in order to understand the predictive model se- lection. This is important, as not all models perform equally well on the same data.
After this exploratory part, an empirical study and explanation will be performed in order to explain why and how the model was chosen. In this part, Machine Learning (ML) as well as Deep Learning (DL) techniques will be explored and analyzed.
This leads the TFG then into a final argumentation about how to fuse these techniques into one predictive model that works on the specific data presented above. Different combinations of these ML and DL techniques, called Ensembles, will be attempted, in order to minimize algorithm bias and maximize performance.
As an addition to the project, a graphic user interface (GUI) has been created to enhance usability and reduce the need to use a terminal em- ulator to make iterative executions. This GUI is fully coded in the pro- gramming language Python and links with the data gathering as well as the prediction functionalities this TFG describes. All the examples shown in this document feature this GUI. Functional requirements are described further in section 3.1. The figure below shows the main menu of the user interface:
Abbildung in dieser Leseprobe nicht enthalten
Figure 2.1: Main menu of the GUI
2.1. Information sources
This section focuses on studying the used data for the predictive models.
As explained in the introduction, the data sources chosen for this project were
- Information about the company itself.
- Company data, which includes offer data and historical sales funnel states.
It starts by analyzing the company data, which is sourced using a tailored API connector, which retrieves all the available data from the queried company that LinkedIn has available. The process and constraints of using this method are also explained and argumented.
The next section illustrates the available data from the internal Data Warehouse from the company, which is the offer data and the historic sales funnel. This information was sourced from the company on July, 2018 and is therefore not the most recent one. This is due to legal issues and conditions imposed by the telco. Ending this chapter will be an ex- ploratory analysis in which the most important variables will be picked out and described statistically.
The company information will be described in the next section, 2.1.1 while the rest of the data will be explained in section 2.1.2.
2.1.1. Company information
All the company information will exclusively be retrieved from LinkedIn and not the available internal databases. The reason behind it, is that so little information was available that other means to enrich company data had to be found. The reason LinkedIn was chosen to be the source for company data gathering was that it has a good API connector and the fact that until a certain usage it is free of charge. Other business-oriented websites like XING [17] were considered, but seeing the connection to their API was not free, it was left out. The Argentinian census was also considered, but it was not up to date and there was no possible access to mass-searches without a usage-dependant fee.
That, amongst other issues like the low amount of penetration for any other business-oriented social network in Latin America left LinkedIn as the only sensible choice, considering it has a global database of more than 590 million users.
However, at a bit more than 6 million active users in Argentina, even the biggest player in that industry does not have all the desired informa- tion, considering a very small country in comparison like Spain has al- most double the amount of users. Nevertheless, as it was the only viable option, an API connector was developed for enrichment purposes.
LinkedIn REST API
The aforementioned LinkedIn API connector was developed purely in Python and works through the REST API provided by LinkedIn. For ease- of-use, the authentication part of the script was based upon the Python LinkedIn Github repository [18].
The LinkedIn REST API works along with the user’s LinkedIn ac- count, by signing up to their development page and creating an app. This app in turn creates the typical client id and secret id elements which are used to authenticate the user and the app they are using.
Each production app lets the user make up to 100 requests per day, but in this TFG the development version is used, extending the throttle limits up to 500 calls a day. Through automation there are currently 540 apps assigned to the creator’s profile, making up to 270 thousand requests a day possible.
This program features:
- Search for companies based on their names in bulk. There is the possibility to load a document in .CSV format with a list of the companies to be searched. This will then create an output document which appends the obtained results to the searched names.
- Thread-based search for better performance by parallelization. An option to select as many threads as the machine the script is running on supports, is available. It then distributes the companies to search equally among the selected threads.
- Possibility to collect up to the four first results. This allows to get more accurate results, as in the free version the returned results are not ordered by relevance.
- Automatic token renewal for continuous app usage. On changing the number of the tokens to renew to a number bigger than zero, it uses a framework for testing web applications called Selenium [19], which automatically executes certain programmed actions on a Google Chrome Browser.
- Automatic creation of new developer apps to extend throttle limits. Using the previously mentioned Selenium framework, it can also create new developer apps in the LinkedIn portal based off of a previously created csv document by the user.
- Option to filter by country using postal codes. The possibility to filter results through postal codes is included.
- Natural Language Processing (NLP) to further filter the results based on the company name. By using the package FuzzyWuzzy [20], which implements the Levenshtein Distance [21], the similar- ity between the queried and the returned name can be fine-tuned.
Initially, this API retrieval script was developed to enrich the internal databases from the german market information, enrichting several batches of up to 700 thousand companies per batch with this tool. Since it worked so well it was used for this project for the argentinian market.
Below is a representation of the graphical user interface developed for it.
Abbildung in dieser Leseprobe nicht enthalten
(a) Initial enrichment screen of the GUI.
(b) After clicking on configure.
Figure 2.2: GUI: Company enrichment options.
The API connector works in 4 steps.
1. Firstly, it reads the names on the CSV file and makes the request to the LinkedIn API, fetching the responses in standard JSON format and saving them.
2. Once all the requests have been sent and all the JSON files have been collected, they are parsed, converted into CSV format and grouped (if the search for more than one result is set) into one CSV file for the queried name.
3. After that, all CSV files are joined into one
4. A filtering by country is then performed based upon the ZIP-codes obtained from LinkedIn from predefined lists already included in the program and subsequently filtered again using the Levenshtein Distance with the defined threshold in the GUI to ensure only wanted results are left.
The final CSV file is then saved, along with the unfiltered one, for de- bugging purposes. figure 2.3 below depicts the process in a more graphi- cal form:
Abbildung in dieser Leseprobe nicht enthalten
Figure 2.3: Representation of the LinkedIn API connector’s workflow.
The results exploration of the final CSV file is also integrated into the GUI by a third button, which appears after the process is completed, la- belled Showresults. Below is a representation of what it looks like on the user interface itself.
Abbildung in dieser Leseprobe nicht enthalten
(a) Logs after execution.
(b) Visualization of returned results.
Figure 2.4: GUI: Sample execution of a batch query to the LinkedIn REST API.
In the figures we see the GUI after finishing a batch of queries. The left hand side picture shows the usual GUI layout, with a new button and logs appearing on the text box. These logs indicate the state the script execution is in. Every log appears with a timestamp for the user to be aware of the current progress.
These logs represent the four steps mentioned above: reading the CSV, fetching the results and parsing them into CSV files, joining the returned results into one CSV file and lastly, filtering through country via the ZIP-code and the Levenshtein distance with the defined threshold.
Depicted on the right hand side a picture of the actual results is shown. This screen opens in a new window when the according button is pressed. It shows the Levenshtein distance as well as the returned information through the LinkedIn API.
Ultimately, not all the information the LinkedIn API had to provide was chosen. A selection was made in order to make the run times more efficient and gather less data. A full description of all the available fields can be found on the LinkedIn developer website [22].
The final data from the LinkedIn API together with the added by the process of the program is listed as follows:
Table 2.1: Chosen LinkedIn API information
Abbildung in dieser Leseprobe nicht enthalten
The inclusion of company data however, was not used in the predic- tion data for the model as it was later found out that even the biggest busi- ness oriented network had not enough data due to the low 12% penetra- tion in the Argentinian market [23] for the predictive model to make sense of it. The reason, is that only the companies who have a premium sub- scription to LinkedIn’s service obtain exposure through their API. Even a combination of the existing internal databases of the company and the de- veloped API connector weren’t enough to cover the data at a satisfactory level, thus it was discarded.
This tool however, could be used in Europe with a very good amount of covered businesses, with a more elevated rate at multinational-level companies (MNC) rather than small and medium enterprises (SME).
2.1.2. Internal company data
This section is about the available internal data which is comprised of sales funnel data as well as offer data.
Firstly, a quick overview of the different set of variables will be done by explaining their definition and an exploratory analysis will be per- formed, explaining the relationship between the variables.
Below is a list of the available variables. They are comprised of 28 unique variables and up to thirteen funnel state ones:
Table 2.2: Available internal information
Abbildung in dieser Leseprobe nicht enthalten
[...]
- Citar trabajo
- Herr Juan Ruiz de Bustillo Ohngemach (Autor), 2019, Predicting sales funnel with a customer-relationship-management tool, Múnich, GRIN Verlag, https://www.grin.com/document/503216
-
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X. -
¡Carge sus propios textos! Gane dinero y un iPhone X.