This research thesis focuses on developing as well as deploying efficient and proficient OM model for automatically processing Arabizi language system in the context of both public and private customer service providers in Lebanon country. Service providers include restaurants, hotels, shopping centers, governmental institutions, etc... Arabizi corpus of 2635 text reviews, which is essential for the building of the OM model, was gathered through crawling pages of service providers in Facebook, Google and Zomato websites over a period of time from April 4, 2018 to October 30, 2018.
The main aim of this research thesis is to give credit to the Arabizi language users’ feelings and thoughts in Lebanon territory by extracting sentimental knowledge out of expressed sequences of texts in positive or negative impressions. In addition, it is necessary to highlight the challenges that underpin this language system for the public and researchers most particularly to further their research studies on. Moreover, it is crucial to distinguish Arabizi, particularly in the Lebanese context; therefore, it would be a startup point for other researches to build on. Furthermore, this research experiments the machine capabilities on tasks for sentiment predication and classification in the Lebanese Arabizi. And, this thesis is purposeful to build a dataset that contains reliable Arabizi reviews, which could be used for further researches. Researchers could be working on the expansion of this corpus, too. In general, it is important to classify the outstanding number of Arabizi sentences, which could be of great help for media offices, government centers, research facilities, and start-ups businesses in knowledge-making and future current-based predication tasks.
Because of the unavailability of SA tools for automatically processing Arabizi language system, building a one is of highly importance. For this, Arabizi language system would be in a place of recognition in the SA field with the increase of internet users, who currently use it and would use in the future. In addition, it would help companies, institutions, small businesses in extracting sentiments of positives and negatives much more efficiently in text reviews written in Arabizi; therefore, they would reflect on enhancing the qualities of their provided services.
Inhaltsverzeichnis (Table of Contents)
- Chapter I: Introduction
- 1.1 Introduction
- 1.2 Problem Statement
- 1.3 Purpose of the Study
- 1.4 Research Questions
- 1.5 Research Hypotheses
- 1.6 Significance of the Study
- 1.7 Limitations of the Study
- 1.8 Challenges of the study
- 1.9 Research contributions
- 1.10 Key Terms
- 1.10.1 Sentiment Analysis
- 1.10.2 Natural Language Processing (NLP)
- 1.10.3 Arabizi NLP
- 1.10.4 Classifier
- 1.10.5 Big Data
- 1.10.6 Machine Learning Classifier
- 1.10.7 Lexicon-based Classifier
- 1.10.8 Customer Review
- 1.11 Research Outline
- Chapter II: Literature Review
- 2.1 Literature Review
- 2.2 Natural Language Processing (NLP)
- 2.3 Big Data and Sentiment Analysis (SA)
- 2.4 Approaches to SA
- 2.4.1 Lexicon-Based Approach
- 2.4.2 Machine Learning Approach
- 2.4.3 Hybrid Approach
- 2.5 Arabizi and the Lebanese Dialect
- 2.6 Sentiment Analysis and Lebanese Arabizi
- Chapter III: Research Methodology
- 3.1 Research Methodology
- 3.2 Research Design
- 3.3 Research Sample
- 3.3.1 The Challenges of Analyzing Arabizi Texts
- 3.4 Data Preprocessing and Filtering
- 3.4.1 Removal of reviews with "neutral" sentiment
- 3.4.2 Ratings' Encodings
- 3.4.3 Data splitting for training and testing
- 3.4.4 Data Cleaning
- 3.5 Reviews Representation
- 3.5.1 Selected Features
- 3.6 Research Tools
- 3.6.1 Machine Learning Classifier
- 3.6.2 Lexicon-based Classifier
- 3.7 Research Procedure
- Chapter IV: Experimentation and Results
- 4.1 Experiment Preparation
- 4.2 Data Preprocessing
- 4.3 Feature Extraction
- 4.4 Building Classifiers
- 4.4.1 Machine Learning
- 4.4.2 Lexicon-based
- 4.5 Results and Evaluation
- Chapter V: Results and Discussion
- 5.1 Research Result
- 5.2 Machine Learning
- 5.2.1 First phase (Default settings)
- 5.2.2 Second phase (hyperparameters tuning settings)
- 5.2.3 Experiment Summary
- 5.3 Lexicon-based
- 5.3.1 Experiment Summary
- 5.4 Discussion
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This thesis aims to analyze the sentiment expressed in Lebanese Arabizi customer reviews using both machine learning and lexicon-based approaches. The study investigates the challenges of applying sentiment analysis techniques to this informal, transliterated form of Arabic.
- Sentiment analysis of Lebanese Arabizi text
- Comparison of machine learning and lexicon-based approaches
- Data preprocessing and feature extraction for Arabizi
- Challenges of applying NLP to informal language
- Evaluation of sentiment analysis accuracy in the context of Arabizi
Zusammenfassung der Kapitel (Chapter Summaries)
Chapter I: Introduction: This chapter introduces the research topic, focusing on the challenges of sentiment analysis in the context of Lebanese Arabizi, a transliterated form of Arabic. It outlines the study's purpose, research questions, hypotheses, significance, limitations, and contributions. Key terms like sentiment analysis, NLP, and Arabizi are defined, providing a strong foundation for the subsequent chapters. The chapter establishes the research methodology and explains the overall structure of the thesis.
Chapter II: Literature Review: This chapter presents a comprehensive review of existing literature related to sentiment analysis, natural language processing (NLP), and big data. It explores various approaches to sentiment analysis, including lexicon-based, machine learning, and hybrid methods. The chapter then delves into the specifics of Arabizi and the Lebanese dialect, examining its unique linguistic characteristics and the challenges it poses for NLP tasks. Finally, it reviews existing research on sentiment analysis within the context of Arabizi and similar informal language varieties, highlighting gaps in the literature that this thesis aims to address.
Chapter III: Research Methodology: This chapter details the research design and methodology employed in the study. It outlines the data collection process, including the selection of a representative sample of Lebanese Arabizi customer reviews. Crucially, it thoroughly describes the data preprocessing steps undertaken to prepare the data for analysis, including the challenges of cleaning and normalizing the informal text. The chapter also specifies the features extracted from the data and the research tools (machine learning and lexicon-based classifiers) utilized. The chapter’s importance lies in its clear explanation of the entire analytical process, allowing for replication and scrutiny of the methods.
Chapter IV: Experimentation and Results: This chapter meticulously describes the experimental setup, detailing the preparation of the data, the feature extraction process, and the building of both machine learning and lexicon-based classifiers. The chapter thoroughly explains the different steps in the analysis, offering a transparent view into the research process. This chapter also presents the results of both experiments, providing a quantitative analysis and comparison between the two approaches in terms of accuracy and performance. The focus remains on the rigorous methodology and the objective results rather than broad generalizations.
Chapter V: Results and Discussion: This chapter presents a detailed analysis of the experimental results obtained from both the machine learning and lexicon-based approaches to sentiment analysis. It compares the performance of each approach, identifying the strengths and weaknesses of both. The discussion section critically evaluates the findings in the context of the research questions and hypotheses, providing insightful commentary on the implications of the results for future research in sentiment analysis of similar informal language varieties. It offers a nuanced perspective on the accuracy and limitations of the employed methods.
Schlüsselwörter (Keywords)
Sentiment analysis, Lebanese Arabizi, Natural Language Processing (NLP), Machine learning, Lexicon-based, Arabic dialects, Customer reviews, Big data, Text classification, Informal language processing.
Frequently Asked Questions: Sentiment Analysis of Lebanese Arabizi Customer Reviews
What is the main topic of this thesis?
This thesis focuses on analyzing sentiment expressed in Lebanese Arabizi customer reviews using both machine learning and lexicon-based approaches. It investigates the challenges of applying sentiment analysis techniques to this informal, transliterated form of Arabic.
What are the key objectives of this research?
The key objectives include analyzing sentiment in Lebanese Arabizi text, comparing machine learning and lexicon-based approaches, addressing data preprocessing and feature extraction challenges specific to Arabizi, exploring the challenges of applying NLP to informal language, and evaluating sentiment analysis accuracy within the Arabizi context.
What methodologies are used in this research?
The research employs both machine learning and lexicon-based approaches to sentiment analysis. The methodology includes data collection, preprocessing (including cleaning and normalization of informal text), feature extraction, classifier building, and performance evaluation. Specific steps are detailed in Chapter III.
What are the key challenges addressed in this research?
The research addresses several key challenges: the inherent informality and transliteration of Lebanese Arabizi, data preprocessing difficulties (cleaning, normalization), selection of appropriate features for classification, and comparing the effectiveness of machine learning and lexicon-based approaches for this specific language variety.
What are the key findings of this research?
The key findings are presented in Chapter V, comparing the performance of machine learning and lexicon-based approaches. The discussion section critically evaluates these results, highlighting strengths and weaknesses of each method in the context of Lebanese Arabizi sentiment analysis. Specific details about the performance of each approach (including metrics) are included in the chapter.
What datasets were used?
The thesis utilizes a dataset of Lebanese Arabizi customer reviews. Chapter III provides details on the data collection process, sample size, and any preprocessing steps (like removal of neutral reviews, ratings encoding, and data splitting for training/testing) applied to the data.
What tools and techniques were used for the analysis?
The research employs both machine learning classifiers and lexicon-based classifiers. Chapter III details the specific tools and techniques utilized in each approach. The chapter also discusses the steps involved in feature extraction for both methods.
How are the results evaluated?
The results are evaluated based on the accuracy and performance of both machine learning and lexicon-based approaches. Chapter IV and V present the detailed evaluation, providing metrics and comparisons between both methodologies. The evaluation considers factors crucial for analyzing sentiment in informal language.
What are the limitations of this study?
The limitations of the study are discussed in Chapter I. These limitations likely pertain to the scope of the dataset, the specific methodologies chosen, or potential biases inherent in the data or approach.
What are the contributions of this research?
The research contributes to the understanding of sentiment analysis in informal and transliterated language. It offers insights into the effectiveness of different approaches (machine learning vs. lexicon-based) and highlights challenges and solutions for applying NLP to similar contexts. The contribution is detailed in Chapter I.
What are the key terms defined in this research?
Key terms defined include: Sentiment Analysis, Natural Language Processing (NLP), Arabizi NLP, Classifier, Big Data, Machine Learning Classifier, Lexicon-based Classifier, and Customer Review. These are defined in Chapter I.
- Quote paper
- Marwan Al Omari (Author), 2019, Feelings and thoughts of Arabizi language users in Lebanon territory. An efficient OM model for automatically processing Arabizi language system, Munich, GRIN Verlag, https://www.grin.com/document/537244