Business Problem
Retail companies receive a large number of customer support requests every day through:
- support portals
- helpdesk systems
- chatbots
- mobile applications
Each request must be routed to the correct department such as billing, delivery, or technical support.
Goal
The goal of this assignment is to build a machine learning system that automatically classifies customer support tickets into the correct department using Natural Language Processing (NLP).
Dataset
Click here to download the dataset.
Classification Problem
This is a multi-class text classification problem, ML model must be able to classify the given text to one of the class.
Technologies and Tools
Here are the list of tools can be used to build the ML system.
Data Analysis
- Pandas
- Numpy
NLP Libraries
- spaCy
- NLTK
Vectorization Techniques
- TF-IDF using TfidfVectorizer
- Bag-of-Words using CountVectorizer
- spacy
Models
- Logistic Regression
- Multinomial Naive Bayes
- Support Vector Machine (SVM)
- Random Forest
Data Processing
Before training the machine learning model, the dataset must be cleaned and processed using NLP techniques.
Data Processing stage to remove unwanted tokens in the dataset may include
- remove unwanted features
- converting text to lowercase
- removing punctuation and special characters
- removing stop words
- optional: lemmatization or stemming
Text Vectorization
Machine learning models cannot understand raw text, hence must convert the text data into numerical feature vectors using vectorization techniques.
ML Training
split the dataset into training and testing sets and Train the ML models with different possible hyper-parameters
ML Evaluation
Evaluate the Model performance using sci-kit metrics and save the best model best hyper-parameters and best vectorization technique that giving high accuracy.
Model Inference
Build a simple interface using streamlit and inference the Model to classify the given text by the user.
Notes: Once the ML system is ready and tested share the GitHub link for validation and comment below which model is the best.
