Text Classification & Tokenization Project
Project Overview
This project processes and classifies text articles that have been manually labeled into categories (folders).
The workflow includes:
- Loading labeled text data
- Tokenizing and cleaning the text
- Preparing features for future machine learning models
The dataset is stored in a ZIP file, where each folder represents a label/category (Unzip Articles.zip for running the code).