View on GitHub


Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Text Classification with Logistic Regression

Learn how to build your first text classifier using Logistic Regression in Python. The challenge is to categorize news articles with the appropriate categories (from a set of 31 categories).

Running the Notebook

  1. From the command line, first, clone this repo.
    git clone <this repo url>
  2. Next, switch to the text-classification directory of this repo. ``` cd nlp-in-practice/text-classification
3. Then, run jupyter notebook

jupyter notebook ```

  1. Now, go to notebooks directory and select the notebook you would like to run and re-run the cells.