View on GitHub


Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Usage examples of scikit’s tfidftransformer and tfidfvectorizer and the differences between the two.


Running the Notebook

  1. From the command line, first, clone this repo.
    git clone <this repo url>
  2. Next, switch to the tfidftransformer directory of this repo.
    cd  nlp-in-practice/tfidftransformer
  3. Then, run jupyter notebook
    jupyter notebook
  4. Select TFIDFTransformer vs. TFIDFVectorizer Notebook.ipynb, and re-run the cells and re-use the code!