View on GitHub

nlp-in-practice

NLP, Text Mining and Machine Learning starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, keyword extraction with TFIDF, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Usage examples of scikit’s tfidftransformer and tfidfvectorizer and the differences between the two.

Navigate

Running the Notebook

  1. From the command line, first, clone this repo.
    git clone <this repo url>
    
  2. Next, switch to the tfidftransformer directory of this repo.
    cd  nlp-in-practice/tfidftransformer
    
  3. Then, run jupyter notebook
    jupyter notebook
    
  4. Select TFIDFTransformer vs. TFIDFVectorizer Notebook.ipynb, and re-run the cells and re-use the code!