View on GitHub


Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.


Running the TF-IDF Keyword Extraction Tutorial Notebook

  1. From the command line, first, clone this repo.
    git clone <this repo url>
  2. Next, switch to the tf-idf directory of this repo.
    cd  nlp-in-practice/tf-idf
  3. Then, run jupyter notebook
    jupyter notebook
  4. Select Keyword Extraction with TF-IDF and SKlearn.ipynb, now you can re-run the cells and extend the code!