View on GitHub

nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Usage examples of scikit’s tfidftransformer and tfidfvectorizer and the differences between the two.

From the command line, first, clone this repo.
```
git clone <this repo url>
```
Next, switch to the tfidftransformer directory of this repo.
```
cd  nlp-in-practice/tfidftransformer
```
Then, run jupyter notebook
```
jupyter notebook
```
Select TFIDFTransformer vs. TFIDFVectorizer Notebook.ipynb, and re-run the cells and re-use the code!