View on GitHub


NLP, Text Mining and Machine Learning starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, keyword extraction with TFIDF, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Word Count and Reading CSV & JSON files with PySpark