Use these NLP, Text Mining and Machine Learning code samples and tools to solve real world text data problems.
Notebooks / Source
Links in the first column take you to the subfolder/repository with the source code.
|Task||Related Article||Source Type||Description|
|Large Scale Phrase Extraction||phrase2vec article||python script||Extract phrases for large amounts of data using PySpark. Annotate text using these phrases or use the phrases for other downstream tasks.|
|Word Cloud for Jupyter Notebook and Python Web Apps||word_cloud article||python script + notebook||Visualize top keywords using word counts or tfidf|
|Gensim Word2Vec (with dataset)||word2vec article||notebook||How to work correctly with Word2Vec to get desired results|
|Reading files and word count with Spark||spark article||python script||How to read files of different formats using PySpark with a word count example|
|Extracting Keywords with TF-IDF and SKLearn (with dataset)||tfidf article||notebook||How to extract interesting keywords from text using TF-IDF and Python’s SKLEARN|
|Text Preprocessing||text preprocessing article||notebook||A few code snippets on how to perform text preprocessing. Includes stemming, noise removal, lemmatization and stop word removal.|
|TFIDFTransformer vs. TFIDFVectorizer||tfidftransformer and tfidfvectorizer usage article||notebook||How to use TFIDFTransformer and TFIDFVectorizer correctly and the difference between the two and what to use when.|
|Accessing Pre-trained Word Embeddings with Gensim||Pre-trained word embeddings article||notebook||How to access pre-trained GloVe and Word2Vec Embeddings using Gensim and an example of how these embeddings can be leveraged for text similarity|
|Text Classification in Python (with news dataset)||Text classification with Logistic Regression article||notebook||Get started with text classification. Learn how to build and evaluate a text classifier for news classification using Logistic Regression.|
- For more articles, please see this list.
- If you would like to receive articles via email subscribe to my mailing list.