How to Deal with High Cardinality Categorical Variables

Date

Background and Methods

In machine learning problems, we encounter categorical features very often, such as gender, address, zip code, etc. For low cardinality attributes, which only takes a small number of possible values, one hot encoding (OHE) is widely used. This encoding scheme represents each value of the original categorical …

more ...


Creating Word Cloud in Python

In text analysis, creating word clouds is a useful technique to visualize text data. Words bigger and bolder in size represent a higher frequency of occurance in word corpus. In other word, key words stand out and catch our eyes. The color of the text are generated randomly.

It is …

more ...