Identifying Duplicate Quora Question Pairs (Kaggle Competition Bronze Medal Winner)

  • We explored the current methods in NLP, including word2vec embedding (gensim package in python), LSTMs(use keras neural networks API), tf-idf, python nltk package, etc.
  • We built machine learning models which identified duplicate Quora question pairs with high accuracy (logloss ~0.151)
  • We are ranked top 8% in this Kaggle …
more ...


Creating Word Cloud in Python

In text analysis, creating word clouds is a useful technique to visualize text data. Words bigger and bolder in size represent a higher frequency of occurance in word corpus. In other word, key words stand out and catch our eyes. The color of the text are generated randomly.

It is …

more ...

How I Build My First Pelican Blog

After completed several data science projects, I am eager to document them and share them with people. It took me several days to research, set up and write my blog, but I feel it can be much easier and faster to build a Pelican blog, so I am sharing with …

more ...