Building a Word Embeddings Model

Posted on Mon 28 November 2016 in data science • Tagged with python, nlp, word2vec, tensorflowLeave a comment

One of the reasons why I find Natural Language Processing interesting is that it provides you with ways to turn textual data into a numeric representation, which allows you to do comparisons and find associations between words and their context. Its very intriguing how some of these representations encode semantic meaning, and how with mathematical operation we can decompose them and get insights on how we think and communicate.

Continue reading

FML or Just Made My Day?

Posted on Sat 24 September 2016 in data science • Tagged with python, nlp, ml, naive-bayesLeave a comment

On this post I create implement a Naive Bayes classifier that can differentiate posts from FML and Just Made My Day, where the first ones are generally depressing and the second uplifting. By looking at a sample of these posts we can get some ideas for the model features.

Topic Modeling on Bukowski's Poems

Posted on Fri 16 September 2016 in data science • Tagged with python, nlp, topic-modeling, bukowskiLeave a comment

On the first post about Bukowski's poems we explored the top words and their polarity. From inspection these groups seemed to be associated to 4 main topics, which also happen to be mentioned on the writter's legacy website. It would be interesting to see if these same topics show up when applicating a generative statistical modeling, such as the Latent Dirichlet Allocation (LDA). To do this and visualize the results I'll use the pyLDAvis and scikit-learn packages.

Continue reading

Bukowski's Poems Sentiment Analysis

Posted on Thu 25 August 2016 in data science • Tagged with python, nlp, word2vec, bukowskiLeave a comment

As a byproduct of the neural network project that attempts to write a Bukowski poem, I ended up with this pickle file with a large sample of its poems (1363). I'll use the data to perform basic sentiment analysis on the writings, and see what insights can be extracted from them.

Continue reading