A Gentle Introduction to Curve Fitting

Posted on Mon 06 November 2017 in data science • Tagged with python, optimization, modelingLeave a comment

Supose you find yourself in nature, recording pairs of data points for some important endeavour. Maybe you are a biologist looking at insects' age vs. height, or an ethnologist comparing mortality rates across municipalities with varying climatological conditions. Once you collect all your data you will want to formulate a hypothesis of a general model describing it, aiding you to:

Continue reading

Building a Word Embeddings Model

Posted on Mon 28 November 2016 in data science • Tagged with python, nlp, word2vec, tensorflowLeave a comment

One of the reasons why I find Natural Language Processing interesting is that it provides you with ways to turn textual data into a numeric representation, which allows you to do comparisons and find associations between words and their context. Its very intriguing how some of these representations encode semantic meaning, and how with mathematical operation we can decompose them and get insights on how we think and communicate.

Continue reading

A PCA Classifier for Landscapes

Posted on Wed 12 October 2016 in data science • Tagged with python, ml, imagesLeave a comment

Neural Networks have become very trendy nowadays, in part because of their superior performance on image recognition related tasks. Take for instance the Deep Visual-Semantic Alignments network from Stanford, which is able to generate sentence descriptions from images and achieve other amazing results. But this power comes at a cost, as most of these models are trained with thousands (or millions) of inputs, and require the expensive computational cost of GPU's. Sometimes, if our image-classification task is a simpler one, we can get away with much simpler approaches.

Continue reading

FML or Just Made My Day?

Posted on Sat 24 September 2016 in data science • Tagged with python, nlp, ml, naive-bayesLeave a comment

On this post I create implement a Naive Bayes classifier that can differentiate posts from FML and Just Made My Day, where the first ones are generally depressing and the second uplifting. By looking at a sample of these posts we can get some ideas for the model features.

Topic Modeling on Bukowski's Poems

Posted on Fri 16 September 2016 in data science • Tagged with python, nlp, topic-modeling, bukowskiLeave a comment

On the first post about Bukowski's poems we explored the top words and their polarity. From inspection these groups seemed to be associated to 4 main topics, which also happen to be mentioned on the writter's legacy website. It would be interesting to see if these same topics show up when applicating a generative statistical modeling, such as the Latent Dirichlet Allocation (LDA). To do this and visualize the results I'll use the pyLDAvis and scikit-learn packages.

Continue reading

Bukowski's Poems Sentiment Analysis

Posted on Thu 25 August 2016 in data science • Tagged with python, nlp, word2vec, bukowskiLeave a comment

As a byproduct of the neural network project that attempts to write a Bukowski poem, I ended up with this pickle file with a large sample of its poems (1363). I'll use the data to perform basic sentiment analysis on the writings, and see what insights can be extracted from them.

Continue reading