Significance Magazine Contribution
I am excited to have the opportunity to be a regular contributor to Significance Magazine. Below is the first of what I hope to be many contributions. I hope to see contributions from my talented (and...
View ArticleQuick Look: Facebook’s Kaggle Competition
Following Friday’s news of yhat’s ggplot port (which I hope they promptly rename to avoid search engine conflation with other variants), I thought it’d be fun to explore the large Stack Overflow...
View ArticleMahalanobis Distance and Outliers
I wrote a short article on Absolute Deviation Around the Median a few months ago after having a conversation with Ryon regarding robust parameter estimators. I am excited to see a wet lab scientist...
View ArticleThe Cost Function of K-Means
When exploring a novel dataset, I believe most analysts will run through the familiar steps of generating summary statistics and/or plotting distributions and feature interactions. Clustering and PCA...
View ArticleA Real World Introduction to Information Entropy
I’ve been using IPython notebook so much that it might finally be time to stand up a Pelican based site on this server in order to utilize Jake Vanderplas’ IPython integration method. This post might...
View ArticleDynamic Time-Series Modeling
Today’s article will showcase a subset of Pandas’ time-series modeling capabilities. I’ll be using financial data to demonstrate the capabilities, however, the functions can be applied to any...
View ArticleRegularized Logistic Regression Intuition
In this notebook we’ll manually implement regularized logistic regression in order to facilitate intuition about the algorithm’s underlying math and to demonstrate how regularization can address...
View ArticleThe 35-hour Workweek with Python
I was prompted to write this post after reading the NYT’s In France, New Review of 35-Hour Workweek. For those not familiar with the 35-hour workweek, France adopted it in February 2000 with the...
View ArticlePyCon Montreal 2015 and Motivation
I just got back from a fun week in Montreal for PyCon 2015. Due to my work commitments since relocating to Seattle and leaving the San Diego Data Science Meetup I organized behind, I’ve been concerned...
View ArticlePure Python Decision Trees
By now we all know what Random Forests is. We know about the great off-the-self performance, ease of tuning and parallelization, as well as it’s importance measures. It’s easy for engineers...
View ArticleLending Club Data Analysis Revisited with Python
2.5 years ago I analyzed Lending Club’s issued loans data (yikes! I was using R back then!) . It was the most visited blog post on my site in 2013 through 2014. Today it’s still number 5. Reddit picked...
View ArticleExamining Your Presence on Twitter with Python
My Evil The Following with absoluteBLACK’s direct mount oval ring. The purpose of this post is to show how a sponsorship/marketing manager might track their athletes or brand ambassadors. The code...
View ArticleA wild dataset has appeared! Now what?
Where do we start when we stumble across a dataset we don’t know much about? Lets say one where we don’t necessarily understand the underlying generative process for some or all of the variables. Lets...
View ArticleTopic Modeling Amazon Reviews
Adapted from Biel 2011 I found Professor Julian McAuley’s work at UCSD when I was searching for academic work identifying the ontology and utility of products on Amazon. Professor McAuley and his...
View ArticleA Real World Introduction to Information Entropy
I’ve been using IPython notebook so much that it might finally be time to stand up a Pelican based site on this server in order to utilize Jake Vanderplas’ IPython integration method. This post might...
View ArticleDynamic Time-Series Modeling
Today’s article will showcase a subset of Pandas’ time-series modeling capabilities. I’ll be using financial data to demonstrate the capabilities, however, the functions can be applied to any...
View ArticleRegularized Logistic Regression Intuition
In this notebook we’ll manually implement regularized logistic regression in order to facilitate intuition about the algorithm’s underlying math and to demonstrate how regularization can address...
View ArticleThe 35-hour Workweek with Python
I was prompted to write this post after reading the NYT’s In France, New Review of 35-Hour Workweek. For those not familiar with the 35-hour workweek, France adopted it in February 2000 with the...
View ArticlePyCon Montreal 2015 and Motivation
I just got back from a fun week in Montreal for PyCon 2015. Due to my work commitments since relocating to Seattle and leaving the San Diego Data Science Meetup I organized behind, I’ve been concerned...
View ArticlePure Python Decision Trees
By now we all know what Random Forests is. We know about the great off-the-self performance, ease of tuning and parallelization, as well as it’s importance measures. It’s easy for engineers...
View Article
More Pages to Explore .....