How to Use scikit-learn Methods with statsmodels Estimators

Sometimes you want to use estimators from one package but methods from another. Maybe, like me, you want to use scikit-learn's grid searching cross validation function with an estimator from statsmodels. These two don't work together straight out of the box, but by writing a quick wrapper, you can make a statsmodels estimator play nice with scikit-learn. »

State Space Time Series Analysis - Part 1

State space models make up a suite of powerful time series analysis techniques which utilize the Kalman filter to model seasonal, trend, and level components of time series separately. State space methodology gives the developer considerably greater control over how the time series is modeled than most popular time series analysis techniques while also seemlessly allowing the analysis of exogenous variables alongside autoregressive and moving average terms. »

Introduction to Causal Inference - Part 4

Wrapping up the series on causal inference, this final post covers the essential topic of design sensitivity, which allows a statistician to derive actual insights from an observational study by making some necessary adjustments to the standard statistical inference used in randomized experiments. »

Building Decision Trees with the ID3 Algorithm

For a machine learning course, I had to write code to implement the ID3 algorithm to train decision trees from scratch. Writing recursive functions can be challenging and even frustrating, particularly when you are a math/stats master's student just beginning his foray into the world of devops and computer science. Each piece of the unoptimized recursion I wrote is written out in gory detail here for your reading pleasure. »

Introduction to Causal Inference - Part 1

This is my master's thesis broken into smaller, more digestible pieces. Causal inference is a fascinating (and relatively emergent) branch of statistics that seeks to establish causal relationships between variables. It turns out that establishing causality is intensely more demanding than establishing associations via traditional statistical inference methods. This post covers the groundwork to get started with causal inference, including essential background about randomized experiments and observational studies. »

The Gradient Descent Algorithm

The gradient descent algorithm turns up nearly everywhere in machine learning. This algorithm is intensely popular because it is excellent at solving certain types of optimization problems. It must be used thoughtfully, however, since it is not guaranteed to converge to global extrema. It's absolutely essential for machine learning engineers to understand the mathematics of this ubiquitous algorithm. »