Data Analysis: The Hard Parts

(Repost from 2014) I don’t know whether this word exists, but mainstreamification is what’s happening to data analysis right now. Projects like Pandas or scikit-learn are open source, free, and allow anyone with some Python skills do lift some serious data analysis. Projects like MLbase or Apache Mahout work to make data analysis scalable such … Continue reading Data Analysis: The Hard Parts

Three Things About Data Science You Won’t Find in the Books

(Repost) In case you haven’t heard yet, Data Science is all the craze. Courses, posts, and schools are springing up everywhere. However, every time I take a look at one of those offerings, I see that a lot of emphasis is put on specific learning algorithms. Of course, understanding how logistic regression or deep learning … Continue reading Three Things About Data Science You Won’t Find in the Books

How Python Became the Language of Choice for Data Science

Nowadays Python is probably the programming language of choice (besides R) for data scientists for prototyping, visualization, and running data analyses on small and medium sized data sets. And rightly so, I think, given the large number of available tools (just look at the list at the top of this article). However, it wasn’t always … Continue reading How Python Became the Language of Choice for Data Science