This week we welcome Jacob Vanderplas (@jakevdp) as our PyDev of the Week! Jacob is the author of Python Data Science Handbook: Essential Tools for Working with Data and works at the University of Washington as a researcher and teacher. As you might have guessed from the title of his book, Jacob is very much in tune with the Scientific programming projects in Python. If you check out his github profile, you will find many interesting highlights on Scikit-learn, for example. Let’s take some time to get to know him better!
Can you tell us a little about yourself (hobbies, education, etc):
I’ve always been drawn to physical activity – I was a swimmer in high school and college, and post-college got into triathlons, culminating with an Ironman a few years back. I’m not as competition-driven these days, but the way I relax is to go out on long hikes, runs, swims, or bike rides. My favorite pastime is to head out on long trail-runs deep into the mountains, though I don’t make it out as much these days with a toddler at home!
I was born and raised in Palo Alto, majored in Physics at Calvin College and did my PhD in Astronomy at University of Washington. In between I lived for a year in Northern Japan, guided mountaineering excursions for two summers in the Sierra Nevada, and taught at a middle school outdoor science program for two years in the redwood forests above Santa Cruz – my experiences during those years and the people with whom I shared my time have had a profound impact on me, and I’m so grateful for the opportunities I’ve had!
Why did you start using Python?
After a few years living and teaching in the outdoors, I decided that I wanted to dig into astronomy and astrophysics so that I could eventually teach them at a higher level. When I started graduate school in 2006, my only real coding experience was a high school C++ course almost a decade earlier. I joined a research group that first quarter of graduate school, and asked my advisor what programming language I should learn. Most people in the department back then were using this proprietary scripting language called IDL—for reasons I don’t fully understand, IDL drove probably 90% of Astronomy research in those days—but the professor I was working with told me with confidence that Python was the future. So I installed Python’s IDLE on my ancient Windows laptop, and got to work. I taught myself the basics of the language over winter break through the exercise of writing a solver and generator of sudoku puzzles, and have been learning ever since. In retrospect, it turns out that advisor was absolutely correct: a large fraction of professional astronomers today are using Python for their research work, and needless to say, following his advice has served me very well!
What other programming languages do you know and which is your favorite?
Aside from Python, I’ve done a fair bit with C, C++, and Javascript, but to be honest my all-time favorite programming language is Cython. The Cython language is a superset of Python that acts as something of a mutant hybrid of Python and C, and it can compile down to C code, aided by optional type declarations sprinkled through otherwise standard Python code. You get all the flexibility, expressiveness, and beauty of Python itself, but when portions of your program need that extra boost of speed that you get from compiled code, C-style syntax is at your fingertips.
What projects are you working on now?
After far too long working on it, I just wrapped-up my OReilly book, the Python Data Science Handbook. I’m currently working on final edits of all the Jupyter notebooks behind the book, and am making them available on github as they’re ready. Aside from that, I’ve been putting a lot of work recently into Altair, a declarative statistical visualization library built on the Vega-Lite visualization grammar. I’m quite excited about Altair, actually: it has a nice, clean, declarative syntax for visualization, and I believe that will free users to think about relationships within the data, rather than think about axes, ticks, labels, and other minutiae involved in displaying a plot.
Which Python libraries are your favorite (core or 3rd party)?
In Python’s core library, I love the collections module – it contains so many useful data structures, and in my opinion is vastly underused. As far as third party modules, I really love the emcee package: it’s a super clean and fast package for doing Bayesian estimation via Markov Chain Monte Carlo. Numba is another favorite. It’s kind of black magic: you add a decorator to any Python function, and Numba will JIT-compile it into LLVM bytecode, leading to huge speedups in many cases. Numba has been steadily improving – I’m hoping to dig back into it again some time soon. Dask is another good one: it’s a relatively new framework for parallelization of scientific Python code, and I’ve been having a lot of fun with it lately. I’m also very excited about the upcoming Jupyter Lab project: it’s still in alpha phase at the moment, but from what I’ve seen I think it’s going to unlock a lot of very interesting possibilities in the scientific Python space.
Where do you see Python going as a programming language?
I think Python is pretty unique in that it crosses the boundaries of so many technological niches. You have the stats world, where it competes with tools like R, the scientific computing world where it competes with tools like Matlab and Julia, the web world where it competes with tools like Javascript and Go… that is a strength, because Python can cross those boundaries pretty seamlessly, but it can also be a weakness because core language features often can’t be tailored to any one of those tasks without making sacrifices in other areas.
I’ve been glad to see over the last few years greater recognition of the needs of scientific users from the Python core team, exemplified by the addition of the array buffer protocol, the matmul operator, codification of type-hinting, and especially the wheel format in packaging. The Python world today is far more friendly to the scientific user than it was a decade ago, and I’m confident that continuing dialog between various branches of the Python user and developer communities will further that progress.
Is there anything else you’d like to say?
Thanks for the opportunity to be a part of this series!
Thanks for doing the interview!