This week we welcome Marc Garcia (@datapythonista) as our PyDev of the Week! Marc is a core developer of pandas, a Python data analysis library. If you’d like to know more about Marc, you can check out his website which has links to his talks that he has given at PyData in Europe as well as talks at EuroPython.
In fact, here is one of his talks on pandas in case you are interested:
You can also see what projects he is a part of over on Github. Now, let’s take some time to get to know Marc!
Can you tell us a little about yourself (hobbies, education, etc):
My background is in computer engineering, with a master’s degree in AI. I wrote my first program when I was 9, and not many years later I learned about free software, and I still think it’s one of the most amazing achievements of humanity.
I’ve been working professionally with Python for more than 10 years, and this year I became a Python fellow. I’m a pandas core developer, and been involved in the Python community almost since I started coding in Python.
I started as a regular of the Barcelona Python meetup when we were less than 10 members in the events. I contributed to Django before it reached its 1.0. I was one of the founders of PyData Mallorca. I was a NumFOCUS ambassador. I speak regularly to PyCon and PyData conferences. And I organize the London Python Sprints group, where we mentor people who wants to contribute to open source Python projects. Most people know me for leading the pandas documentation sprint, a worldwide event last March, in which around 500 people worked on improving the pandas documentation. Around 300 pull requests were sent, I still need to review and merge some of them. 🙂
Regarding hobbies, I love hiking, travelling, yoga, playing tennis, dancing forro, playing djembe drums, and watching Bollywood movies. I don’t have time to do all that regularly, but hopefully I will at some point.
Why did you start using Python?
I started using Python in 2006. Back then one of the main reasons to use Python was that it was “batteries included”. Not only because of that, but by using Python I was much more productive than using any other language. That was my main reason.
Another thing I really loved since the very beginning was the indentation. Not much before discovering Python I had to work with some legacy php code with no indentation at all. I saw forcing indentation as one of the main Python assets at that time.
After using Python for a while, I quite enjoyed that Python was attracting the top programmers. In 2006, Python was far from being a mainstream language. And not being taught at universities, and rarely used by companies, most Python users at that time were passionate programmers looking for excellence in their spare time. My feeling is that those origins explain why today the Python community is one of the most advanced; not only technically, but in terms of values, diversity, code of conducts…
What other programming languages do you know and which is your favorite?
Since I discovered Python, I rarely used any other language, except some JavaScript (for frontend stuff) or C. Before Python I used a bit of everything, php, java, visual basic… But I’m really happy I haven’t worked with them for more than 10 years now. 🙂
My favorite language together with Python is surely C. I think its perfection lies in its simplicity.
What projects are you working on now?
At the moment, outside of work I’m mainly working in pandas. My main focus is on the documentation. Not only that I think it’s really needed, but it also helps me discover parts of the library, besides the ones I use regularly.
Besides the documentation I’m also working on making pandas more efficient when constructing row-based data (like constructing from a generator). Also in making pandas plot with other libraries than matplotlib, like Bokeh. I also like to work on the deprecation and removal of pandas legacy features.
Which Python libraries are your favorite (core or 3rd party)?
Working as a data scientist, pandas and scikit-learn are the libraries I couldn’t live with. Now we give them for granted, but it’s not hard to imagine how much lower the productivity of any data science team would be without them. What if every time you want to use a different model you need to research new libraries, learn new APIs, deal with bugs and lack of documentation, or you just find implementations that are too slow for your data? I often say that the only people doing actual machine learning are scikit-learn developers (and the other required libraries in the ecosystem, numpy, pandas…). The rest of us we just do 10% of the work.
As third-party libraries, Bokeh and Datashader are libraries that I love. They have quite an innovative approach towards visualization. I think they take it to the next level.
And I don’t do much web development nowadays, but I really love Django (I was quite involved in its development many years ago). I really loved it as a software and as a community.
Is there anything else you’d like to say?
I think many times we have the perception that because the software we use is of really high quality, and new releases come often, there is no need to contribute to it. Personally I think that’s far from true.
My feeling is that even if Python has tens of millions of users, if only 20 or 30 key people stop making heroic efforts (in their free time) to improve it (and here I mean the different Python implementations and the most popular libraries), the impact on our future productivity would be huge.
I think many more people should be contributing to the different Python projects. And the many companies saving thousands of dollars in software licenses, and boosting their productivity because of open source software, should consider financing the projects. And of course let their employees contribute to them as part of their job. We take Python and free software for granted, but I don’t think we have the formula to make it sustainable yet.
Another thing I’d like to say, is that I think we all would appreciate having funnier lightning talks. I have the feeling that in the past people took lightning talks less seriously and a lot of crazy stuff was being presented in them (or serious and great projects being presented in a crazy way). I don’t think we want to leave a world to future generations where lightning talks are used to show formulas and code. 😉
Thanks for doing the interview!
Pingback: PyDev of the Week: Marc Garcia | The Mouse Vs. The Python | Literary Nirvana