This week we welcome Katharine Jarmul (@kjam) as our PyDev of the Week! Katherine is the co-author of Data Wrangling with Python . She is also the co-founder of KIProtect. You can catch up with the projects she works on over on Github. Let’s take some time to get to know her better!
Can you tell us a little about yourself (hobbies, education, etc):
Sure! I first started working on computers building fan websites for house music in the 90s with my dial-up shared Windows 95 computer. Since then, I have had a love / hate relationship with computers and what is now called data science. I have some formal education with math, statistics and computer science, but also learned most of what I do on my own and therefore am proud to count myself a member of the primarily self-taught folks. For fun, I like to cook and eat with friends, read news or arXiv papers and rant with like-minded folks on and offline.  (I am @kjam on Twitter…)
Why did you start using Python?
I first started using Python in 2007, when I was working at the Washington Post. A mentor (Ryan O’Neil) took a chance on me after seeing a small application I built using JavaScript. He set up a Linux computer and installed the Django application stack along with it — even gave me a commit key! I can’t tell you how many times I broke the server, but 6 months later I launched my first Django app. I was hooked and wanted to build and do more.
What other programming languages do you know and which is your favorite?
I have dabbled in numerous other languages: C++, Java, Go, even Perl, R, PHP and Ruby. I like Python the best, but that’s probably because I know it the best. I am working more regularly in Go now, which is really fun — but also hard for me to do so much typing. Python as my primary language has definitely spoiled me, and for data science and machine learning, there is a reason it has been so widely adopted.
What projects are you working on now?
I recently announced my new company, KIProtect (https://kiprotect.com). We are building solutions for data privacy and security for data science and machine learning. Essentially, we believe data privacy should be a right for everyone, not just those of us lucky enough to live in Europe. For this reason, we want to democratize data privacy — making it easier for data scientists and engineers everywhere to enable secure and private data sharing. Our first offering is a pseudonymization API which is free for limited usage (and paid for larger use). This allows you to send private data and get back properly pseudonymized data via one API call. We will be offering additional tools, solutions and APIs to help increase security and privacy in the coming year.
Which Python libraries are your favorite (core or 3rd party)?
NumPy is pretty much the best thing ever as someone working in machine learning and data science. It is such a useful library and the optimizations the core developers have made to allow for us to do fast, efficient math in Python (ahem, Cython) are fantastic. I am unsure if we would have things like Pandas, Scikit-Learn, even Keras and TensorFlow if it wasn’t for the steady grounding of NumPy to help foster a real data science community within Python.
How did you end up writing a book on Python?
I was approached by my co-author Jacqueline Kazil shortly after I moved to Europe. Ironically, the week before I turned to my partner and said, “you know, I am finally feeling less burnt out. I wonder what I should do next?†The book seemed like a great opportunity to get started with computers again.
What did you learn from that experience?
Writing a book is really hard. I know everyone says it, but it takes quite a lot out of you; and you are likely never fully satisfied with the outcome. That said, I have heard a lot of nice things from folks who used our book as a welcoming introduction to the world of Python and data — and if I even convert one new Pythonista, I can say I have achieved some impact.Â
Is there anything else you’d like to say?
Don’t take your website offline to comply with GDPR (the new EU Privacy regulation). It is alarming to me the blanket blocks of European IPs or other ridiculously clueless reactions and takes I have heard from (primarily) US Americans on the regulation.
First off, the regulation is pretty easy to read — so I recommend reading it. If that’s too hard for you, check out our article covering a lot of what you need to know as a data scientist (https://kiprotect.com/blog/
Secondly, think of it first as a user. Wouldn’t you want more say over your data? Don’t you want to know about data breaches? Is it okay for someone to resell your data without telling you? Treat your users how you want to be treated.
Finally, there are tools to help! At KIProtect, we are building several solutions to help make your life easier. There are also many other companies and projects working to help make our software safer for everyone. Don’t treat privacy and security as nice add-ons, treat them as part of your core product. Protect your data, it might be the most valuable thing you create.
Thanks for doing the interview, Katherine!
Pingback: Planet Python | Full Software Development
Pingback: Linkdump #86 | WZB Data Science Blog