This week we welcome Hillel Wayne (@Hillelogram) as our PyDev of the Week! Hillel is the author of “Learn TLA+” and is currently writing “Practical TLA+” with Apress. You should check out his website / blog as it is a good place to learn more about him. Hillel also recently spoke at PyCon US on testing. Let’s take a few moments to chat with Hillel!
Can you tell us a little about yourself (hobbies, education, etc):
I was planning on being a physicist for the longest time until my fourth year of college, when I suddenly switched to wanting to be a programmer. I spent a while as a fullstack in the Bay Area and a backend dev in Chicago, where I discovered a deep love of formal methods, or the practice of “mathematically” designing and building software. If you’ve read The Coming Software Apocalypse, that’s a really good introduction to the why and the what. I now do consulting and workshops, helping people with the how; these tools are way too powerful and useful to stay niche.
Beyond formal methods, my interests in tech are as follows:
- Software Safety: where do bugs actually come from, and how we do make them less likely?
- Empirical Software Engineering: what do we actually know is true in software engineering, and what do we just think is true?
- Software History: how did we get where we are, and what can we learn from the past?
- Weird and interesting niche ideas, languages, techniques, etc.
Outside of tech, I do a lot of juggling and cooking. I’m a super avid confectioner and chocolatier. There’s a place in Chicago that sells 10-pound bars of chocolate for 50 bucks and I usually go through about four bars a year.
Why did you start using Python?
Around 2012 or so I suddenly realized that I didn’t want to do physics grad school and had to find some way of making money in the real world. Since the part of physics research I actually enjoyed was programming the hardware, I decided to look for programming jobs. But lab tech was mostly in Matlab and C so I picked up Python to diversify my skills and was instantly hooked. Not having to worry about memory or string problems was just so nice! It quickly became my primary hacking language, especially for learning new programming techniques. For example, I learned TDD by implementing Theseus and the Minotaur in Python, and for a while I used a Django app to help manage my depression.
I’m doing a lot less programming in general these days, but I still reach for Python whenever I need to test a new idea, or communicate an idea with a snippet of code. Even people who don’t know python can usually get a hang of what’s going on in a python snippet.
What other programming languages do you know and which is your favorite?
My last couple of jobs were as a Rails dev, so I know the tool but no so much the Ruby language. Besides that, I tend to pick new programming languages for how much they mess with my head. I’m in a Prolog class right now which I’m enjoying, so shoutouts to Annie Ogborn for running it. I also have done a lot in J and, funnily enough, AutoHotKey, which has been by far the most useful in my day to day life (except Python).
My specialty, though, is specification languages, the notations we use to describe our overall systems, as opposed to implement them. I’ve done a lot in both Alloy and TLA+ and am in the middle of writing a book on the latter. TLA+ is what I’m both the best at and the happiest working in, although it makes me feel like an idiot whenever I break a spec.
What projects are you working on now?
I’m currently writing a book on TLA+ and working on a 3-day workshop for companies. After that, one idea that’s been bouncing around in my head is doing a short piece called “Python for Science Coders”. It would be something aimed at teaching my grad school friends who use Python how to use it better. Things like “this is what an IDE is, here’s how it makes coding easier”, “copy/paste means mistakes, try using functions instead”, “here’s how to check you got the right result and didn’t make a typo”, etc. I think there’s a lot of opportunity to make our science code better and, by extension, our science research safer.
Which Python libraries are your favorite (core or 3rd party)?
Third party’s gotta be hypothesis, which is a “property based testing” library for python. Instead of defining tests as “check this input gives us this output”, you say “outputs should have these properties, now generate 100 inputs and check them all.” It takes a bit more thinking of what the properties of your code are, but it can catch really hairy bugs. I did a recent talk on it, actually, about how it can be used to generate integration tests for you.
Core library is abc. In particular, I’m kind of obsessed with __subclasshook__. In most languages things declare their own types: if Foo inherits from Bar, then Foo is a Bar. But with ABCs you can flip that around and have abstract types declare what other classes count as that type. For example, you can have an ABC say “every class who’s name is a palindrome counts as a TimeFarbler.” I have no idea how this is useful in practice, but the idea just delights me.
I see you are the author of a book on TLA+. Can you tell us what TLA+ is?
There are (very roughly) two kinds of errors in software: implementation errors and design errors. Implementation errors are when you make a mistake translating your design to code, such as writing return l.sort() instead of return sorted(l). Design errors are when your code faithfully follows your design, but the design doesn’t actually do what you thought it did. One common example is missing an edge case: what if someone clicks the “submit order” button twice? What happens if an item in the cart goes out of stock? How much of a discount should I get if I have both a “$5 off” and a “10% off” coupon?
Design errors are often more dangerous and more subtle than implementation errors. They are also much harder to check, and almost all of our current tools only help us with implementation errors. To test our designs we use a specification language like TLA+. We describe our system and the properties it should have, and then we use the model checker to explore all possible evolutions of that system and confirm it does what we expect. For example, if I was creating a message processor, I might write something like
\A m \in Message: m \in Sent ~> m \in Received
To say that every message that is sent is eventually received (for our definitions of “message”, “sent”, and “received”). I could then check that that property actually holds with respect to my existing system.
TLA+ has a strong track record in industry. Amazon used it to fix S3 and Microsoft used it to find bugs in the Xbox. At a less “international megacorp” level, I’ve used it to fix business logic and Murat Demirbas uses it to teach students distributed systems.
What did you learn from writing a book and what would you do differently if you could start over?
Oddly enough, the hardest part for me wasn’t the actual content, it was consistency. If I rename a chapter, did I remember to update all of the references to it? Am I certain I introduced concept A before concept B? Stuff like that. I started writing in Word and switched to markdown with pandoc halfway through, and neither really helped with this problem. I think if I started again I’d spend some time researching what format best enforced this kind of consistency. Probably RST or LaTeX.
(Fun fact: Leslie Lamport created both TLA+ and LaTeX!)
About halfway through I started building an automated toolchain for things like error checking, converting documents, and uploading. It’s a huge time saver. For the next book I’d spend a lot more time building a stronger toolchain from the start.
Last discovery: pair with the technical editor. The book is pretty dense, so it’s very slow to proofread. We started moving a lot faster (and I got much deeper feedback) when we started scheduling time to sit down and go over it together.
Thanks for doing the interview!