I’m a little late in reporting on this topic, but Ruff put out an update in April 2024 that includes a hand-written recursive descent parser. This update is in version 0.4.0 and newer.
Ruff’s new parser is >2x faster, translating to a 20-40% speedup for all linting and formatting invocations. Ruff’s announcement includes some statistics to show improvements that are worth checking out.
What’s This New Parser?
I’ve never tried writing a code parser, so I’ll have to rely on Ruff’s announcement to explain this. Basically, when you are doing static analysis, you will turn the source code into Abstract Syntax Trees (ASTs), which you can then analyze. Python has an AST module built in for this purpose. Ruff is written in Rust, though, so their AST analyzer is also written in Rust.
The original parser was called a generated parser, specifically LALRPOP. The parser requires a grammar to be defined in a Domain Specific Language (DSL), which is then converted into executable code for the generator.
Ruff’s new hand-written parser is a recursive descent parser. Follow that link to Wikipedia to learn all the nitty gritty details.
Their team created a hand-written parser to give them more control and flexibility over the parsing process, making it easier to work on the many weird edge cases they need to support. They also created a new parser to make Ruff faster and provide better error messages and error resilience.
Wrapping Up
Ruff is great and makes linting and formatting your Python code so much faster. You can learn much more about Ruff in my other articles on this topic: