Concurrency is a big topic in programming. The concept of concurrency is to run multiple pieces of code at once. Python has a couple of different solutions that are built-in to its standard library. You can use threads or processes. In this chapter, you will learn about using threads.
When you run your own code, you are using a single thread. If you want to run something else in the background, you can use Python’s threading
module.
In this article you will learn the following:
- Pros of Using Threads
- Cons of Using Threads
- Creating Threads
- Subclassing
Thread
- Writing Multiple Files with Threads
Note: This chapter is not meant to be comprehensive in its coverage of threads. But you will learn enough to get started using threads in your application.
Let’s get started by going over the pros and cons of using threads!
Pros of Using Threads
Threads are useful in the following ways:
- They have a small memory footprint, which means they are lightweight to use
- Memory is shared between threads – which makes it easy to share state across threads
- Allows you to easily make responsive user interfaces
- Great option for I/O bound applications (such as reading and writing files, databases, etc)
Now let’s look at the cons!
Cons of Using Threads
Threads are not useful in the following ways:
- Poor option for CPU bound code due to the Global Interpreter Lock (GIL) – see below
- They are not interruptible / able to be killed
- Code with threads is harder to understand and write correctly
- Easy to create race conditions
The Global Interpreter Lock is a mutex that protects Python objects. This means that it prevents multiple threads from executing Python bytecode at the same time. So when you use threads, they do not run on all the CPUs on your machine.
Threads are great for running I/O heavy applications, image processing, and NumPy’s number-crunching because they don’t do anything with the GIL. If you have a need to run concurrent processes across multiple CPUs, use the multiprocessing
module. You will learn about the multiprocessing
module in the next chapter.
A race condition happens when you have a computer program that depends on a certain order of events to happen for it to execute correctly. If your threads execute something out of order, then the next thread may not work and your application can crash or behave in unexpected ways.
Creating Threads
Threads are confusing if all you do is talk about them. It’s always good to familiarize yourself with how to write actual code. For this chapter, you will be using the threading
module which uses the _thread
module underneath.
The full documentation for the threading
module can be found here:
Let’s write a simple example that shows how to create multiple threads. Put the following code into a file named worker_threads.py
:
# worker_threads.py import random import threading import time def worker(name: str) -> None: print(f'Started worker {name}') worker_time = random.choice(range(1, 5)) time.sleep(worker_time) print(f'{name} worker finished in {worker_time} seconds') if __name__ == '__main__': for i in range(5): thread = threading.Thread( target=worker, args=(f'computer_{i}',), ) thread.start()
The first three imports give you access to the random
, threading
and time
modules. You can use random
to generate pseudo-random numbers or choose from a sequence at random. The threading
module is what you use to create threads and the time
module can be used for many things related to time.
In this code, you use time
to wait a random amount of time to simulate your “worker” code working.
Next you create a worker()
function that takes in the name
of the worker. When this function is called, it will print out which worker has started working. It will then choose a random number between 1 and 5. You use this number to simulate the amount of time the worker works using time.sleep()
. Finally you print out a message that tells you a worker has finished and how long the work took in seconds.
The last block of code creates 5 worker threads. To create a thread, you pass in your worker()
function as the target
function for the thread to call. The other argument you pass to thread
is a tuple of arguments that thread
will pass to the target function. Then you call thread.start()
to start running that thread.
When the function stops executing, Python will delete your thread.
Try running the code and you’ll see that the output will look similar to the following:
Started worker computer_0 Started worker computer_1 Started worker computer_2 Started worker computer_3 Started worker computer_4 computer_0 worker finished in 1 seconds computer_3 worker finished in 1 seconds computer_4 worker finished in 3 seconds computer_2 worker finished in 3 seconds computer_1 worker finished in 4 seconds
Your output will differ from the above because the workers sleep()
for random amounts of time. In fact, if you run the code multiple times, each invocation of the script will probably have a different result.
threading.Thread
is a class. Here is its full definition:
threading.Thread( group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None, )
You could have named the threads when you created the thread rather than inside of the worker()
function. The args
and kwargs
are for the target function. You can also tell Python to make the thread into a daemon
. “Daemon threads” have no claim on the Python interpreter, which has two main consequences: 1) if only daemon threads are left, Python will shut down, and 2) when Python shuts down, daemon threads are abruptly stopped with no notification. The group
parameter should be left alone as it was added for future extension when a ThreadGroup
is added to the Python language.
Subclassing Thread
The Thread
class from the threading
module can also be subclassed. This allows you more fine-grained control over your thread’s creation, execution and eventual deletion. You will encounter subclassed threads often.
Let’s rewrite the previous example using a subclass of Thread
. Put the following code into a file named worker_thread_subclass.py
.
# worker_thread_subclass.py import random import threading import time class WorkerThread(threading.Thread): def __init__(self, name): threading.Thread.__init__(self) self.name = name self.id = id(self) def run(self): """ Run the thread """ worker(self.name, self.id) def worker(name: str, instance_id: int) -> None: print(f'Started worker {name} - {instance_id}') worker_time = random.choice(range(1, 5)) time.sleep(worker_time) print(f'{name} - {instance_id} worker finished in ' f'{worker_time} seconds') if __name__ == '__main__': for i in range(5): thread = WorkerThread(name=f'computer_{i}') thread.start()
In this example, you create the WorkerThread
class. The constructor of the class, __init__()
, accepts a single argument, the name
to be given to thread. This is stored off in an instance attribute, self.name
. Then you override the run()
method.
The run()
method is already defined in the Thread
class. It controls how the thread will run. It will call or invoke the function that you passed into the class when you created it. When you create your own run()
method in your subclass, it is known as overriding the original. This allows you to add custom behavior such as logging to your thread that isn’t there if you were to use the base class’s run()
method.
You call the worker()
function in the run()
method of your WorkerThread
. The worker()
function itself has a minor change in that it now accepts the instance_id
argument which represents the class instance’s unique id. You also need to update the print()
functions so that they print out the instance_id
.
The other change you need to do is in the __main__
conditional statement where you call WorkerThread
and pass in the name rather than calling threading.Thread()
directly as you did in the previous section.
When you call start()
in the last line of the code snippet, it will call run()
for you itself. The start()
method is a method that is a part of the threading.Thread
class and you did not override it in your code.
The output when you run this code should be similar to the original version of the code, except that now you are also including the instance id in the output. Give it a try and see for yourself!
Writing Multiple Files with Threads
There are several common use cases for using threads. One of those use cases is writing multiple files at once. It’s always nice to see how you would approach a real-world problem, so that’s what you will be doing here.
To get started, you can create a file named writing_thread.py
. Then add the following code to your file:
# writing_thread.py import random import time from threading import Thread class WritingThread(Thread): def __init__(self, filename: str, number_of_lines: int, work_time: int = 1) -> None: Thread.__init__(self) self.filename = filename self.number_of_lines = number_of_lines self.work_time = work_time def run(self) -> None: """ Run the thread """ print(f'Writing {self.number_of_lines} lines of text to ' f'{self.filename}') with open(self.filename, 'w') as f: for line in range(self.number_of_lines): text = f'This is line {line+1}\n' f.write(text) time.sleep(self.work_time) print(f'Finished writing {self.filename}') if __name__ == '__main__': files = [f'test{x}.txt' for x in range(1, 6)] for filename in files: work_time = random.choice(range(1, 3)) number_of_lines = random.choice(range(5, 20)) thread = WritingThread(filename, number_of_lines, work_time) thread.start()
Let’s break this down a little and go over each part of the code individually:
import random import time from threading import Thread class WritingThread(Thread): def __init__(self, filename: str, number_of_lines: int, work_time: int = 1) -> None: Thread.__init__(self) self.filename = filename self.number_of_lines = number_of_lines self.work_time = work_time
Here you created the WritingThread
class. It accepts a filename
, a number_of_lines
and a work_time
. This allows you to create a text file with a specific number of lines. The work_time
is for sleeping between writing each line to simulate writing a large or small file.
Let’s look at what goes in run()
:
def run(self) -> None: """ Run the thread """ print(f'Writing {self.number_of_lines} lines of text to ' f'{self.filename}') with open(self.filename, 'w') as f: for line in range(self.number_of_lines): text = f'This is line {line+1}\n' f.write(text) time.sleep(self.work_time) print(f'Finished writing {self.filename}')
This code is where all the magic happens. You print out how many lines of text you will be writing to a file. Then you do the deed and create the file and add the text. During the process, you sleep()
to add some artificial time to writing the files to disk.
The last piece of code to look at is as follows:
if __name__ == '__main__': files = [f'test{x}.txt' for x in range(1, 6)] for filename in files: work_time = random.choice(range(1, 3)) number_of_lines = random.choice(range(5, 20)) thread = WritingThread(filename, number_of_lines, work_time) thread.start()
In this final code snippet, you use a list comprehension to create 5 file names. Then you loop over the files and create them. You use Python’s random
module to choose a random work_time
amount and a random number_of_lines
to write to the file. Finally you create the WritingThread
and start()
it.
When you run this code, you will see something like this get output:
Writing 5 lines of text to test1.txt Writing 18 lines of text to test2.txt Writing 7 lines of text to test3.txt Writing 11 lines of text to test4.txt Writing 11 lines of text to test5.txt Finished writing test1.txt Finished writing test3.txt Finished writing test4.txtFinished writing test5.txt Finished writing test2.txt
You may notice some odd output like the line a couple of lines from the bottom. This happened because multiple threads happened to write to stdout at once.
You can use this code along with Python’s urllib.request
to create an application for downloading files from the Internet. Try that project out on your own.
Wrapping Up
You have learned the basics of threading in Python. In this chapter, you learned about the following:
- Pros of Using Threads
- Cons of Using Threads
- Creating Threads
- Subclassing
Thread
- Writing Multiple Files with Threads
There is a lot more to threads and concurrency than what is covered here. You didn’t learn about thread communication, thread pools, or locks for example. However you do know the basics of creating threads and you will be able to use them successfully. In the next chapter, you will continue to learn about concurrency in Python through discovering how multiprocessing
works in Python!
Related Articles
-
Python 101 – Creating Multiple Processes
-
Python 201: A Tutorial on Threads
This article is based on a chapter from Python 101: 2nd Edition. You can purchase Python 101 on Amazon or Leanpub.