The business world uses Microsoft Office. Their spreadsheet software solution, Microsoft Excel, is especially popular. Excel is used to store tabular data, create reports, graph trends, and much more. Before diving into working with Excel with Python, let’s clarify some special terminology:
- Spreadsheet or Workbook – The file itself (.xls or .xlsx).
- Worksheet or Sheet – A single sheet of content within a Workbook. Spreadsheets can contain multiple Worksheets.
- Column – A vertical line of data that is labeled with letters, starting with “A”.
- Row – A horizontal line of data labeled with numbers, starting with 1.
- Cell – A combination of Column and Row, like “A1”.
In this article, you will be using Python to work with Excel Spreadsheets. You will learn about the following:
- Python Excel Packages
- Getting Sheets from a Workbook
- Reading Cell Data
- Iterating Over Rows and Columns
- Writing Excel Spreadsheets
- Adding and Removing Sheets
- Adding and Deleting Rows and Columns
Excel is used by most companies and universities. It can be used in many different ways and enhanced using Visual Basic for Applications (VBA). However, VBA is kind of clunky — which is why it’s good to learn how to use Excel with Python.
Let’s find out how to work with Microsoft Excel spreadsheets using the Python programming language now!
Python Excel Packages
You can use Python to create, read and write Excel spreadsheets. However, Python’s standard library does not have support for working with Excel; to do so, you will need to install a 3rd party package. The most popular one is OpenPyXL. You can read its documentation here:
OpenPyXL is not your only choice. There are several other packages that support Microsoft Excel:
- xlrd – For reading older Excel (.xls) documents
- xlwt – For writing older Excel (.xls) documents
- xlwings – Works with new Excel formats and has macro capabilities
A couple years ago, the first two used to be the most popular libraries to use with Excel documents. However, the author of those packages has stopped supporting them. The xlwings package has lots of promise, but does not work on all platforms and requires that Microsoft Excel is installed.
You will be using OpenPyXL in this article because it is actively developed and supported. OpenPyXL doesn’t require Microsoft Excel to be installed, and it works on all platforms.
You can install OpenPyXL using pip
:
$ python -m pip install openpyxl
After the installation has completed, let’s find out how to use OpenPyXL to read an Excel spreadsheet!
Getting Sheets from a Workbook
The first step is to find an Excel file to use with OpenPyXL. There is a books.xlsx
file that is provided for you in this book’s Github repository. You can download it by going to this URL:
Feel free to use your own file, although the output from your own file won’t match the sample output in this book.
The next step is to write some code to open the spreadsheet. To do that, create a new file named open_workbook.py
and add this code to it:
# open_workbook.py from openpyxl import load_workbook def open_workbook(path): workbook = load_workbook(filename=path) print(f'Worksheet names: {workbook.sheetnames}') sheet = workbook.active print(sheet) print(f'The title of the Worksheet is: {sheet.title}') if __name__ == '__main__': open_workbook('books.xlsx')
In this example, you import load_workbook()
from openpyxl
and then create open_workbook()
which takes in the path to your Excel spreadsheet. Next, you use load_workbook()
to create an openpyxl.workbook.workbook.Workbook
object. This object allows you to access the sheets and cells in your spreadsheet. And yes, it really does have the double workbook
in its name. That’s not a typo!
The rest of the open_workbook()
function demonstrates how to print out all the currently defined sheets in your spreadsheet, get the currently active sheet and print out the title of that sheet.
When you run this code, you will see the following output:
Worksheet names: ['Sheet 1 - Books'] <Worksheet "Sheet 1 - Books"> The title of the Worksheet is: Sheet 1 - Books
Now that you know how to access the sheets in the spreadsheet, you are ready to move on to accessing cell data!
Reading Cell Data
When you are working with Microsoft Excel, the data is stored in cells. You need a way to access those cells from Python to be able to extract that data. OpenPyXL makes this process straight-forward.
Create a new file named workbook_cells.py
and add this code to it:
# workbook_cells.py from openpyxl import load_workbook def get_cell_info(path): workbook = load_workbook(filename=path) sheet = workbook.active print(sheet) print(f'The title of the Worksheet is: {sheet.title}') print(f'The value of {sheet["A2"].value=}') print(f'The value of {sheet["A3"].value=}') cell = sheet['B3'] print(f'{cell.value=}') if __name__ == '__main__': get_cell_info('books.xlsx')
This code will load up the Excel file in an OpenPyXL workbook. You will grab the active sheet and then print out its title
and a couple of different cell values. You can access a cell by using the sheet object followed by square brackets with the column name and row number inside of it. For example, sheet["A2"]
will get you the cell at column “A”, row 2. To get the value of that cell, you use the value
attribute.
Note: This code is using a new feature that was added to f-strings in Python 3.8. If you run this with an earlier version, you will receive an error.
When you run this code, you will get this output:
<Worksheet "Sheet 1 - Books"> The title of the Worksheet is: Sheet 1 - Books The value of sheet["A2"].value='Title' The value of sheet["A3"].value='Python 101' cell.value='Mike Driscoll'
You can get additional information about a cell using some of its other attributes. Add the following function to your file and update the conditional statement at the end to run it:
def get_info_by_coord(path): workbook = load_workbook(filename=path) sheet = workbook.active cell = sheet['A2'] print(f'Row {cell.row}, Col {cell.column} = {cell.value}') print(f'{cell.value=} is at {cell.coordinate=}') if __name__ == '__main__': get_info_by_coord('books.xlsx')
In this example, you use the row
and column
attributes of the cell
object to get the row and column information. Note that column “A” maps to “1”, “B” to “2”, etcetera. If you were to iterate over the Excel document, you could use the coordinate
attribute to get the cell name.
When you run this code, the output will look like this:
Row 2, Col 1 = Title cell.value='Title' is at cell.coordinate='A2'
Speaking of iterating, let’s find out how to do that next!
Iterating Over Rows and Columns
Sometimes you will need to iterate over the entire Excel spreadsheet or portions of the spreadsheet. OpenPyXL allows you to do that in a few different ways. Create a new file named iterating_over_cells.py
and add the following code to it:
# iterating_over_cells.py from openpyxl import load_workbook def iterating_range(path): workbook = load_workbook(filename=path) sheet = workbook.active for cell in sheet['A']: print(cell) if __name__ == '__main__': iterating_range('books.xlsx')
Here you load up the spreadsheet and then loop over all the cells in column “A”. For each cell, you print out the cell
object. You could use some of the cell attributes you learned about in the previous section if you wanted to format the output more granularly.
This what you get from running this code:
<Cell 'Sheet 1 - Books'.A1> <Cell 'Sheet 1 - Books'.A2> <Cell 'Sheet 1 - Books'.A3> <Cell 'Sheet 1 - Books'.A4> <Cell 'Sheet 1 - Books'.A5> <Cell 'Sheet 1 - Books'.A6> <Cell 'Sheet 1 - Books'.A7> <Cell 'Sheet 1 - Books'.A8> <Cell 'Sheet 1 - Books'.A9> <Cell 'Sheet 1 - Books'.A10> # output truncated for brevity
The output is truncated as it will print out quite a few cells by default. OpenPyXL provides other ways to iterate over rows and columns by using the iter_rows()
and iter_cols()
functions. These methods accept several arguments:
min_row
max_row
min_col
max_col
You can also add on a values_only
argument that tells OpenPyXL to return the value of the cell instead of the cell object. Go ahead and create a new file named iterating_over_cell_values.py
and add this code to it:
# iterating_over_cell_values.py from openpyxl import load_workbook def iterating_over_values(path): workbook = load_workbook(filename=path) sheet = workbook.active for value in sheet.iter_rows( min_row=1, max_row=3, min_col=1, max_col=3, values_only=True, ): print(value) if __name__ == '__main__': iterating_over_values('books.xlsx')
This code demonstrates how you can use the iter_rows()
to iterate over the rows in the Excel spreadsheet and print out the values of those rows. When you run this code, you will get the following output:
('Books', None, None) ('Title', 'Author', 'Publisher') ('Python 101', 'Mike Driscoll', 'Mouse vs Python')
The output is a Python tuple that contains the data within each column. At this point you have learned how to open spreadsheets and read data — both from specific cells, as well as through iteration. You are now ready to learn how to use OpenPyXL to create Excel spreadsheets!
Writing Excel Spreadsheets
Creating an Excel spreadsheet using OpenPyXL doesn’t take a lot of code. You can create a spreadsheet by using the Workbook()
class. Go ahead and create a new file named writing_hello.py
and add this code to it:
# writing_hello.py from openpyxl import Workbook def create_workbook(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['A2'] = 'from' sheet['A3'] = 'OpenPyXL' workbook.save(path) if __name__ == '__main__': create_workbook('hello.xlsx')
Here you instantiate Workbook()
and get the active sheet. Then you set the first three rows in column “A” to different strings. Finally, you call save()
and pass it the path
to save the new document to. Congratulations! You have just created an Excel spreadsheet with Python.
Let’s discover how to add and remove sheets in your Workbook next!
Adding and Removing Sheets
Many people like to organize their data across multiple Worksheets within the Workbook. OpenPyXL supports the ability to add new sheets to a Workbook()
object via its create_sheet()
method.
Create a new file named creating_sheets.py
and add this code to it:
# creating_sheets.py import openpyxl def create_worksheets(path): workbook = openpyxl.Workbook() print(workbook.sheetnames) # Add a new worksheet workbook.create_sheet() print(workbook.sheetnames) # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': create_worksheets('sheets.xlsx')
Here you use create_sheet()
twice to add two new Worksheets to the Workbook. The second example shows you how to set the title of a sheet and at which index to insert the sheet. The argument index=1
means that the worksheet will be added after the first existing worksheet, since they are indexed starting at 0
.
When you run this code, you will see the following output:
['Sheet'] ['Sheet', 'Sheet1'] ['Sheet', 'Second sheet', 'Sheet1']
You can see that the new sheets have been added step-by-step to your Workbook. After saving the file, you can verify that there are multiple Worksheets by opening Excel or another Excel-compatible application.
After this automated worksheet-creation process, you’ve suddenly got too many sheets, so let’s get rid of some. There are two ways to remove a sheet. Go ahead and create delete_sheets.py
to see how to use Python’s del
keyword for removing worksheets:
# delete_sheets.py import openpyxl def create_worksheets(path): workbook = openpyxl.Workbook() workbook.create_sheet() # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) del workbook['Second sheet'] print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': create_worksheets('del_sheets.xlsx')
This code will create a new Workbook and then add two new Worksheets to it. Then it uses Python’s del
keyword to delete workbook['Second sheet']
. You can verify that it worked as expected by looking at the print-out of the sheet list before and after the del
command:
['Sheet', 'Second sheet', 'Sheet1'] ['Sheet', 'Sheet1']
The other way to delete a sheet from a Workbook is to use the remove()
method. Create a new file called remove_sheets.py
and enter this code to learn how that works:
# remove_sheets.py import openpyxl def remove_worksheets(path): workbook = openpyxl.Workbook() sheet1 = workbook.create_sheet() # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) workbook.remove(sheet1) print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': remove_worksheets('remove_sheets.xlsx')
This time around, you hold onto a reference to the first Worksheet that you create by assigning the result to sheet1
. Then you remove it later on in the code. Alternatively, you could also remove that sheet by using the same syntax as before, like this:
workbook.remove(workbook['Sheet1'])
No matter which method you choose for removing the Worksheet, the output will be the same:
['Sheet', 'Second sheet', 'Sheet1'] ['Sheet', 'Second sheet']
Now let’s move on and learn how you can add and remove rows and columns.
Adding and Deleting Rows and Columns
OpenPyXL has several useful methods that you can use for adding and removing rows and columns in your spreadsheet. Here is a list of the four methods you will learn about in this section:
.insert_rows()
.delete_rows()
.insert_cols()
.delete_cols()
Each of these methods can take two arguments:
idx
– The index to insert the row or columnamount
– The number of rows or columns to add
To see how this works, create a file named insert_demo.py
and add the following code to it:
# insert_demo.py from openpyxl import Workbook def inserting_cols_rows(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['A2'] = 'from' sheet['A3'] = 'OpenPyXL' # insert a column before A sheet.insert_cols(idx=1) # insert 2 rows starting on the second row sheet.insert_rows(idx=2, amount=2) workbook.save(path) if __name__ == '__main__': inserting_cols_rows('inserting.xlsx')
Here you create a Worksheet and insert a new column before column “A”. Columns are indexed started at 1 while in contrast, worksheets start at 0. This effectively moves all the cells in column A to column B. Then you insert two new rows starting on row 2.
Now that you know how to insert columns and rows, it is time for you to discover how to remove them.
To find out how to remove columns or rows, create a new file named delete_demo.py
and add this code:
# delete_demo.py from openpyxl import Workbook def deleting_cols_rows(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['B1'] = 'from' sheet['C1'] = 'OpenPyXL' sheet['A2'] = 'row 2' sheet['A3'] = 'row 3' sheet['A4'] = 'row 4' # Delete column A sheet.delete_cols(idx=1) # delete 2 rows starting on the second row sheet.delete_rows(idx=2, amount=2) workbook.save(path) if __name__ == '__main__': deleting_cols_rows('deleting.xlsx')
This code creates text in several cells and then removes column A using delete_cols()
. It also removes two rows starting on the 2nd row via delete_rows()
. Being able to add and remove columns and rows can be quite useful when it comes to organizing your data.
Wrapping Up
Due to the widespread use of Excel in many industries, it is an extremely useful skill to be able to interact with Excel files using Python. In this article, you learned about the following:
- Python Excel Packages
- Getting Sheets from a Workbook
- Reading Cell Data
- Iterating Over Rows and Columns
- Writing Excel Spreadsheets
- Adding and Removing Sheets
- Adding and Deleting Rows and Columns
OpenPyXL can do even more than what was covered here. For example, you can add formulas to cells, change fonts and apply other types of styling to cells using OpenPyXL. Read the documentation and try using OpenPyXL on some of your own spreadsheets so that you can discover its full power.
Pingback: Sentiment Analysis, Fourier Transforms, and More Python Data Science – The Real Python Podcast - Business 2020