Originally published at https://www.datacamp.com/community/tutorials/python-excel-tutorial
You will probably already know that Excel is a spreadsheet application developed by Microsoft. You can use this easily accessible tool to organize, analyze and store your data in tables. What’s more, this software is widely used in many different application fields all over the world.
And, whether you like it or not, this applies to data science.
You’ll need to deal with these spreadsheets at some point, but you won’t always want to continue working in it either. That’s why Python developers have implemented ways to read, write and manipulate not only these files, but also many other types of files.
Today’s tutorial will give you some insights into how you can work with Excel and Python. It will provide you with an overview of packages that you can use to load and write these spreadsheets to files with the help of Python. You’ll learn how to work with packages such as pandas , openpyxl , xlrd , xlutils and pyexcel .
It might also be interesting for you to take a look at DataCamp’s Importing Data in Python course . If you also want to know more about how to read files into R, consider taking DataCamp’s R Tutorial on Reading and Importing Excel Files into R .

Starting Point: TheData
When you’re starting a data science project, you will often work from data that you have gathered maybe from web scraping, but probably mostly from datasets that you download from other places, such as Kaggle , Quandl , etc.
But more often than not, you’ll also find data on Google or on repositories that are shared by other users. This data might be in an Excel file or saved to a file with .csv extension,… The possibilities can seem endless sometimes. But whenever you have data, your first step should be to make sure that you’re working with a qualitative data.
In the case of a spreadsheet, you should corroborate that it’s qualitative because you might not only want to check if this data can answer the research question that you have in mind but also if you can trust the data that the spreadsheet holds.
Quality of Your Excel SpreadsheetTo check the overall quality of your spreadsheet, you can go over the following checklist:
Does the spreadsheet represent static data? Does your spreadsheet mix data, calculation, and reporting? Is the data in your spreadsheet complete and consistent? Does your spreadsheet have a systematic worksheet structure? Did you check if the live formulas in the spreadsheet are valid?This list of questions is to make sure that your spreadsheet doesn’t ‘sin’ against the best practices that are generally accepted in the industry. Of course, the above list is not exhaustive: there are many more general rules that you can follow to make sure your spreadsheet is not an ugly duckling. However, the questions that have been formulated above are most relevant for when you want to make sure if the spreadsheet is qualitative.
Quality of YourDataPrevious to reading in your spreadsheet in Python, you also want to consider adjusting your file to meet some basic principles, such as:
The first row of the spreadsheet is usually reserved for the header, while the first column is used to identify the sampling unit; Avoid names, values or fields with blank spaces. Otherwise, each word will be interpreted as a separate variable, resulting in errors that are related to the number of elements per line in your data set. Consider using underscores, dashes, Camel case, or concatenating words. Short names are preferred over longer names; Try to avoid using names that contain symbols such as ? , $ , % , ^ , & , * , ( , ) , - , # , ? , , , < , > , / , | , \ , [ , ] , { , and } ; Delete any comments that you have made in your file to avoid extra columns or NA’s to be added to your file; and Make sure that any missing values in your data set are indicated with NA.Next, after you have made the necessary changes or when you have taken a thorough look at your data, make sure that you save your changes if you have made any. By doing this, you can revisit the data later to edit it, to add more data or to change them, while you preserve the formulas that you maybe used to calculate the data, etc.
If you’re working with Microsoft Excel, you’ll see that there are a considerable amount of options to save your file: besides the default extension .xls or .xlsx , you can go to the “File” tab, click on “Save As” and select one of the extensions that are listed as the “Save as Type” options. The most commonly used extensions to save datasets for data science are .csv and .txt (as tab-delimited text file). Depending on the saving option that you choose, your data set’s fields are separated by tabs or commas, which will make up the “field separator characters” of your data set.
Now that have checked and saves your data, you can start with the preparation of your workspace!

Prepping Your Workspace
Preparing your workspace is one of the first things that you can do to make sure that you start off well. The first step is to check your working directory.
When you’re working in the terminal, you might first navigate to the directory that your file is located in and then start up Python. That also means that you have to make sure that your file is located in the directory that you want to work from!
But perhaps more importantly, if you have already started your Python session and you’ve got no clue of the directory that you’re working in, you should consider executing the following commands:
# Import `os` import os # Retrieve current working directory (`cwd`) cwd = os.getcwd() # Change directory os.chdir("/path/to/your/folder") # List all files and directories in current directory os.listdir('.')Great, huh?
You’ll see that these commands are pretty vital not only for loading your data but also for further analysis. For now, let’s just continue: you have gone through all the checkups, you have saved your data and prepped your workspace.
Can you already start with reading the data in Python?
Unfortunately, you’ll still need to do one more last thing.
Even though you don’t have an idea yet of the packages that you’ll need to import your data, you do have to make sure that you have everything ready to install those packages when the time comes.
Pip That’s why you need to have pip and setuptools installed. If you have Py