Python Excel Tutorial: The Definitive Guide

Originally published at https://www.datacamp.com/community/tutorials/python-excel-tutorial

You will probably already know that Excel is a spreadsheet application developed by Microsoft. You can use this easily accessible tool to organize, analyze and store your data in tables. What’s more, this software is widely used in many different application fields all over the world.

And, whether you like it or not, this applies to data science.

You’ll need to deal with these spreadsheets at some point, but you won’t always want to continue working in it either. That’s why Python developers have implemented ways to read, write and manipulate not only these files, but also many other types of files.

Today’s tutorial will give you some insights into how you can work with Excel and Python. It will provide you with an overview of packages that you can use to load and write these spreadsheets to files with the help of Python. You’ll learn how to work with packages such as pandas , openpyxl , xlrd , xlutils and pyexcel .

It might also be interesting for you to take a look at DataCamp’s Importing Data in Python course . If you also want to know more about how to read files into R, consider taking DataCamp’s R Tutorial on Reading and Importing Excel Files into R .

Python Excel Tutorial: The Definitive Guide

Starting Point: TheData

When you’re starting a data science project, you will often work from data that you have gathered maybe from web scraping, but probably mostly from datasets that you download from other places, such as Kaggle , Quandl , etc.

But more often than not, you’ll also find data on Google or on repositories that are shared by other users. This data might be in an Excel file or saved to a file with .csv extension,… The possibilities can seem endless sometimes. But whenever you have data, your first step should be to make sure that you’re working with a qualitative data.

In the case of a spreadsheet, you should corroborate that it’s qualitative because you might not only want to check if this data can answer the research question that you have in mind but also if you can trust the data that the spreadsheet holds.

Quality of Your Excel Spreadsheet

To check the overall quality of your spreadsheet, you can go over the following checklist:

Does the spreadsheet represent static data? Does your spreadsheet mix data, calculation, and reporting? Is the data in your spreadsheet complete and consistent? Does your spreadsheet have a systematic worksheet structure? Did you check if the live formulas in the spreadsheet are valid?

This list of questions is to make sure that your spreadsheet doesn’t ‘sin’ against the best practices that are generally accepted in the industry. Of course, the above list is not exhaustive: there are many more general rules that you can follow to make sure your spreadsheet is not an ugly duckling. However, the questions that have been formulated above are most relevant for when you want to make sure if the spreadsheet is qualitative.

Quality of YourData

Previous to reading in your spreadsheet in Python, you also want to consider adjusting your file to meet some basic principles, such as:

The first row of the spreadsheet is usually reserved for the header, while the first column is used to identify the sampling unit; Avoid names, values or fields with blank spaces. Otherwise, each word will be interpreted as a separate variable, resulting in errors that are related to the number of elements per line in your data set. Consider using underscores, dashes, Camel case, or concatenating words. Short names are preferred over longer names; Try to avoid using names that contain symbols such as ? , $ , % , ^ , & , * , ( , ) , - , # , ? , , , < , > , / , | , \ , [ , ] , { , and } ; Delete any comments that you have made in your file to avoid extra columns or NA’s to be added to your file; and Make sure that any missing values in your data set are indicated with NA.

Next, after you have made the necessary changes or when you have taken a thorough look at your data, make sure that you save your changes if you have made any. By doing this, you can revisit the data later to edit it, to add more data or to change them, while you preserve the formulas that you maybe used to calculate the data, etc.

If you’re working with Microsoft Excel, you’ll see that there are a considerable amount of options to save your file: besides the default extension .xls or .xlsx , you can go to the “File” tab, click on “Save As” and select one of the extensions that are listed as the “Save as Type” options. The most commonly used extensions to save datasets for data science are .csv and .txt (as tab-delimited text file). Depending on the saving option that you choose, your data set’s fields are separated by tabs or commas, which will make up the “field separator characters” of your data set.

Now that have checked and saves your data, you can start with the preparation of your workspace!

Prepping Your Workspace

Preparing your workspace is one of the first things that you can do to make sure that you start off well. The first step is to check your working directory.

When you’re working in the terminal, you might first navigate to the directory that your file is located in and then start up Python. That also means that you have to make sure that your file is located in the directory that you want to work from!

But perhaps more importantly, if you have already started your Python session and you’ve got no clue of the directory that you’re working in, you should consider executing the following commands:

# Import `os` import os # Retrieve current working directory (`cwd`) cwd = os.getcwd() # Change directory os.chdir("/path/to/your/folder") # List all files and directories in current directory os.listdir('.')

Great, huh?

You’ll see that these commands are pretty vital not only for loading your data but also for further analysis. For now, let’s just continue: you have gone through all the checkups, you have saved your data and prepped your workspace.

Can you already start with reading the data in Python?

Unfortunately, you’ll still need to do one more last thing.

Even though you don’t have an idea yet of the packages that you’ll need to import your data, you do have to make sure that you have everything ready to install those packages when the time comes.

Pip That’s why you need to have pip and setuptools installed. If you have Py

Python Excel Tutorial: The Definitive Guide

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本