Parsing fixed-width files in Python

I have some census data in a fixed width format I’d like to parse. Stuff like “columns 23-26 are the year in 4 digit form”. It’s easy enough to ad hoc parse this with string slicing. But by the time you handle naming, and type conversion, and stripping padding, and validation, etc etc you end up with a fair amount of code. You can parse CSV with just string split too, but anyone sane uses the CSV module. Is there a good fixed width module?

Not that I could find. I gave up and just did the ad hoc thing.

I thought FixedWidth was a candidate but after 20 minutes trying it, gave up on it. There’s packaging problems and the docs are poor. The tests are incomplete. The API is weird and seems more designed for emitting fixed width than parsing it. The final reason I gave up is it seems to require you specify a full schema; you can’t parse columns 99-103 unless you’ve said what to do with columns 1-98 forst. That was a nuisance.

The other option I found was Pandas read_fwf . I didn’t try it because Pandas is overkill for my project. But I know from CSV work that DataFrame is really nice, and the Pandas CSV module is quite comprehensive. I also know that even after parsing with read_csv you still have to do a lot of work to get it into a clean DataFrame. I’d definitely look into using this for more serious work.

Related question: are there standard metadata descriptions for fixed width formats? The census data has this thing called data dictionaries that are clearly meant to be parseable. But they’re in at least two formats right on the site. I feel like I’ve seen other government records with similar metadata descriptions.

Further reading: Extract, transform, and load census data with python .

Parsing fixed-width files in Python

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本