Memory modules, Photo by Christiaan Colen

Loading Ridiculously Large Excel Files in Python

Yaakov Bressler
5 min readJun 11, 2020

Sometimes you need to work with .xls or .xlx files. In the event that they’re larger than 1M rows, excel will either load the first 1M rows or crash. This article will demonstrate how to load such files in python.

Background Info on Excel File Structure

Before we dive in, here’s some background information on how spreadsheets work on the backend.

I. Data Structure

Spreadsheets store data as row-wise arrays consisting of column keys with cell values.¹

The functionality of spreadsheets deride from their behavior as a directed acyclic graph (DAG) which allow cyclical (arbitrary) relational functionality.² This is best demonstrated with the illustration below which shows how some cell values are derived from their relationships to others.

II. Size Limits

Current size limits for excel are 1,048,576 rows by 16,384 columns — owing to memory resources. To be clear, data can be stored in an excel file which breaks these rules — but will not function in the excel program.

--

--

Yaakov Bressler
Yaakov Bressler

Written by Yaakov Bressler

Data Engineer @ Capital One. Editor in Chief @ Data Engineer Things. More about me at www.yaakovbressler.com

Responses (6)