How to Read Data from Excel in C

Cody Schneider8 min read

Need to pull data from an Excel spreadsheet directly into your C program? You're in the right place. While C might not be the go-to language for quick and dirty data scripting, it’s unbeatable in scenarios that demand high performance, low-level memory control, or integration with existing legacy systems. This guide will walk you through a practical method for reading modern .xlsx files, an essential skill for handling business data or scientific datasets in C.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Why Read Excel Files in C?

In a world with Python and R, using C to read an Excel file might seem like choosing a manual screwdriver over a power drill. However, there are compelling reasons why this skill is valuable:

  • Performance-Critical Applications: For data-intensive applications in fields like finance or scientific computing, the speed of C is non-negotiable. Processing large datasets directly without the overhead of an interpreted language can lead to significant performance gains.
  • Legacy System Integration: You might be working on a well-established C or C++ codebase that needs to interact with data exported from modern business tools, which often comes in Excel formats.
  • Embedded Systems: Many embedded systems run on C. If a device needs to load configuration settings or process input from a .xlsx file stored on a memory card, you'll need a C-based solution.
  • No External Dependencies: Sometimes, you need to build a single, self-contained executable with minimal external runtime dependencies. Building the logic in C helps achieve this goal.

The Challenge: What Exactly is an .xlsx File?

Before writing a single line of code, it’s important to understand that a modern Excel file (.xlsx) isn't a simple grid of values like a CSV. It's actually a compressed ZIP archive containing a very specific structure of XML files and folders. If you rename a report.xlsx file to report.zip, you can open it and see the contents for yourself.

Inside, you'll find folders like xl/ and files such as:

  • workbook.xml: Contains information about the workbook itself, including the names and order of the sheets.
  • worksheets/sheet1.xml: Contains the actual data for a specific worksheet, referencing rows and cells.
  • sharedStrings.xml: For efficiency, Excel stores each unique string value only once in this file. Cells in the worksheet then reference these strings by an index instead of repeating the text.

Parsing this structure from scratch would be an enormous task. You'd need to decompress the archive and then build a robust XML parser to make sense of the tangled web of relationships. Fortunately, we can rely on existing open-source libraries to handle the heavy lifting for us.

The Strategy: Unzip, Parse, and Reconstruct

Our approach will involve two key libraries to tackle the two main challenges:

  1. libzip: A C library for reading, creating, and modifying Zip archives. We will use it to open the .xlsx package and access the individual XML files within it, like sharedStrings.xml and worksheets/sheet1.xml.
  2. libxml2: A powerful and widely used XML parser. We’ll use it to navigate the structure of the XML files we extract, pulling out the row, cell, and value information we need.

This approach gives you maximum flexibility and foundational understanding, because you're working directly with the core components of the .xlsx format.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Step-by-Step: Let's Read an Excel File

Let's build a simple C program that opens an Excel file, reads the contents of the first sheet, and prints it to the console. For this example, assume we have a simple Excel file named sales_data.xlsx with the following data in Sheet1:

(In Excel) A1: Product, B1: Region, C1: Sales A2: Widget A, B2: North, C2: 1500 A3: Gadget B, B3: South, C3: 2200 A4: Widget A, B4: North, C4: 1800

Step 1: Setting Up Your Environment

First, you need to install the development files for libzip and libxml2. The process varies depending on your operating system.

On Debian/Ubuntu:

sudo apt-get install libzip-dev libxml2-dev

On macOS (using Homebrew):

brew install libzip libxml2

On Windows: This is the most complex environment. The easiest path is to use the Windows Subsystem for Linux (WSL) and follow the Ubuntu instructions. Alternatively, you can use a package manager like vcpkg to install the libraries for a native Windows build environment.

Step 2: Scaffolding the Program and Opening the Archive

Let's start by writing C code to open the .xlsx file as a zip archive and confirm we can read the file list. Create a file named excel_reader.c.

#include <stdio.h>
#include <zip.h>

int main() {
    const char* filename = "sales_data.xlsx",
    int err = 0,
    zip_t* archive = zip_open(filename, 0, &err),
    if (archive == NULL) {
        fprintf(stderr, "Failed to open excel file\n"),
        return 1,
    }

    printf("Successfully opened '%s'.\n", filename),
    
    // We'll add our XML parsing logic here later.

    zip_close(archive),
    return 0,
}

To compile and run this, you need to link the libzip library. Use the -lzip flag:

gcc excel_reader.c -o excel_reader -lzip
./excel_reader

If it works, you'll see the message: "Successfully opened 'sales_data.xlsx'." This confirms that libzip is correctly installed and able to recognize your Excel file.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Step 3: Reading the Shared Strings

As mentioned, text values are often stored in xl/sharedStrings.xml. We need to read this file first and store its contents in an array so we can look up values later. This requires using libxml2.

For this, extract the XML file's content into memory a-la libzip, then parse that memory buffer with libxml2.

For this, you'd use zip_fopen to open the file inside the archive, zip_fread to read its contents into a buffer, and then xmlReadMemory from libxml2 to parse it. For brevity, the full implementation can be quite long, so we'll conceptualize the logical flow.

Imagine we've parsed sharedStrings.xml and now have an array like this:

char* shared_strings[] = {"Product", "Region", "Sales", "Widget A", "North", "Gadget B", "South"},

This is the data we'll use in the next step.

Step 4: Reading and Parsing the Worksheet

Now for the main event: parsing xl/worksheets/sheet1.xml to get the cell data. A cell in XML looks something like this:

<!-- A cell with a string value (type "s") -->
<c r="A2" t="s">
  <v>3</v>
</c>

<!-- A cell with a numeric value (no type attribute) -->
<c r="C2">
  <v>1500</v>
</c>

Our code needs to find each <row> element, then loop through its children <c> elements. It will check the attribute t. If t="s", it reads <v> as an index and looks up the string in shared_strings. Otherwise, it reads the numeric value directly.

Here's a conceptual code snippet:

#include <libxml/parser.h>
#include <string.h>

// ... inside a function `void parse_sheet(const char* xml_buffer, char** shared_strings)`
// Note: This is illustrative, real code needs more error handling and memory management.

xmlDoc* doc = xmlReadMemory(xml_buffer, strlen(xml_buffer), "noname.xml", NULL, 0),
if (doc == NULL) {
    fprintf(stderr, "Failed to parse document\n"),
    return,
}

xmlNode* root_element = xmlDocGetRootElement(doc),

// Navigate: worksheet -> sheetData -> row -> c -> v
for (xmlNode* row_node = root_element->children->children, row_node != NULL, row_node = row_node->next) {
    if (strcmp((const char*)row_node->name, "row") == 0) {
        for (xmlNode* cell_node = row_node->children, cell_node != NULL, cell_node = cell_node->next) {
            if (strcmp((const char*)cell_node->name, "c") == 0) {
                xmlChar* cell_type = xmlGetProp(cell_node, (const xmlChar*)"t"),
                xmlNode* value_node = NULL,
                for (xmlNode* v_node = cell_node->children, v_node != NULL, v_node = v_node->next) {
                    if (strcmp((const char*)v_node->name, "v") == 0) {
                        value_node = v_node,
                        break,
                    }
                }
                if (value_node == NULL) continue,

                xmlChar* value = xmlNodeGetContent(value_node),
                if (cell_type != NULL && strcmp((const char*)cell_type, "s") == 0) {
                    int index = atoi((const char*)value),
                    printf("%s, ", shared_strings[index]),
                } else {
                    printf("%s, ", (const char*)value),
                }
                xmlFree(value),
                if (cell_type) xmlFree(cell_type),
            }
        }
        printf("\n"),
    }
}

xmlFreeDoc(doc),

To compile with both libraries:

gcc excel_reader.c -o excel_reader -lzip -lxml2

Running the program at this stage would output CSV-like data:

Product, Region, Sales, 
Widget A, North, 1500, 
Gadget B, South, 2200, 
Widget A, North, 1800,
GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Common Challenges and Best Practices

  • Full Error Handling: Always check the return values from libzip and libxml2 functions to prevent crashes.
  • Memory Management: Free all allocated resources: xmlFreeDoc, xmlFree, and zip_close.
  • Handling Dates and Complex Types: Excel dates are stored as floating-point numbers. You need to interpret and convert them appropriately.
  • Using Streaming Parsers: For very large files, consider libxml2's SAX interface to process XML without loading entire documents into memory.

Final Thoughts

Reading Excel files in C is a powerful skill that blends high-level data formats with low-level performance programming. Using libraries like libzip and libxml2 allows you to access and process the data without heavy dependencies or proprietary tools. This method offers granular control, making it ideal for performance-critical applications.

This manual process highlights how much effort it takes to extract and understand data stored in complex files. We built Graphed because we believe insights should be accessible without such low-level data wrangling. With a one-click connection to your Google Sheets, databases, or SaaS platforms, our AI analyst lets you ask questions in plain English—like "compare sales for Widget A in the North regions this quarter"—and instantly generates dashboards and reports.

Related Articles