How to read Excel file in Java using Apache POI
Excel is the popular file format created by Microsoft and we will learn how to read data from excel files using Apache POI.

Excel spreadsheets are widely used around the world for various tasks that are related to analysis, storage of tabular data, calculation, visualization etc. Although Excel file is not an opened file format, Java applications can still read and write Excel files using the Apache POI - the Java API for Microsoft Documents, because the development team uses reverse-engineering to understand the Excel file format. Hence the name POI stands for Poor Obfuscation Implementation.
Since Excel files are so common, developers often encounter their uses on reading data or to generate a report in excel file format.
In this blog post, we will see how to read data from excel files in Java using Apache POI's library. We will use Mavan for this project.
Getting Apache POI library:
Apache POI is the pure Java API for reading and writing Excel files in both formats XLS (Excel 2003 and earlier) and XLSX (Excel 2007 and later). To use Apache POI in your Java project:
Add the following dependencies to pom.xml file:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>5.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.0.0</version>
</dependency>
- The first dependency
poi
is used to work with the old Microsoft's binary file format for excel. These file formats have .xls extension. - The second dependency poi-ooxml is used to work with the newer xml based file format. These file formats have .xlsx extension.
Sample Excel file we will read:
Following screenshot shows a sample excel file that we are going to read in our code. Even though the file format is of the newer xml based (.xlsx), the code would work for both types of file formats (.xls and .xlsx).
Apache POI:
Terminologies: Apache POI library mainly involves following four key interfaces.
- Workbook: Workbook is a high level representation of a Spreadsheet.
- Sheet: A workbook may contain many sheets.
- Row: It represents a row in the spreadsheet.
- Cell: A cell represents a column in the spreadsheet.
Classes:
1. HSSF, XSSF and SXSSF classes:
Apache POI main classes start with either HSSF, XSSF or SXSSF.
- HSSF (Horrible SpreadSheet Format) - is the is the POI Project’s pure Java implementation of the Excel ’97(-2007) file format. e.g.
HSSFWorkbook
,HSSFSheet
. They are used to work with excel files of the older binary file format - .xls. - XSSF (XML SpreadSheet Format) - is the POI Project’s pure Java implementation of the Excel 2007 OOXML (.xlsx) file format. e.g.
XSSFWorkbook
,XSSFSheet
. These implementations are used to work with the newer xml based file format - .xlsx. - SXSSF - (since 3.8-beta3) – is an API-compatible streaming extension of
XSSF
to be used when very large spreadsheets have to be produced, and heap space is limited. e.g.SXSSFWorkbook
,SXSSFSheet
.SXSSF
achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document.
2. Row and Cell:
Apart from above classes, Row
and Cell
are used to interact with a particular row and a particular cell in excel sheet.
3. Style Classes:
A wide range of classes like CellStyle, BuiltinFormats, ComparisonOperator, ConditionalFormattingRule, FontFormatting, IndexedColors, PatternFormatting, SheetConditionalFormatting etc. are used when you have to add formatting in a sheet, mostly based on some rules.
4. FormulaEvaluator:
Another useful class FormulaEvaluator is used to evaluate the formula cells in excel sheet.
Apache POI - Code to read an excel file:
Reading an excel file using POI is simple if we divide this in steps.
- Create workbook instance from excel sheet
- Get to the desired sheet
- Increment row number
- iterate over all cells in a row
- repeat step 3 and 4 until all data is read
The following program shows you how to read an excel file using Apache POI. Since we’re not using any file format specific POI classes, the program will work for both types of file formats - .xls and .xlsx.
The program shows three different ways of iterating over sheets, rows, and columns in the excel file.
import org.apache.poi.ss.usermodel.*;
import java.io.File;
import java.io.IOException;
import java.util.Iterator;
public class ExcelReader {
public void excelReader(String file) throws IOException {
// Create a Workbook from the given Excel file
Workbook workbook = WorkbookFactory.create(new File(file));
// Get number of sheets in the Workbook
System.out.println("Workbook has " + workbook.getNumberOfSheets() + " sheet(s).");
/*
* Iterate over all the sheets in the Workbook
*/
// 1. Obtain sheetIterator and iterate over it
Iterator<Sheet> sheetIterator = workbook.sheetIterator();
System.out.println("Sheets:");
while (sheetIterator.hasNext()) {
Sheet sheet = sheetIterator.next();
System.out.println("-> " + sheet.getSheetName());
}
// 2. Using for-each loop
for (Sheet sheet: workbook) {
System.out.println("-> " + sheet.getSheetName());
}
// 3. Using Java 8 forEach loop with lambda
workbook.forEach(sheet -> System.out.println(sheet.getSheetName()));
/*
* Iterate over rows and columns in Sheet
*/
// Get the sheet at required index
Sheet sheet = workbook.getSheetAt(0);
// Create a DataFormatter to format and get each cell's value as String
DataFormatter dataFormatter = new DataFormatter();
// 1. Obtain rowIterator and columnIterator
Iterator<Row> rowIterator = sheet.rowIterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// Iterate over columns of current row
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
String cellValue = dataFormatter.formatCellValue(cell);
System.out.println(cellValue);
}
}
// 2. Use for-each loop
for (Row row: sheet) {
for (Cell cell: row) {
String cellValue = dataFormatter.formatCellValue(cell);
System.out.println(cellValue);
}
}
// 3. Java 8 forEach loop with lambda
sheet.forEach(row -> row.forEach(cell -> {
String cellValue = dataFormatter.formatCellValue(cell);
System.out.println(cellValue);
}));
// Close the Workbook
workbook.close();
}
}
Create object and call the method:
import java.io.IOException;
public class VelocityBytes {
public static void main(String[] args) {
ExcelReader excelReader = new ExcelReader();
try {
excelReader.excelReader("data/velocitybytes.xlsx");
} catch (IOException e) {
e.printStackTrace();
}
}
}
If you have noticed, we are not using the concrete classes like HSSFWorkbook
or XSSFWorkbook
to create an instance of the Workbook
. We are using WorkbookFactory
instead. This makes our program independent of format and works for both file types .xls and .xlsx.
And we have used three different ways of iterating over sheets, rows, and columns. The clean approach I prefere is Java 8 forEach loop with lambda expression and I’ve used a DataFormatter to format and get each cell’s value as String.
Thank you for reading. Until next time!