Bulk and Batch import with Hibernate

18.12.2017

Bulk and Batch import with Hibernate

This article explains how to execute a bulk or batch import with Hibernate so that as few SQL statements as possible are executed. The following properties must be set for Hibernate (hibernate.properties) to group the individual inserts. In this example the batch size is 5.

hibernate.jdbc.batch_size=5
hibernate.order_inserts=true
hibernate.order_updates=true

In addition, the following setting ensures that statistics are printed after the import, so that you can check how many SQL statements and bulk operations have been executed. In the file: persistence.xml the property must be added.

 <property name="hibernate.generate_statistics" value="true" />

In the class Importer the import is executed. Ten Item-Objects are created an persisted with the EntityManger. This operation is done in one transaction.

 package org.hameister.bulk;

import org.hameister.bulk.data.Item;


import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.UUID;

/**
 * Created by hameister on 03.12.17.
 */
public class Importer {

    public Importer() throws Exception {

        EntityManager entityManager = createEntityManagerFactory().createEntityManager();
        entityManager.getTransaction().begin();

        String date = LocalDateTime.now().format(DateTimeFormatter.ISO_DATE_TIME);

        for (int i = 0; i < 10; i++) {
            Item item = new Item();
            item.setId(UUID.randomUUID().toString());
            item.setLocation("Board");
            item.setDescription("Item " + i + "IT:" + date);
            entityManager.persist(item);

            //Check when the SQL-Statements are written to the console
            if (i == 4) {
                entityManager.flush();
                entityManager.clear();
            }
        }


        entityManager.getTransaction().commit();
        entityManager.close();
        System.out.println("Commit");
    }


    protected EntityManagerFactory createEntityManagerFactory() throws Exception {
        return Persistence.createEntityManagerFactory("org.hibernate.tutorial.jpa");
    }

}

If you want to know when the SQL statements are written to the console and when the import is done, the condition in line 33 can be changed in the if-statement of the loop. Or it can be completely removed.

In the example, 10 item objects are created and imported with a batch size 5.

package org.hameister.bulk.data;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;

/**
 * Created by hameister on 03.12.17.
 */
@Entity
@Table(name = "Item")
public class Item {

    @Id
    String id;

    @Column(name = "description")
    private String description;

    @Column(name = "location")
    private String location;

    public Item() {
    }

    // Getter and Setter...
}

Attention: Do not use a generator to create the id in the Item. If you do that Hibernate is not able to create batches, because it cannot prepare the statements if the ids are missing which are created by the database. Summary, one can say that you have to create your PKs (ids) yourself if you want to use batching. (See line 27 of the Importer)

Statistic output without if-statemenent if hibernate.jdbc.batch_size=5.

 25829 nanoseconds spent acquiring 1 JDBC connections;
 0 nanoseconds spent releasing 0 JDBC connections;
 92324 nanoseconds spent preparing 1 JDBC statements;
 0 nanoseconds spent executing 0 JDBC statements;
 9683458 nanoseconds spent executing 2 JDBC batches;
 0 nanoseconds spent performing 0 L2C puts;
 0 nanoseconds spent performing 0 L2C hits;
 0 nanoseconds spent performing 0 L2C misses;
 29629982 nanoseconds spent executing 1 flushes (flushing a total of 10 entities and 0 collections);
 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

Statistic output with if-statemenent if hibernate.jdbc.batch_size=5.

22597 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
331605 nanoseconds spent preparing 2 JDBC statements;
0 nanoseconds spent executing 0 JDBC statements;
10124494 nanoseconds spent executing 2 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
53042885 nanoseconds spent executing 2 flushes (flushing a total of 10 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

Statistic output with if-statemenent (i==5) if hibernate.jdbc.batch_size=5.

22009 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
169733 nanoseconds spent preparing 2 JDBC statements;
0 nanoseconds spent executing 0 JDBC statements;
4269364 nanoseconds spent executing 3 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
29683214 nanoseconds spent executing 2 flushes (flushing a total of 10 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

If you set the batch_size to 10 and remove the if-statement, then you see that all items are imported in a batch:

19117 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
135859 nanoseconds spent preparing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC statements;
4418512 nanoseconds spent executing 1 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
20183053 nanoseconds spent executing 1 flushes (flushing a total of 10 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

The complete source code can be found on Github HibernateBulkImport as a Maven project.

Further informations and explanations can be found in Vlad Mihalceas Blogpost The best way to do batch processing with JPA and Hibernate.

If you want to use bulk and batch imports ans Spring Boot you find an example here: Bulk and Batch imports with Spring Boot