This post is about inserting large amounts of data into MySQL as fast as possible. MySQL provides a few ways to improve the speed of INSERT statements. However, if you are comfortable with straying from JDBC standard, LOAD DATA INFILE statement is probably the fastest way to insert data into MySQL.

To use LOAD DATA INFILE statement in JDBC, you will need some help from the driver. Fortunately, MySQL driver Connector/J 5.1.3 and later include two additional methods -- setLocalInfileInputStream() and getLocalInfileInputStream() -- to send data to MySQL server for a LOAD DATA INFILE statement. See Connector/J notes that talks about these methods in more detail.

If you are looking for a solution in Java, you might want to see Jeff Rick's post on the same topic.

For this demo, I'll use two tables visit and visit_by_day as described below. To avoid frequent grouping by day, I'd like to collect statistics by day and load that data into visit_by_day table.

mysql> describe visit;
+---------+--------------+------+-----+---------+----------------+
| Field   | Type         | Null | Key | Default | Extra          |
+---------+--------------+------+-----+---------+----------------+
| visitId | int(11)      | NO   | PRI | NULL    | auto_increment |
| when    | timestamp    | YES  |     | NULL    |                |
| url     | varchar(255) | YES  |     | NULL    |                |
| browser | varchar(255) | YES  |     | NULL    |                |
+---------+--------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

mysql> describe visits_by_day;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| day   | date    | NO   | PRI | NULL    |       |
| count | int(11) | NO   |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.00 sec)

The intent here is to pass data to MySQL server by invoking setLocalInfileInputStream() on the Statement object, which is of type MySQLStatement. I am using C3PO for connection pooling, which also happens to provide a nice API to invoke non-standard methods reflectively on an underlying Connection. So, here's the complete program.


import groovy.sql.Sql

import org.apache.commons.io.IOUtils

import com.mchange.v2.c3p0.C3P0ProxyStatement
import com.mysql.jdbc.Statement as MySQLStatement

/**
 * Data Service
 *
 * @author $Author:  $
 * @version $Revision:  $
 */
class DataService {

	static transactional = false

	/**
	 * Spring bean datasource is auto-injected
	 */
	def dataSource

	def OUT_STATEMENT = """
		SELECT
    		DATE(`when`) as `day`, COUNT(`visitId`) as `totalVisitors`
		FROM
    		`visit`
		GROUP BY `day`
	"""

	def IN_STATEMENT = """
		LOAD DATA LOCAL INFILE '/home/vkandy/data.tsv' REPLACE INTO TABLE `visits_by_day`;
	"""

	/**
	 * Reads data from `visit` table and loads into `visits_by_day` table.
	 *
	 * @return
	 */
	def loadData() {
		try {
			def sql = new Sql(dataSource)
			def requested = sql.rows(OUT_STATEMENT, [])

			def data = new StringBuilder()
			requested.each { data << "$it.day\t$it.totalVisitors\n" }

			def inStream = IOUtils.toInputStream(data.toString())
			sql.withStatement{ stmt ->
				def method = MySQLStatement.class.getMethod("setLocalInfileInputStream", [InputStream.class] as Class[]);
				stmt.rawStatementOperation(method, C3P0ProxyStatement.RAW_STATEMENT, inStream)
			}
			sql.execute(IN_STATEMENT)

		} catch(all) {
			all.printStackTrace()
		}
	}
}

Further Reading

The default field delimiter is the tab character and the default line separator is a new line character. You can use other delimiters using:

LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\r\n';
For more options etc., see LOAD DATA INFILE syntax.

For information on improving batch import performance, read Ted Naleid's post on that topic.