Table of Contents
Fundamentally, you backup your databases by copying JE log files off to a safe storage location. To restore your database from a backup, you copy those files to an appropriate directory on disk and reopen your JE application
Beyond these simple activities, there are some differing backup strategies that you may want to consider. These topics are described in this chapter.
Before describing JE backup and restore, it is necessary to describe some of JE's internal workings. In particular, a high-level understanding of JE log files and the in-memory cache is required. You also need to understand a little about how JE is using its internal data structures in order to understand why checkpoints and/or syncs are required.
You can skip this section so long as you understand that:
JE databases are stored in log files contained in your environment directory.
Every time a JE environment is opened, normal recovery is run.
For transactional applications, checkpoints should be run in order to bound normal recovery time. Checkpoints are normally run by the checkpointer thread. Transactional applications and the checkpointer thread are described in the Berkeley DB, Java Edition Getting Started with Transaction Processing guide.
For non-transactional applications, environment syncs must be performed if you want to guarantee the persistence of your database modifications. Environment syncs are manually performed by the application developer. See Data Persistence for details.
Your JE database is stored on-disk in a series of log files. JE uses no-overwrite log files, which is to say that JE only ever appends data to the end of a log file. It will never delete or modify an existing log file record.
JE log files are named
NNNNNNNN.jdb
where NNNNNNNN
is an 8-digit hexadecimal number that
increases by 1 (starting from 00000000
) for each log file written to disk.
JE creates a new log file whenever the current log file has reached a pre-configured size (10000000
bytes by default). This size is controlled by the je.log.fileMax
properties
parameter. See The JE Properties File for information on setting
JE properties.
By default, log files are placed in the environment home directory. However, you can cause JE to place log files in subdirectories within the environment home directory. For more information, see Multiple Environment Subdirectories.
Because JE uses no-overwrite log files, the logs must be compacted or cleaned so as to conserve disk space.
JE uses the cleaner background thread to perform this task. When it runs, the cleaner thread picks the log file with the smallest number of active records and scans each log record in it. If the record is no longer active in the database tree, the cleaner does nothing. If the record is still active in the tree, then the cleaner copies the record forward to a newer log file.
Once a log file is no longer needed (that is, it no longer contains active records), then the cleaner
thread deletes the log file for you. Or, optionally, the cleaner thread can simply rename the discarded
log file with a del
suffix.
JE uses a minimum log utilization property to determine how much cleaning to perform. The log files
contain both obsolete and utilized records. Obsolete records are records that are no longer in use, either
because they have been modified or because they have been deleted. Utilized records are those records
that are currently in use. The je.cleaner.minUtilization
property identifies the
minimum percentage of log space that must be used by utilized records. If this minimum percentage is not
met, then log files are cleaned until the minimum percentage is met.
For information on managing the cleaner thread, see The Cleaner Thread.
JE databases are internally organized as a BTree. In order to operate, JE requires the complete BTree be available to it.
When database records are created, modified, or deleted, the modifications are represented in the BTree's leaf nodes. Beyond leaf node changes, database record modifications can also cause changes to other BTree nodes and structures.
When a write operation is performed in JE, the modified data is written to a leaf node contained in the in-memory cache. If your JE writes are performed without transactions, then the in-memory cache is the only location guaranteed to receive a database modification without further intervention on the part of the application developer.
For some class of applications, this lack of a guaranteed write to disk is ideal. By not writing these modifications to the on-disk logs, the application can avoid most of the overhead caused by disk I/O.
However, if the application requires its data to persist persist at a specific point in time, then the developer must
manually sync database modifications to the on-disk log files (again, this is only necessary for
non-transactional applications). This is done using Environment.sync()
.
Note that syncing the cache causes JE to write all modified objects in the cache to disk. This is probably the most expensive operation that you can perform in JE.
Every time a JE environment is opened, normal recovery is run. Because of the way that JE organizes and manages its BTrees, all it needs is leaf nodes in order to recreate the rest of the BTree. Essentially, this is what normal recovery is doing – recreating any missing parts of the internal BTree from leaf node information stored in the log files.
Unlike a traditional database system, JE performs recovery for both transactional and non-transactional operations. The integrity of the Btree is guaranteed by JE in the face of both application and OS crashes.