Restoring Log Files

Reclaiming Log Files
Suspending Writes Due to Disk Thresholds

During normal operations, the nodes in a replication group communicate with one another to ensure that the JE cleaner does not reclaim log files still needed by the group. The tail end of the replication stream may still be needed by a lagging Replica in order to make it current with the Master, and so the replication group tries to make sure the trailing log files needed to bring lagging Replicas up-to-date are not reclaimed.

However, if a node is unavailable for a long enough period of time, then log files needed to bring it up to date might have been reclaimed by the cleaner. For information on how and when log files are reclaimed in a replicated environment, see Reclaiming Log Files.

Once log files have been reclaimed by a cleaner, then the Replica can no longer be brought up to date using the normal replication stream. Your application code will know this has happened when the ReplicatedEnvironment constructor throws an InsufficientLogException.

When your code catches an InsufficientLogException, then you must bring the Replica up-to-date using a mechanism other than the normal replication stream. You do this using the NetworkRestore class. A call to NetworkRestore.execute() causes the Replica to copy the missing log files from a member of the replication group who owns the files and seems to be the least busy. Once the Replica has obtained the log files that it requires, it automatically re-establishes its replication stream with the Master so that the Master can finish bringing the Replica up-to-date.

For example:

 ...
  try {
     node = new ReplicatedEnvironment(envDir, repConfig, envConfig);
 } catch (InsufficientLogException insufficientLogEx) {

     NetworkRestore restore = new NetworkRestore();
     NetworkRestoreConfig config = new NetworkRestoreConfig();
     config.setRetainLogFiles(false); // delete obsolete log files.

     // Use the members returned by insufficientLogEx.getLogProviders() 
     // to select the desired subset of members and pass the resulting 
     // list as the argument to config.setLogProviders(), if the 
     // default selection of providers is not suitable.

     restore.execute(insufficientLogEx, config);

     // retry
     node = new ReplicatedEnvironment(envDir, repConfig, envConfig);
 } ...  

Note that the replication group does not maintain information about the log files needed by secondary nodes. Instead, the system retains a set of log files beyond those required for a network restore based on the NETWORK_RESTORE_OVERHEAD property, which you can manage using ReplicationConfig.setConfigParam(). The default value is 10, which means that the system uses the estimate of 10 percent for the additional amount of data that performing a network restore needs to send over the network as compared to using the same log files to perform replication. In this case, the system saves files containing an additional 10 percent of log data beyond the amount needed for a network restore.

Reclaiming Log Files

Ordinarily JE's cleaner thread reclaims log files as soon as possible so as to minimize the amount of disk space used by the database. Log files are reclaimed as records are deleted, and log files are subsequently compacted.

However, various database activities might cause log files to be temporarily reserved or protected temporarily. A reserved file is a file that JE can delete but has yet done so. A protected file is a file that should be deleted, but JE cannot do so due to some database activity, such as a backup.

For replicated environments, JE hangs on to log files as long as possible in case they are needed to bring a replica up to date. Log files that have been cleaned but then saved due because of replication are in a reserved state. All such files are retained until the disk usage thresholds as defined by EnvironmentConfig.MAX_DISK and EnvironmentConfig.FREE_DISK are exceeded. At that point, JE deletes reserved log files.

Suspending Writes Due to Disk Thresholds

In the previous section, we mentioned that JE reserves cleaned log files until disk threshold limits are encountered, at which time log files are reclaimed (deleted).

Be aware that if reclaiming log files does not allow JE to meet its disk usage threshold limits, then writes are disabled for one or more nodes in the replication group.

If the threshold limits cannot be met on the Master, then write operations will throw DiskLimitException just as they would for a non-replicated environment.

If the threshold limit cannot be met on a replica, then writes are disabled only on that replica. In this case, the Master might see InsufficientAcksException thrown in response to a write — if your application's durability guarantee cannot be met due to the replica being unable to perform writes.