Chapter 2. Replication API First Steps

Table of Contents

Using Replicated Environments
Configuring Replicated Environments
HA Exceptions
Master-Specific HA Exceptions
Replica-Specific HA Exceptions
Replicated Environment Handle-Specific Exceptions
Opening a Replicated Environment
Managing Write Requests at a Replica
Using the StateChangeListener
Catching ReplicaWriteException
Secondary Nodes
Time Synchronization
Configuring Two-Node Groups

From an API point of view, there are two basic requirements that every replicated application must meet:

  1. It must be a transactional application.

  2. It must use a specific form of the Environment handle, which you get by using the ReplicatedEnvironment class.

Beyond that, there are some additional requirements in terms of exception handling that your application should perform.

The transactional nature of your replicated application is described in Transaction Management. This chapter discusses replicated environments and the exceptions unique to exceptions in detail.

Using Replicated Environments

Every electable or secondary node manages a single replicated JE environment directory. The environment follows the usual regulations governing a JE environment; namely, only a single read/write process can access the environment at a single point in time.

Usually this requirement is met naturally, because usually each node in a replicated application is also operating on a machine that is independent of all the other nodes. However, in some test and development scenarios, this one node to one machine rule might not be met, so the bottom line is that you need to make sure that no two processes are ever attempting to manage the same environment.

Note

An application can access a replicated JE environment directory using a read only Environment handle. The usual semantics of read only non-replicated Environment handles apply in this case. That is, the application can view a snapshot of the replicated environment as of the time the Environment handle was opened, through the Environment handle. An application can therefore open a ReplicatedEnvironment handle in one process, and concurrently open read only Environment handles in other processes. Any changes subsequently made to the replicated environment, either by virtue of the node being a Master, or due to a replay of the replication stream (if the node is a Replica), are not accessible through the read only Environment handles until they are closed and reopened.

Normally you manage your JE environments using the Environment class. However, to provide for the underlying infrastructure needed to implement replication, your JE HA application must instead use the ReplicatedEnvironment class, which is a subclass of Environment. Its constructor accepts the normal environment configuration properties using the EnvironmentConfig class, just as you would normally configure an Environment object. However, the ReplicatedEnvironment class also accepts an ReplicationConfig class object, which allows you to manage the properties specific to replication.

The following is an example of how you instantiate a ReplicatedEnvironment object. Note that there are some differences in how this is used, depending on whether you are starting a brand-new node or you are restarting an existing node. We discuss these differences in the next section.

For a general description of environments and environment configuration, see the Getting Started with Berkeley DB Java Edition guide.

EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setAllowCreate(true);
envConfig.setTransactional(true);

// Identify the node
ReplicationConfig repConfig = new ReplicationConfig();
repConfig.setGroupName("PlanetaryRepGroup");
repConfig.setNodeName("Mercury");
repConfig.setNodeHostPort("mercury.example.com:5001");

// This is the first node, so its helper is itself
repConfig.setHelperHosts("mercury.example.com:5001");
 
ReplicatedEnvironment repEnv =
     new ReplicatedEnvironment(envHome, repConfig, envConfig);  

Configuring Replicated Environments

You configure a JE ReplicatedEnvironment handle using two different configuration classes: EnvironmentConfig and ReplicationConfig. Your usage of EnvironmentConfig is no different than if you were writing a non-replicated application, so we will not describe its usage here. For an introduction to basic environment configuration, see the Getting Started with Berkeley DB, Java Edition guide.

The ReplicationConfig class allows you to configure properties that are specific to replicated applications. Some of these properties are important in terms of how your application will behave and how well it will perform. These properties are discussed in detail later in this book.

To an extent, you can get away with ignoring most of the configuration properties until you are ready to tune your application's performance and behavior. However, no matter what, there are four properties you must always configure for a ReplicatedEnvironment before opening it. They are:

  1. Group Name

    The group name is a string that uniquely identifies the group to which the node belongs. This name must be unique. It is possible to operate multiple replication groups on the same network. In fact, a single process can even interact with multiple replication groups, so long as it maintains separate replicated environments for each group in which it is participating.

    By using unique group names, the JE replication code can make sure that messages arriving at a given client are actually meant for that client.

    You set the group name by using the ReplicationConfig.setGroupName() method. Note that if you do not set a group name, then the default GROUP_NAME value is used.

  2. Node Name

    This name must be unique to the replication group. This name plus the replication group name uniquely identifies a node in your enterprise.

    You set the node name by using the ReplicationConfig.setNodeName() method.

  3. Host

    The host property identifies the network name and port where this node can be reached. Other nodes in the replication group will use this host/port pair to establish a TCP/IP connection to this node. This connection is used to transfer data between machines, hold elections, and monitor the status of the replication group.

    You provide the host and port information using a string of the form:

    host:[port]

    The port that you provide must be higher than 1023.

    You set the host information by using the ReplicationConfig.setNodeHostPort() method. Note that if you do not set a node host, then the default NODE_HOST_PORT value is used.

  4. Helper Host

    The helper host or hosts are used by a node the very first time it starts up to find the Master. Basically, this string should provide one or more host/port pairs for nodes who should know where the Master is.

    One of the nodes that you provide on this string can be the current Master, but that is not required. All that matters is that the hosts identified here can tell a new node where the current Master is.

    If the brand new node is an electable node and cannot find a Master, it will initiate an election. If no other electable nodes are available to the new node, and the current node is specified as the only helper host, then it will elect itself as Master. If the current node is truly the very first electable node starting up in the replication group, then self-electing itself to be the Master is probably what you want it to do.

    However, if the current node is not the very first node starting up in the replication group, then a misconfiguration of this property can cause you to end up with multiple replication groups, each with the same group name. This represents an error situation, one that can be very difficult to diagnose by people who are inexperienced with managing replication groups. For this reason, it is very important to make sure the hosts identified on this string do NOT identify only the local host except when creating the first node.

    On subsequent start ups after the very first startup, the node should be able to locate other participants in the replication group using information located in its own database. In that case, the information provided on this string is largely ignored unless the current node has been down or otherwise out of communication with the rest of the group for so long that its locally cached information has grown stale. In this case, the node will attempt to use the information provided here to locate the current Master.

    You set the helper host information by using the ReplicationConfig.setHelperHosts() method.

When configuring and instantiating a ReplicatedEnvironment object, you should usually configure the environment so that a helper host other than the local machine is used:

EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setAllowCreate(true);
envConfig.setTransactional(true);
 
// Identify the node
ReplicationConfig repConfig = new ReplicationConfig();
repConfig.setGroupName("PlanetaryRepGroup");
repConfig.setNodeName("Jupiter");
repConfig.setNodeHostPort("jupiter.example.com:5002");
 
// Use the node at mercury.example.com:5001 as a helper to find the rest
// of the group.
repConfig.setHelperHosts("mercury.example.com:5001");
 
ReplicatedEnvironment repEnv =
   new ReplicatedEnvironment(envHome, repConfig, envConfig);  

Note that if you are restarting a node that has already been added to the replication group, then you do not have to supply a helper host at all. This is because the node will already have locally stored host and port information about the other nodes in the group.

EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setAllowCreate(true);
envConfig.setTransactional(true);
 
// Identify the node
ReplicationConfig repConfig = 
    new ReplicationConfig("PlanetaryRepGroup", 
                          "Jupiter", 
                          "jupiter.example.com:5002");
 
ReplicatedEnvironment repEnv =
   new ReplicatedEnvironment(envHome, repConfig, envConfig);  

However, if you are starting the very first node in the replication group for the very first time, then there is no other helper host that the node can use to locate a Master. In this case, identify the current node as the helper host, and it will then go ahead and become a replication group of size 1 with itself as a Master.

Note

Do this ONLY if you are truly starting the very first electable node in a replication group for the very first time.

EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setAllowCreate(true);
envConfig.setTransactional(true);
 
// Identify the node
ReplicationConfig repConfig = 
    new ReplicationConfig("PlanetaryRepGroup", 
                          "Jupiter", 
                          "jupiter.example.com:5002");
 
// This is the first node, so the helper is itself.
repConfig.setHelperHosts("jupiter.example.com:5002");
 
ReplicatedEnvironment repEnv =
   new ReplicatedEnvironment(envHome, repConfig, envConfig);