JE HA requires you to manage more error situations that you would have to if you were writing a non-replicated application. These error situations translate to additional exceptions that you must contend with in your code. Before continuing with our description of how to write a replicated application, it is useful to review the HA-specific exceptions that your application must manage.
There are two exceptions that you can see on a Master node, and which you will not see anywhere else. They are:
This exception can be raised on a transaction begin or commit. It means that the Master cannot successfully commit a transaction, or begin one, because it is not in contact with enough Electable Replicas. The number of Electable Replicas required to successfully commit the transaction is a function of the durability policy that you have set for the transaction. See Managing Durability for more information.
If raised on a transaction commit operation, this exception means that the transaction has not been committed. Instead, it has been marked as invalid. In response to this exception, your application must at a minimum abort the transaction. It is up to you whether you want to retry the transaction at some later time when more Replicas are in contact with the Master.
If raised on a transaction begin operation, this exception means that the transaction has not begun. If the application intended to initiate a read-only transaction on a Master, it can avoid this exception by ensuring that the transaction is configured to not require any acknowledgments. For information on configuring acknowledgments, see Managing Acknowledgements.
This exception can be raised on a transaction commit. It means that the Master has successfully committed the transaction locally, but it has not received enough acknowledgements from its Electable Replicas in the timeframe allocated for acknowledgements to be received.
The application should respond to this exception in such a way as to alert the administrator that there might be a problem with the health of the network or the nodes participating in the replication group.
For information on how to manage acknowledgement policies, see Managing Acknowledgements.
The exceptions that you can see on a Replica, and nowhere else, are:
Indicates that the Replica was unable to meet the defined consistency requirements in the allocated period of time.
If this exception is encountered frequently, it indicates that the consistency policy requirements are too strict and cannot be met routinely given the load being placed on the system and the hardware resources that are available to service the load. The exception may also indicate that there is a network related issue that is preventing the Replica from communicating with the Master and keeping up with the replication stream.
In response to this exception, your application can either attempt to retry the transaction, or you can relax your application's consistency requirements until the transaction can successfully complete.
For information on managing consistency policies, see Managing Consistency.
An attempt was made to perform a write operation on a Replica. The exception typically indicates an error in the application logic. In some extremely rare cases it could be the result of a transition of the node from Master to Replica, while a transaction was in progress.
The application must abort the current transaction and redirect all subsequent update operations to the Master. For example code that performs this action, see Example Run Transaction Class.
A read lock currently held by a Replica has been preempted by an HA write operation. The Replica should abort and retry the read operation in response to this exception.
Note that your application should attempt to catch the LockConflictException base class rather than this class because all of the locking exceptions are managed in the same way (abort and retry the transaction).
The database handle on a Replica was forcibly closed due to the replay of an Environment.truncateDatabase(), Environment.removeDatabase() or Environment.renameDatabase() operation in the replication stream.
When this exception occurs, the application must close any open Cursors and abort any open Transactions that are using the database, and then close the Database handle. If the application wishes, it may reopen the database if it still exists.
A new master has been selected, this Replica's log is ahead of the current Master, but the Replica was unable to rollback without a recovery. As a consequence, one or more of the most recently committed transactions may need to be rolled back, before the Replica can synchronize its state with that of the current Master. This exception can happen if the electable Replica with the most recent log files was unable to participate in the election of the Master, perhaps because the node had been shut down.
For details on how to handle this exception, see Managing Transaction Rollbacks.
Indicates that the log files constituting the Environment are insufficient and cannot be used as the basis for continuing with the replication stream provided by the current master.
This exception generally means that the node has been down for a long enough time that it can not be brought up-to-date by the Master. For information on how to respond to this condition, see Restoring Log Files.
In addition to Master- and Replica-specific exceptions, it is possible for a ReplicatedEnvironment handle to throw an UnknownMasterException. This exception indicates that the operation being tried requires communication with a Master, but the Master is not available.
This exception typically indicates that there is a problem with your physical infrastructure. It might mean that an insufficient number of electable nodes are available to elect a Master, or that the current node is unable to communicate with other nodes due to, for example, network problems.
In response to this exception, your application can try any number of corrective actions, from immediately retrying the operation, to logging the problem and then abandoning the operation, to waiting some predetermined period of time before attempting the operation again. Your application can also use the Monitor or the StateChangeListener to be notified when a Master becomes available. For more information see Writing Monitor Nodes or Using the StateChangeListener.