Among the several database concepts out there, one of the most important ones is replication. Replication involves data copying between multiple systems so each user has the same type of information. This allows for data availability which in turn improves the performance of the application. Likewise, it can also help in backups, for instance when one of the systems is damaged by a cyberattack. In MongoDB, replication is achieved with the help of grouped mongod processes known as a replica set.
What Is a Replica Set?
A replica set is an amalgamation of multiple mongod processes which are combined together. They are the key to offering redundancy in MongoDB along with ensuring strong availability of data. Members in the replica set are classified into three categories.
- Primary member.
- Secondary member.
There can only be one primary member in a single replica set. It is a sort of “leader” among all the other members. The major difference between a primary member and other members is that a primary member gets write operations. This means that when MongoDB has to process writes, then it forwards them to the primary member. These writes are then recorded in the operation log (oplog) of the primary member. Afterward, the secondary members look up in the oplog and use that recorded data to implement changes in their stored data.
In the case of read operations, all the members of the replica set can process them but the read operation is first directed to the primary member due to default settings in the MongoDB.
All members in a replica set acknowledge their availability through a “heartbeat”. If the heartbeat is missing for the specified time limit (usually defined in seconds), then it means that the member is disconnected. In replica sets, there are some cases when the primary member gets unavailable due to multiple issues, for instance, the data center in which the primary member resides is disconnected due to the power outage.
Since replication cannot operate without a primary member, an “election” is held to appoint a primary member from the secondary members.
A secondary member is dependent on the primary member to maintain its data. For replication, it uses the oplog of the primary member asynchronously. Unlike primary members, a replica set can have multiple secondary members—the more the merrier. However, as explained before, write operations are only conducted by the primary member and thus, a secondary member is not allowed to process them.
If a secondary member is allowed to vote, then it is eligible to trigger an election and take part as a candidate to become the new primary member.
A secondary member can also be customized to achieve the following.
- Made ineligible to become the primary member. This is done so it can be utilized in a secondary data center where it is valuable as a standby option.
- Block the contact of applications from the secondary member such that they cannot “read” through it. This can ensure that they work with special applications which necessitate detachment from the standard traffic.
- You can modify a secondary member to act as a snapshot for historical purposes. This is a backup strategy which can aid as a contingency plan. For example, when a user deletes a database by mistake, then the snapshot can be utilized.
An arbiter is distinct due to the fact that it does not store its own data copy. Likewise, it is ineligible for the primary member position. So, why exactly are they used? Arbiters provide efficiency in the elections; they can vote once. They are added so they can vote and break the possibility of scenarios where it is a “tie” between members.
For example, there are 5 members in a replica set. If the primary member gets unavailable, then an election is held where two of the secondary members bag two votes each, so it is not possible for either of them to become the primary member. Therefore, to break such standoffs, arbiters are added so their vote can complete the election and select a primary member.
It is also possible to add a secondary member in the replica set to improve the election but a new secondary member is heavy as it stores and maintains data. Since arbiters are not constrained by such overheads; therefore they are the most cost-effective solution.
It is recommended, that arbiters should never reside in sites which are responsible to host both the primary and secondary members.
If the configuration option, authorization, is enabled in the arbiter, then it can also swap credentials with members of the replica set where authentication is implemented using the “keyfiles”. In such scenarios, MongoDB applies encryption on the entire authentication procedure.
The use of cryptographic techniques ensures that this authentication is secure from any exploitation. However, since arbiters are unable to store or maintain data, this means that the internal table—which has information of users/roles in authentication—can not be owned by the arbiters. Therefore, the localhost exception is used with the arbiter for authentication purposes.
Do note that in the recent MongoDB versions (from MongoDB 3.6 onwards), if you upgrade the replica set from an older version, then its priority value is changed from 1 to 0.
What Is the Ideal Replica Set?
There is a limit on the number of members in the replica set. At most, a replica set can contain 50 members. However, the number of voting members must not cross the 7-member limit. To configure voting and allow a member to vote, the members[n]. votes setting of a replica set must be 1.
Ideally, a replica set should have at least three members which can bear data: 1 primary member and 2 secondary members. It is also possible to use a three-member replica set where with one primary, secondary, and arbiter but at the expense of redundancy.