Deep dive || Aerospike Fundamentals

aditya goel
11 min readFeb 27, 2024

Question → What is a Node in Aerospike setup ?

Answer → Nodes in Aerospike are instances of the Aerospike database running on individual machines or VMs.

  • Each node is typically responsible for storing a portion of the data and processing client requests.
  • Nodes communicate with each other to maintain cluster integrity, share cluster state information, and replicate data.

Question → What is a Cluster in Aerospike setup ?

Answer → A cluster in Aerospike consists of multiple nodes working together to form a distributed database system.

  • Nodes in the cluster cooperate to store and manage data, handle client requests, and maintain high availability and fault tolerance.
  • Aerospike clusters are designed to scale horizontally, allowing for easy expansion by adding more nodes to accommodate growing data and workload demands.

Question → What is a Namespace in Aeropsike setup ?

Answer → A namespace in Aerospike is a logical container for data, similar to a database in relational databases.

  • Namespaces isolate data from each other and provide a way to organize and manage data within the cluster.
  • Each namespace has its own configuration settings, including replication factor, eviction policy, storage engine, and data retention policy.

Question → Can a namespace in Aerospike be present across multiple nodes OR Is it always present at a single node ?

Answer → In Aerospike, a namespace can be present across multiple nodes in a cluster. Aerospike distributes data across nodes in a cluster using partitions, and each namespace can span multiple nodes based on how partitions are distributed.

1.) Namespace and Partitions → Aerospike partitions the dataset within each namespace into multiple partitions. These partitions are distributed across the nodes in the cluster, allowing the data in a namespace to be stored on multiple nodes.

2.) Partition Ownership → Each node in the cluster is responsible for hosting a subset of partitions for each namespace. Aerospike uses a partition map to determine which node is responsible for each partition within a namespace.

Question → What is a Partition in Aerospike ?

Answer → In Aerospike, a partition is a logical unit of data distribution and management within the database cluster. Partitions are used to divide the dataset into smaller, manageable units that can be distributed across the nodes in the cluster for efficient storage, retrieval, and processing.

Question → What is a Set in Aeropsike setup ?

Answer → A set in Aerospike is a subset of data within a namespace, akin to a table in relational databases.

  • Sets are used to group related records together based on a common attribute or key.
  • Sets provide a way to organize and query data more efficiently and can have their own configuration settings, such as replication factor and eviction policy.

Question → What is a Record in Aeropsike setup ?

Answer → A record in Aerospike is a single unit of data stored in the database, identified by a unique key within its namespace and set.

  • Records can contain multiple bins (fields), each storing a different piece of data.
  • Records are read, written, and queried by client applications using the Aerospike API.

Question → How do Nodes communicate with each other in Aerospike ?

Answer → In Aerospike, nodes communicate with each other primarily for cluster coordination, data distribution, and replication. This communication happens through a peer-to-peer communication protocol called the “Mesh Heartbeat Protocol”. Here’s how nodes communicate with each other in Aerospike:

  • Making own state available to other nodes in cluster → Each node regularly sends heartbeat messages to all other nodes in the cluster to announce its presence and state, so that all nodes are aware of each other’s presence and status.
  • Contents of Heartbeat message → Heartbeat messages contain information about the sending node’s status, such as its health, load, and cluster membership information.
  • Purpose of heartbeat → Heartbeat messages are used to detect node failures, network partitions, and other cluster events, allowing the cluster to adapt and respond accordingly.
  • Partition Responsibility → Heartbeat messages help ensure that each node is aware of the data partitions it is responsible for and can route client requests accordingly.
  • Failover detection → When a node fails to send heartbeat messages within a certain timeout period, other nodes in the cluster detect the failure and take appropriate actions, such as initiating failover procedures or triggering data rebalancing. Therefore, Heartbeat messages facilitate dynamic cluster membership management, allowing nodes to join or leave the cluster seamlessly without manual intervention.

Question → How does a write happens in Aerospike cluster, provided that replication factor is set to 1 ?

Answer → If the replication factor is set to 1 in Aerospike, it means that each record is stored on only one node in the cluster.

1.) Master Nodes →

  • Each node in the cluster acts as the master for the data it stores.
  • When a client writes data to the cluster, the data is stored on the node responsible for the partition containing the data.
  • Each node maintains the primary copy of the data it hosts and serves read and write requests for that data.

2.) No Replicas →

  • Since the replication factor is set to 1, there are no replica copies of the data stored on other nodes in the cluster.
  • Data redundancy and fault tolerance are not provided by replication in this configuration. If a node fails, the data stored on that node becomes unavailable until the node is recovered or replaced.

Question → Is there a concept of leader in Aerospike ?

Answer → No, Aerospike does not have a concept of a “leader” node.

  • Each Node is a master for subset of data → In Aerospike’s architecture, each node operates independently and can serve read and write requests without requiring coordination from a central leader node.
  • No Centralised Leader → In Aerospike, each node is responsible for a subset of the data and can independently handle client requests. In Aerospike, all nodes in the cluster are equal peers, and there is no centralized leader responsible for coordinating cluster operations.
  • Shared-Nothing-Architecture → Aerospike employs a shared-nothing architecture where each node in the cluster operates independently and collaborates with other nodes to achieve consensus and maintain data consistency.
  • Peer-to-Peer Communication → Nodes in the cluster communicate with each other using a peer-to-peer protocol. There is no central coordinator or leader node responsible for directing cluster operations.
  • Quorum-based Consensus → Aerospike uses a quorum-based approach for achieving consensus on cluster operations, such as reads, writes, and cluster membership changes. Operations require acknowledgment from a configurable number of nodes (a quorum) to be considered successful.

Question → How does quorum based consensus works in Aerospike Write operations ?

Answer → Here is my understanding about the usage of quorum based consensus algos in Aerospike ?

1.) Consensus refers to the process of reaching an agreement among a group of nodes in a distributed system on a particular decision or operation.

2.) Quorum → A quorum is a minimum number of acknowledgments required from nodes in the cluster to consider an operation successful. Quorums are configurable and can be set based on factors such as cluster size, replication factor, and consistency requirements.

3.) Quorum-based Consistency → Aerospike uses quorum-based consistency to determine the level of acknowledgment required for read and write operations to be considered successful. For write operations, a quorum of nodes must acknowledge the write before it is considered successful. This ensures that data is replicated to a sufficient number of nodes for durability and fault tolerance.

Question → Can you show an example of code for Quorum-based Consistency with Write operations ?

Answer → Below is how, we can configure Aerospike client in Java, to require a quorum of nodes to acknowledge write operations before considering them successful. In this example, we’ll use the Java client for Aerospike :-

Note here that, consistencyLevelspecifies the consistency level for the write operation. We use CONSISTENCY_ALL to require acknowledgment from all nodes (quorum) before considering the write successful.

Question → Are all the nodes of any given Aerospike Cluster are active at any given point of time ?

Answer → Yes, in an Aerospike cluster, all nodes are typically active and participate in serving client requests at any given point in time. Aerospike employs a shared-nothing architecture, where each node operates independently and is responsible for hosting a subset of the data partitions.

Question → What happens when a particular node in aerospike cluster goes down ?

Answer → When a particular node in an Aerospike cluster goes down, Aerospike employs employs “Automatic Failover mechanism” to ensure continuous availability, fault tolerance, and data integrity.

  • Step #1.) Aerospike detects the failure of a node through heartbeats and cluster health monitoring. Upon detecting a node failure, Aerospike initiates an automatic failover process to maintain cluster availability.
  • Step #2.) Aerospike promotes the prole (replica) partitions hosted by the failed node to master status, ensuring that data remains accessible even in the absence of the primary node.

Question → Say, we have got 4 nodes (N1, N2, N3, N4) in cluster of Aerospike and we have replication factor of 2. Also imagine that, we have record R1. Now, one nodes goes down. Now what will happen ?

Answer → Initially, let’s assume that record R1 is stored on nodes N1 and N2 as primary-copy & replica-copy respectively.

1.) Node Failure (N1) → If node N1 goes down, the data on node N1, including its replicas, becomes inaccessible. However, since R1 is replicated with a factor of 2, the replica of R1 on node N2 remains available.

2.) Data Access → Clients can continue to read and write data to the operational nodes (N2, N3, N4). The data stored on node N1 is still accessible through its replicas on the other nodes.

3.) Quorum Adjustment → With a replication factor of 2, the quorum requirement is typically set to 2. Even with one node down (N1), the quorum requirement can still be met with the remaining operational nodes (N2, N3, N4).

4.) Replication Management → Aerospike detects the node failure and initiates the process of replicating the data stored on node N1 to the remaining nodes in the cluster (N2, N3, N4) to maintain the desired replication factor of 2.

5.) Rebalancing → Aerospike may initiate cluster rebalancing to redistribute the data across the remaining nodes in the cluster (N2, N3, N4) to ensure balanced data distribution and optimal performance.

Notes :-

  • Overall, even with one node (N1) down, record R1 remains available and accessible through its replica on node N2. Aerospike ensures data availability and consistency by replicating the data across multiple nodes and adjusting the cluster configuration as needed.
  • In the event of a node failure (such as N1), Aerospike will initiate the process of replicating the data stored on the failed node to the remaining nodes in the cluster (N2, N3, N4) to maintain the desired replication factor of 2.
  • Since the replication factor is 2, and one node (N1) has failed, Aerospike will ensure that another copy of the data, including record R1, is created on one of the remaining operational nodes (N2, N3, N4) to maintain redundancy and data availability. This ensures that there are always two copies of the data available in the cluster, even in the presence of node failures.
  • In practice, the time taken for data replication can range from a few seconds to several minutes, depending on the specific circumstances of the cluster and the workload it is handling. Aerospike’s replication mechanisms are designed to prioritize data availability and consistency while minimizing downtime, and the system is optimized to replicate data as quickly and efficiently as possible.

Question → To which node of a cluster does a Java-Client connects to usually in Aerospike cluster ?

Answer → In Aerospike, clients typically connect to any node in the cluster to perform database operations.

  • Aerospike employs a shared-nothing architecture, where each node in the cluster can independently handle client requests.
  • Clients can connect to any reachable node in the cluster, and the cluster will internally route requests to the appropriate node hosting the requested data.

Question → Say we have 4 nodes in aerospike cluster and If we Configure clients with all the 4 seed nodes (i.e., IP addresses of nodes in the cluster), then in this case, to which node would client connect to ?

Answer → When clients are configured with all the seed nodes (IP addresses of nodes) in the Aerospike cluster, they typically use one of the seed nodes to establish the initial connection.

  • Once the client connects to one of the seed nodes, it receives back the cluster information, and the client becomes aware of the entire cluster topology, including all nodes and their roles (master or replica).
  • After the initial connection, the client may use any of the nodes in the cluster to perform database operations. Aerospike employs a mechanism called “smart routing,” where client requests are directed to the appropriate node hosting the requested data.
  • The specific node to which the client initially connects may depend on factors such as network latency, load balancing algorithms, or the order in which the seed nodes are specified in the client configuration. However, once connected, the client can communicate with any node in the cluster, providing fault tolerance and load balancing capabilities.

Question → When a client writes data to the Aerospike cluster, how the responsible node shall be decided for storing the partition containing the data ?

Answer → Once some client writes some data to cluster as shown below :-

Behind the scenes, here is what happens :-

Step #1.) Unique Identifier Generation → When the client writes data to Aerospike, it generates a unique record identifier. This identifier is typically derived from the record’s primary key using a hashing algorithms like CRC32 OR Murmur.

Step #2.) Partition Calculation → Aerospike calculates a partition ID for the record based on its identifier. The partition ID is a numeric value that determines the partition to which the record belongs.

Step #3.) Partition Mapping → Aerospike maintains a partition map that maps each partition ID to a specific node in the cluster. This mapping is distributed across all nodes and is used to route client requests to the appropriate node.

Step #4.) Node Assignment → Each node in the cluster is responsible for a subset of partitions, and the partition map is used to route write operations to the appropriate node. Based on the calculated partition ID in Step 2 above, Aerospike determines which node in the cluster shall be responsible for hosting the partition containing the record.

Step #5.) Record Storage → Once the responsible node is identified, Aerospike stores the record on that node. The record is typically stored in memory (RAM) for fast access, and optionally persisted to disk for durability.

Question → Say we have few nodes in cluster and there happens partitioning, then what happens in Aerospike ?

Answer → In Aerospike, when a network partition occurs and nodes are isolated from each other, the cluster may split into multiple subclusters, each containing a subset of nodes.

1.) Split Brain → Aforementioned situation is known as split-brain scenario, where different parts of the cluster operate independently and may cause data inconsistencies.

2.) Split-Brain Prevention → To prevent split-brain scenarios, Aerospike employs strategies such as quorum-based decision making to ensure that only one set of nodes continues to operate as the authoritative cluster while the other set halts operations or transitions to a read-only state.

  • In the context of a split-brain scenario, quorum-based decision making ensures that only one set of nodes can form a majority and continue to operate as the authoritative cluster.
  • Nodes in minority subclusters are unable to achieve quorum and must either halt operations or transition to a read-only state to prevent data inconsistencies.

That’s all in this blog. We shall see you in next blog.

References :-

--

--

aditya goel

Software Engineer for Big Data distributed systems