Prev Next

Database / Apache Cassandra Interview Questions

Could not find what you were looking for? send us the question and we would be happy to answer your question.

1. What is Apache Cassandra?

Cassandra is an free, open-source, distributed, and NOSQL database management system used to handle large amount of data. Cassandra provides high availability without any failure.

Cassandra is written in Java. It is originally designed by Facebook consisting of flexible schemas. It is highly scalable for big data.

Cassandra has its own Cassandra Query Language (CQL). CQL is a simple interface for accessing Cassandra, as an alternative to the traditional Structured Query Language (SQL).

2. Key features of Cassandra.

  • Open-source availability.
  • Distributed footprint.
  • Scalability.
  • Cassandra Query Language.
  • Fault tolerance.
  • Schema free.
  • Tunable consistency.
  • Fast writes.
  • Peer-to-peer architecture.
3. Compare Cassandra Vs Relational Databases.

CassandraRDBMS
Data may be unstructured. Only structured data.
Flexible schema.Fixed schema.
Data is written in many locations.Data is written in mostly one location.
In Cassandra, a table is a list of "nested key-value pairs". (Row x Column Key x Column value)In RDBMS, a table is an array of arrays. (Row x Column)
Keyspace is the outermost container which contains data corresponding to an application.Database is the outermost container which contains data corresponding to an application.
4. How does Cassandra store data?

The data storage path in Cassandra begins with the memtable where the data is stored temporarily and also to a commit log. And once committed, the data is periodically flushed and written into SSTable.

  • Logging data in the commit log,
  • Writing data to the memtable,
  • Flushing data from the memtable,
  • Storing data on disk in SSTables.
5. What are SSTables in Cassandra?

SSTables are the immutable data files that Cassandra uses for persisting data on disk. As SSTables are flushed to disk from memtables or are streamed from other nodes, Cassandra triggers compactions which combine multiple SSTables into one. Once the new SSTable has been written, the old SSTables can be removed.

6. What is CommitLog in Cassandra?

Commitlogs are an append only log of all mutations local to a Cassandra node. Any data written to Cassandra will first be written to a commit log before being written to a memtable. This provides durability in the case of unexpected shutdown. On startup, any mutations in the commit log will be applied to memtables.

7. What are Memtables in Cassandra?

Memtables are in-memory structures where Cassandra buffers writes. In general, there is one active memtable per table. Eventually, memtables are flushed onto disk and become immutable SSTables.

8. What is the NoSQL database?

NoSQL, also referred to as "not only SQL", "non-SQL", is an approach to database design that enables the storage and querying of data outside the traditional structures found in relational databases. While it can still store data found within relational database management systems (RDBMS), it just stores it differently compared to an RDBMS. The decision to use a relational database versus a non-relational database is largely contextual, and it varies depending on the use case.

Instead of the typical tabular structure of a relational database, NoSQL databases, house data within one data structure, such as JSON document.

9. Advantages of NoSQL Databases.

  • Handle large volumes of data at high speed with a scale-out architecture Store unstructured, semi-structured, or structured data.
  • Enable easy updates to schemas and fields.
  • Be developer-friendly.
  • Take full advantage of the cloud to deliver zero downtime.
10. What is CQL?

CQL query language is a NoSQL interface that is intentionally similar to SQL, providing users who are comfortable with relational databases a familiar language that ultimately lowers the barrier of entry to Apache Cassandra.

11. What are the main components of Cassandra?

The components of Cassandra are:

  • Node
  • Data cluster
  • Commit log
  • Cluster
  • Mem-table
  • SSTable
  • Bloom filter
12. What is a Node in Cassandra?

A node represents a single instance of Cassandra. These nodes communicate with one another through a protocol called gossip, which is a process of computer peer-to-peer communication. Since it is a distributed database, Cassandra can (and usually does) have multiple nodes.

Node is where the data is stored.

13. What is the Data Center and Cluster in Cassandra?

Cassandra Datacenter is a group of nodes related and configured within a cluster for replication purposes. A datacenter is a logical set of racks. The datacenter should contain at least one rack.

A cluster is a component that contains one or more datacenters.

14. What is meant by Cassandra rack?

A rack is a collection of servers. A Cassandra rack is a logical grouping of nodes within the ring.

15. Difference between Memtable and SSTable.

MemTable doesn't store the data. It temporarily accumulates 'write data', while SStable, store the data from Memtable into the Cassandra database. The data stored in SSTable is permanent and cannot be changed.

16. Explain the concept of Bloom Filter in Cassandra.

Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native memory) data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.

17. What is Cqlsh in Cassandra?

Cqlsh (Cassandra Query Language Shell) configures the CQL interactive terminal. It is a Python-based command-line prompt used on Linux or Windows and executes CQL commands like ASSUME, CAPTURE, CONSISTENCY, COPY, DESCRIBE, and many others. With cqlsh, users can define a schema, insert data, and execute a query.

18. What is source command in Cassandra?

Source command is used to execute a file consisting of CQL statements.

SOURCE '~/data/insert_data.cql'

19. What is the purpose of using thrift in Cassandra?

Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.

20. What is replication factor in Cassandra?

Replication factor (RF) is the number that determines how many nodes get the copy of the same data in the cluster. For example, three nodes in the ring will have copies of the same data with RF=3.

21. Explain Cassandra Data Model.

Cassandra data model consists of four main components:

  • Cluster: Made up of multiple nodes and keyspaces.
  • Keyspace: A namespace to group multiple column families, especially one per partition.
  • Column: Consisting of a column name, value, and timestamp.
  • Column Family: Multiple columns with the row key reference.
22. What is Super Column in Cassandra?

A super column is a special column, therefore, it is also a key-value pair. But a super column stores a map of sub-columns.

Generally column families are stored on disk in individual files. Therefore, to optimize performance, it is important to keep columns that you are likely to query together in the same column family, and a super column can be helpful here.Given below is the structure of a super column.

«
»
Amazon DynamoDB Interview questions

Comments & Discussions