Help the world stop coronavirus! Stay home!

Prev Next

DataStructures / System Design

Could not find what you were looking for? send us the question and we would be happy to answer your question.

Difference between horizontal and vertical scaling.

Horizontal Scaling also referred to as "scale-out" is the addition of more machines or setting up a cluster/distributed environment for your software system. This usually requires a load-balancer program which is a middleware component in the standard 3 tier client-server architectural model.

Vertical Scaling also referred to as "scale-up" approach is an attempt to increase the capacity of a single machine by adding more processing power (CPU) or by adding more storage memory (RAM).

What is a Load Balancer?

Load balancer is responsible to distribute user requests (load) among the various back-end systems/nodes in the cluster. Each of these back-end machines runs a copy of your software and hence capable of servicing requests.

Another common responsibility is "health-check" where the load balancer uses the "ping-echo" protocol or exchanges heartbeat messages with all the servers to ensure they are up and running fine.

Explain few load balancing algorithms that you know.
  • Round Robin also called as "Next in Loop".
  • Weighted Round Robin, similar to Round Robin, but some servers get a larger share of the overall traffic.
  • Random.
  • In Source IP hash Connections are distributed to backend servers based on the source IP address. If a web node fails and is taken out of service the distribution changes. As long as all servers are running a given client IP address will always go to the same web server.
  • Using Least connections, the load balancer monitors the number of open connections for each server and sends to the least busy server.
  • Least traffic. The load balancer monitors the bitrate from each server and sends to the server that has the least outgoing traffic.
  • Least latency. The load balancer makes a quick HTTP OPTIONS request to backend servers, and sends the request to the first server to answer.
Explain CAP theorem.

The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:

Consistency: Every read receives the most recent write or an error.

Availability: Every request receives a (non-error) response – without guarantee that it contains the most recent write.

Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

Explain the BASE property of the database.

Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem.

Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.

Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time.

Explain ACID properties of database transactions.
What is database sharding?

A database shard is a horizontal partition of data in a database. Each individual partition is referred to as a database shard. Each shard is held on a separate database server instance, to spread the load. Some data within a database remains present in all shards,[notes 1] but some appear only in a single shard. Each shard (or server) acts as the single source for this subset of data.

Difference between database sharding and partitioning.

Partitioning is a general term used to describe the act of breaking up your logical data elements into multiple entities for the purpose of performance, availability, or maintainability.

Sharding is the equivalent of "horizontal partitioning".

"Vertical partitioning" is the act of splitting up the data stored in one entity into multiple entities for space and performance reasons.

Difference between eventual and strong consistency in Distributed Databases.

Eventual consistency makes sure that data of each node of the database gets consistent eventually. Time taken by the nodes of the database to get consistent may or may not be defined.

In Strong consistency, data will get passed to all the replicas as soon as a write request comes to one of the replicas of the database. But while these replicas are being updated with new data, response to any subsequent read/write requests gets delayed as all replicas are busy in keeping each other consistent.

How to choose between SQL and No-SQL Database?

SQL database is a better choice for any business that has the pre-defined structure and set schemas. Applications that involve multi-row transactions - like accounting systems, warehousing, payment systems can be benefitted using SQL database.

NoSQL database is a good choice for businesses that have rapid growth or databases with no clear schema definitions. If you cannot define a schema for your database, or if your schema continues to change for apps such as mobile apps, real-time analytics, content management systems, it is the better choice.

What is TLS?

Transport Layer Security (TLS) is a cryptographic protocol that provide communications security over a computer network. The TLS protocol aims primarily to provide privacy and data integrity between two communicating computer applications that ensure private connection and maintain integrity.

Explain Request Throttling.

Throttling is a process that is used to control the usage of APIs by consumers during a given period. You can define throttling at the application level and API level. Throttling limit is considered as cumulative at API level.

Difference: hard vs soft real-time system.

Hard real-time expects every hit must meets its deadline. Hard real-time systems very few and used in medical and defense fields.

Soft real-time systems, also known as firm real-time system, allow some hits if it miss deadline. That is considered common scenario although too many misses are not tolerated.

What is NAT-T (NAT Traversal)?

Nat Traversal also known as UDP encapsulation allows traffic to get to the specified destination when a device does not have a public address. This is usually the case if your ISP is doing NAT, or the external interface of your firewall is connected to a device that has NAT enabled.

What does HLS stand for?

HLS stands for HTTP Live Streaming. HLS is a media streaming protocol for delivering visual and audio media to viewers over the internet.

Its adaptive bitrate video delivery is a combination of server and client software that detects a client's bandwidth capacity and adjusts the quality of the video stream between multiple bitrates and/or resolutions.

Security measures to follow when you are developing your projects.

Perform security tests in CD/CD: CI/CD processes and tools are great places to include security tools and security uni-test cases. Generally, developers are amenable to fixing flagged vulnerabilities on merges but more resistant to addressing large security coding problems prior to shipping a product/service.

Understand your software supply chain: include automation in CI/CD process to create a list of third-party components. OWASP Dependency check can help you identify your components, versions, and known vulnerabilities associated with each library version.

Upgrade your libraries: To benefit from vulnerability remediation in your software supply chain, you must upgrade your third-party libraries.

Use popular third-party libraries: Use only well-maintained third-party libraries. Libraries that are not well-maintained will impact your security agility and will leave you vulnerable longer to well-known issues.

Design for easy upgrades: Establish development standards for code compliance. For example, you may be compiling with Java 11, but you can mandate code compliance to Java 9. The benefit of separating your build compliance from software code compliance is that you have more flexibility to downgrade or upgrade.

Avoid poor configuration: avoid hardcoded/clear text passwords in configuration. Use information from sites like "Security/Server side TLS at Mozilla" to generate configuration/ provide cipher suite recommendations.

Protect against MITM attacks: Employ encryption to defend against man-in-the-middle (MitM) attacks.

Protect against replay attacks: You can use a cryptographically secure nonce to defend against a replay attack.

Apply security controls on the server: Designers often apply validation to Javasript front-ends for web apps. It's acceptable to include validation on the client for performance reasons, saving a round-trip to the server. However, client security can't be applied in lieu of server security. All security controls must be deployed on the server since attackers can bypas browser security controls by calling your web service interfaces directly.

Protect against credential leakage: Ensure all servers communicate securely through encrypted connections. Don't store credentials in the clear. Store passwords hashed and salt hashes on a per-user basis.

Explain about EquiFax 2017 security incident.

In July 2017, Equifax suffered a breach, disclosing 150 million customer records. The exploit was due to a known vulnerability in the Apache struts2 library. Failure to patch quickly placed Equifax and its customers at risk.

From the operational perspective, patching can destabilize production systems and lead to outages however from security perspective, "patch often" is the motto. Security patches remediate known vulnerabilities that provide an easy means for attackers to exploit products and services.

Different Injection defects.
  • Cross-site scripting (XSS),
  • SQL injection,
  • Command injection,
  • Insecure redirects,
  • Insecure file upload/download,
  • and Buffer overflow.
Different Authentication & access control defects.
  • Insufficient authentication,
  • Insufficent authorization,
  • Parameter tampering,
  • and Cross-Site request forgery (CSRF).
Causes of Data protection defects.
  • Insecure cryptographic algorithm,
  • Insecure password management,
  • Insecure session management,
  • and information exposure.
What is Cross-site scripting?

Cross-site scripting (XSS) occurs when malicious code is included in an HTML response, that alters the way the page is rendered. The malicious data is interpreted as script and executed on the client's browser.

There are 2 types of XSS.

Reflected: Data from the incoming HTML request is returned in the outgoing HTML response. This type of XSS targets a specific user by exploiting a defect in the application that results in the application returning the malicious data back to the user.

Persisted: Data on the server is included in the outgoing HTTP response. Usually targets one or more users.

XSS can result in unavailability, defacement, unauthorized access, session hijacking, identity theft, account harvesting, or full compromise of the system.

To mitigate XSS risk:

  • Use appropriate encoding on data to change HTTP responses.
  • Utilize HTTP security headers, such as content security policy, and use safe APIs such as textContent instead of innerHtml.
  • Consider using client-side templating libraries.

What is SQL injection?

SQL injection is the highest application security concern because it's well known, easy to perform and operates on the database server.

SQL injection occurs when:

  • Malicious data is used to construct SQL statements via string concatenation, thus commingling executable code and data.
  • Executable code and data are commingled; the database server parsing the query may interpret the "data" as executable code/query. This allows the data to alter the meaning/result of the SQL query.
  • A single attack can leak an entire database, alter or destroy a database, or even lead to a compromise of the server where the database is deployed.

To mitigate the SQL injection risk:

  • Use parameterized queries with bind variables to ensure that data added to the statement cannot alter the intention of the statement.
  • Validate untrusted data; data submitted by end users.

SQL Injection example:

In the following code $user_name and $password represent variable for user input, which are then used to create a SQL statement.

SELECT * FROM BankAccounts WHERE username=$user_name AND password = $password;

A malicious user may provide input which bypasses the intended functionality of the query and which actually grants unlimited access, just by simply commenting out the "AND" portion of the SQL statement.

SELECT * FROM BankAccounts WHERE username='admin'--'AND password = 'password123';

Notice the comment indicator (--) after the username. The database will interpret the rest of the query as a comment, ignoring the password verification. The effective statement that gets executed is:

SELECT * FROM BankAccounts WHERE username='admin'

The attacker who submitted that login attempt would be granted access to resources owned by the "admin" account.

Explain Command injection.

Command injections attacks exploit application functionality that makes system calls or commands using untrusted data.

Attacks become possible when an application passes unsafe user-supplied data such as forms, cookies, and HTTP headers to the system as part of a shell command. This type of security lapse occurs due to poor security architecture. While input validation can help prevent successful attacks, the failure to keep data isolated from code is the source of risk.

When attacks occur on an application server, this may compromise the server or result in data exposure.

To mitigate command injection risk:

  • Do not pass untrusted data to system calls or commands.
  • Validate untrusted data against a whitelist and encode data to protect again problematic characters.
What is meant by Insecure redirects?

This type of injection defects occur where untrusted data redirects used to faulty/malicious sites.

Redirects allows web application to direct users to different pages within the same application or to an external site. An insecure redirect sends the user to an untrusted or malicious site.

To prevent insecure redirects:

  • Treat all data received from a client as untrusted.
  • Define redirect URLs within the application using trusted, whitelisted data.
Explain about "insecure upload/download" injection defect.

Uploading/downloading files in an insecure manner is a broad type of risk that covers path manipulation, data caching, file handling, malware and anti-virus, access control, and bandwidth concerns.

Path manipulation is a major concern. For this type of injection defect, untrusted data is used to construct the path to the resource. For example:

  • Path manipulation for a download may allow unauthorized access to a resource.
  • Path manipulation for a upload may allow a file to be placed on an unauthorized location covering up a legitimate resource.
  • Path manipulation for an upload may allow an executable file to open up "back-doors".

Mitigation strategies:

  • Do not use untrusted data in file paths or names; instead, use server-generated file paths and names.
  • Restrict the application's access to its home directory and subdirectories, thereby leveraging the operating system's access controls.
  • Normalize the path prior to validation and authorization checks when using untrusted data as part of a file path. Validate the path against a whitelist.
What is the "Buffer overflow" attack?

Buffer overflow occurs when an application writes more data into an area of memory, called a buffer than was intended.

Buffers are created to contain a finite amount of data. When the data is longer than expected, data will overflow into one or more adjacent memory locations (buffers) replacing the original data. This results in:

  • Erratic program behaviour.
  • Data exposure to unauthorized parties.
  • Processor tricked into running arbitrary code.

Mitigation strategies:

  • Check the length of data and limit it to the expected size.
  • Never assume that code will safely handle untrusted data.
  • Use libraries explicitly created to perform string and other memory operations in a secure fasion.
Differentiate Authentication and Authorization.

Authentication is the act of proving one's identity. Authorization is the act of proving one's access privileges.

What is parameter tampering?

Parameter tampering, also known as insecure direct object reference, occurs when attackers manipulate parameters exchanged between client and server to gain access unauthorized access to data.

Examples of parameter values frequently manipulate include:

  • cookies.
  • URL parameters.
  • Drop-down list, Radio buttons and checkboxes.
  • database primary fields are stored in hidden fields.

Mitigation strategies:

  • Perform resource entitlement checks on every data access request.
  • Do not rely on client-provided information for authorization, other than the sessionID. Map sesionIDs to primary keys and other fields as a server side operation.
  • Implement tokenization, where the database primary keys are indirectly referenced.
What is Cross-site request forgery (CSRF)?

Cross-site request forgery (CSRF) occurs when a malicious website, email, blog, instant message, or program causes a user's web browser to perform an unwanted action on a trusted site where the user is currently authenticated.

These attacks can make use of a target system's normal functions -- such as transferring funds, changing passwords, using the target's browser without the knowledge of the target user.

Mitigation strategies:

  • Do not rely solely on the presence of a valid sessionID or a cookie.
  • Include a unique, single-use value in every response sent to the browser and then validated that token when a request is submitted.
  • Require users to re-authenticate for high-risk transactions.
What is SACM (Service Asset and Configuration management)?

SACM is a primary information technology-business process that is foundational and required to mitigate system vulnerabilities and risk of cyberattacks against any organization.

It is a collection of processes that achieve operational control, systematic onboarding, validation, updates, maintenance, and disposal of technology assets as well as management of configuration items.

What is digital accessibility?

Digital accessibility is about making digital products and services accessible to those with disabilities. A website, application or document is accessible when a person with diverse abilities can use it to perform the task or access the service for which it is intended without reliance on the assistance of other people.

«
»
MongoDB

Comments & Discussions