Software Architecture and Designing Large scale systems

5 min readDec 10, 2023

What is Software Architecture?

Is a high-level description of a system’s structure, that also defines the interconnection and behavior of the sub-components that make up the system, in order to fulfill it’s functional requirements. Moreover, it abstracts the behind complex implementation details.

Are programming languages part of the Software Architecture Design?

No, programming languages are part of the implementation.

Advantages of desinging and implementing Software Architecture

Handle, process and store large amount of data
Thousands of users can be served every day

Examples

Social media
Video-on-demand
Online video games
Banking systems

First step to designing a Software Architecture, is to gather the Requirements

Features of the system

Functional Requirements : “Describe what the system must do”. These requirements do not determine the system architecture

Quality Attributes

Non-Functional Requirements : Scalability, High-Availability, Reliability, Security, Performance. These properties dictate the system architecture

System Constraints

Limitation and boundaries: Strict deadlines, Financial limitations (Limited budget)

Examples of Requirements

Functional Requirements

“When a user clicks on a button, a modal will be loaded displaying all the available products”

Quality Attributes

“When a user clicks on a button, a modal will be loaded displaying all the available products within at most 100 milliseconds”

Important!

No single architecture can provide all the quality attributes. Some attributes contradict each other, therefore there always will be trade-offs.

Take for example, a login page which handles username/password. If the quality attribute is to login in less than 1 second the issue here is, login pages always should use SSL/TLS encryption, which will make login page slower. So, we trade-off performance with security.

Unrealistic Quality Attributes

Extremely low-latency in areas where bandwidth is low
100% Availability, System can never fail
Full protection against hackers

System Constraints

Are essentially, decisions that were made fully or partially for us, restricting our degree of freedom.

These are considered pillars of software design
Providing a starting point, system is designed around them

Types of System Constraints

Technical Constraints

Example: Being locked to a particular hardware vendor

Business Constraints

Example: Strict deadlines or limited budget

Regulatory Constraints

Example: GDPR (General Data Protection Rule) rules on collecting, storing and sharing users data

Constraints should be taken lightly. Specifically internal constraints should be negotiated. External constraints, like legal obligations and rules are not negotiable.

Example: If limited to a particular database vendor, we should make sure our design is not tightly coupled to that technology. Usage of different technology in the future should need minimal changes.

In general, different components of the system should be loosely coupled so they can be replaced easily in the future.

Most Important Quality Attributes

Performance

Time between a client’s request and receiving a response

Response Time = Processing Time + Waiting Time

Scalability

Vertical Scalability: Replacing existing hardware/server with a newer better one, or upgrading existing hardware’s CPU, RAM etc …

Pros:

No code changes needed
All applications can benefit from it
Migration to different machines is easy

Cons

We are locked to a centralized system, which cannot provide Fault tolerance and High Availability

Horizontal Scalability: Adding more hardware/servers to process client requests

Pros

No limitations on scalability, to the number of nodes added

Easy to add, remove machines
Fault tolerance and High Availability

Cons

Code changes may be required
Increased complexity

Availability

The fraction of time/probability a system is operationally functional and accessible to the user.

Uptime = The time the system is operational and accesible by the user

Donwtime = The time the system is unavailable to the user

Availability = Uptime / (Uptime + Downtime)

Large scale systems architectural blocks

Load Balancing

Distributing a set of requests over a set of resources. Usually, A load balancer sits between the user and the server group.

Message Brokers

Used for event sourcing, asynchronous requests and promote loose coupling between services. Eliminates backpressure, coming to a service.

API Gateway

Follows an architecture pattern called API composition. Abstracts the API implementation from the consumer application.

Advantages of using an API Gateway

Seamless internal modifications
Consolidate security, authentication and authorization all in one place

Content Delivery Network (CDN)

Network of interconnected servers that caches content close to the user. Allows quick transfer for content like javascript files, HTML pages etc …*

Case Scenario

Lets design a social media platform where users can post messages, comment and react to other users posts. This system needs to scale up to server millions of users everyday.

Functional Requirements

Users must have an account to post or view other posts
Posts, can contain text, images, links or videos
Users can comment on other users posts, delete or update the comment
Users can add a reaction on other users comments, delete or update the reaction
Posts and Comments are ordered chronologically by datetime
Posts and Comments could be also ordered by number of reactions and popularity

What are the Non-Functional Requirements of this system?

Scalability — millions of users
Performance — < 400ms response time
Fault Tolerance/Availability — 99.9%
Availability / Partition Tolerance — Availability over Consistency

Lets Design the REST APIs

Users

GET /users

GET /users/userid

POST /users/signup

POST /users/login

Posts

GET /posts?limit=20&offset=0 [use API pagination]

GET /posts/{postid}

POST /posts/create

PUT /posts/{postid}/update

DELETE /posts/{postid}

Comments

GET posts/{postid}/comments/?limit=20&offset=0 [use API Pagination]

GET /posts/{postid}/comments/{commentid}

POST /posts/{postid}/comments/create

DELETE /posts/{postid}/comments/{commentid}

Reactions

GET /posts/{postid}/reactions

GET /posts/{postid}/comments/{commentid}/reactions/

POST /posts/{postid}/reaction/create

POST /posts/{postid}/comments/{commentid}/reaction/create

PUT /posts/{postid}/reactions/{reactionid}

PUT /posts/{postid}/comments/{commentid}/reactions/{reactionid}

DELETE /posts/{postid}/reactions/{reactionid}

DELETE /posts/{postid}/comments/{commentid}/reactions/{reactionid}

API gateway delegates the API calls to the appropriate micro-service . In addition, to eliminate duplicate request processing, we introduce an API cache to store responses and serve them when needed.
Backend services splitted to multiple Micro-Services. Each micro-service is replicated. Micro-services are making use of Load balancing to handle multiple requests and to be fault-tolerant.
Each micro-service is connected to its own database. Users service is communicating with an SQL relational Database, the rest of the micro-services are assinged a NoSql document database to store massive amounts of data and of course to process large amount of data in the minimum amount of time.
Posts service Database is being sharded, in order to handle massive amount of posts from millions of users simultaneously. Sharding means to split the database based on the primary key, from a specific range to a specific range. For example, Shard 1 consists of records from 0–10000, shard 2 10001–20000 and shard 3 20001–30000.
A meesage broker is assinged between the API gateway and Reactions service, in order to remove backpressure and buffer millions of reactions from millions of users.
Ranking service is used to extract analytics from Reactions and Posts service, making use of Batch Processing for Big Data.
A distributed CDN cloud storage, is also used to store static content.
Web App service is used to store the images uploaded in conjuction with users posts and comments.

Summary

Did we satisfy the Non Functional Requirements?

Lets summarize the points added to the Architecture Diagram

Scalability — millions of users

Load Balancing

Database sharding

API gateway

Micro-services replication

Performance — < 400ms response time

CDN cloud storage

API gateway Cache

Message broker