Software Architecture and Designing Large scale systems
What is Software Architecture?
Is a high-level description of a system’s structure, that also defines the interconnection and behavior of the sub-components that make up the system, in order to fulfill it’s functional requirements. Moreover, it abstracts the behind complex implementation details.
Are programming languages part of the Software Architecture Design?
No, programming languages are part of the implementation.
Advantages of desinging and implementing Software Architecture
- Handle, process and store large amount of data
- Thousands of users can be served every day
Examples
- Social media
- Video-on-demand
- Online video games
- Banking systems
First step to designing a Software Architecture, is to gather the Requirements
Features of the system
- Functional Requirements : “Describe what the system must do”. These requirements do not determine the system architecture
Quality Attributes
- Non-Functional Requirements : Scalability, High-Availability, Reliability, Security, Performance. These properties dictate the system architecture
System Constraints
- Limitation and boundaries: Strict deadlines, Financial limitations (Limited budget)
Examples of Requirements
- Functional Requirements
“When a user clicks on a button, a modal will be loaded displaying all the available products”
- Quality Attributes
“When a user clicks on a button, a modal will be loaded displaying all the available products within at most 100 milliseconds”
Important!
No single architecture can provide all the quality attributes. Some attributes contradict each other, therefore there always will be trade-offs.
Take for example, a login page which handles username/password. If the quality attribute is to login in less than 1 second the issue here is, login pages always should use SSL/TLS encryption, which will make login page slower. So, we trade-off performance with security.
Unrealistic Quality Attributes
- Extremely low-latency in areas where bandwidth is low
- 100% Availability, System can never fail
- Full protection against hackers
System Constraints
Are essentially, decisions that were made fully or partially for us, restricting our degree of freedom.
- These are considered pillars of software design
- Providing a starting point, system is designed around them
Types of System Constraints
- Technical Constraints
Example: Being locked to a particular hardware vendor
- Business Constraints
Example: Strict deadlines or limited budget
- Regulatory Constraints
Example: GDPR (General Data Protection Rule) rules on collecting, storing and sharing users data
Constraints should be taken lightly. Specifically internal constraints should be negotiated. External constraints, like legal obligations and rules are not negotiable.
Example: If limited to a particular database vendor, we should make sure our design is not tightly coupled to that technology. Usage of different technology in the future should need minimal changes.
In general, different components of the system should be loosely coupled so they can be replaced easily in the future.
Most Important Quality Attributes
Performance
Time between a client’s request and receiving a response
Response Time = Processing Time + Waiting Time
Scalability
Vertical Scalability: Replacing existing hardware/server with a newer better one, or upgrading existing hardware’s CPU, RAM etc …
Pros:
- No code changes needed
- All applications can benefit from it
- Migration to different machines is easy
Cons
- We are locked to a centralized system, which cannot provide Fault tolerance and High Availability
Horizontal Scalability: Adding more hardware/servers to process client requests
Pros
No limitations on scalability, to the number of nodes added
- Easy to add, remove machines
- Fault tolerance and High Availability
Cons
- Code changes may be required
- Increased complexity
Availability
The fraction of time/probability a system is operationally functional and accessible to the user.
Uptime = The time the system is operational and accesible by the user
Donwtime = The time the system is unavailable to the user
Availability = Uptime / (Uptime + Downtime)
Large scale systems architectural blocks
Load Balancing
Distributing a set of requests over a set of resources. Usually, A load balancer sits between the user and the server group.
Message Brokers
Used for event sourcing, asynchronous requests and promote loose coupling between services. Eliminates backpressure, coming to a service.
API Gateway
Follows an architecture pattern called API composition. Abstracts the API implementation from the consumer application.
Advantages of using an API Gateway
- Seamless internal modifications
- Consolidate security, authentication and authorization all in one place
Content Delivery Network (CDN)
Network of interconnected servers that caches content close to the user. Allows quick transfer for content like javascript files, HTML pages etc …*
Case Scenario
Lets design a social media platform where users can post messages, comment and react to other users posts. This system needs to scale up to server millions of users everyday.
Functional Requirements
- Users must have an account to post or view other posts
- Posts, can contain text, images, links or videos
- Users can comment on other users posts, delete or update the comment
- Users can add a reaction on other users comments, delete or update the reaction
- Posts and Comments are ordered chronologically by datetime
- Posts and Comments could be also ordered by number of reactions and popularity
What are the Non-Functional Requirements of this system?
- Scalability — millions of users
- Performance — < 400ms response time
- Fault Tolerance/Availability — 99.9%
- Availability / Partition Tolerance — Availability over Consistency
Lets Design the REST APIs
- Users
GET /users
GET /users/userid
POST /users/signup
POST /users/login
- Posts
GET /posts?limit=20&offset=0 [use API pagination]
GET /posts/{postid}
POST /posts/create
PUT /posts/{postid}/update
DELETE /posts/{postid}
- Comments
GET posts/{postid}/comments/?limit=20&offset=0 [use API Pagination]
GET /posts/{postid}/comments/{commentid}
POST /posts/{postid}/comments/create
DELETE /posts/{postid}/comments/{commentid}
- Reactions
GET /posts/{postid}/reactions
GET /posts/{postid}/comments/{commentid}/reactions/
POST /posts/{postid}/reaction/create
POST /posts/{postid}/comments/{commentid}/reaction/create
PUT /posts/{postid}/reactions/{reactionid}
PUT /posts/{postid}/comments/{commentid}/reactions/{reactionid}
DELETE /posts/{postid}/reactions/{reactionid}
DELETE /posts/{postid}/comments/{commentid}/reactions/{reactionid}
- API gateway delegates the API calls to the appropriate micro-service . In addition, to eliminate duplicate request processing, we introduce an API cache to store responses and serve them when needed.
- Backend services splitted to multiple Micro-Services. Each micro-service is replicated. Micro-services are making use of Load balancing to handle multiple requests and to be fault-tolerant.
- Each micro-service is connected to its own database. Users service is communicating with an SQL relational Database, the rest of the micro-services are assinged a NoSql document database to store massive amounts of data and of course to process large amount of data in the minimum amount of time.
- Posts service Database is being sharded, in order to handle massive amount of posts from millions of users simultaneously. Sharding means to split the database based on the primary key, from a specific range to a specific range. For example, Shard 1 consists of records from 0–10000, shard 2 10001–20000 and shard 3 20001–30000.
- A meesage broker is assinged between the API gateway and Reactions service, in order to remove backpressure and buffer millions of reactions from millions of users.
- Ranking service is used to extract analytics from Reactions and Posts service, making use of Batch Processing for Big Data.
- A distributed CDN cloud storage, is also used to store static content.
- Web App service is used to store the images uploaded in conjuction with users posts and comments.
Summary
Did we satisfy the Non Functional Requirements?
Lets summarize the points added to the Architecture Diagram
- Scalability — millions of users
Load Balancing
Database sharding
API gateway
Micro-services replication
- Performance — < 400ms response time
CDN cloud storage
API gateway Cache
Message broker
Micro-services replication
- Fault Tolerance/Availability — 99.9%
- Availability / Partition Tolerance — Availability over Consistency
Micro-services replication and Database replication