High-performance and high concurrency Web architecture

Bai Yang 2015-05

baiy.cn

 

Typical Web Use Cases

The following figure shows a typical example of high-load Web applications.

The above figure shows a typical high-performance Web application with three-layer architecture. This is a proven architecture that has been widely deployed in many large-scale Web applications including Google, Yahoo, Facebook, Twitter, and Wikipedia.

 

Reverse Proxy

The reverse proxy server, which is in the outside layer of the architecture, accepts connection requests from users. In real use cases, the proxy server will also need to complete at least some the tasks listed below:
  • Connection management: maintains the connection pools on the client side and application server side, manages Keep-Alive connections, and terminates them after time out.
     
  • Attack detection and isolation: all requests associated with business logic will be sent to and processed by the back-end application server, because the reverse proxy service does not handle any dynamic content generation tasks. Thus, the reverse proxy service will almost not be affected by program or back-end service vulnerabilities. The reliability and security of reverse proxy service only depends on the product itself. Deploying a reverse proxy server at the front-end of the application server can effectively set up a reliable isolation and attack detection mechanism between the back-end applications and remote users.

    When higher security is needed, users can add additional network isolation device like hardware firewall at boundary positions of external network, reverse proxy, back-end applications and database.
     
  • Load balance: use Round Robin or the "Least Connections First" service policy to achieve load balance based on user requests, or utilize SSI technology to divide a user request into several parallel parts and submit them to several application servers separately.
     
  • Distributed cache acceleration: Deploy reverse proxy servers in groups at network boundaries that are geographically close to hot areas, and accelerate network applications by providing cache service at locations close to clients. This has established a CDN network.
     
  • Static file server: when a static file request is received, the server directly returns the file without submitting the request to the back-end application server.
     
  • Dynamic response cache: caches the dynamically generated responses that will not change for a period, to prevent the background server from frequently executing repeated query and calculation.
     
  • Data compression: enables GZIP/ZLIB compression algorithms for returned data in order to save bandwidth.
     
  • Data encryption (SSL Offloading): enables SSL/TLS encryption for communications with clients.
     
  • Fault detection and Fault tolerance: tracks the health status of back-end application servers, to avoid sending requests to a faulty server.
     
  • User authentication: completes tasks including user login and session establishment.
     
  • URL alias: establishes a uniform URL alias in order to hide the real location.
     
  • Applications mixture: mixes different Web applications together using SSI and URL mapping technology
     
  • Protocol conversion: provides protocol conversion service for back-end applications that use protocols like SCGI and FastCGI.

The popular reverse proxy services include Apache httpd+mod_proxy, IIS+ARR, Squid, Apache Traffic Server, Nginx, Cherokee, Lighttpd, HAProxy, Varnish, and etc.

 

Application Service

The application service layer is located between the back-end service layer (e.g., database) and the reverse proxy layer. It receives connection requests forwarded by the reverse proxy, and downwards accesses structured storage and data query services provided by the database.

This layer has implemented all business logic associated with Web applications, and usually needs to complete a lot of calculation and dynamic data generation tasks. The nodes within the application layer may not be fully equivalent, and may be separated into different service clusters with SOA or μSOA architecture. Working in combination with the asynchronous Web framework provided by libutilitis, it is realistic to use C/C++ to implement Web applications that leave its rivals far behind in terms of functionality and effectiveness.

Above figure shows a typical working model with high concurrency and high performance. Each Web application node (represented by boxes labelled as "App" in above figure) usually works on its own server (physical server or VPS), and several nodes can work in parallel in order to easily achieve horizontal scaling (scale-out).

In the above example, a Web application node comprises three key parts: I/O callback threads pool, Web requests queue, and back-end worker threads pool. The workflow is as follows:

  1. When a Web request arrives, the operating system informs AIO callback thread to process this arrived Web request, through the I/O completion (or I/O ready) callback mechanisms which are closed related to the platform such as IOCP, epoll, kqueue, event ports, real time signal (posix aio), /dev/poll, and pollset.
     
  2. When a worker thread in the AIO callback pool receives an arrived Web request, it attempts to pre-process the request. During pre-processing, local high-speed cache will be used to avoid data query which requires relatively higher cost. If local cache is matched, it will directly return the result (still using asynchronous method) to the client and will complete this request.
     
  3. If the queried data is not matched in local cache, or the Web request needs writing to the database, the AIO callback thread will put this request into the specified queue. The request will wait for an idle thread in the worker threads pool to further process it.
     
  4. Each thread in the back-end worker threads pool maintains two Keep-Alive connections: one is connected to the bottom layer database service, and the other is connected to the distributed caching (memcached) system. The worker threads pool has implemented a connection pool mechanism for both the database and distributed cache, through the method that each worker thread maintains its own Keep-Alive connections. Keep-Alive connection has substantially improved application processing efficiency and network utilization by repeated use of a single network connection for different requests.
     
  5. Back-end worker threads wait for new requests to arrive in the Web requests queue. Once getting a new request from the queue, the thread will first attempt to match the data being queried by the request with distributed cache, if there is no match or this request needs further processing such as database writing, this Web will be directly completed through database operations.
     
  6. After a Web request is fully processed, the worker thread will return the result as a Web response to the specified client using asynchronous I/O method.
     

The above procedures are intended to give you a general understanding about how a typical Web application node works. It is worth noting that different Web applications may have very different working model and architecture because of different design concept and functions.

Note that the edge-triggered AIO event notification mechanisms like Windows IOCP and POSIX AIO Realtime Signal are different with level-triggered notification mechanisms like epoll, kqueue and event ports. In order to prevent the I/O completed events queue from being too long or overflow, causing the memory buffer being locked in the nonpaged pool for a long time, the above mentioned AIO callback mechanism is composed of two separate thread pools and one AIO completed events queue. One thread pool is responsible for continuously listening for events arrived at the AIO completed events queue, and then submit the events to an internal AIO completed events queue (this queue works under user mode and will never lock memory; the queue length is user-customizable.); and simultaneously, the other thread pool is waiting on this internal AIO queue, and processes AIO completed events that arrives at the queue. This type of design can reduce workload for the operating system, and can avoid message loss, memory leak and memory exhaustion that may occur in extreme situations. Also, it can help the operating system to better manage and utilize its nonpaged pool.

As a typical use case, most of Google Web applications like search engine and Gmail are implemented using C/C++. Thanks to the high efficiency and powerfulness of C/C++ languages, Google provides global Internet users with the best Web experience, and also has achieved completing a Web search among its millions of distributed servers around the world at total consumption of 0.0003 kW·h only. For further discussion on Google Web application architecture and hardware scaling, refer to http://en.wikipedia.org/wiki/Google and http://en.wikipedia.org/wiki/Google_search.

 

Database and memcached Services

Database service offers relational or structured data storage and query service for upper layer Web applications. Depending on specific use case, Web applications can provide access to different database services using plugin mechanisms like database connector. Under this architecture, users can flexibly choose or change to a database product which is most suitable for their needs. For example, users can use embedded engine like SQLite for quick deployment and functions verification at POC stage, and can switch to MySQL database solution which is cheaper at the preliminary stage. And when business needs increase and database workload becomes heavy, users can migrate to a more expensive and complex solution such as Clustrix, MongoDB, Cassandra, MySQL Cluster and Oracle.

Memcached is a distributed data objects caching service fully based on memory and <Key, Value> pair. It offers unbelievable performance and has a large distributed architecture which eliminates the need for inter-server communication. For high-load Web applications, memcached is an important service often used to speed up database access. It is not a mandatory component, so users can wait to deploy it till the time when performance bottleneck shows up in their database service. It is worth noting that though memcached is not a mandatory component, its deployments in large-scale Web applications (e.g., YouTube, Wikipedia, Amazon.com, SourceForge, Facebook, and Twitter) has proved that memcached not only can keep performing stably under high-load environments, but also can dramatically improve the overall performance of data query. For further discussion on memcached, refer to http://en.wikipedia.org/wiki/Memcached.

However, we should note that distributed caching systems like memcached are intrinsically a compromise solution that improves the average access performance at the cost of consistency. Caching service adds distributed replicas of some records in database. For multiple distributed replicas of the same piece of data, it is impossible to guarantee the strong consistency unless we employ consensus algorithms like Paxos and Raft.

Contradictorily, memory cache itself is meant to improve performance. Thus it is unrealistic to employ the above mentioned expensive consensus algorithms. These algorithms require each access request to simultaneously access the majority replica including master and slave nodes in the background database. Obviously, this will make performance even lower than not using caching service.

Furthermore, the consensus algorithms like Paxos and Raft can only guarantee strong consistency at single record level. That means there is no guarantee for transaction-level consistency.

Distributed caching will add complexity to the program design and will increase access delay in unfavoured circumstances such as RTT delay upon unmatched, delay upon node offline or network communication issues.

Since 20 years ago, the mainstream database products have implemented proven multi-layer (e.g., disk block, data page and query result set) caching mechanism with high match rate. Now that distributed caching mechanisms have so many drawbacks while database products have excellent built-in caching mechanisms, why the former have become an important foundation for modern high-load Web App?

The intrinsic reason is, in the technology environment ten years ago, the RDBMS (SQL) system with poor scale-out capability had become the bottleneck for network applications like Web App to expand. Thus, NoSQL database products represented by Google BigTable, Facebook Cassandra, MongoDB and SequoiaDB, and distributed caching systems represented by memcached and redis emerged in succession, all playing an important role.

Compared with "traditional" SQL database products like MySQL, ORACLE, DB2, MS SQL Sever, and PostgreSQL, both NoSQL database and distributed caching systems has sacrificed strong consistency to get higher scale-out capability.

This kind of sacrifice was a painful choice under the technology conditions at that time. Systems have become complex: traditional RDBMS is used for places where ACID transaction and strong consistency are required and data volume is small; distributed caching systems are preferred for places where there is "more read and less write" but there is still some room for compromising consistency; NoSQL is used for big data with even lower requirement for consistency; if the data volume is large and there is strict requirement for consistency, sharding of RDBMS could be a solution, which requires various middleware to be developed for implementing complex operations such as request distribution and result set merging for the underlayer databases. There are many different cases which are mingled together making the systems even more complex.

In retrospect, that is an age when old rules were broken but new rules were still not established yet. The old RDBMS is poor in scale-out capability so it cannot satisfy the emerging requirements for big data processing. However, there was not a structured data management solution that can replace the old systems and can satisfy most of user requirements.

That is an age when requirements were not satisfied. Products like BigTable, Cassandra, and memcached are self-rescue results made by Google, Facebook and LiveJournal respectively. There is no doubt these products aimed at "satisfying business requirements at the lowest cost" are poor with generality.

In 2015, finally we are moving out of the predicament. As many of NewSQL solutions (e.g., Google F1, MySQL Cluster (NDB), Clustrix, VoltDB, MemSQL, NuoDB and MyCat) are getting mature and the technology is improving, horizontal scaling capability is no longer a bottleneck for RDBMS. Nowadays architectures can guarantee enough horizontal scaling capability for the system, and simultaneously can achieve strong consistency for distributed transactions (XA).

As shown in the above figure, there is no longer a keen need for distributed caching systems or NoSQL products after NewSQL is equipped with good scale-out capabilities. This has made design and development of the architecture back to simplicity and clarity. Object Storage service offers the support for storing and accessing unstructured BLOB data like audios, videos, graphics and files.

This kind of simple, clear and plain architecture makes everything seemly reverted back to years ago. Object Storage service looks like disk file systems such as FAT, NTFS and Ext3, and NewSQL service looks like the old single-machine database such as MySQL and SQL Server. However, everything is different. Business logic, database and file storage have evolved to be high-performance and high-availability clusters that support scale-out capabilities. Performance, capacity, reliability and flexibility have grown with leaps and bounds. Human beings have always evolved in a spiralling course. Every change that looks like a return represents intrinsic development.

As the distributed file systems (e.g., GlusterFS, Ceph and Lustre) that are mountable and support Native File API are becoming more mature and complete, it is expected to replace existing object storage services for most use cases in a phased manner. This is a major milestone in the evolution of the Web App architecture, of which a real revolution will come when we can implement a high-efficiency and high-availability general Single System Image system. Once such system happens, writing a distributed application will be nothing different from writing a standalone multi-thread application nowadays. It will be nature that processes are distributed and highly available.

 

Scalability of the Three-tier architecture

The three-tier Web application architecture has demonstrated incredible scalability. It can be scaled down for deployment within a single physical server or VPS, and also can be scaled up for deployment in Google's distributed application which comprises millions of physical servers around the world.

Specifically, during project verification and application deployment and at the early stage of service operation, users can deploy the three-layer service component into a single physical server or VPS. Simultaneously, by cancelling the memcached service and by using embedded database products that consume less resource and are easier to deploy, users can further reduce both the difficulty level for deployment and the overall system overhead.

As business expands and system workload keeps increasing, the single-server solution and simple scale-up will no longer be able to satisfy the operation needs. Users can achieve a scale-out solution by distributing components to run on several servers.

For example, a reverse proxy can achieve distributed load balancing by using DNS CNAME records or some layer-3/layer-4 relay mechanisms (such as LVS and HAProxy). It can also use Round Robin or the "Least Load First" strategy to make distributed load balancing for application services. Additionally, a server cluster solution based on shared virtual IP can also implement load balancing and the fault tolerance mechanism.

Similarly, both memcached and database products have their own distributed computing, load balancing and fault tolerance mechanisms. Furthermore, the performance bottleneck of database access can be resolved by changing to NoSQL/NewSQL database product or by using methods such as master-slave replication. Query performance of the traditional SQL database can be dramatically improved by deploying memcached or similar services.

 

Note: This article is excerpted from "3.3.2 Typical Web Use Cases" section in "BaiY Application Platform White Paper"。

Note: This paper describes the overall architecture and how to build a Web App which can permit tens of millions of concurrent connections in a single node. For the distributed system architecture that supported scaling out, anti split brain multiple active IDC, and strong consistency, please refer here.