Scalability Patterns

by JC 10. February 2009 02:56

I have been living in the mission critical, Line of Business (LOB) application world for quite some time, and system performance and scalability is an important aspect of my life.  I have a keen interest in anything performance or scalability related, hence the need to share some of my experiences around this in a blog.  At this point the aim is to blog  about my experiences with some scalability architecture patterns, and to spread this over 2 to 3 blogs.

Scalability is the ability to take full advantage of the resources available in order to match the size of the processing required.   Scalability can be achieved by either “Scaling Up” (adding resources to a node) or by “Scaling Out” (adding more nodes).

“Scaling Up” is generally the easiest to apply, as it normally does not require architectural changes to the application. It also hits the “Scalability Ceiling” quite easily however, as you can only add so much resources to a single node.

“Scaling Out” requires that the application be architected to accommodate this method of scaling, and is normally not something which can be retrofitted to an application. If the correct architecture is used, there is no “Scalability Ceiling”

As a reference system, let’s use a typical N-Tier LOB application (Data Tier, Business Tier, Application Tier, Interface Tier).  This is normally depicted as follows

Tiers

 

In terms of scalability, a prefer to depict the above architecture as the layers of an onion (shown below)

image

At the core you have your Data Tier, then follows your Domain Tier your Application Tier and lastly the Interface Tier.

The typical situation you are faced with is that your Data Tier cannot be scaled out –The core data is the most valuable part of a business's infrastructure.  It is the heart and soul of the organization, containing information that can not be compromised under any circumstances. It can also not be duplicated or distributed easily without creating additional costs and complexity.  Due to the fact that it is a centralized resource, it is normally the primary point of bottlenecks. When it needs to scale, “Scaling Up” is used. As mentioned earlier however, this has a scalability ceiling. 

The Domain Tier is normally the primary means of scaling the solution, and at this Tier your aim should be to “Scale Out”.  Your Application and Domain tier actually should form a Scalability Partnership, where the combination allows you to scale on the Domain and Application Tier.

So, ideally you will end up with a single database server/cluster serving 10 business servers, which in turn services hundreds of nodes on the application tier, which in turn services thousands of clients on the interface tier. (all numbers are a thumb-suck, except for the 1 db server ;-) )

Data Tier

This is a very expensive resource in terms of scalability costs and should be handled as such.  Review any task on this level with the goal to move it to a Tier where you have the ability to Scale Out. Only in exceptional cases should you consider executing tasks at this level.

So, what can you move out to other Tiers.  Firstly, any calculations or processing on this level are prime candidates for moving to other layers. In addition you can also cache data on other tiers to remove the burden on this Tier.  The following are prime candidates:

* Static Data: Static data can be moved to other database servers.  Synchronisation of these are normally an issue, and I normally try to cache them rather than to do database synchronisation.  The .NET HTTPRuntime cache object has also made my life much easier.  Doing a lookup for a simple field does not place a huge load on the system, but multiply this load by a thousand, and it starts sounding significant

* Regularly Accessed Data: As mentioned above, I have started using the HTTPRuntime cache in .NET  for this purpose – note that you do not need to be in a web application to use this.  Calculations which regularly need transactional information from last months runs can populate this cache on a lazy load fashion.  Care has to be taken however to ensure that it is really “read-only” data you are loading, and that your refresh schedule will ensure that your cache does not go out of synch.

As an example: I have used the HTTP Runtime caching in a .NET project to bring down the load on the database server from 80% utilization to 20% utilization, and in the end it involved about 20 lines of code.  The result was a batch run which could now run in 2 hours instead of 6 (more threads could be added to the batch run)

 

Unit Of Work

“Unit Of Work” cannot easily be retrofitted to an application.  Unit of work requires that you program against the object model, not against the database.  This sounds simple, but you have to take care when implementing this.  When you get it right however, it supplies you with great performance as well as scalability gains.  Here is a link to Martin Fowler’ssite with some more information on Unit of Work.  When I first implemented this, I also found Jimmy’s book quite useful (.NET Enterprise Design).

Unit of work removes the following work from the database server

* PK – FK relationships.  A large number of database roundtrips are needed to setup your unique primary keys.  Imagine you are creating entities in memory with different Primary – Foreign key relationships between them.  Having the ability to create the unique primary keys in memory and setup the relationships without burdening the database server removes an enormous load from the database server.  It also speeds up processing, as all the database roundtrips are removed.

* Lots of roundtrips.  Instead of having multiple small roundtrips, you end up with one big trip to the database.

* Minimise Locking.  The entire entity model can be created in memory, and once the work is done, all SQL statements are sent to the database in one roundtrip.  This cuts down severely on the time it takes to execute: I have worked with examples where a function takes about 10 seconds to execute.  Originally it started the transaction early and a locking scenario of some sorts would exist for about 6 seconds.  This severely hampered the ability to run multiple threads.   After implementing unit of work, a huge batch of SQL statements would arrive at the database server, and 300 ms later they have done their bit and locks are released.  Theoretically the database server could handle roughly 2 threads previously, and now it could handle 33 threads.

 

Downside of Unit of work

When you are working with calculations which are dependant on transactions which are generated by other threads / processes, this might not be the easiest thing to implement, but if you apply your mind, you should be able to make it work. 

 

Tags: , , ,

Architecture

About Me

 

 

 

 

 

 

My name is JC Oberholzer and I have been working in the IT industry since 1990.  I majored in Mathematics and Computer Science and started off working with a variety of technologies, using Cobol, Fortran, C etc. on VMS, Windows, Linux.  In 1996 I joined SDT, a company doing Microsoft based development for the financial industry.  Since then the focus has been mainly on creating N-tier applications for financial services on Microsoft platforms, although I have had some exposure to Java on Linux during this time.  Since 1999 I functioned as the chief system architect of SDT. Areas of Expertise are:

 

  • Product Families
  • Long Lived Software
  • Rules Base LOB Software

 


Hosting provided by:
Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2012 JC Oberholzer