Event Driven Architecture

For the last decade, I have been involved in the design and development of applications in the financial services industry using an architecture which we refer to as an event driven architecture (EDA). However, searching for EDA on the Net shows that the moniker Event Driven Architecture has been taken.

With EDA taken, I started searching for an unambiguous way to describe the business architecture used. And that turned out to be not that easy.

So the basic principles of the architecture (or processing mechanism) is:

    • An action triggers an event on an object.

    • The object checks whether it can currently react to the event

    • The object then chooses how to react to the event. This reaction can be

        ○ a state change

        ○ an action

        ○ Ignoring the event

       Â

The above contains all the elements of a state transition diagram. Some plagiarism on my side, but simple notation used for state-diagrams pasted below

State. A condition during the life of an object in which it satisfies some condition, performs some action, or waits for some event.

Event. An occurrence that may trigger a state transition. Event types include an explicit signal from outside the system, an invocation from inside the system, the passage of a designated period of time, or a designated condition becoming true.

Guard. A boolean expression which, if true, enables an event to cause a transition.

Transition. The change of state within an object.

Action. One or more actions taken by an object in response to a state change.

 

One deviation from the state transition diagram however is that a large number of events actually do not cause a state transition, but normally does cause one or more actions. The action is thus triggered by the event rather than by a state change

So it turns out that the architecture we use is actually an “event driven state transition architecture”.

What we add into this is the ability to configure the following on a per event basis:

    • What validation (guards) to perform, if any

    • What state changes to do, if any

    • What actions to perform, if any

   Â

And therefore we end up with a “configurable event driven state transition architecture”

Then we also add the capability to execute custom algorithms for

    • Validations (guards)

    • State changes

    • Actions

 

Which makes it, I guess, an “extendable configurable event driven state transition architecture”

Quite a mouthful, so if you will excuse me, I will stick to event driven architecture (EDA) for now.

 

But what makes this so special that I actually want to blog about this?

The answer is easy: An engine or model built with this architecture provides an extremely powerful processing capability. So powerful in fact, that I plan to do another blog (or two) on how this provides the power. Then I probably also need to do 10 blogs on how to test it ;-) , but for next time , I’m going to focus on the power!

Update on Integration and Jealousy

As an update on a previous blog re the “Professional Jealousy” ;-) . In April I had the good fortune of attending a workshop with Gregor, in lovely Italy. Apart from enjoying Tuscany and Venice, I also asked Gregor the big question: Was my experience in this big integration project really different to his, and if so: Why.

Well, turns out that Gregor’s experience was quite different to mine. Lots of nice technical stuff to focus on with a huge amount of architectural principles to go and work with. Exactly the reason for my “Jealousy”

And the reason for this is pretty simple. The project was extremely technical in nature, and not a line of business replacement project such as mine. His clients were technical people; mine were system analysts, business analysts and end users.

As a side note: This project has now gone live, and I have picked up extremely valuable experience, both technical as well as on the management and softer side of things

 

Professional Jealousy

I recently re-read parts of Gregor Hohpe’s book on integration patterns, and felt a level of professional jealousy. I have been involved in an enterprise level integration effort for a couple of months now. The project entails replacement of a couple of line of business applications with a single package. Complexities involved in this process include:

•Existing applications are tightly integrated into the existing environment. An example of the level of tight coupling that exists is that some integration relies heavily on point to point integration on the mainframe using Cobol copybook calls. This all needs to be replaced with a loosely coupled Wintel based application

•More than 60 integration points needs to be handled.

•Integration interface types include the whole range: Point to point, Message based (synchronous and asynchronous), Online SOAP (synchronous and asynchronous) and good old batch with CSV file exchanges

•Different middleware is used for batch, online and message based integration

•Due to the sheer volume of work, a phased approach is required. Sounds great, but what this really means is that we need to gradually phase out the tight integrations into the loosely coupled integrations, with periods where both interfaces will be active

So why the professional jealousy towards the work of Gregor?

His book, with the impressive set of integration patterns, creates the impression that Gregor had much more technical challenges and much less business and process related challenges. At this point, my enterprise level integration project is definitely 80% about sorting out business process, project management and contingency related issues rather than the stuff techies dream about. So I thought I will pop Gregor a mail and see what his opinion on this is – is it all the technical fun and games, or is my experience what you should expect with a large scale integration project. Will report back on that at a later stage.

Despite the technical complexity driven level of jealousy, I have to admit to enjoying this project. The following list of why’s

•This is a really big site, with commitment to slowly migrate into the SOA world. It is not text book SOA (yet), as the real world and “business as usual” needs requires a pragmatic approach

•Enterprise Architecture and the TOGAF model of managing enterprise architecture came to real good use in understanding the real “Big Picture” and managing each one of the integration points in the larger context

•Although I’m joking a bit re the technical component of the project, in truth there are more than enough technical complexities to this project.

So I’m learning a lot and enjoying it. In a nutshell, the big lessons I have learned:

•Doing an integration project without understanding the architecture of the enterprise is doomed to fail. A framework such as TOGAF’s is a good tool to model and understand the intricacies of introducing change into the enterprise architecture

•Web services are not as technology neutral in real life as promised.

•Adding proprietary security or conversation rules increases coupling

•Slapping a web service interface over old technology by using middleware has its limitations. You can only teach an old dog a limited number of new tricks. Refer to the tight coupling comment above ;-)

•Project managing a project of this nature requires a lot of technical input on a daily basis. A project manager with a technical background and up to date to date knowledge of technology is required; else get a tech lead that can spend a lot of time with project management

•A solution architect with the ability to communicate ideas and concepts between the project stakeholders is critical to project success

SOA Governance

Service Oriented Architecture (SOA) is one of those technologies that is a catalyst for change, and not only change in the IT environment, but also in the business environment. SOA encourages a world of shared services, reuse and agility. It provides a mechanism to quickly assemble highly distributed systems, built from modular, self-contained services.

This nirvana of quickly assembling new distributed applications is great in terms of rapidly responding to business change. Management of these services however requires strict governance processes if you want to avoid your environment turning into a SOA “Wild, Wild West.”

Governance in an enterprise SOA environment presents a unique set of challenges. Once a new service is deployed into your enterprise, you don’t necessarily know where, how or by whom the service will be used. The following needs to be taken into account when managing these services:

•Security: It is relatively easy for a developer to start using a service, as the schema provides information used by most development tools to create the infrastructure needed to start using the service. Services security needs to be addressed to ensure that the right level of authorization is required in order to make use of a service

•Usage Registration: In order to do capacity planning, planning of new deployments or decommissioning, you need to know whom the users of a service are, and what their access patterns look like • Usage limitations: The fact that your service can perform task B does not mean it is the best way to perform Task B a million times a day. Providing usage guidelines and limitations is the first step, the ability to monitor and enforce these is the goal

•Deployment, Testing and Upgrade procedures: In a highly distributed environment, quality glitches in a service can have serious implications for applications which the service developer might not even be aware of. Versioning of services, adhering to service contracts and deployment of updates needs to happen in a controlled fashion

•Decommissioning procedures: As with the previous topic, decommissioning of a service in an enterprise SOA environment needs to happen in a controlled fashion

While technology helps us to solve the business problems, we need to make sure that it doesn’t create other problems down the line. A successful SOA environment requires a well-thought out governance plan

Integration / Perspiration

In the last year, my focus has primarily been on system integrations. Now, my usual domain is the development of long lived packaged software, and it is quite a different thinking hat which you need to wear when doing integrations. And the interesting thing regarding my new “Integration Thinking Hat”, is the unexpected complexity.

But why is that so unexpected you might ask. After all, integrating Java and dot NET and SOAP and SQL and Oracle and MQ and all of those 3 letter things are supposed to be complex.

And the answer to the above is that the source of the complexity is not (as I expected) technological, but of human origin.

When we integrate a mainframe based system with a Wintel system, traditionally it used to be a complex affair. Web services and SOA principles have pretty much brought this from a dark art form to the stage where it is a relatively easy and repeatable process. You publish a service definition which describes inputs, outputs and actions. In simple terms you know what to expect and very important, also what not to expect.

Contrasting that with the human communication element, you need quite a bit of human interactions in order to start understanding what both sides of the equation can expect from the planned integration. In theory it is possible to come to a 100% understanding (in classroom based exercises anyway), but you will probably get to a 70% mutual understanding, and from there onwards pure ignorance will provide the rest. Where the horrors starts haunting you is in the area where all parties involved need to understand what NOT to expect.

So, we need to come up with a WSDL type human communication around integrations. ;-) Which we all know will not work, as we are more intelligent than our silicon based friends (machines), and therefore we make our own interpretations. (Which is our advantages over the machines, but in this case, also somewhat of a disadvantage) We are in need of an ubiquitous language which all parties involved will understand to the point where we all interpret every single word, sentence and model in exactly the same way.

So, while evolution helps us out with the above (probably by all system integrators having a war and leaving only a nucleus group alive), I have found the TOGAF enterprise architecture framework to be a great help in providing the ability for the different parties and stakeholders to reach a reasonable level of shared understanding . The framework provides a relatively simple way to connect business activities the data and technology. In summary, the framework provides a way to connect the following levels

•Business Processes

•Information

•Data

•Applications

•Technology Infrastructure

This helps you understand what impact the intricate interrelationships here – iow, what will happen to the enterprise if the data field “Address” of a client now needs to have 600 characters instead of 100, what process, information flows, applications and infrastructure components will be involved

It is a framework, and hence you can adopt and adapt to suite your needs, and an entire Enterprise Architecture exercise is not needed before you can start with your integration exercise.. Check it out at Open Group

Tech Ed Europe 2009

I’m currently at Tech Ed in Berlin, and enjoying it very much. I have been a regular attendee to Tech Ed Europe for the last decade (Lots of thanks to SDT for seeing the continued value in this), and have always come away inspired. This year is no exception.

 

Long gone are the days when I attended conferences such as these to have a peak of new technical features or gizmos. Of course this is still part of it, but my focus nowadays is to zoom in on the thought patterns and genius of industry thought leaders. From these guys I get some ideas and inspiration – and in the rare circumstances where my thought processes seemed to have been synchronised with these guys, I also get a sense of “internal pride”. Unfortunately not something you can share with the big wide world without looking smug (or idiotic), but at least I can be proud of myself ;-)

 

Stepping away from the sadness of geek-pride, who where the geeks I enjoyed listening to this year?

 

Reflecting a bit: In the past it was guys like Don Box (The bathtub session in Spain !!), Steve McConnel (though not at Tech Ed), last year Rocky Lhotka gave me a lot of that geek-pride feeling ;-) , Kalen Delaney always excites my SQL optimisation sensors.

 

And 2009 then? Well, an all-time favourite of mine is Rafal Lukawiecki. Event thought he some-times goes over my head, any talk by him is in inspiration, even if you just go to hear him delivering his talk in his typically passionate fashion. This year I enjoyed his talk on Predictive Programming – for me the highlight of Tech-Ed 2009. Another guy that I really enjoyed was Jim Wilt. Unfortunately I missed Udi Dahan’s session, but could catch up with him later with a beer in hand. Always a good thing in Germany, I can assure you.

Growing New Architects

So this is one of my pet topics: I have been in the IT industry for near to two decades. Started out working with some assembler, wrote some low level transport layers on networks, worked with a variety of operating systems and at some point did some open source stuff on Linux , then worked with database systems etc.

 

At some point I entered the Microsoft world: I became and MCSD and MCSE around 1996, and at that point I actually knew all applications/ software offered by Microsoft at an expert level. The combination off all of this knowledge and experience enabled me to architect enterprise level, high performance, and high scalability applications.

 

Today (13 years later) it is not possible to be an expert on all Microsoft software anymore (and my guess is that goes for other vendors as well – in fact, in my opinion it is not even possible to be an expert on a single tool such as SQL Server anymore. You need to choose a field within SQL Server, and become an expert on that. SQL Server is just an example; this is true for most areas of the total IT industry. An exponential increase in technology, complexity, options etc. is making it near impossible to keep up.

 

A concern I have is: How do new guys/graduates catch up? Is it still possible for a graduate to start as a junior developer and work his way up to become a good systems architect? It is pretty much impossible to keep up as well as catch up detailed knowledge about all the layers of abstractions that senior resources have aggregated over years and years. The only workable solution seems to be:

•You need to have a base knowledge about the abstraction layers you are working with

•You need to trust these abstractions

•This will enable you to focus on keeping up

 

This sounds workable, but again my critical mind is not happy. Sure, I will entrust my “Hello World” application to someone who has this level of knowledge (And I will probably get a much better user experience than anything I can ever develop or even image ;-) ). Will I trust this guy when it comes to mission critical, high performance, highly scalable applications? I don’t think so!

 

The interesting thing about this is: I attended a session presented by Rockford Lhotka at Tech-Ed, Barcelona recently, where Rocky was speaking about “How to Manage Technology And Not have it Manage You”. The interesting thing about the audience was that we were all mostly “seasoned IT staff” ;-) . The above concern seemed to be a shared by the whole group. I take this as a sign that I’m not paranoid, and that the above really is an issue.

 

Are there any solutions/remedies to this situation? No silver bullet available I’m afraid. But I suggest the following:

•Junior resources should be encouraged to pick up a broad level of experience before starting to specialise in a certain technology/ architecture/ type of application.

•Junior guys that want to move on to become system architects must make it their mission to understand the mind of the guys who designed the systems they work with. In all applications there are numerous trade-offs and design decisions – Make an effort to understand the thought processes and motivations behind these decisions. Work with stuff which turned out great, as well as things which turned out to be not so great

•Senior guys – take care when creating your API set/ abstraction/reusable module – note that someone will at some point trust that whatever you created is doing the best job possible.

Performance and Caching

In the previous post I touched on an example of enabling scaling out using .NET caching (System.Web.HttpRuntime.Cache). Well, a while ago I had the opportunity to use caching again – the result was that with roughly 10 lines of code, the scalability and performance characteristics were greatly enhanced. The component in question has a long history:

 

The component is a complex rate lookup system which searches for a scenario that satisfies a certain set of criteria. When originally designed (7 years ago), the assumption was that the engine would work on a matrix of roughly 30 variables with a possible scenario list of about 500 to 1000 scenarios. Because of the size of the matrix and some variables which are bound to live data, this was implemented as a SQL stored procedure. Initial performance was OK, and no need existed for scaling out. (We could handle about 10 requests per second, more than enough at that point)

 

2 Years on, one of the clients loaded a rate table with 1.6 million scenarios, and this just about killed the SQL machine. I managed to enhance the performance of the stored procedure by building some intelligence into the selection criteria used for the scenarios – instead of evaluating all 1.2 million scenarios, we were back to about 500 scenarios again. (Performance degraded to about 1 request in 2 minutes, after the change we could handle about 20 requests per second)

 

Another 2 years later, the volumes picked up – to the point where requests started to kill the SQL server. Scaling up was done, but architecturally we needed to enable scaling out as well. At this point we enabled scaling out by adding a layer between the requesting component and the database. In this layer we could now implement the ability to cache request/response pairs. When a new request comes in, a bit of logic is used to compare this with the request/response pairs in memory, trying to find a similar request and response in memory before firing a request to the database. This lowered the load on the SQL machine, and the combination of scaling up and scaling out gave the required performance characteristics. (We could handle about 200 requests per second)

 

Another 3 years later volume dictated that we look at the performance again. The caching implemented in the previous step was local to each requesting process, and it relieved the pressure on the SQL box, but we still saw a lot of the same requests coming from different threads. With 500 processes requesting values constantly, we needed a central caching ability rather than one which is local to the process. At this point I implemented the .NET caching on a per server basis. Due to the structure of the application, the code changes in total resulted in less than 10 lines of code. The processing is slightly slower than the previous solution (there is an overhead as the cache is not local to the threads/processes doing the request). It however enhanced our ability to scale out dramatically – the central SQL machine is still the bottleneck, but the ability to share the cache, as well as the ability to have a long lived cache moves a huge amount of work off the SQL machine, and this more than compensates for the slight overhead of communicating over process boundaries. (After running the solution for a while (and thus populating the cache), we have seen more than 2000 requests per second, with the central machine still doing fine)

 

The result of all this was a 100 –fold increase in the ability to handle requests, and all it required was a 10 line code change. How great is that!

 

What will the future hold for this way of using caching? Currently there are a couple of products on the market providing caching which can be scaled across server boundaries. My prediction is that we will get a version of this built into the .NET framework pretty soon. Hopefully before I have my next big bottleneck in above mentioned application.

 

As a side note: For technical reasons related to the original application, I did not have the option of scaling out the SQL machines: Even though the rate table data is relatively static, I did not have this level of freedom

Scalability Patterns

I have been living in the mission critical, Line of Business (LOB) application world for quite some time, and system performance and scalability is an important aspect of my life. I have a keen interest in anything performance or scalability related, hence the need to share some of my experiences around this in a blog. At this point the aim is to blog about my experiences with some scalability architecture patterns, and to spread this over 2 to 3 blogs.

 

Scalability is the ability to take full advantage of the resources available in order to match the size of the processing required. Scalability can be achieved by either “Scaling Up” (adding resources to a node) or by “Scaling Out” (adding more nodes).

 

“Scaling Up” is generally the easiest to apply, as it normally does not require architectural changes to the application. It also hits the “Scalability Ceiling” quite easily however, as you can only add so much resources to a single node.

 

“Scaling Out” requires that the application be architected to accommodate this method of scaling, and is normally not something which can be retrofitted to an application. If the correct architecture is used, there is no “Scalability Ceiling”

 

As a reference system, let’s use a typical N-Tier LOB application (Data Tier, Business Tier, Application Tier, Interface Tier). This is normally depicted as follows

 

In terms of scalability, a prefer to depict the above architecture as the layers of an onion. At the core you have your Data Tier, and then follows your Domain Tier your Application Tier and lastly the Interface Tier.

 

The typical situation you are faced with is that your Data Tier cannot be scaled out –The core data is the most valuable part of a business’s infrastructure. It is the heart and soul of the organization, containing information that cannot be compromised under any circumstances. It can also not be duplicated or distributed easily without creating additional costs and complexity. Due to the fact that it is a centralized resource, it is normally the primary point of bottlenecks. When it needs to scale, “Scaling Up” is used. As mentioned earlier however, this has a scalability ceiling.

 

The Domain Tier is normally the primary means of scaling the solution, and at this Tier your aim should be to “Scale Out”. Your Application and Domain tier actually should form a Scalability Partnership, where the combination allows you to scale on the Domain and Application Tier.

 

So, ideally you will end up with a single database server/cluster serving 10 business servers, which in turn services hundreds of nodes on the application tier, which in turn services thousands of clients on the interface tier. (all numbers are a thumb-suck, except for the 1 db server ;-) )

 

Data Tier

 

This is a very expensive resource in terms of scalability costs and should be handled as such. Review any task on this level with the goal to move it to a Tier where you have the ability to Scale Out. Only in exceptional cases should you consider executing tasks at this level.

 

So, what can you move out to other Tiers. Firstly, any calculations or processing on this level are prime candidates for moving to other layers. In addition you can also cache data on other tiers to remove the burden on this Tier. The following are prime candidates:

 

* Static Data: Static data can be moved to other database servers. Synchronisation of these is normally an issue, and I normally try to cache them rather than to do database synchronisation. The .NET HTTPRuntime cache object has also made my life much easier. Doing a lookup for a simple field does not place a huge load on the system, but multiply this load by a thousand, and it starts sounding significant

* Regularly Accessed Data: As mentioned above, I have started using the HTTPRuntime cache in .NET for this purpose – note that you do not need to be in a web application to use this. Calculations which regularly need transactional information from last month’s runs can populate this cache on a lazy load fashion. Care has to be taken however to ensure that it is really “read-only” data you are loading, and that your refresh schedule will ensure that your cache does not go out of synch.

 

As an example: I have used the HTTP Runtime caching in a .NET project to bring down the load on the database server from 80% utilization to 20% utilization, and in the end it involved about 20 lines of code. The result was a batch run which could now run in 2 hours instead of 6 (more threads could be added to the batch run)

 

 

Unit Of Work

 

“Unit Of Work” cannot easily be retrofitted to an application. Unit of work requires that you program against the object model, not against the database. This sounds simple, but you have to take care when implementing this. When you get it right however, it supplies you with great performance as well as scalability gains. Here is a link to Martin Fowler’ssite with some more information on Unit of Work. When I first implemented this, I also found Jimmy’s book quite useful (.NET Enterprise Design).

 

Unit of work removes the following work from the database server

* PK – FK relationships. A large number of database roundtrips are needed to setup your unique primary keys. Imagine you are creating entities in memory with different Primary – Foreign key relationships between them. Having the ability to create the unique primary keys in memory and setup the relationships without burdening the database server removes an enormous load from the database server. It also speeds up processing, as all the database roundtrips are removed.

* Lots of roundtrips. Instead of having multiple small roundtrips, you end up with one big trip to the database.

* Minimise Locking. The entire entity model can be created in memory, and once the work is done, all SQL statements are sent to the database in one roundtrip. This cuts down severely on the time it takes to execute: I have worked with examples where a function takes about 10 seconds to execute. Originally it started the transaction early and a locking scenario of some sorts would exist for about 6 seconds. This severely hampered the ability to run multiple threads. After implementing unit of work, a huge batch of SQL statements would arrive at the database server, and 300 ms later they have done their bit and locks are released. Theoretically the database server could handle roughly 2 threads previously, and now it could handle 33 threads.

 

 

 

Downside of Unit of work

 

When you are working with calculations which are dependent on transactions which are generated by other threads / processes, this might not be the easiest thing to implement, but if you apply your mind, you should be able to make it work.

Â