Friday, October 7, 2016

Exception Management in SOA – Loosely Coupled vsTightly Coupled

The Background:

The convention dictates that the exception handling in a typical OOPS should be centralised and all the errors would need to be converted or translated to human readable form.  This is often true because of the fact that the user would get the opportunity to correct the data and hence recover from the error scenario. This requires the developers to catch all the possible exceptions, convert the technical error into human readable form and send across different layers cohesively until the user sees in the view layer.


However, this topic will let us analyse in detail the possible error handling principles in the SOA world and will leave the decision and the best practice to the choice of the readers themselves.

The Importance of Loose Coupling:

The services developed in the SOA world are aimed to be atomic to perform a single task of the underlying business and hence promoting the reusability.  At the same time, we need to remember that these services could be accessed or triggered by one off the below ways:

  •  By a batch process  (In a regular interval)
  • By another web-service (Used by another application Soap or REST based)
  • By an orchestration service (A service that calls 2 or more services in a definite sequence)
  • By a Business-Event (A special criteria formed based on 2 or more data)
  • By a Rule-Engine (A special service that takes series of decision based on service calls)
  • By a Messaging System (A service call based on the incoming event/message placed in the Queue)

This is the key in having the service design with the loosely coupled architecture.  Look at the Actors involved in the above services invocation; it could either be a system or user or event.  Do we expect these actors to correct the error and call the service back?  No, it is not the best that the system corrects every error by altering the original message and re-issue or re-invoke the same service to get it processed successfully.  But the major focus in SOA is the service enablement such that it can be used by any services, any technology, any number of times.

How do we achieve these loose-coupling and yet have the best exception handling strategy. Let us analyse our options.

Introduce Consistency in Error Handling

Considering the services being invoked from any boundary (across and outside the Enterprise) and triggered by any actor (system or event), it is important to maintain the consistency in indicating the errors.

The consistency can be achieved by the below key principles:
  • Use of Common Error Codes for all services
  • Optionally provide the error descriptions
  •  Uniform Message Structure (Canonical)
Consider the below sample of such a ‘Fault’ Message:
                <error>
                   <code>10000</code>
                   <type>Network Communication</type>
                   </description>Service or Network Access Error</description>
          </error>


Centralize the Error Handling

The centralization in SOA would mean to place the error identification logic within the individual services but decision making around the error using the isolated and centralized logic via a dedicated service or set of services.  This is important so that the exception handling code is not duplicated inside different services in a redundant way.  Where there is a duplication of logic, more the possibility for overhead of maintenance, defect fixing, testing and in turn more errors.


Connect the Business to Error Handling:

It is important for every enterprise organization to analyse the errors occurred in the specific period of time for the below reasons. 
  • To identify the pattern of errors occurred.
  • To identify the frequency of errors occurred.
  • To identify the services causing the errors.


The above listed paradigm helps the organization to evolve SOA as a continuous improvement of the different systems that in turn evolved over a period of time.  This also helps to build a strategy around the results, find the root cause and thereby taking effective decision and cost-reduction by reducing the maintenance. 


Build Effective Error Handling Strategy

The error handling needs to have a definite strategy, reasons for every action behind it. For example, the error handling needs to be built considering the below basic error handling principles or policies:

  • Support for retrying the failed messages that are valid.
  • Support for Notification and Alerts.
  • Deactivating the Service events during failure of dependent services.
  • Persisting/Storing the Failed Messages for future Manual Correction.
  • Logging the Error and the Related Events.

Build High Performance Architecture

The power of SOA is to be able to scale the performance if needed.  The reason for centralizing the error handling and making it consistent is to make the system not getting affected due to number of errors.

The error handling strategy should focus on the performance and not support for the user to correct the error. The asynchronous messaging pattern can be introduced, optionally, so that these errors are handled without affecting the actual service performance.   The asynchronous way of error handling is building the services such that they are processed offline without making the calling services wait for the error handling logic to be completed.

Build Monitoring Process around Error Handling Strategy

Now that we have the strategy in place to gather and process all the details about the errors occurred.  Also it is now important to build a monitoring process to gain the business benefits out of it.

For example, the notification service and event service can be effectively utilized to notify the loss of significant business due to communication error or other unexpected event.  For example, the loss of sales above 5000$ can be notified to the CxO officials when such combination occurs or when the business event occurs worth 10000$ which resulted in error due to unexpected error such as network outage or password expiry. Building such a design to recover the selected errors would often result in favour of the business which otherwise is possible with the dedicated infrastructures such as BHM or Complex-Event processing.

Conclusion

The purpose of this blog is to introduce these frequently followed principles of error handling in SOA. However, the choice of the combination of the suggested strategies depends upon different factors involved in the enterprise such as available SOA infrastructure, cost and time, roadmap and the development plan.

So, the success lies in the right choice of these design aspects at the right time for the enterprise to be able to scale in future.

No comments:

Post a Comment