Monday, May 19, 2008

Interesting Design Problem

One of the products I am working on generates Microsoft Excel reports on demand, using a database and template-driven scheme.  The report is a summary of a participant's results for a 360° assessment which measures project management competencies.  If you want to read about the product, go here.  See the screen shot below.


The reports are generated each time a participant completes an assessment or when one of the participant's invitees (360 is a multi-rater assessment methodology) completes their assessment of the project manager.  As a result, the report contents build over time.

The report generator is implemented as a collection of Report Services, which in turn are implemented as Windows services.  The system scales horizontally in that as load increases, you can add additional Report Services to the mix to distribute the load.  Each Report Service maintains a pool of Report Generators, which implement report generation.  Each Report Generator executes on its own thread.

At the core of the system is a shared data structure; a queue, implemented in SQL Server.   The queue methods are implemented as stored procedures and provide locking mechanisms such that a single report generation job is handled by just one Report Service\Report Generator instance.  There are some tricks involved to get this done, mainly to do with locking hints. 

The problem is this:  How to let the Report Service population know that reports need generating?  The first implementation used the grotesque, but easily maintained, hamster wheel pattern (or antipattern), which loops about, querying the queue for new work items and then executing them.   A poll interval is specified through configuration so a pause happens between queries.  This is pretty ugly, but in low-loads the net performance cost is marginal.

I contracted an outside firm to do some performance testing.  While the numbers looked pretty good, I did get a MOMs alert from RackSpace indicating the report server was operating under high CPU usage.

So, I started thinking about an alternative.  I came up with one that I am experimenting with, all though I don't know if it will ship (as one of my colleagues mentioned today, we aren't writing a streaming video service with real-time requirements or anything sophisticated).

The current scheme by which work items are populated in the queue is trigger-based.  When an invitation is satisfied, that is the assessment is completed, some identifying information is placed in the queue.  The next Report Service that queries for new work items locks the queued items and performs the generation.  Again, lock hints and other mechanisms are used to ensure that only one Report Service processes the new work item.

I have used CLR triggers in SQL 2005 before and found them interesting, but I have hesitated to use them in production.  These things just take time to get used to. It did, however, occur to me that with a CLR trigger, I might have the opportunity to create a broadcast mechanism that would indicate to the waiting Report Services that new work items exist in the queue.  I implemented a very simple UDP client/server that broadcasts a "new work item exists" message to the waiting Report Services.

A trigger should execute very, very quickly.  To prevent the trigger from blocking, I create a thread which handles message broadcasting.  To enable this, a database must be marked as TRUSTWORTHY.  This has some side effects that must be considered and there are other ways to do it but for the purposes of the experiment, this is fine.  When you deploy your CLR trigger, you must give it full access in order to create threads and use the networking stack.

So, when a new work item is in the queue, a message is broadcast to the Report Services.  The first one that reads it wins the lock on the queued item.  I don't have the source in front of me, so I will post it tomorrow.

The Report Servers contain a thread per Report Generator, which go into a wait state after their work is performed.  These threads are waiting in a call to Receive(), which doesn't return until data is received (this can be done asynchronously but I already have a few threads and synchronization objects in play so I kept it simple).  Now, this is a problem if the Windows service is asked to quit.  Rather than rudely terminate the Report Generator thread, I would like the thread to return from the Receive() method and perform the cleanup it needs to prior to termination.  How to do this?  Well, the Report Service implements that same message broadcast scheme that the trigger does except that it broadcasts the message to itself.  This works because the parent thread (where the broadcast is sent) of the Report Service is running in a monitor mode that checks the health and load of the Report Generators and provides parental guidance in shutting down gracefully.   It is implemented using synchronization events.

The final analysis is not in yet, but the experiment worked and seems interesting.  I need to research a little more the potential security and reliability factors, but it was fun to do.



No comments:


Content on this site is provided "AS IS" with no warranties and confers no rights. Additionally, all content on this site is my own personal opinion and does not represent my employer's view in any way.