Geeks With Blogs

News

Charles Young

October Rules Fest 2009, Dallas, TX

I've largely finished my presentation for the October Rules Fest 2009 conference in Dallas at the end of the month.   I'm speaking on complex event processing (CEP).   My plan is to provide a broad survey of CEP technologies, chiefly concentrating on the similarities and differences between event stream and rules processing.  There has been a lot of interest and activity around event processing in the rules community in recent years, and not a little controversy about the best approaches and, indeed, the role, if any, of Rete rules engines in detection of complex events.   Constructing the presentation has been something of a journey for me, and hopefully it will prove of interest to those attending the conference.

This is rather last-minute plug for ORF 2009.  It is the second year of this 'alternative' rules conference which is differentiated by a clear and unashamed focus on technology (there are other well-established rules conferences which focus on application at the business level), and by a wide-ranging interest in several related areas of science and research.  For me, some of last year's highlights included hearing Gary Riley explain how he managed to squeeze so much performance out of the CLIPS engine and listening to Dan Levine's talk on rule-based mechanisms in the human brain.  It's that kind of event.

The major rule-processing vendors (ILog (now IBM), FICO, Tibco, etc.,) are well represented at the event together with the JBoss team.   Charles Forgy, who came up with the Rete algorithm three decades ago, is a star speaker (a fascinating talk is promised on how to maximise the benefits of parallelisation in rules engines).   I'm particularly looking forward to hearing Andrew Waterman's talk on the use of rules processing in game-playing software used to promote sustainability and development of natural resources in Mexico.    I've been aware of this project for some time.   Greg Barton will be reporting on his experiences at Southwest Airlines.  There are interesting sessions on rule modelling and aspects of rule languages and DSLs, plenty on CEP, and various talks on constraint programming, rule verification and other topics.   And, to remind us all that technology, for technology's sake, is never a good idea, John Zackman will be there to talk about the role of rules in Enterprise Architecture.

ORF 2009, only in its second year, offers an incredibly varied diet for the rule technologist.   Together with the boot camps and introductory sessions at the beginning of the programme, it offers practical hands-on experience, a chance to learn about rules processing in depth, a showcase for the wide-ranging application of rules in many different areas of IT and an insight into many areas of research.

Places are still available, I understand.   The cost is kept as low as possible by the conference organisers, so visit http://www.octoberrulesfest.org for more information and book in while you can.

Posted on Wednesday, October 7, 2009 10:19 AM | Back to top


Comments on this post: October Rules Fest 2009. See you in Dallas

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
The most interesting talk will be Dr. Forgy's talk on his new Tech algorithm. Did you get a chance to look at Arpeggio? Based on the numbers I see, a stream RETE engine should be able to beat engines like Esper :)
Left by Peter Lin on Oct 07, 2009 12:17 PM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
I've seen the paper that Dr. Forgy is submitting, and it is all about approaches to optimising Rete for parallelisation. However, James Owen did say a while back that he would be talking about Tech. It may be that these are the same subjects, or they may be different. We will see.

I haven't really looked at Arpeggio in depth yet. I understand that it is significantly different to Rete. In my ORF presentation, I'm taking the line that the biggest barriers to achieving ultra-low latency, high throughput in traditional Rete stems from the amount of synchronisation required within the rete (and overall between the rete and the agenda), and the fact that the 'holistic' nature of a rete (i.e., the fact that it represents a transformation of an entire rule set, and not just individual rules) coupled with these synchronisation requirements means that there are significant limits with regard to the benefit that you can get from parallelisation and multi-threading. Interestingly, the (slightly counter-intuitive) implication of this is that there isn't any obvious reason why a rete that represents a single continuous query would necessarily perform any more poorly than some other type of dataflow that represents a single query. I'm not even convinced that memory usage would be any higher in rete than it would in many stream-based engines. Many stream-based engines are heavily dependent on the use of materialised views within their dataflows. I've yet to see the fundamental difference, in terms of data retention, between a 'materialised view' and a Rete memory. Some (very few, it seems) engines are heavily stream-orientated throughout, and use selection & consumption models that avoid materialised views and naturally keep memory usage to a minimum. One of the points I will be raising is that Rete has both set-based and stream-orientated characteristics (including a basic form of selection & consumption), and that there may be more that could be done to research the use of selection & consumption approaches in Rete. I wonder if this is what Arpeggio is effectively doing?
Left by Charles Young on Oct 07, 2009 12:56 PM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
The RETE graph it generates is exactly the same as Jamocha. In Arpeggio, the agenda has been removed completely and all alpha nodes do not have memory. The only join nodes it will have when it's done are temporal nodes. Those nodes will manage the memory locally and insure memory consumption remains constant or near constant. I have several benchmarks on my blog, which shows Arpeggio can handle 1.5-2 million facts per/second for rules that do not have joins. The memory consumption is constant. The same thing in jamocha would be 10x slower at best. So far, my results show that stream RETE engine is faster than NFA based approaches like Esper. Unlike other CEP engines, my goal is to figure out a practical way of supporting temporal logic, not just sliding windows. A few weeks back I ran a 5 hour test with arpeggio with some simple rules that showed constant memory usage. The same can't be said of jamocha. During the run, arpeggio was able to handle over 400 million facts.
Left by Peter Lin on Oct 07, 2009 3:06 PM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
OK, I get it. Arpeggio is a Rete engine rigged purely for continuous query and instead of classic beta memories you have garbage-collected memories that (when fully implemented) will support sliding windows and other approaches. This issue of removing the agenda is interesting. You are doing away with conflict resolution. I would guess that implies that Arpeggio strictly implements event semantics for all facts and ensures event immutability? Is that so? Presumably you still support Assert and Retract actions in your productions, but not Modify? Do you support other actions, like method calls on custom Java objects, or anything like that, and if you do, what happens if the events currently in Arpeggio match multiple rules and the actions exhibit side effects?
Left by Charles Young on Oct 07, 2009 3:46 PM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
Arpeggio doesn't implement the type of semantics that other CEP engines support. I don't have the concept of events in Arpeggio. Instead, all facts are temporal and there isn't a built-in "event object". All facts are immutable. I removed retract and modify, so arpeggio assumes all facts are read only. The same design principles of using functions still applies in arpeggio. The other big difference is between arpeggio and jamocha is asserts are not idempotent.

For arpeggio, I implemented macros for java objects, which avoids using reflection to extract values from a property. If a fact matches multiple rules, those rules will fire. For example, I could have a rule that calculates the aggregate for sector and industryGroup. Another enhancement in arpeggio is I optimized the node hashing for the ObjectTypeNode. So unlike NFA based approaches that pass all events to every query, Arpeggio provides near constant scalability with respect to rule count.
Left by Peter Lin on Oct 07, 2009 4:04 PM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
OK, so Argeggio is dedicated to processing 'temporal fact' only. Is there not a case for supporting Retract? Couldn't you have a scenario where detection of one pattern means that certain temporal facts that are still live are no longer required and should be retracted to reduce match cost?

The hashing approach is a great example of where some of the ESP/CEP guys really need to talk to the Rete guys! It is easy to miss these types of optimisation if they are not pointed out to you.

One of the motivations for adding ESP/CEP features to classical Rete is to support what I will call 'contextual' event detection. A text-book example is processing of RFID tag read events where the event simply signifies that a tag responded to the signal emitted by a reader. Without contextual facts, you can't interpret what the event really signifies (e.g., goods-in or goods-out?). Of course, you could represent contextual state as temporal facts, but they would often have a very different life time to other temporal facts. Does Arpeggio handle lifetime on individual facts - i.e., if a fact has a lifetime of, say, two hours, does it remain available to, say, a one minute sliding window for those full two hours? Also, if you don't support Retract, how do you handle a situation where the lifetime of an event is non-deterministic. Taking the RFID example, say that the context is set by punching a button on a 'traffic lights' system. Every so often an operator changes the lights from 'in' to 'out'. You don’t know when they are going to do that. How would that be handled in Arpeggio?
Left by Charles Young on Oct 07, 2009 4:31 PM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
I had an email conversation with one of the authors of that paper earlier this year. One of the things I intend to look at in my ORF presentation is several different models for combining event stream processing with production systems. There are several different ‘base’ models including using separate agents within an EPN or building hybrid models like the one described in the paper or, indeed, the rather different hybrid approach implemented in JBoss Rules Fusion. The specific model described in the paper is really a way of doing ECA rules on top of Rete, but keeping the ‘E’ bit completely separate to the ‘C’ bit. There are some peculiarities. One is that the approach subtly reverses the normal semantics of ECA. They are really ‘CEA’ rules, in that Rete can control event detection nodes, only switching them on when there is a condition to match, but the event detection bit (really an implementation of SnoopIB) can’t control the rete in the same way. Put another way, Rete doesn’t do lazy evaluation, and the single rete represents an entire rule set, so you can’t short-circuit condition evaluation. Instead, the proposed hybrid engine short-circuits event detection. This is not a bad thing, though, and should help to boost performance. However, it is limited. If any nodes are shared in the event detection dataflow, they, and their parents, cannot be switched off. A related issue is that when detecting complex events, a condition must continue to evaluate to true from the moment the ‘initiator’ event is detected to the moment the ‘terminator’ event is detected. If it is untrue at any time in that period, the complex event is discarded. I have another specific concern about this particular model. Keeping event detection separate from condition evaluation is all very well, but is means that the engine now supports two different pattern-matching dataflows, each with rather different semantics. Representing that sensibly in a high-level EPL/ECA rule language could be very tricky.

So, back to Arpeggio. I’ll take this up in a separate email thread, as you suggest.
Left by Charles Young on Oct 07, 2009 5:17 PM

# re: stream Rete
Requesting Gravatar...
Peter - "a stream RETE engine should be able to beat engines like Esper"... and of course, some Rete-based CEP vendors already have continuous query languages exploiting Rete for joins...

Charles - see you in Dallas!

Cheers
Left by Paul Vincent on Oct 09, 2009 8:16 AM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
Thanks Paul. See you later this month.
Left by Charles Young on Oct 09, 2009 12:56 PM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
Paul - Yes, it's nice to see Tibco adapt RETE for stream processing. Does Tibco have any plans to release benchmarks so that developers can learn about it?

peter
Left by Peter Lin on Oct 09, 2009 2:35 PM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
They have done some interesting work at Purdue on a system called EventJava. The current version is built on top of Jess. They show that it outperforms Cayuga at 37,500 event correlation patterns, though Cayuga does scale better with larger numbers of patterns.

When I get the time, I would like to apply these benchmarks to TECH. (BTW, TECH and the parallel engine work are two separate things at this time, though I certainly plan to create a parallel TECH engine soon.)
Left by Charles Forgy on Oct 23, 2009 11:28 AM

# re: October Rules Fest 2009. See you in Dallas
Requesting Gravatar...
Thanks Charles. I hadn't come across EventJava before, and will be reading the various papers on it with interest.

The closes equivalent I know of in the .NET world is the Rx Framework which is currently a work in progress. It is based on Microsoft's composable (monadic) new IObserver/IObservable model and allows LINQ queries directly over an async 'push' event model. LINQ is a functional approach to language-independent query originally based on a glorified list comprehension monad over the IEnumerable (think 'iterable') interface. The various .NET language compilers (C#, VB.NET and F#) all support syntactic sugar for LINQ, and provide declarative query definition as part of the the programming language.

One of the Haskell guys working for Microsoft did the algebra to show that it is functionally possible to map IEnumerable and IEnumerator directly onto IObservable and IObserver in a totally equivalent (and essentially symmetric) fashion. Hence, LINQ providers can be implemented to provide monads over IObservable. This is a really powerful idea that could revolutionise the way events are handled in .NET. Currently, .NET offers a specialised 'event' type that is like a very basic version of the equivalent type in EventJava without support for query and without the customisable context idea. It is based on multicast delegates, and is therefore similar in that respect. By contrast, Rx is a LINQ technology and so is fully integrated with Microsoft's in-line declarative query features, and doesn't require 'exotic' types.

As I understand it, the first release of Rx will probably be quite constrained, and won't support pattern matching over multiple events. This is just a 'first release' thing, and, as I understand it, the intention is to extend the technology to support full pattern-matching. What is not clear is how the pattern-matching facility will be implemented behind the scenes. You say that EventJava is using Jess. It will be interesting to see how Microsoft tackles implementation of pattern matching in future versions of Rx.

StreamInsight (Microsoft's forthcoming event stream processing engine) took a very similar line to Rx. It uses LINQ over CEPStream objects, and also supports a variation of IObserver/IObservable. However, it was developed separately to Rx (it is not clear if the two teams knew of each other's existence). In the case of StreamInsight, it provides a industrial-strength pattern-matching engine. Hopefully, Microsoft will bring Rx and StreamInsight closer together (if we are lucky, before StreamInsight ships, though I have no idea of their plans in the respect). StreamInsight provides a form of EventJava-like context for timestamps and event lifetime, but this is not customisable. However, this is not necessarily a problem. Thanks to the composability of LINQ operators and the use of lambdas, extension methods and the like, you can easily construct your own approach to customised context and exploit it declaratively within a LINQ query in a very similar way to EventJava.
Left by Charles Young on Oct 25, 2009 4:29 PM

Your comment:
 (will show your gravatar)


Copyright © Charles Young | Powered by: GeeksWithBlogs.net