Monday, August 23, 2010

TimesTen -- an Oracle Database Saviour for Performance-Critical Applications?

Performance-critical applications such as those used in trading and telecoms often need low latency in sub-milliseconds or even microseconds and high throughput.
If such an application uses a disk-based database for persistence (or the system of record), you probably will have to miss your low latency requirement because disk-based accesses incur latency in many milliseconds and often seconds.

In order to achieve low latency and high throughput, there should be at least 3 requirements to be met:
  1. Uses memory-based accesses. The access rate for the commonly wildly used computer DRAM is in microseconds;
  2. Collocates your application logic with your data. The traditional multi-tiered approaches incurs network IO latency.
  3. A fast and predictable JIT compiler if you use Java. Oracle's JRockit is a good choice.
I know 2 products which can meet the above requirements and worth a try.
One is to adopt Gigaspaces' space-based programming paradigm.
The other is to employ an in-memory database (IMDB) such as Oracle's TimesTen along with your traditional database such as Oracle.

I will focus on TimesTen here and compare it with Gigaspaces.
As mentioned above, using Oracle database alone in a performance-critical application is not practical. Accordingly Oracle just doesn't have the same popularity in such applications as in others.
Fortunately if we use Oracle along with its cache companion - TimesTen, it is a very attractive option.

TimesTen is an in-memory relational database from Oracle. Because it caches all data in memory and has no disk access, TimesTen can achieve breakthrough performance such as 5-microsecond reads and 15-microsecond updates based on its white paper "Using Oracle In-Memory Database Cache to Accelerate the Oracle Database".
TimesTen is targeted to run in your application tier, close to applications, and optionally in process with applications. A TimesTen database may be used as the database of record, and/or as a cache to an
Oracle database.

As a cache to Oracle database, TimesTen can be embedded into your application tier and only cache the performance-critical subset of your Oracle database while still leaving your Oracle database for non-performance-critical applications. This architecture is shown in Figure 3 in the above white paper:
In the above figure, you can only cache the subset of an Oracle database that needs real-time processing such as stock trading data. The cached data can be either read-only, read-mostly or read-write. In case of data modification, the "Cache Agent" will take care of the data synchronization between TimeTen and Oracle.
You can also scale out your application tier horizontally by deploying several TimeTimes together to form a so called cache grid.

Here are my TimesTen comparisons to Gigaspaces.
Commons to Both:
  • cache data along with your application logic;
  • supports multiple topologies such as partitioned, replicated or both;
  • supports ACID properties often provided by traditional relational databases; 
  • supports asynchronous operations for better performance such as transaction write-behind and asynchronous replication;
  • besides as a cache bus, they can also be used as a message bus.
Pros and Cons on TimesTen and Gigaspaces:
  • Gigaspaces can cache all your data in a computing cloud such as Amazon's EC2 while TimesTen only practically caches subset of your whole enterprise data;
  • Gigaspaces has more flexible deployment topologies than TimesTen. For example, Gigaspaces can adjust your deployment based on your SLA while TimesTen's deployment tends to be static;
  • Gigaspaces supports application level optimistic locking while TimeTen only provides 2 traditional database's isolation levels - read-commit and serializable;
  • Gigaspaces intrinsically supports parallel processing such as map-reduce paradigm through its master-worker pattern and thread executor framework while using TimesTen you have to build up such functions by yourself.  For example, if you want to do a search in your TimesTen cache grid (suppose you partitioned your data in the grid), you have to code by yourself to submit tasks to grid memebers and finally aggregate the each individual searching result;
  • It is easier to adopt TimesTen than Gigaspaces because TimesTen intrinsically supports the popular JDBC API while Gigaspaces's intrinsic JavaSpaces API is still not that popular; 
  • TimesTen is budget and deployment friendly because it only needs to cache a subset of your whole enterprise data. Most often a large amount of your enterprise data don't need real-time accesses and accordingly caching them may be wasting your valuable resources.
  • TimesTen has closer integration with Oracle than Gigaspaces does. For example, TimesTen supports the same flexible locks as in Oracle and materialized views;
  • TimesTen seems to have a bit better performance. For example, TimesTen can achieve 5-microsecond reads and 15-microsecond updates while Gigaspaces can achieve sub-millisecond latency at best. This may be because TimesTen was implemented by C/C++ and Gigaspaces was implemented Java (JVM has pause).
    

13 comments:

  1. Gigaspaces does have a JDBC driver, but I can't comment on the completeness of it's support of SQL since I've only used in limited scenarios.

    ReplyDelete
  2. Well, it's worth pointing out that Gigaspaces has a LOT of API options: JMS, JDBC (it uses a subset of SQL-92, although it's useful for most cases), memcached, javaspaces, jcache, imap, as well as providing an option for writing a REST interface through the embedded web app container.

    What you usually find is that the SQL model isn't extraordinarily useful for scalability; SQL support by scalability-oriented vendors is meant to provide more of a shallow on-ramp rather than the final destination.

    ReplyDelete
  3. Even Gigaspaces supports JDBC, this is a wrapper over its intrinsic JavaSpaces API. For low latency, I don't like wrapper.

    My understanding for Jottinger's comment is scalability-oriented vendors don't really scale your disk-based SQL very well.

    However after you load your data into the cache along in your application tier, how will you access them?
    Obviously bespoke API is the fastest, but it is impossible unless you load the data into your own data structure.
    So standard JDBC access may be still the best option.

    ReplyDelete
  4. Yongjun Jiao,

    Thank for you a great post.

    Here are few corrections/clarifications:
    - Since TimesTen is a C/C++ based it can't run close to the application. Most of the applications out there (even mission critical low latency ones today) are Java based. This means GigaSpaces multi level cache actually runs within the application process memory address providing ability to perform millions read/sec. No network overhead or serialization involved. Whatever the data size or structure will be. TimesTen must perform network call for every data retrieval to send the data back to the application.
    - Most of the applications out there using some sort of Object Oriented language. This means they need their data in a for of Objects. GigaSpaces provides an Object Oriented In-Memory-Data-Grid (IMDG). There is no need to map tables to objects as TimesTen is doing.
    - GigaSpaces native API is a Object/SQL. The JDBC/SQL engine is not exactly a wrapper, but actually part of the space engine.
    - The fastest way to access data with GigaSpaces would be via a key (by ID). TimesTen don't provide such capability.
    - GigaSpaces is a cloud enabled technology allowing you to run distributed In-Memory-Data-grid across multiple machines acting as one large super computer for data storage and parallel data processing. Not sure how TimesTen can do that.
    - GigaSpaces Development and deployment environment is user friendly and support all developers standards such as Spring , maven , .Net , J2EE , etc . TimesTen as a C/C++ based product does not support any of these.
    - Using Oracle (:-) JVM with CMS or i-CMS Garbage collector would result ZERO pauses. We have customers running very large JVM without any issues or pauses. Just tune the JVM correctly.
    - You can have for example 10 blade machine (UCS boxes for example) each with 400 G RAM to run GigaSpaces IMDG. This will provide you 4 Tera data space in memory. TimesTen does not support such distributed environment out of the box.
    - GigaSpaces is an application server with Object Oriented In-Memory-Data-Grid. If you use TimesTen you will have to use also WebLogic/Tomcat and few other products to have a complete solution. This means TimesTen/Oracle Coherence/Fusion mix is by far more expensive than GigaSpaces.

    Shay Hassidim
    Deputy CTO
    GigaSpaces

    ReplyDelete
  5. Yongjun Jiao,

    Thank for you a great post.

    Here are few corrections/clarifications:
    - Since TimesTen is a C/C++ based it can't run close to the application. Most of the applications out there (even mission critical low latency ones today) are Java based. This means GigaSpaces multi level cache actually runs within the application process memory address providing ability to perform millions read/sec. No network overhead or serialization involved. Whatever the data size or structure will be. TimesTen must perform network call for every data retrieval to send the data back to the application.
    - Most of the applications out there using some sort of Object Oriented language. This means they need their data in a for of Objects. GigaSpaces provides an Object Oriented In-Memory-Data-Grid (IMDG). There is no need to map tables to objects as TimesTen is doing.
    - GigaSpaces native API is a Object/SQL. The JDBC/SQL engine is not exactly a wrapper, but actually part of the space engine.
    - The fastest way to access data with GigaSpaces would be via a key (by ID). TimesTen don't provide such capability.

    ReplyDelete
  6. Continue…
    - GigaSpaces is a cloud enabled technology allowing you to run distributed In-Memory-Data-grid across multiple machines acting as one large super computer for data storage and parallel data processing. Not sure how TimesTen can do that.
    - GigaSpaces Development and deployment environment is user friendly and support all developers standards such as Spring , maven , .Net , J2EE , etc . TimesTen as a C/C++ based product does not support any of these.
    - Using Oracle (:-) JVM with CMS or i-CMS Garbage collector would result ZERO pauses. We have customers running very large JVM without any issues or pauses. Just tune the JVM correctly.
    - You can have for example 10 blade machine (UCS boxes for example) each with 400 G RAM to run GigaSpaces IMDG. This will provide you 4 Tera data space in memory. TimesTen does not support such distributed environment out of the box.
    - GigaSpaces is an application server with Object Oriented In-Memory-Data-Grid. If you use TimesTen you will have to use also WebLogic/Tomcat and few other products to have a complete solution. This means TimesTen/Oracle Coherence/Fusion mix is by far more expensive than GigaSpaces.

    Shay Hassidim
    Deputy CTO
    GigaSpaces

    ReplyDelete
  7. I concur a lot with Mr. Shay's comments, especially Gigapspaces unique SBA model,object style IMDG and cloud-oriented deployment.

    However there is the problem I think Gigapspaces needs to do.
    When I talked to some of my peers and principles in my company regarding using Gigaspaces to persist data transactionally and durably, they kind of didn't believe it because they still were thinking the way the traditional disk-based databases are doing.

    Actually several months ago, my company asked a Gigaspaces profession to have a seminar, but there were not many people showing up.

    ReplyDelete
  8. We have several teams from SunGard working with us right now. They seems to "believe us"... :-)
    Please contact me directly for more details. shay@gigaspaces.com. I'm located in NY.
    Tnx
    Shay

    ReplyDelete
  9. Well, you can scale JDBC - but JDBC doesn't really lend itself to scaling well, because it never really shards well. Thus: NoSQL. I've seen some really good results from Gigaspaces' adding transactional write-through capabilities to memcached - normally a read-mostly cache, memcached can now serve as a persistent key/value store. It's really nice.

    Other vendors do the same sort of thing, but IMO, Gigaspaces' memcached is the best I've worked with. The ability to distribute data without having to have an explicit list of server IP/Port combinations is really nice.

    ReplyDelete
  10. TimesTen would only require network access if it's running remotely; however if it's co-located with the application, that application (including Java-based ones) would attach to the TimesTen shared-memory segment and have latency-free access.

    The Cache Grid functionality in TimesTen 11g allows data to be partitioned transparently across multiple TimesTen nodes such that each node has a subset of the cached data for local access; however access to non-local data would require a network round-trip. However for the case where the entire hot data can fit into a single TimesTen memory space, you don't need Cache Grid.

    That said, Oracle Coherence is probably a better point of comparison to Gigaspaces and similar JSR-107 type products.

    ReplyDelete
  11. I'd also add that if you are deploying on WebLogic EE, Coherence is bundled with it. In conjunction with JRockit Real-Time Edition, would provide pretty good response times.

    However, in my experience, if you only have a single (fairly modest) box, TimesTen is going to have the best write performance. A lot of customers use TimesTen deeply embedded.

    ReplyDelete
  12. Can you provide some performance numbers for Coherence and TimesTen for write and read operations with Java applications? From our experience Tangosol got good read performance (well , it is a fine cache product) , but does not have great write performance. TimesTen in the other hand , have good write performance but not that great read performance since it need transfer Java to C structure (and vise versa) and execute network call for every operation.

    With the GigaSpaces case both write and read are optimized and provide good numbers where the write behind designed to utilize the DB in optimal manner while maintaining consistency and transaction boundary.

    ReplyDelete
  13. Gigaspaces is also working with SunGard in other places than in the US. We are currently rebuilding an entire application using Gigasapces for SunGard so I do believe that they, as well as many other companies, are looking into it. My experience as a new Gigaspace user is that it just blows customers away. Maybe they have a hard time believing it at first but after trying it and understaning the architecture they just go "ahh thats how to do it". Other comments I have had is "Well we really didn't do any programming to achieve our goals". As a Java developer for 15 years that is a good sign, a un-intrusive framework allows me to change at any time. And as everybdoy know it is not the choice of technology that makes you application live for a long time , it is a good architecture and Gigaspaces embraces that with it's nice spring base programming support.

    ReplyDelete