Monday, October 25, 2010

Is the "ORDER BY" in HQL and EJB-QL based on lexicography or dictionary?

This topic is inspired by Dominik Dunz's comment on my Hibernate tuning article "Revving up Your Hibernate Engine" on InfoQ.

In Java, you sort strings lexicographically (based on the underlying character's encoding values) using class String's compareTo() method.
You can also sort strings based on a locale's dictionary using class Collator's compare() method.

So now you should ask which sorting the "order by" I wrote in HQL or EJB-QL supports?
Currently there is no any QL syntax for you specify either a lexicographic or dictionary order. So both HQL and EJB-QL just literally pass the "order by" clause to the back-end database. It is your database session that decides the sorting.

In case of Oracle, it also supports sorting lexicographically (binary in Oracle's term) or based on dictionary(linguistic in Oracle's term).
Your Oracle session decides the sorting. Specifically if the session's NLS_COMP is "binary" it will sort lexicographically(based on the string's underlying encoding values).
If the session's NLS_COMP is "linguistic", it will sort based on the dictionary of the locale that you specified in NLS_SORT.

If you use Oracle's JDBC thin driver in an application server, the application server's JVM decides the values of NLS_COMP and NLS_SORT.

You can always do sorting in your application tier based on your business logic instead of relying on your database. But your application tier sort probably will be slower than your DB sorting.
However there are many complications if you want to use your back-end database sorting to implement your business logic sorting.
  1. Your database may only support lexicographical sorting;
  2. Even lexicographical sorting is much simpler than dictionary sorting, your database session's character encoding may not be Java string's UNICODE. However it may not be a big deal to change your DB's charactor encoding to Java string's UNICODE or be a subset of Java string's UNICODE.
    You also need to make sure that your DB's lexicographical sorting is the same as your Java's. In case of Oracle, it basically has the same lexicographical sorting as Java String's compareTo() method.
  3. Java's linguistic sorting may not be the same as your DB's. You need to carefully exam documents from both Java and your DB.
    You can find Oracle's linguistic sorting logic from this link.
  4. It is becoming more complicated if you have a there-tier architecture where the front-end UI (either Swing or a browser) decides the sorting logic because the same database session in the back-end can be shared by different front-end user sessions.
    You have to change your DB session's sorting whenever your fron-end UI has changed.

1 comment: