J2EE Web application performance - SQL query optimization

For J2EE projects, performance issue is a common problem detected mostly towards end of the project lifecycle. In many projects fixing performance problems takes substantial percentage of testing time.  Sometimes performance problems remain undetected till the Web application is released to production system which causes sleepless nights for the project teams!

J2EE performance problems are typically detected when the page response time is more than specified by technical requirements. In Web applications, response times needed range from 2 - 10 seconds. One of the major reasons why Web applications don’t meet response time requirements is the bad design of database queries. This article looks at what can cause query performance bottleneck and ways to resolve them.

Database queries and J2EE application performance

Following are some of the common types of SQL query problems I have seen during code reviews. Some of these are trivial which can be easily fixed. But problems due to complex queries or lack of indexes can be a difficult performance problem to tackle. Only an SQL expert can resolve such issues. Sometimes you will have to look for database specific solutions (materialized views for example!).

Usage of nested queries - This is probably the biggest reason why many applications perform poorly. Java programmers who are inexperienced in SQL programming writes nested queries instead of using table joins. Let me give a simplified example,

Consider a scenario which needs to get data from CUSTOMER table to display a list of customers meeting a criteria. Assume that we also need to display the customer’s user id from USER table. Here is the bad algorithm which can cause performance problems,

  1. Select required customer from CUSTOMER table.
  2. For each customer selected, query USER table for user id.

In the above algorithm if we are selecting 10 customers from the table, a total of 11 queries will be fired. If each query takes 0.1 seconds, the total time taken for this simple processing alone will be over 1 second!

Now the correct algorithm is,

  1. Join CUSTOMER and USER tables to get all required data in single query.

The query may take a little bit more time usually. Even assuming a performance penalty factor of 2, the time taken will be 0.2 seconds!

Of course the above example is a trivial one. Yet in real code you find very similar approach(nested queries) causing a lot performance problems.

Lack of “static lookup data” caching - There is another variation of the above problem involving lookup data. Typically this data is maintained in a single table which contains code and corresponding description. STATUS is an obvious example for which value codes are usually O, C, P etc. with corresponding descriptions as Open, Closed and Pending. The master tables which refer to the lookup data table contains only the value codes. For example, CUSTOMER table possibly can have a column called STATUS which contains status codes (O,C, P etc.). Now fetching data from CUSTOMER table will involve joining with lookup table similar to the following,

SELECT C.CUSTNAME, S.DESCRIPTION FROM CUSTOMER C, LOOKUP S 
WHERE C.STATUS = S.CODE AND S.TYPE='customerstatus';

In this case this may not have a big performance penalty, but if there are more columns in CUSTOMER which requires a join with LOOKUP, queries will eventually become complex and slow.

Now if you look at data in LOOKUP table you realize that most of the time they are static. In a production system, customer statuses are a predefined set of values. So performance can be drastically improved if this data can be cached in the application.

In the above example we could load the entire status codes in a Hash Table with CODE as the key. Rewrite the above query as follows,

SELECT C.CUSTNAME, C.STATUS FROM CUSTOMER C;

Now in the Java code where you are fetching values from table and assigning it to object, do a Hash Table (our lookup data cache) lookup for the CODE as given below,

objCust.setName(resultSet.getString("CUSTNAME"));
objCust.setCustomerStatus((String)statusMap.get(resultSet.getString("STATUS")));

An obvious place to load lookup data is during application startup. You can configure a servlet for this purpose which is auto loaded from web.xml. Use JNDI for the cache (you can also use a simple static variable if there is no plan for application clustering).

Sometimes lookup data can change from application. In such cases you need a mechanism to refresh the global lookup data cache.

Lack of indexing - Proper indexing of table columns can dramatically improve query performance. Indexing becomes more and more important as the table data increases. Sometimes it is necessary to create composite indexes. The indexing decision can only be made after analyzing majority of application queries and also looking at the execution speed requirements of the queries.  Always review SQL execution plan of queries before and after the creation of the index.

If the indexed column contains a small set of unique values, you will need to use a bitmap index (Oracle).

Inefficient query design - Sometimes you find that queries are still slow after creating indexes and after removing nested queries. This most probably means that there is something wrong with the query design. Such problems are very common when an ORM solution such as Hibernate is used. In such cases your query may be accessing more tables than is necessary. Turn on the query logging facility of the ORM tool (show_sql attribute in Hibernate for example) to validate the ORM auto generated queries. You can then change the ORM mapping configuration or replace ORM with a native SQL query as needed to improve performance.

Complex queries - Sometimes you are stuck with a small number of slow queries which is almost impossible to optimize. Whatever you do (indexing, query rewriting etc.) the query is slow complex that it takes a lot of time. For example, reporting queries can be very time consuming for large data sets. There are two ways to handle such queries.

One way is to change such reports to a scheduled report which is not generated in real time. User can schedule a  report and once it is complete he is informed through email or as an application alert. In such cases you will need temporary tables which can hold the report result.

Another interesting way to optimize complex queries is to use materialized views. A materialized view is a virtual table which holds a query result. It is very similar to a table view except that in materialized view the data is already queried and a snapshot is stored. This means that queries to a materialized view is extremely fast. But it also means that you may be dealing with stale data. So in order for this approach to work, you need to find out sub queries in the complex query which rarely changes or even a change doesn’t affect results/reports much. Such sub query results can be made into a materialized views and can then be used instead of the sub query. You will also require a strategy to refresh the materialized view which depends on application requirements.

Summary

  1. Minimize number of queries.
  2. Optimize queries by query analysis.
  3. Identify and create required indexes.
  4. Use application caching for static data.
  5. Use materialized views for complex queries.
  6. Convert an online query to scheduled query if none of the above works!
May 28, 2008 | Posted in Programming

Leave a Comment