| Sign In/My Account | View Cart |
Thus far, we have concentrated on simple, single-column indexes and simple queries that can be tuned in this way. Single-column indexes created on owner and time_create columns will help us with the queries we'll use to filter or sort their values. Even the following query:
SELECT * FROM tbl WHERE owner = 'dejan'
AND time_create > '2006-01-01 00:00:00'
ORDER BY time_create
will have acceptable performances. But if you try to execute this query:
SELECT * FROM tbl WHERE owner = 'dejan' ORDER BY time_create
you will get a very long execution time. This is because of the extra sorting step that the database needs to perform in order to sort data.
The solution for these types of queries is to create an index that will cover both the owner and time_create columns. We can achieve this by executing the following query:
CREATE INDEX tbl_owner_time_create ON tbl(owner, time_create)
With this index in use, the query performance will dramatically improve. Now, notice the following lines in the analyzer log:
Index Scan ResultSet for TBL using index TBL_OWNER_TIME_CREATE
at read committed isolation level using share row locking
chosen by the optimizer
We have helped the database by letting it use a handy index to quickly find already sorted data.
The important thing to notice in this example is that column order in the CREATE INDEX statement is very important. Multiple-column indexes are optimizable by the first column defined during index creation. So, if we had created the following index:
CREATE INDEX tbl_time_create_owner ON tbl(time_create, owner)
instead of one we used previously, we wouldn't see any performance benefits. That is because the Derby optimizer could not consider this index as the best execution path and it would simply be ignored.
Indexes can help us improve performance when data selection is in question. But they slow down database insert, and delete and possibly update operations. Since we not only have table structure, but various index structures, it takes longer for the database to maintain all these structures when data changes.
For example, when we are inserting a row in a table, the database must update all indexes related to columns of that table. That means that it has to insert an indexed column value in the right place in the appropriate index, and that takes time. The same thing happens when you delete a certain row, because the index must be kept ordered. Update actions affect indexes only when you update indexed columns, since the database must relocate those entries in order to keep indexes sorted.
So, the point is to optimize database and application design according to your needs. Don't index every column; you might not use those indexes, and you might need to optimize your database for fast inserting of data. Measure your performance early and identify bottlenecks; only then should you try to implement some of the techniques provided in this article.
In this article we have focused on just a small subset of performance-related issues you can find in everyday development tasks. Most of the principles shown here could be used (with some modifications) to any relational database system available. There are many other techniques that can help you improve the performance of your application. Caching is certainly one of the most effective and widely used approaches. There are many caching solutions for Java developers (some of them, such as OSCache or EHCache, have open source licenses) that could serve as a buffer between the application and database and thus improve overall application performance. Also, many object-relation frameworks used in Java projects (such as Hibernate) have built-in caching capabilities, so you should consider those solutions as well, but that's the material for another discussion.
Dejan Bosanac is a software developer, technology consultant and author. He is focused on the integration and interoperability of different technologies, especially the ones related to Java and the Web.
Return to ONJava.com.
Showing messages 1 through 3 of 3.
The information in the article is very and very intoductory. Which is good and generic. not specific to what a new project can offer. It's not new at all. That is the very very basics of databases. I found out that Derby was preinstalled with Netbeans 5.5 I've downloaded the other day and I said. Ok, that's good. I've got something under the hood that I can use during development. It's an Apache project that's also cool.
But next year will you be using Derby for your super duper web 2.0 site? Guess not. Remember the Apache mail server project?
So my question is why develop a new very generic multi purpose database engine? Please don't get me wrong. There are invaluable Apache projects like lucene, geronimo, jakarta commons, axis. But not a mail server or a database I guess. The play field is pretty much occupied.