Archive for June, 2008



The Sin of Band-Aids

Friday, June 20th, 2008

Let’s take a trip to the past…we weren’t actually there, but we can pretend. Let’s look into a conversation between a DBA, developers, and management at some time ‘way back when.’

Management: Our system has been performing poorly for months now.
Developers: We didn’t change anything.
DBA: I didn’t change anything.
Management: We need a fix.
Developers: We don’t have time to go through all our code and tune it.
DBA: *frantically types: alter system set optimizer_index_caching = 35; alter system set optimizer_index_cost_adj = 25;* Fixed!

Fast forward…

Queries are not performing as expected. Sometimes a query will run fine, sometimes it runs horribly. Sometimes a combination of queries and users results in massive buffer busy waits or ‘latch: cache buffers chains’ waits. Nothing is working and so finally, a consultant is called.

You’ve probably guessed that I was the consultant. Move to the head of the class! In particular, there were three or four main offenders; queries that would sometimes perform well, then all of a sudden ‘turn horrible.’ The core of the problem was truly unstable execution plans.

Using ASH I was able to go through different times of day and compare reports from times where the system performed well with times the system performed horribly. Doing so allowed me to compare SQL_IDs between different times of day to see which queries seemed to be going through dramatic changes. Once a SQL_ID was found that experienced such a dramatic change, I used DBMS_XPLAN.DISPLAY_AWR to find all the different iterations of the query.

In case you’ve never used it, DBMS_XPLAN.DISPLAY_AWR is very useful. It allows you to input only a SQL_ID, and Oracle will show you the explain plans for that SQL_ID as recorded in the AWR. For instance:

select * from TABLE(dbms_xplan.display_awr(’95t7cmj84u4jz’));

The next step was to look through the results to find any inconsistencies large enough to result in huge differences in performance. In this situation, I found that ‘good’ explain plans were using decent indexes, NESTED LOOPs, and low overall bytes. The ‘bad’ explain plans were using index joins, where an index fast full scan would join with an index range scan into a view (e.g. index$_join$_003). These plans resulted in many more block touches and much more CPU used.

Using Grid Control, I looked at the timeline for each of the hash values of the SQL_ID and saw that the poorly performing plans lined up with their bad performance periods.

Of course, now the burning question: what can be done about this?

One solution is to set the _index_join_enabled parameter to FALSE, disabling the index joins completely. Based on query analysis it appears that for right now, the pros will far outweigh the cons. But what about queries where index joins perform more optimally? What if Oracle replaces the index joins with something even worse?

The better solution is to negate the need for OPTIMIZER_INDEX_COST_ADJ and OPTIMIZER_INDEX_CACHING to be set to such extreme values (or to be set at all). Though these parameters can be very useful in the hands of a skilled DBA, they can also be very dangerous if improperly used. In this case, the CBO never would have chosen to use index joins if the OPTIMIZER_INDEX_COST_ADJ weren’t set so extremely low.

For testing purposes I took one of the problem queries and ran it with OPTIMIZER_INDEX_COST_ADJ = 100 and OPTIMIZER_INDEX_CACHING = 0. The query performed no index joins, but it did perform two full table scans and overall the results were less than satisfactory. Without those two parameters, the queries remained ‘unfixed’ as far as runtime is concerned.

So in the end, I have struck a compromise with the client. If another period of poor performance occurs, we can attempt setting “_index_join_enabled” to false to try to band-aid the band-aid of their performance problems. However (and that’s a big however), they need to take the time to work on a more permanent solution. For instance, using development to test and tune queries with OPTIMIZER_INDEX_CACHING = 0 and OPTIMIZER_INDEX_COST_ADJ = 100 (or at least a less extreme value).

Otherwise it’s a band-aid…then another…then another…until the true performance issue is so obfuscated it may never be found.

I’ve worked with another client that does somewhat the same thing. If they use a resource manager plan, their system will not become overloaded and things seem to work very well. Turn off the plan and the runqueue goes through the roof (in the hundreds) and cache buffers chains latches abound. Unfortunately whenever a problem happens, all one sees taking DB time is ‘resmgr: cpu quantum’. For them, throttling is a fix. I’d prefer a full tuning expedition that fixes the actual cache buffers chains issue.

Conclusions

  • There is nothing wrong with using OPTIMIZER_INDEX_CACHING and OPTIMIZER_INDEX_COST_ADJ. However it is important to find a happy medium in your settings, and to know exactly what you are doing.
  • There is nothing wrong with using a resource manager plan. But are you doing it for balance or because your system is horrible without it?
  • _index_join_enabled might be able to negate one of the negative effects of an extremely low OPTIMIZER_INDEX_COST_ADJ setting
  • DBMS_XPLAN.DISPLAY_AWR helps find situations where the same query might result in drastically different explain plans
  • If you can get it approved, it is always best to follow up a quick fix with a more comprehensive solution (such as tuning the actual queries directly).
  • Good luck getting it approved. Instant gratification is hard for a company with poor performance to turn down. Broad scale query tuning and redesigns can be hard to get approved depending on the budget
  • Anything in extremes can be bad

Ladies and Gentlemen, Meet Hellraiser

Tuesday, June 17th, 2008

Sometimes you just need to take a moment and appreciate the things you have in life. And while I always appreciate my lovely wife and my darling kids, this time my appreciation is for something a little more material.

Hellraiser
Hellraiser

Okay, so Hellraiser is just a computer, but what a fine computer it is.

    The CyberpowerPC Gamer Infinity 9310

  • Quad Core 2.4GHz 64-bit processor
  • 8GB DDR2 800 RAM
  • 6 SATA drive bays, RAID configurable
  • Dual NVIDIA GeForce 8800GT 512MB 16X PCI Express Video Cards
  • and all the usual fixins

If you’re ever in the market for a machine that will act something like a server when you run Oracle, this one does the trick. Not only that, but it comes with a price tag of only $1,719.99 from NewEgg.com (note that price includes only 4GB RAM and one 500GB 7200RPM SATA drive).

It is extremely quiet, lightning fast, very powerful, and highly recommended. The graphics are overkill for Oracle, but great for whatever graphic intensive software you might also be running. However, I will say that it is HUGE. The system in box weighs 72 pounds, and stands a little higher than my knee. But with that comes the thrill of knowing that your home computer can probably run Oracle better than many of your clients’ servers!

Comments Fixed

Sunday, June 15th, 2008

To anyone who has tried to post comments to my blog entries, I greatly appreciate it and I am sorry that the comments did not go through! I recently updated the mod_security parameters on my server, and it appears that the new rules break some WordPress functionality. I have approved all comments that did not previously go through.

An Oracle Schema Filesystem

Sunday, June 15th, 2008

I’m one of those types that always needs a project. If I run out of projects using the things I know, I take on a project that involves something I don’t know whatsoever.

For instance, building a filesystem in C that connects to Oracle and uses queries in order to create inode structures.

It’s not done by a long shot; hell, it’s not even presentable, but I’m still kind of proud of it since:

  • I didn’t know C when I started it
  • It’s a neat concept
  • If I can remember how I did it, it could be an interesting piece of software

I won’t pretend that I know how to do complex filesystem work. I did have some help.

FUSE – Filesystems In Userspace

The main component (other than C itself and the OCI) is a software package called FUSE. Basically it is an interface to the kernel that allows user created pass-through filesystems. Instead of creating a filesystem that maps only to disk, you can create directory structures, files, etc. that map to anything you can imagine. I have seen a Wikipedia Filesystem, a Zip Filesystem, and several others that are very interesting. One in particular struck me as a good idea: DBToy from thesaguaros.com. DBToy is a database browser for use with MySQL or Postgres. Using that tool as a guide, I was able to hack out a version for Oracle.

Example Usage

[root@615349 orafs]# ls -ltr
total 148
-rw-r--r--  1 oracle dba   1542 Sep 23  2007 orafs.h
drwxr-xr-x  2 root   root  4096 Sep 23  2007 old
-rw-r--r--  1 oracle dba    804 Jun 15 00:13 makeit
-rw-r--r--  1 oracle dba   5824 Jun 15 00:17 orafs.c
drwxr-xr-x  2 root   root  4096 Jun 15 00:19 mount
drwxr-xr-x  5 root   root  4096 Jun 15 01:21 backup
-rw-r--r--  1 oracle dba  14070 Jun 15 01:21 ora_driver.c
-rw-r--r--  1 root   root 15084 Jun 15 01:21 ora_driver.o
-rw-r--r--  1 root   root 13972 Jun 15 01:21 orafs.o
-rwxr-xr-x  1 root   root 29612 Jun 15 01:21 orafs
[root@615349 orafs]# ls -ltr mountdir
total 0
[root@615349 orafs]# ./orafs mountdir
[root@615349 orafs]# cd mountdir
[root@615349 mountdir]# ls -ltr
total 0
drwxr-xr-x  2 root root 0 Dec 31  1969 TSMSYS
drwxr-xr-x  2 root root 0 Dec 31  1969 SCOTT
drwxr-xr-x  2 root root 0 Dec 31  1969 OUTLN
drwxr-xr-x  2 root root 0 Dec 31  1969 HR
drwxr-xr-x  2 root root 0 Dec 31  1969 FLOWS_FILES
drwxr-xr-x  2 root root 0 Dec 31  1969 FLOWS_020100
drwxr-xr-x  2 root root 0 Dec 31  1969 DIP
drwxr-xr-x  2 root root 0 Dec 31  1969 DBSNMP
drwxr-xr-x  2 root root 0 Dec 31  1969 CTXSYS
drwxr-xr-x  2 root root 0 Dec 31  1969 ANONYMOUS
[root@615349 mountdir]# cd HR
[root@615349 HR]# ls -ltr
total 0
drwxr-xr-x  2 root root 0 Dec 31  1969 REGIONS
drwxr-xr-x  2 root root 0 Dec 31  1969 LOCATIONS
drwxr-xr-x  2 root root 0 Dec 31  1969 JOBS
drwxr-xr-x  2 root root 0 Dec 31  1969 JOB_HISTORY
drwxr-xr-x  2 root root 0 Dec 31  1969 EMPLOYEES
drwxr-xr-x  2 root root 0 Dec 31  1969 DEPARTMENTS
drwxr-xr-x  2 root root 0 Dec 31  1969 COUNTRIES
drwxr-xr-x  2 root root 0 Dec 31  1969 CDEMO81_EMP
drwxr-xr-x  2 root root 0 Dec 31  1969 CDEMO81_DEPT
drwxr-xr-x  2 root root 0 Dec 31  1969 CAR
[root@615349 HR]# cd EMPLOYEES
[root@615349 EMPLOYEES]# ls -ltr
total 0
-r--r--r--  1 root root 0 Dec 31  1969 data.csv
[root@615349 EMPLOYEES]# cd ../COUNTRIES
[root@615349 COUNTRIES]# ls -ltr
total 0
-r--r--r--  1 root root 0 Dec 31  1969 data.csv
[root@615349 COUNTRIES]# cd ../../..
[root@615349 orafs]# umount mountdir

 

But What Did It Do?

When I ran the orafs program, two things happened:

  1. A connection to Oracle was made as SYSTEM
  2. A FUSE filesystem was mounted on the mountdir directory using API calls

This allowed me to cd to the mountdir directory and use standard OS commands like ls -ltr. The first level directory is schema names. In each schema directory are directories for each table. In each table directory is a file called data.csv. It doesn’t actually contain anything yet, but hey, it’s a start. Just being able to navigate through Oracle’s data dictionary at the OS level is neat enough.

Also notice that to remove it, I had to use the umount command. This isn’t a fake mounting of any kind, it truly did get mounted at the OS level!

Nothing On Disk

One of the coolest features of FUSE is that nothing had to be written at all to disk. The data from the data dictionary was not written to disk at any time. The orafs program used no flat files or anything of the sort. When ls -ltr is called, it actually did a query on the fly. For instance, select table_name from dba_tables where owner = ?. My code passes results to FUSE, and FUSE makes it understandable to the OS.

Great, But Why?

Yeah, that’s the big question. The reason I even did it in the first place was that I thought it would be a neat concept. Then it was a challenge. Now it’s something for which I have absolutely no time!

But if I had things my way, here’s what I’d love to do with it:

  • Have 3 files in each table subdirectory: data.xml, data.csv, and ddl.sql. The data.xml and data.csv files would have all of the rows from the table arranged in the proper format. This table would not actually be queried until you opened the file in some way. Imagine, simply doing cp data.csv /home/oracle/table.csv to get a comma delimited dump of your data! ddl.sql would, of course, contain the DDL to regenerate the object.
  • Create symbolic links to child tables based on constraints
  • Use the creation date from DBA_OBJECTS and DBA_USERS to set the dates when browsing the directory
  • Allow command line args for login/password instead of hardcoding. I had this at one point but got rid of it due to coding issues
  • Figure out how to use malloc() and calloc() properly so I don’t waste memory

It’s a big list that I’ll probably never get through; however, it doesn’t hurt to have goals.

In fact, imagine some other uses for FUSE with Oracle (maybe I’ll tackle them one day):

  • Mount an ASM diskgroup, allowing ASMCMD-like commands for browsing, but also allowing cp commands to a normal filesystem using RMAN commands in the code
  • Mount a table that contains a BLOB and a column containing the file name of each row. You could actually see the BLOBs on disk with the proper filename, all dynamic from the DB without duplicating space usage

I may actually try to learn some of the FUSE bindings I’ve seen. FUSE-J for instance is a Java binding, which would probably make the coding MUCH easier. However, the thought of rewriting all the code I’ve done so far for orafs makes my head hurt.

Anyone good with C and OCI that would like to see the code? I’d be happy to share, maybe someone else can do something with it. After you fix all my horrible code, of course.

Hey Guys, Does Size Matter?

Friday, June 13th, 2008

Last night I was called in on an unfortunate problem. A client has an update that they must run (unfortunately) which updates ~830,000 rows, setting one column equal to the other (two column table). On their development environment this was taking roughly twenty seconds to perform. However, on their soon-to-be production environment it was taking roughly 45 minutes.

Explain plans were checked, trace files examined, and not much popped up except that the production machine was attempting larger I/Os during the update and was consequently taking much longer. Comparing the initialization parameters between production and development showed the exact same parameters, except that the upcoming production box was using a 16k block size and development was using a 4k block size.

There was one other difference, which was that the upcoming production server was 64-bit and the development server was 32-bit. In order to make sure everything was the same, I created a fresh instance on the 64-bit production server with a block size of 4k, exported the objects from the instance with the 16k block size, and imported them.

The final result: When the update was run against the 16k blocksize DB, it took 45 minutes. Against the 4k blocksize DB on the same box with the same parameters and the same FS mounts, it took 2 minutes 20 seconds.

I even took it a step further to see if we could squeeze any more performance out of it. Setting FILESYSTEMIO_OPTIONS=’SETALL’ (instead of none) I was able to get the update down to 1.5 minutes. Turning off DB_BLOCK_CHECKING (not recommended) I was able to get it down to 10 seconds.

By going from a 16k blocksize to a 4k blocksize with all other things being equal, we experienced roughly a twenty times improvement. We shaved off even more time setting FILESYSTEMIO_OPTIONS = SETALL. And then we changed DB_BLOCK_CHECKING, a parameter Oracle documentation says only adds a 1 to 10% overhead depending on concurrency of DML, which made the update 6 times faster alone.

The final result was a 270 times improvement over the original.

To be fair, I also tried setting the FILESYSTEMIO_OPTIONS and DB_BLOCK_CHECKING the same on the 16k blocksize instance, which resulted in the update taking 30 minutes as opposed to 45. The results were better, but the 4k blocksize database still won by 180 times.

What’s more, all queries both large and small performed the same or better than in production, and a test insert of 100,000 rows went from 20 seconds on the 16k blocksize to 3 seconds on the 4k.

Stats

Oracle 9.2.0.8 on RHEL4 64-bit, 8CPU, 2+2 RAID10 for Data with a 128k segment size, 2+2 RAID10 for Redo, Hardware diagnostics report no issues

Conclusion

Honestly, there are no final conclusions yet. Like a weight loss commercial that shows a 400 pound man slimming up to 150 pounds, ‘results may not be typical’. While it definitely shows that changing only blocksize can have a profound effect, more thorough analysis will help get to the core of why it had such an effect. Whatever the case, it definitely shows us to keep an open mind in tuning. Leave no stone unturned!

NOTE: There is one interesting outcome that I’ve observed: on the 16k blocksize database, the UPDATE statement performs full table scans against FILE$. On the 4k blocksize database, it performs range scans using the FILE_I2 index. This only makes a difference of about 1 consistent get per execution, but it is worth noting.

NOTE: It is also worth reiterating, this is and Oracle 9.2.0.8 database.

Undone by Undo

Thursday, June 12th, 2008

It has been quite a long time since I posted here. However, all I can say is that I’ve been very busy, and I’m happy that work is so plentiful that I barely have time for blogging!

Yesterday I got a call from a client that was having some trouble. Their runqueue was through the roof with no explanation as to why. Inserts, especially, were causing them grief, hanging for quite a long time.

Identifying the Problem

The system is 10.2.0.4 64-bit on Solaris 10, 16 CPU cores, 32GB RAM. In order to locate what the problem was, the first step was to do a little fact finding. What was going on when it started going bad? What kind of things happened in the past 24 hours that I should know about? Those questions along with some basic queries showed the issue.

It turns out that the night previous a huge update was run in a loop by mistake. This update ran for a while and was finally killed. The following morning things seemed to run fine, but massive waits started piling up while inserting/deleting from a large table used for tracking purposes (and therefore used constantly). I used the following query to find out the bottlenecks currently contributing to the issue:

select event, count(*) from v$session_wait group by event;

This simple query is extremely powerful. For anyone who has not used the v$session_wait table, let me give you an introduction. This table contains each SID (session logged into the instance), a wait event, and supporting information in the p1, p2, and p3 columns (you can look up the meaning of these columns in v$event_name on 10g).

The results of this query showed a great deal of waits. Library cache pins, cache buffers chains latches, and a few others. Because of experience on this system I knew that these waits were usually brought up as symptoms to a real issue, so I looked for any waits that were not as common and found these: enq: US – contention, wait for stopper event to be increased, and wait for a undo record.

Those waits, along with the knowledge of the killed updates from the night previous, and the fact that mostly DML was suffering was enough to put together the assumption that undo was the cause of these issues. A quick check of ‘top’ showed that the SMON process was consuming a fair amount of CPU on the system along with some PX processes, which definitely looked like fast start parallel recovery. I did a quick check on Metalink and found Note 464246.1 – DATABASE HANG DUE TO PARALLEL TRANSACTION RECOVERY. Assumption confirmed, for the most part.

Parallel Transaction Recovery

When you perform DML in Oracle your changes will go to UNDO (among other places). Specifically your changes are placed into an UNDO segment inside your UNDO tablespace. Please note that on RAC, each node has its own active UNDO tablespace while on single instance systems only one UNDO tablespace may be active at a time (unless you’re switching to a new one). But what does UNDO actually accomplish?

  • Transaction rollback. UNDO is the mechanism by which you can undo your changes before a commit with the ROLLBACK command. Also, if a process is abnormally terminated UNDO will be used to clean up any uncommitted transactions.
  • Flashback. Oracle 10g is sometimes called the Flashback Database. Flashback Query and Flashback Table (but not dropped table) utilize UNDO.
  • Read consistency. If you start a query at 9AM and it takes 3 hours, are the results you see from 9AM or noon? The answer is 9AM, the moment you pressed the enter key. Oracle accomplishes this through read consistency, which pulls the data you require from current data blocks or consistent read blocks, which is UNDO. If you do not have enough UNDO to satisfy your long running queries, you may receive an ‘ORA-01555: snapshot too old, rollback segment too small’ error.
  • Recovery. If your instance is abnormally terminated (kill -9 pmon, shutdown abort, janitor trips over cable), SMON will handle recovery on instance startup. This involves four phases: mount the database, roll forward all redo since the last checkpoint to the datafiles, open database, and rollback all non-committed transactions.

Undo Recovery

That last bullet point is the reason things went bad for my client. When Oracle rolls back all uncommitted transactions during an instance startup, SMON will spawn parallel processes to do the work. In addition, the server process of connected users will also try to rollback uncommitted transactions if they stumble upon any while doing their own work. During this time if any processes are using excessive amounts of UNDO things will get worse and worse. For instance, inserts to a table by many connected users while jobs try to keep the data clean with deletes on the backend could cause a lot of UNDO contention.

The FAST_START_PARALLEL_ROLLBACK parameter controls how much juice SMON will take. The three possible parameters are: FALSE, LOW, and HIGH. A setting of FALSE will tell SMON to do the recovery on its own without parallel assistance. This could work to allow your app to remain online while SMON does its job, but could still cause contention issues. In the end, it is better to let SMON finish the work while other processes try to lay off UNDO as much as possible. A setting of LOW will tell SMON to use 2 * CPU_COUNT parallel processes to do the work, and a setting of high will use 4 * CPU_COUNT. In order to allow SMON to finish its work as quickly as possible, we turned off their app and set the FAST_START_PARALLEL_ROLLBACK parameter to HIGH, allowing SMON all the resources it needed to finish. To determine how long it would take to finish, we consulted the V$FAST_START_TRANSACTIONS view (hey, a V$ view that’s pluralized!).

select state,
undoblocksdone,
undoblockstotal,
undoblocksdone / undoblockstotal * 100
from v$fast_start_transactions;

STATE            UNDOBLOCKSDONE UNDOBLOCKSTOTAL UNDOBLOCKSDONE/UNDOBLOCKSTOTAL
---------------- -------------- --------------- ------------------------------
RECOVERING                94160           95432                     98.6671138

The result you see there was near the end of course, with only 1,272 blocks to go to completion. Once the work completed, we were able to bring the app back up.

Undone by Undo

But then, confusion struck. Runqueue started creeping up again, and all the while the question was asked, “Why didn’t this problem affect us until the afternoon?” Soon contention was back up on the system with the ‘enq: US – contention’ wait in the forefront. Even though UNDO fast start parallel rollback was finished, contention was still high.

A small amount of further investigation provided the answer: background jobs overlapping while deleting from high concurrency tables. This meant that multiple jobs were attempting to do the same deletes over and over again while the app was still trying to insert into those tables, causing locks and UNDO contention. That also provided the answer to the question of delayed issues. The system was able to live with the amount of UNDO being generated during the less busy morning hours; however, once things ramped up, and with fast start parallel rollback taking so much horsepower, the structure of their system started breaking down at all levels. Previous to calling me the client also did a shutdown abort and a Dataguard failover, which unfortunately made the problem worse (more UNDO to recover). The result was a catastrophic UNDO pile up.

The Solution

In the end though, we can’t blame UNDO for the problem. It was only doing its job to preserve the integrity of the system. However, we can take away two things from this problem:

  1. In the case of excessive CPU usage due to UNDO following an instance failure, SMON can be throttled with the FAST_START_PARALLEL_ROLLBACK parameter. You can monitor the progress using the V$FAST_START_TRANSACTIONS view.
  2. Take care of your apps! Any scripts that perform DML should include a portion at the top that checks to see if it is already running (ps -ef | grep [p]urge.sh for instance). This will take care of overlapping process issues that are bound to cause contention. In addition, avoid unnecessary DML. Do you really need to insert/update/delete as much as you are? Is there another way that perhaps uses memory, or even flat files to avoid a constant barrage on the database layer?

In the end, the big conclusion is this: anything in excess can be problematic, especially when there are waits involved.