Archive for the 'Oracle' Category



Hey Guys, Does Size Matter?

Friday, June 13th, 2008

Last night I was called in on an unfortunate problem. A client has an update that they must run (unfortunately) which updates ~830,000 rows, setting one column equal to the other (two column table). On their development environment this was taking roughly twenty seconds to perform. However, on their soon-to-be production environment it was taking roughly 45 minutes.

Explain plans were checked, trace files examined, and not much popped up except that the production machine was attempting larger I/Os during the update and was consequently taking much longer. Comparing the initialization parameters between production and development showed the exact same parameters, except that the upcoming production box was using a 16k block size and development was using a 4k block size.

There was one other difference, which was that the upcoming production server was 64-bit and the development server was 32-bit. In order to make sure everything was the same, I created a fresh instance on the 64-bit production server with a block size of 4k, exported the objects from the instance with the 16k block size, and imported them.

The final result: When the update was run against the 16k blocksize DB, it took 45 minutes. Against the 4k blocksize DB on the same box with the same parameters and the same FS mounts, it took 2 minutes 20 seconds.

I even took it a step further to see if we could squeeze any more performance out of it. Setting FILESYSTEMIO_OPTIONS=’SETALL’ (instead of none) I was able to get the update down to 1.5 minutes. Turning off DB_BLOCK_CHECKING (not recommended) I was able to get it down to 10 seconds.

By going from a 16k blocksize to a 4k blocksize with all other things being equal, we experienced roughly a twenty times improvement. We shaved off even more time setting FILESYSTEMIO_OPTIONS = SETALL. And then we changed DB_BLOCK_CHECKING, a parameter Oracle documentation says only adds a 1 to 10% overhead depending on concurrency of DML, which made the update 6 times faster alone.

The final result was a 270 times improvement over the original.

To be fair, I also tried setting the FILESYSTEMIO_OPTIONS and DB_BLOCK_CHECKING the same on the 16k blocksize instance, which resulted in the update taking 30 minutes as opposed to 45. The results were better, but the 4k blocksize database still won by 180 times.

What’s more, all queries both large and small performed the same or better than in production, and a test insert of 100,000 rows went from 20 seconds on the 16k blocksize to 3 seconds on the 4k.

Stats

Oracle 9.2.0.8 on RHEL4 64-bit, 8CPU, 2+2 RAID10 for Data with a 128k segment size, 2+2 RAID10 for Redo, Hardware diagnostics report no issues

Conclusion

Honestly, there are no final conclusions yet. Like a weight loss commercial that shows a 400 pound man slimming up to 150 pounds, ‘results may not be typical’. While it definitely shows that changing only blocksize can have a profound effect, more thorough analysis will help get to the core of why it had such an effect. Whatever the case, it definitely shows us to keep an open mind in tuning. Leave no stone unturned!

NOTE: There is one interesting outcome that I’ve observed: on the 16k blocksize database, the UPDATE statement performs full table scans against FILE$. On the 4k blocksize database, it performs range scans using the FILE_I2 index. This only makes a difference of about 1 consistent get per execution, but it is worth noting.

NOTE: It is also worth reiterating, this is and Oracle 9.2.0.8 database.

Undone by Undo

Thursday, June 12th, 2008

It has been quite a long time since I posted here. However, all I can say is that I’ve been very busy, and I’m happy that work is so plentiful that I barely have time for blogging!

Yesterday I got a call from a client that was having some trouble. Their runqueue was through the roof with no explanation as to why. Inserts, especially, were causing them grief, hanging for quite a long time.

Identifying the Problem

The system is 10.2.0.4 64-bit on Solaris 10, 16 CPU cores, 32GB RAM. In order to locate what the problem was, the first step was to do a little fact finding. What was going on when it started going bad? What kind of things happened in the past 24 hours that I should know about? Those questions along with some basic queries showed the issue.

It turns out that the night previous a huge update was run in a loop by mistake. This update ran for a while and was finally killed. The following morning things seemed to run fine, but massive waits started piling up while inserting/deleting from a large table used for tracking purposes (and therefore used constantly). I used the following query to find out the bottlenecks currently contributing to the issue:

select event, count(*) from v$session_wait group by event;

This simple query is extremely powerful. For anyone who has not used the v$session_wait table, let me give you an introduction. This table contains each SID (session logged into the instance), a wait event, and supporting information in the p1, p2, and p3 columns (you can look up the meaning of these columns in v$event_name on 10g).

The results of this query showed a great deal of waits. Library cache pins, cache buffers chains latches, and a few others. Because of experience on this system I knew that these waits were usually brought up as symptoms to a real issue, so I looked for any waits that were not as common and found these: enq: US - contention, wait for stopper event to be increased, and wait for a undo record.

Those waits, along with the knowledge of the killed updates from the night previous, and the fact that mostly DML was suffering was enough to put together the assumption that undo was the cause of these issues. A quick check of ‘top’ showed that the SMON process was consuming a fair amount of CPU on the system along with some PX processes, which definitely looked like fast start parallel recovery. I did a quick check on Metalink and found Note 464246.1 - DATABASE HANG DUE TO PARALLEL TRANSACTION RECOVERY. Assumption confirmed, for the most part.

Parallel Transaction Recovery

When you perform DML in Oracle your changes will go to UNDO (among other places). Specifically your changes are placed into an UNDO segment inside your UNDO tablespace. Please note that on RAC, each node has its own active UNDO tablespace while on single instance systems only one UNDO tablespace may be active at a time (unless you’re switching to a new one). But what does UNDO actually accomplish?

  • Transaction rollback. UNDO is the mechanism by which you can undo your changes before a commit with the ROLLBACK command. Also, if a process is abnormally terminated UNDO will be used to clean up any uncommitted transactions.
  • Flashback. Oracle 10g is sometimes called the Flashback Database. Flashback Query and Flashback Table (but not dropped table) utilize UNDO.
  • Read consistency. If you start a query at 9AM and it takes 3 hours, are the results you see from 9AM or noon? The answer is 9AM, the moment you pressed the enter key. Oracle accomplishes this through read consistency, which pulls the data you require from current data blocks or consistent read blocks, which is UNDO. If you do not have enough UNDO to satisfy your long running queries, you may receive an ‘ORA-01555: snapshot too old, rollback segment too small’ error.
  • Recovery. If your instance is abnormally terminated (kill -9 pmon, shutdown abort, janitor trips over cable), SMON will handle recovery on instance startup. This involves four phases: mount the database, roll forward all redo since the last checkpoint to the datafiles, open database, and rollback all non-committed transactions.

Undo Recovery

That last bullet point is the reason things went bad for my client. When Oracle rolls back all uncommitted transactions during an instance startup, SMON will spawn parallel processes to do the work. In addition, the server process of connected users will also try to rollback uncommitted transactions if they stumble upon any while doing their own work. During this time if any processes are using excessive amounts of UNDO things will get worse and worse. For instance, inserts to a table by many connected users while jobs try to keep the data clean with deletes on the backend could cause a lot of UNDO contention.

The FAST_START_PARALLEL_ROLLBACK parameter controls how much juice SMON will take. The three possible parameters are: FALSE, LOW, and HIGH. A setting of FALSE will tell SMON to do the recovery on its own without parallel assistance. This could work to allow your app to remain online while SMON does its job, but could still cause contention issues. In the end, it is better to let SMON finish the work while other processes try to lay off UNDO as much as possible. A setting of LOW will tell SMON to use 2 * CPU_COUNT parallel processes to do the work, and a setting of high will use 4 * CPU_COUNT. In order to allow SMON to finish its work as quickly as possible, we turned off their app and set the FAST_START_PARALLEL_ROLLBACK parameter to HIGH, allowing SMON all the resources it needed to finish. To determine how long it would take to finish, we consulted the V$FAST_START_TRANSACTIONS view (hey, a V$ view that’s pluralized!).

select state,
undoblocksdone,
undoblockstotal,
undoblocksdone / undoblockstotal * 100
from v$fast_start_transactions;

STATE            UNDOBLOCKSDONE UNDOBLOCKSTOTAL UNDOBLOCKSDONE/UNDOBLOCKSTOTAL
---------------- -------------- --------------- ------------------------------
RECOVERING                94160           95432                     98.6671138

The result you see there was near the end of course, with only 1,272 blocks to go to completion. Once the work completed, we were able to bring the app back up.

Undone by Undo

But then, confusion struck. Runqueue started creeping up again, and all the while the question was asked, “Why didn’t this problem affect us until the afternoon?” Soon contention was back up on the system with the ‘enq: US - contention’ wait in the forefront. Even though UNDO fast start parallel rollback was finished, contention was still high.

A small amount of further investigation provided the answer: background jobs overlapping while deleting from high concurrency tables. This meant that multiple jobs were attempting to do the same deletes over and over again while the app was still trying to insert into those tables, causing locks and UNDO contention. That also provided the answer to the question of delayed issues. The system was able to live with the amount of UNDO being generated during the less busy morning hours; however, once things ramped up, and with fast start parallel rollback taking so much horsepower, the structure of their system started breaking down at all levels. Previous to calling me the client also did a shutdown abort and a Dataguard failover, which unfortunately made the problem worse (more UNDO to recover). The result was a catastrophic UNDO pile up.

The Solution

In the end though, we can’t blame UNDO for the problem. It was only doing its job to preserve the integrity of the system. However, we can take away two things from this problem:

  1. In the case of excessive CPU usage due to UNDO following an instance failure, SMON can be throttled with the FAST_START_PARALLEL_ROLLBACK parameter. You can monitor the progress using the V$FAST_START_TRANSACTIONS view.
  2. Take care of your apps! Any scripts that perform DML should include a portion at the top that checks to see if it is already running (ps -ef | grep [p]urge.sh for instance). This will take care of overlapping process issues that are bound to cause contention. In addition, avoid unnecessary DML. Do you really need to insert/update/delete as much as you are? Is there another way that perhaps uses memory, or even flat files to avoid a constant barrage on the database layer?

In the end, the big conclusion is this: anything in excess can be problematic, especially when there are waits involved.

Easy Stored Procedure Output

Tuesday, January 15th, 2008

I answered a question on the DBA Forum today and I thought it was a common enough question to warrant a blog posting.

You can find the entire thread here. For my blog, I’ll post a condensed version of the question.

Question: I am new to the wonderful world of Oracle. I want to be able to view the results of a stored procedure in an output window, say out of Oracle SQL developer. Unfortunately it appears I need to write some more code to actually view the data.

On a more generic note, can anyone explain to me why Oracle has chosen to make PL/SQL inordinately more complicated than say MS SQL/Servers tSQL? I mean in tSQL I would just write:

CREATE OR REPLACE PROCEDURE TESTSPROC2
AS
select * from test_table order by id_no;
GO

and viola, a nice result set spits out in Query Analyzer (or a .net application).

Answer:

Before I go on, let me say I agree that PL/SQL is more powerful (read the original post to see the debate). That being said, here are your options.

1. Test it with REFCURSOR using a FUNCTION and selecting from dual:

SQL> create or replace function testfunc return sys_refcursor
  2  as
  3    c_test sys_refcursor;
  4  begin
  5    open c_test for select first_name, last_name, email from employees where rownum < 10;
  6    return c_test;
  7  end;
  8  /

Function created.

SQL> select testfunc() from dual;

TESTFUNC()
--------------------
CURSOR STATEMENT : 1

CURSOR STATEMENT : 1

FIRST_NAME           LAST_NAME                 EMAIL
-------------------- ------------------------- -------------------------
Steven               King                      SKING
Neena                Kochhar                   NKOCHHAR
Lex                  De Haan                   LDEHAAN
Alexander            Hunold                    AHUNOLD
Bruce                Ernst                     BERNST
David                Austin                    DAUSTIN
Valli                Pataballa                 VPATABAL
Diana                Lorentz                   DLORENTZ
Nancy                Greenberg                 NGREENBE

9 rows selected.

2. Use the same function and return it into a variable:

SQL> variable rc refcursor
SQL> exec :rc := testfunc()

PL/SQL procedure successfully completed.

SQL> print rc

FIRST_NAME           LAST_NAME                 EMAIL
-------------------- ------------------------- -------------------------
Steven               King                      SKING
Neena                Kochhar                   NKOCHHAR
Lex                  De Haan                   LDEHAAN
Alexander            Hunold                    AHUNOLD
Bruce                Ernst                     BERNST
David                Austin                    DAUSTIN
Valli                Pataballa                 VPATABAL
Diana                Lorentz                   DLORENTZ
Nancy                Greenberg                 NGREENBE

9 rows selected.

3. Use your procedure with a variable:

SQL> create or replace procedure testproc(c_test out sys_refcursor) is
  2  begin
  3    open c_test for select first_name, last_name, email from employees where rownum < 10;
  4  end;
  5  /

Procedure created.

SQL> variable rc2 refcursor
SQL> exec testproc(:rc2);

PL/SQL procedure successfully completed.

SQL> print rc2

FIRST_NAME           LAST_NAME                 EMAIL
-------------------- ------------------------- -------------------------
Steven               King                      SKING
Neena                Kochhar                   NKOCHHAR
Lex                  De Haan                   LDEHAAN
Alexander            Hunold                    AHUNOLD
Bruce                Ernst                     BERNST
David                Austin                    DAUSTIN
Valli                Pataballa                 VPATABAL
Diana                Lorentz                   DLORENTZ
Nancy                Greenberg                 NGREENBE

9 rows selected.

#3 is more in-line with your original needs. Personally I’m a fan of #1 and #2 because of the capabilities of returning a refcursor as a function, like passing it into DBMS_XMLGEN.GETXML.

When errors tell you absolutely nothing!

Thursday, December 27th, 2007

The task was simple…

Oh yes, hello by the way. It has been a while since I’ve posted!

Anyways, like I said, the task was simple. Migrate an Oracle 10.1.0.3 32-bit database on one server to 10.2.0.3 64-bit on another server, and do it with only 1 hour of downtime. Migrations like this are daunting sometimes to company-employed DBAs since the number of migrations per database release are usually kept to a minimum. However, in the consulting world, these sort of tasks are old hat, and we’ve got lots of tricks up our sleeves to get a database from one box to another without resorting to the old, tired, and painfully slow exp/imp.

So in this case, I set myself up a task list that would accomplish the necessary goal:

  1. Back up the production database (10.1.0.3 32-bit) via RMAN to a NAS array (we’ll just call it /nas)
  2. Create an instance on the new server (64-bit 10.2.0.3) and restore the controlfile from /nas
  3. Restore the database to the new server
  4. Start the database on the new server in mount mode
  5. Copy archive logs from the old server periodically, catalog them on the new server and recover database over and over until you’re ready to migrate
  6. Shut down production, start it in restricted mode, archive log current, and shut it down again
  7. Apply the final archive log to the new server and “alter database open resetlogs upgrade”
  8. Run @?/rdbms/admin/catupgrd.sql (remember, it’s not catpatch.sql anymore)
  9. Shutdown, start up, and run utlirp.sql and utlrp.sql to change packages from 32-bit to 64-bit
  10. All finished!

10 points to anyone who can figure out the missing step there without reading any further. One missing step caused #8 (running catupgrd.sql) to fail miserably.

Everything went very smoothly until step #8. When running that step, I received the following error after about 5 minutes:

ORA-20000: Insufficient privileges to analyze an object in Dictionary Schema
ORA-06512: at "SYS.DBMS_STATS", line 13323
ORA-06512: at "SYS.DBMS_STATS", line 13649
ORA-06512: at "SYS.DBMS_STATS", line 15985
ORA-06512: at "SYS.DBMS_STATS", line 16027
ORA-06512: at line 2

That was an odd error, considering I was logged in as SYS. Just to be overly safe and perform a step I never should have had to do, I explicitly granted ANALYZE ANY and ANALYZE ANY DICTIONARY to SYS, just in case. I ran it once more, and received the same error.

Further investigation showed that the error occurred while running cmpdbmig.sql, during this command:

execute dbms_registry.gather_stats(null); 

To diagnose the error, I decided to see if dbms_stats had the same issue.

BEGIN dbms_stats.gather_table_stats('SYS', 'SOURCE$'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "SYS"."SOURCE$", insufficient privileges or
does not exist
ORA-06512: at "SYS.DBMS_STATS", line 13046
ORA-06512: at "SYS.DBMS_STATS", line 13076
ORA-06512: at line 1

Sure enough, I couldn’t gather statistics on the SOURCE$ table even though I was logged in as SYS. This error was a little different, saying that I had insufficient privileges OR the object did not exist.

SQL> show user
USER is "SYS"

SQL> desc source$
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OBJ#                                      NOT NULL NUMBER
 LINE                                      NOT NULL NUMBER
 SOURCE                                             VARCHAR2(4000)

As expected, the object exists and I’m SYS. I tried everything I could think of, but under no circumstances could I make the DBMS_STATS package work against SYS objects. I thought that perhaps I had gotten a little too clever with my upgrade strategy.

Finally, it dawned on me that there was one step I had missed before running the catupgrd.sql script: I did not add a tempfile to the TEMP tablespace! Remember always, if you restore from backup, there WILL NOT BE a temporary tablespace anymore. It is not included in your backups!

SQL> alter tablespace temp add tempfile '/u01/app/oracle/oradata/temp01.dbf' size 1000M;

Tablespace altered.

SQL> exec dbms_stats.gather_table_stats('SYS', 'SOURCE$');

PL/SQL procedure successfully completed.

As if by magic, it suddenly works! All the “insufficient privilege” errors put to rest with the creation of a temporary tablespace file. I reran catupgrd.sql, did the final steps, and the migration went off without a hitch!

As such, I hereby submit that the error should be rewritten to:

ORA-20000: Insufficient privileges to analyze an object in Dictionary Schema or something like that, or perhaps not like that whatsoever. Please contact the internet.

The Oracle 11g Result Cache

Saturday, October 6th, 2007

I just wanted to take a minute to point any of my readers (hears crickets chirping) to a great article on the DBA-Oracle site regarding the Oracle 11g Result Cache.

Take the time to have a read, it’s filled with great information, comparisons, and commentary. As always, make sure to note the caveats (near the bottom of the article) and always test new features on a development environment before you use them!

Say Goodbye to LIKE, and Hello to Oracle Text

Wednesday, September 19th, 2007

Don Burleson recently posted a great article on using Oracle Text indexes to make up for the horrible performance issues caused by ‘%LIKE%’ queries on large blocks of text.

I have personally been enjoying keyword searching CONTEXT indexes for quite some time through its various incarnations as the Context Option, Intermedia Option, and now Oracle Text.

So let’s expand a little bit on Don’s article and talk about some more features; namely, the XML searching capabilities of Oracle Text.

Making It Happen

First, let’s create a table to hold some car data, and insert three rows.


SQL> create table car (id number, make varchar2(20), model varchar2(30), color varchar2(10), description varchar2(255));

Table created.

SQL> insert into car values (1, 'Honda', 'Civic', 'Blue', 'A fun car to drive');

1 row created.

SQL> insert into car values (2, 'Jeep', 'Wrangler', 'Green', 'Better than a Honda');

1 row created.

SQL> insert into car values (3, 'Chevrolet', 'Camaro', 'Black', 'Good when you''re feeling blue');

1 row created.

SQL> commit;

Commit complete.

Note that Car 1 has the color “blue,” but Car 3 has “blue” in the description. Also note that Car 1 has a MAKE of “Honda” but Car 2 has “Honda” in the description.

Now let’s create a table from this data to hold XML for each row:


SQL> create table car_xml as select id, dbms_xmlgen.getxml('select * from car where id = ' || id) xml from car;

Table created.

SQL> desc car_xml
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 XML                                                CLOB

SQL> select * from car_xml where rownum < 2;

        ID
----------
XML
--------------------------------------------------------------------------------
         1
<?xml version="1.0"?>
<ROWSET>
 <ROW>
  <ID>1</ID>
  <MAKE>Honda</MAKE>
  <MODEL>Civic</MODEL>
  <COLOR>Blue</COLOR>
  <DESCRIPTION>A fun car to drive</DESCRIPTION>
 </ROW>
</ROWSET>

As you can see, Oracle has assigned column tags for each column of the row.

Lastly, we create the Oracle Text index:


SQL> create index car_xml_text_idx on car_xml(xml) indextype is ctxsys.context parameters ('section group ctxsys.auto_section_group');

Index created.

Note that we have added a parameter to our context index. This parameter is SECTION GROUP, and we have assigned the AUTO_SECTION_GROUP. This section group will take our data from the column and actually parse it internally based on standard HTML style tags.

Querying XML


SQL> select id from car_xml where contains(xml, 'blue') > 0;

        ID
----------
         1
         3

Notice that when we searched for the word “blue,” Oracle returned rows 1 and 3. Row 1 has the color blue, while row 3 mentions blue in the description. How can we keep searching all columns when we want to, but narrow our search as needed? Search WITHIN.


SQL> select id from car_xml where contains(xml, 'blue within color') > 0;

        ID
----------
         1

Notice that now when we use the CONTAINS clause, we specify that “blue” must appear “within color.” This instructs Oracle to ONLY return rows where the word “blue” can be found between begin and end COLOR tags.

To illustrate this example once more, we’ll try “honda.”


SQL> select id from car_xml where contains(xml, 'honda') > 0;

        ID
----------
         1
         2

SQL> select id from car_xml where contains(xml, 'honda within make') > 0;

        ID
----------
         1

Once again, we have narrowed our search with a WITHIN cause. Sometimes, it is very handy to be able to search an entire table’s contents for a keyword, while other times drilling down is much more appropriate.

You may have also noticed that the Oracle Text index is case insenstive. Though I searched for ‘honda’ in lowercase, it picked up ‘Honda’ in uppercase in both columns.

Better still, there are many more features included with Oracle Text such as synonym searching, preferred terms, fuzzy searches, highlighting, and more, but we will save those for another post.

Log Buffer #60: a Carnival of the Vanities for DBAs

Friday, August 31st, 2007

It has been a long time since I have posted on here (work and all, sorry about that), and so when I was asked to do another Log Buffer, it sounded like a great way to get back into the swing of things.

Not to say work is light; on the contrary, I couldn’t be busier! The sometimes wild world of database technology has literally been exploding. This brings us, of course, to the news.

The top news on the minds of most Oracle DBAs is the release of Oracle 11g for Linux. This action packed SQL to Oracle 10g (get it?) includes some amazing tools, such as the Query Result Cache. The Query Result Cache is a new area of the shared pool that speeds up often-run queries tremendously by caching their results. However, all good things come at a price, as Alex Fatkulin points out. It will be interesting to see how this feature pans out once more people get their hands on it.

Continuing on with the Oracle 11g theme, Laurent Schneider blogs about an outstanding feature in Oracle 11g that logs the errors produced by scripts that you have run. Be sure to check it out, even if you think your code is perfect. I couldn’t begin to cound the times this would have come in handy.

Lastly on 11g (for now), Jeremy Schneider has a great post that gives an overview on many of the new features included in the latest release. Even if you have been working with 11g for a while now, this is a great recap from the past week of the new functionality at your fingertips.

Eddie Awad reminds us of an important rule that applies no matter which database we use or what language we program: self documenting code is not enough!

Interested in keeping your MySQL database safe in the case of a disaster (I sure hope so)? Have a look at this post from a MySQL DBA that was once an Oracle DBA.

When does revoking a permission grant a permission? When it’s MySQL! This post, also by the MySQL DBA that was once an Oracle DBA, talks about how permissions work in MySQL, and what to watch out for when you plan your security structure.

Are you considering being a MySQL DBA that was once an Oracle DBA yourself? It can’t hurt to understand a wide variety of database architectures. An article by Paul Vallee explains why Oracle and SQL Server DBAs probably want to learn MySQL.

If you are interested in using BLOB objects in multiple engines in MySQL, have a look at this article by MooCow Productions, which shows some testing on performance depending on the engine you choose.

Having problems with variables in your scripted SQL Server tasks? The SSIS Junkie clues us in to a possible bug received after installing several patches.

I think that last time I did a log buffer, I posted something about dates and times in SQL Server. Once again another great article on the same topic has crossed my desk (browser), and this time it is an excellent article on using the DATETIME datatype for more robust functionality from Jeff Smith.

If you are on PostgreSQL waiting on your posts, you’ll be happy to see some tips and tricks by depesz that will help you bring back your queries in the order you are hoping for, which is not always the order that you get!

For those of you that think there should be more PostgreSQL posts in this Log Buffer, I invite you to remember that PostgreSQL is not a democracy, as noted in this article by Robert Treat.

And in the interest of multi-database posting, let’s not forget DB2! In fact, if you are interested in DB2 you can read this article and join up with other DB2 users on Facebook, as suggested by Chris Eaton.

But I can’t resist going back to Oracle. Tanel Poder has posted an outstanding article on advanced Oracle troubleshooting when the wait interface is not enough. Consequently, this is his first post on the new blog, and I hope to see more great articles like this in the future.

Sitting in your cube staring at top, wondering what to do next? Take a break and listen to three geeks talk about Oracle 11g.

Some people think that tuning is all about adding indexes on your WHERE clause columns. Think again! The MySQL Performance Blog posts about cases where an index might not be such a hot idea. This applies to Oracle as well of course, where sometimes a b-tree index just won’t cut it.

Kevin Closson has written an article that delves into 11g Automatic Memory Management (AMM) and Linux Hugepages support. This is a good read and sheds a great deal of light on both features, and how to determine the best course of action for your 11g implementation.

And to wrap all this up with some highly political debate, Howard Rodgers posts about a topic that may be of interest to some in the Oracle Community: is the Oracle ACE program devalued? Does being an Oracle ACE really prove anything about you as a DBA? Reading through this controversial entry and its comments, it would seem that some people feel it is nothing more than a title. As a note, I am proud to call myself an Oracle ACE and would hope that the ACE community continues to be held in high regard.

That wraps up this edition of Log Buffer! May your queries be tuned and your memory be plentiful!

Oracle Parameters You’d Love to See

Tuesday, May 1st, 2007

Have you ever just WISHED that a parameter existed in Oracle? I know I have, and old coworkers of mine have long joked about the need for things like the “fix_database.sh” script.

So I ask you, what parameters would you just LOVE to see show up by Oracle 20rr (Really Robust)? I’ll start:

  1. _allow_all_corruption = TRUE|FALSE
  2. make_cbo_work = TRUE|FALSE
  3. ignore_bad_queries = TRUE|FALSE
  4. ignore_user_list = (comma delimited list of users)
  5. auto_create_sr_on_error = TRUE|FALSE

Oh, and also two SQL*Plus commands I’d like:

  1. set alias for select = sleect
  2. set tab complete on

Any other ideas?

SCD: Slowly Changing Dimensions or Schrödinger’s Cat Dilemma?

Friday, April 6th, 2007

In the world of theoretical physics there is a theoretical cat owned by Erwin Schrödinger. This famously fickle feline posed a problem: he could not decide if he was alive or dead.

In the Schrödinger’s Cat thought experiment, a living cat is placed in a box with a nefarious device. This device consists of a Geiger counter, an extremely small radioactive substance, and some acid. The radioactive substance has a 50/50 chance of decaying. If it does, the flask of acid will burst and the cat will die. If it does not, the flask will remain intact and the cat will live. But because the box is closed (and soundproof), we cannot conclude the final state of the cat, and the formulas of quantum mechanics would tell us that the cat is both living and dead.

The original translated text (Erwin Shrödinger, “The Present Situation in Quantum Mechanics,” Naturwissenschaften, 1935) is as follows:

One can even set up quite ridiculous cases. A cat is penned up in a steel chamber, along with the following device (which must be secured against direct interference by the cat): in a Geiger counter there is a tiny bit of radioactive substance, so small, that perhaps in the course of the hour one of the atoms decays, but also, with equal probability, perhaps none; if it happens, the counter tube discharges and through a relay releases a hammer which shatters a small flask of hydrocyanic acid. If one has left this entire system to itself for an hour, one would say that the cat still lives if meanwhile no atom has decayed. The ψ-function of the entire system would express this by having in it the living and dead cat (pardon the expression) mixed or smeared out in equal parts.

There are three popular interpretations of this experiment (don’t worry; we’ll see what this has to do with Oracle soon):

The Copenhagen Interpretation

In this interpretation, the results of the test are not complete until an observation takes place. Measurement, or observation (whether by a conscious source such as a physicist, or a non-conscious source such as a mechanical device) will eliminate one of the two parts of the superposition (|decayed atom, cat dies> + |non-decayed atom, cat lives>), and the wave containing one possibility or the other will collapse, resulting in a single post-observation outcome. Whether the cat is alive or dead inside the box prior to observation is irrelevant; the fact remains that when you look into the box, the cat will either go on to chase theoretical mice or will be given a proper theoretical burial.

The Many Worlds Interpretation

The many worlds interpretation says that at the time an observation occurs, it does not collapse waveforms but decoheres the possible outcomes. Because the physicist (or device, no special relevance is given to conscious beings) has become entangled with the quantum states of the cat, observer-states are formed for both possibilities and multiple universes are spawned of the possible outcomes. That is to say that a universe will exist where the cat is alive and a universe will also exist where the cat is dead. These universes could theoretically spawn from every true-or-false outcome in existence, resulting in an unthinkable multiverse in which all binary possibilities have been played out in one form or another. A true-or-false decision (|alive> or |dead>) would result in two universes, each with the opportunity to spawn later true-or-false decisions and therefore more opportunities to create additional universes. I can’t help but think of Douglas Adams’ Infinite Improbability Drive from Hitchhiker’s Guide to the Galaxy when I think about this interpretation!

The Many Histories Interpretation

This interpretation is something of a mix of the previous two. It agrees with the many worlds interpretation that waveforms do not completely collapse, but disagrees that this will go on to form multiple futures. When an observation/measurement is made, the observer becomes entangled with the state of the cat and decoherence takes place, just as with the many worlds interpretation. However, the state that did not “really” happen is ignored. The waveforms associated with this possibility still exist, but are seen as inaccessible as only one true future can come of any true-or-false situation once an observation is made.

What the hell does this have to do with Oracle?

Oracle is only a computer program, and therefore does exactly what it is coded to do (most of the time). Quantum mechanics really do not seem to have any relevance in the Oracle world because it is coded with specific outcomes to every true-or-false question. Logic gates are applied to bits, and the programmed outcome takes place. Non-programmed outcomes result in errors (ORA-600, ORA-7445).

However, things get more complex when we are modeling new environments because we end up dealing with conceptual outcomes with multiple possibilities. This is especially true in dimensional modeling, because dimensions are supposed to hold factual lookup data. How can we record a fact when important data regarding that fact may change?

For example, imagine a tax company that keeps track of deductions found in their various offices and wishes to form a star schema that will allow past and present analysis. Its (incomplete) star schema may resemble the picture shown here:

Tax Deduction Star Schema

Let CLIENT_DIM record = E. Schrödinger, 3 dependents
Let TIME_DIM record = 2006 tax year
Let LOCATION_DIM record = Virginia Beach, VA

The data will come together in our fact table to show that he had $25,000 in deductions for this combination of dimensional data. This “fact” is currently safe. During the next year, E. Schrödinger and his wife have a lovely baby boy (at least in this universe!).

Let CLIENT_DIM record = E. Schrödinger, 4 dependents
Let TIME_DIM record = 2007 tax year
Let LOCATION_DIM record = Virginia Beach, VA

The data will once again come together in our fact table, this time reporting $28,000 in deductions (I’m not a CPA, don’t get me on tax code) due to the extra dependent. This “fact” is currently safe, just as the last one was.

However, we now have a paradox in our data. In 2007, we report $28,000 in deductions, which was based on the fact that E. Schrödinger had four dependents. In 2006, we reported $25,000 in deductions due to E. Schrödinger’s 3 dependents. When we run our analytic reports for 2007, we will get great results; we will be able to break down the deductions and the number of dependents will play a proper role in these calculations. But when we run our reports against 2006, the deduction calculations will not compute properly. The CLIENT_DIM record will show 4 dependents, but the deduction amount for the 2006 year will have been based upon 3 dependents. Our dimensional data (CLIENT_DIM record) changed over time.

This is known as a slowly changing dimension (or SCD). Though we may not realize it, almost every dimension is in fact slowly changing; stores may move, clients may die (especially if they hang out with sadistic quantum physicists with ready supplies of hydrocyanic acid), and even our human definition of time can change over time (consider the changes to daylight savings this year). However, we don’t have to worry about all of these possible changes; we only have to worry about the ones pertaining to the facts on which we are attempting to report. Our business needs, in the end, determine which dimensions must be slowly changing.

There are three types of slowly changing dimensions: Type 1, Type 2, and Type 3. Each of these types tries to help the designer of the star schema eliminate paradox from their dimensional model (just as the three interpretations of the Schrödinger’s Cat thought experiment tries to eliminate the paradox of the living dead).

  • Type 1: Overwrite the old value with the new value and call it a day. This is very useful when dealing with issues such as typos on the client’s name. We don’t care about the history in this case because it was incorrect anyways.
  • Type 2: Create a new record in the dimension with a new primary key. In the example we’ve given, there would be two records in CLIENT_DIM for E. Schrödinger, one in which he has 3 dependents and one in which he has 4. Though he is one person from the business point of view, he is two people from a dimensional point of view.
  • Type 3: Overwrite the old value with the new value, and add additional data to the table such as the effective date of the change. This type of SCD resolution would be beneficial if there is a change that can happen once and only once (such as death).

These three types of SCD resolution usually help in resolving changes to “factual” lookup data. However, we can see clear correlations between these three types of resolutions and the three interpretations of the Schrödinger’s Cat thought experiment!

Bringing it all together

SCD Management Type 1 clearly matches up with the Copenhagen Interpretation of the Schrödinger’s Cat thought experiment. In the Copenhagen Interpretation, the state of the cat changes and all other states are discarded as the waveform collapses. Criticism to this interpretation applies as well to Type 1 SCD resolution. The Copenhagen Interpretation ignores the possibility of reconstruction; in quantum mechanics, it must be possible to return to any original state before measurement was taken place. In our star schema, it also ignores the possibility of reconstruction; we will not be able to return to the original state or even acknowledge that a previous state existed for the purpose of analytics.

SCD Management Type 2 matches with the Many Worlds Interpretation of the Schrödinger’s Cat thought experiment. Instead of completely destroying the other possible waveform, we simply maintain that the two possibilities decohere and form their own universes that will no longer share any correlation. This interpretation shares a similar problem with Type 2 SCD resolution. By spawning new records (universes) as the outcome of a changing event, we create multiple possibilities that no longer share any correlation. For instance, if E. Schrödinger has a new dependent, there will be two resulting rows: one in which has 3 dependents and one in which he has 4. If he then legally changes his last name to Schroedinger, we will have to record that change since it is important for tax records. Because of this, we will have three total records for this one client (E. Schrödinger with 3 dependents, E. Schrödinger with 4 dependents, and E. Schroedinger with 4 dependents). These three records will not have any correlation unless we create some sort of superkey that properly identifies a single person and their many instances. This will be important if we will be doing mining that incorporates multiple times, clients, and locations in our analysis.

SCD Management Type 3 matches with the Many Histories Interpretation of the Schrödinger’s Cat thought experiment. When an observed outcome occurs (like the birth of a new child), the old record is changed to reflect the new “real” outcome; in this case, the addition of a single dependent. However, the change is noted by the addition of a column such as an effective date, to show that this is not the only outcome that has ever existed, but it is THE outcome that does exist at this time. In effect, the old outcome (3 dependents) still exists, but is discarded, ignored, and irretrievable now that the outcomes are decoherent. In our star schema, this type of resolution will only provide us with a confusing result of “This is the case now, but it was not always so. It changed on ….” For some situations, as with Type 1 resolution, this will suffice, such as the death of a client. We only need to record that death once. However, if any new data enters our model after the death of the client (post-mortem taxes?) or if the status changes again (miraculous recovery!), we may have unreliable report output.

Conclusion

Just as a note, I understand that the Schrödinger’s Cat thought experiment and Slowly Changing Dimensions are not a true match. In quantum mechanics, we are calculating multiple events that happen at the same exact point in time (or over a period of unknown time); whereas in a data warehouse we are dealing with history and fully acknowledge the changes time may have on our data. However, one cannot help but notice the correlation between the two forms of paradox and their resolutions. In fact, in data warehousing the so-called Schrödinger’s Cat paradox becomes even more problematic because we are forced to not only predict future outcomes based on dimensional data, but to also report on past/present information based on the same data. Physicists attempting to provide interpretations of this thought experiment only have to worry about future conditions; the past and present are unmeasured, and therefore have no relevance on the problem except that they are in a quantum state. They seek to explain the future of the cat once observation/decoherence have taken place. If Schrödinger’s Cat were in a data warehouse, we would have to analyze the entire life of the cat, the cat inside that horrible box of doom, and try to figure out whether the cat will want dry food, wet food, or a good burial place after the experiment is finished.

Socket Based Licensing for Standard Edition

Thursday, March 29th, 2007

I ran across this article today on Oracle’s new strategy for Standard Edition and Standard Edition One licensing.

As a consultant, there are times when I have to tackle licensing questions, though I usually try to stay away from it and let Oracle’s salespeople do that job.

However, I found this news quite interesting. In order to compete with Microsoft and other database vendors in the small and midsize business market, Oracle has changed their licensing model on Standard Edition and Standard Edition One. Instead of a per-core licensing scheme, licensing for these products is now on a per-socket basis. Not only does this make pricing cheaper on multi-core systems, it also makes things much easier for the accounting department!

Of course, it’s still the same for Enterprise Edition (the article calls it tortuous), but perhaps that will change soon as well. Hey, at least we’re no longer using Power Units!