Take the time to have a read, it’s filled with great information, comparisons, and commentary. As always, make sure to note the caveats (near the bottom of the article) and always test new features on a development environment before you use them!
I have personally been enjoying keyword searching CONTEXT indexes for quite some time through its various incarnations as the Context Option, Intermedia Option, and now Oracle Text.
So let’s expand a little bit on Don’s article and talk about some more features; namely, the XML searching capabilities of Oracle Text.
Making It Happen
First, let’s create a table to hold some car data, and insert three rows.
SQL> create table car (id number, make varchar2(20), model varchar2(30), color varchar2(10), description varchar2(255));
Table created.
SQL> insert into car values (1, 'Honda', 'Civic', 'Blue', 'A fun car to drive');
1 row created.
SQL> insert into car values (2, 'Jeep', 'Wrangler', 'Green', 'Better than a Honda');
1 row created.
SQL> insert into car values (3, 'Chevrolet', 'Camaro', 'Black', 'Good when you''re feeling blue');
1 row created.
SQL> commit;
Commit complete.
Note that Car 1 has the color “blue,” but Car 3 has “blue” in the description. Also note that Car 1 has a MAKE of “Honda” but Car 2 has “Honda” in the description.
Now let’s create a table from this data to hold XML for each row:
SQL> create table car_xml as select id, dbms_xmlgen.getxml('select * from car where id = ' || id) xml from car;
Table created.
SQL> desc car_xml
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NUMBER
XML CLOB
SQL> select * from car_xml where rownum < 2;
ID
----------
XML
--------------------------------------------------------------------------------
1
<?xml version="1.0"?>
<ROWSET>
<ROW>
<ID>1</ID>
<MAKE>Honda</MAKE>
<MODEL>Civic</MODEL>
<COLOR>Blue</COLOR>
<DESCRIPTION>A fun car to drive</DESCRIPTION>
</ROW>
</ROWSET>
As you can see, Oracle has assigned column tags for each column of the row.
Lastly, we create the Oracle Text index:
SQL> create index car_xml_text_idx on car_xml(xml) indextype is ctxsys.context parameters ('section group ctxsys.auto_section_group');
Index created.
Note that we have added a parameter to our context index. This parameter is SECTION GROUP, and we have assigned the AUTO_SECTION_GROUP. This section group will take our data from the column and actually parse it internally based on standard HTML style tags.
Querying XML
SQL> select id from car_xml where contains(xml, 'blue') > 0;
ID
----------
1
3
Notice that when we searched for the word "blue," Oracle returned rows 1 and 3. Row 1 has the color blue, while row 3 mentions blue in the description. How can we keep searching all columns when we want to, but narrow our search as needed? Search WITHIN.
SQL> select id from car_xml where contains(xml, 'blue within color') > 0;
ID
----------
1
Notice that now when we use the CONTAINS clause, we specify that "blue" must appear "within color." This instructs Oracle to ONLY return rows where the word "blue" can be found between begin and end COLOR tags.
To illustrate this example once more, we'll try "honda."
SQL> select id from car_xml where contains(xml, 'honda') > 0;
ID
----------
1
2
SQL> select id from car_xml where contains(xml, 'honda within make') > 0;
ID
----------
1
Once again, we have narrowed our search with a WITHIN cause. Sometimes, it is very handy to be able to search an entire table's contents for a keyword, while other times drilling down is much more appropriate.
You may have also noticed that the Oracle Text index is case insenstive. Though I searched for 'honda' in lowercase, it picked up 'Honda' in uppercase in both columns.
Better still, there are many more features included with Oracle Text such as synonym searching, preferred terms, fuzzy searches, highlighting, and more, but we will save those for another post.
It has been a long time since I have posted on here (work and all, sorry about that), and so when I was asked to do another Log Buffer, it sounded like a great way to get back into the swing of things.
Not to say work is light; on the contrary, I couldn’t be busier! The sometimes wild world of database technology has literally been exploding. This brings us, of course, to the news.
The top news on the minds of most Oracle DBAs is the release of Oracle 11g for Linux. This action packed SQL to Oracle 10g (get it?) includes some amazing tools, such as the Query Result Cache. The Query Result Cache is a new area of the shared pool that speeds up often-run queries tremendously by caching their results. However, all good things come at a price, as Alex Fatkulin points out. It will be interesting to see how this feature pans out once more people get their hands on it.
Continuing on with the Oracle 11g theme, Laurent Schneider blogs about an outstanding feature in Oracle 11g that logs the errors produced by scripts that you have run. Be sure to check it out, even if you think your code is perfect. I couldn’t begin to cound the times this would have come in handy.
Lastly on 11g (for now), Jeremy Schneider has a great post that gives an overview on many of the new features included in the latest release. Even if you have been working with 11g for a while now, this is a great recap from the past week of the new functionality at your fingertips.
When does revoking a permission grant a permission? When it’s MySQL! This post, also by the MySQL DBA that was once an Oracle DBA, talks about how permissions work in MySQL, and what to watch out for when you plan your security structure.
If you are interested in using BLOB objects in multiple engines in MySQL, have a look at this article by MooCow Productions, which shows some testing on performance depending on the engine you choose.
I think that last time I did a log buffer, I posted something about dates and times in SQL Server. Once again another great article on the same topic has crossed my desk (browser), and this time it is an excellent article on using the DATETIME datatype for more robust functionality from Jeff Smith.
If you are on PostgreSQL waiting on your posts, you’ll be happy to see some tips and tricks by depesz that will help you bring back your queries in the order you are hoping for, which is not always the order that you get!
And in the interest of multi-database posting, let’s not forget DB2! In fact, if you are interested in DB2 you can read this article and join up with other DB2 users on Facebook, as suggested by Chris Eaton.
And to wrap all this up with some highly political debate, Howard Rodgers posts about a topic that may be of interest to some in the Oracle Community: is the Oracle ACE program devalued? Does being an Oracle ACE really prove anything about you as a DBA? Reading through this controversial entry and its comments, it would seem that some people feel it is nothing more than a title. As a note, I am proud to call myself an Oracle ACE and would hope that the ACE community continues to be held in high regard.
That wraps up this edition of Log Buffer! May your queries be tuned and your memory be plentiful!
Have you ever just WISHED that a parameter existed in Oracle? I know I have, and old coworkers of mine have long joked about the need for things like the “fix_database.sh” script.
So I ask you, what parameters would you just LOVE to see show up by Oracle 20rr (Really Robust)? I’ll start:
_allow_all_corruption = TRUE|FALSE
make_cbo_work = TRUE|FALSE
ignore_bad_queries = TRUE|FALSE
ignore_user_list = (comma delimited list of users)
In the world of theoretical physics there is a theoretical cat owned by Erwin Schrödinger. This famously fickle feline posed a problem: he could not decide if he was alive or dead.
In the Schrödinger’s Cat thought experiment, a living cat is placed in a box with a nefarious device. This device consists of a Geiger counter, an extremely small radioactive substance, and some acid. The radioactive substance has a 50/50 chance of decaying. If it does, the flask of acid will burst and the cat will die. If it does not, the flask will remain intact and the cat will live. But because the box is closed (and soundproof), we cannot conclude the final state of the cat, and the formulas of quantum mechanics would tell us that the cat is both living and dead.
One can even set up quite ridiculous cases. A cat is penned up in a steel chamber, along with the following device (which must be secured against direct interference by the cat): in a Geiger counter there is a tiny bit of radioactive substance, so small, that perhaps in the course of the hour one of the atoms decays, but also, with equal probability, perhaps none; if it happens, the counter tube discharges and through a relay releases a hammer which shatters a small flask of hydrocyanic acid. If one has left this entire system to itself for an hour, one would say that the cat still lives if meanwhile no atom has decayed. The ψ-function of the entire system would express this by having in it the living and dead cat (pardon the expression) mixed or smeared out in equal parts.
There are three popular interpretations of this experiment (don’t worry; we’ll see what this has to do with Oracle soon):
The Copenhagen Interpretation
In this interpretation, the results of the test are not complete until an observation takes place. Measurement, or observation (whether by a conscious source such as a physicist, or a non-conscious source such as a mechanical device) will eliminate one of the two parts of the superposition (|decayed atom, cat dies> + |non-decayed atom, cat lives>), and the wave containing one possibility or the other will collapse, resulting in a single post-observation outcome. Whether the cat is alive or dead inside the box prior to observation is irrelevant; the fact remains that when you look into the box, the cat will either go on to chase theoretical mice or will be given a proper theoretical burial.
The Many Worlds Interpretation
The many worlds interpretation says that at the time an observation occurs, it does not collapse waveforms but decoheres the possible outcomes. Because the physicist (or device, no special relevance is given to conscious beings) has become entangled with the quantum states of the cat, observer-states are formed for both possibilities and multiple universes are spawned of the possible outcomes. That is to say that a universe will exist where the cat is alive and a universe will also exist where the cat is dead. These universes could theoretically spawn from every true-or-false outcome in existence, resulting in an unthinkable multiverse in which all binary possibilities have been played out in one form or another. A true-or-false decision (|alive> or |dead>) would result in two universes, each with the opportunity to spawn later true-or-false decisions and therefore more opportunities to create additional universes. I can’t help but think of Douglas Adams’ Infinite Improbability Drive from Hitchhiker’s Guide to the Galaxy when I think about this interpretation!
The Many Histories Interpretation
This interpretation is something of a mix of the previous two. It agrees with the many worlds interpretation that waveforms do not completely collapse, but disagrees that this will go on to form multiple futures. When an observation/measurement is made, the observer becomes entangled with the state of the cat and decoherence takes place, just as with the many worlds interpretation. However, the state that did not “really” happen is ignored. The waveforms associated with this possibility still exist, but are seen as inaccessible as only one true future can come of any true-or-false situation once an observation is made.
What the hell does this have to do with Oracle?
Oracle is only a computer program, and therefore does exactly what it is coded to do (most of the time). Quantum mechanics really do not seem to have any relevance in the Oracle world because it is coded with specific outcomes to every true-or-false question. Logic gates are applied to bits, and the programmed outcome takes place. Non-programmed outcomes result in errors (ORA-600, ORA-7445).
However, things get more complex when we are modeling new environments because we end up dealing with conceptual outcomes with multiple possibilities. This is especially true in dimensional modeling, because dimensions are supposed to hold factual lookup data. How can we record a fact when important data regarding that fact may change?
For example, imagine a tax company that keeps track of deductions found in their various offices and wishes to form a star schema that will allow past and present analysis. Its (incomplete) star schema may resemble the picture shown here:
Let CLIENT_DIM record = E. Schrödinger, 3 dependents
Let TIME_DIM record = 2006 tax year
Let LOCATION_DIM record = Virginia Beach, VA
The data will come together in our fact table to show that he had $25,000 in deductions for this combination of dimensional data. This “fact” is currently safe. During the next year, E. Schrödinger and his wife have a lovely baby boy (at least in this universe!).
Let CLIENT_DIM record = E. Schrödinger, 4 dependents
Let TIME_DIM record = 2007 tax year
Let LOCATION_DIM record = Virginia Beach, VA
The data will once again come together in our fact table, this time reporting $28,000 in deductions (I’m not a CPA, don’t get me on tax code) due to the extra dependent. This “fact” is currently safe, just as the last one was.
However, we now have a paradox in our data. In 2007, we report $28,000 in deductions, which was based on the fact that E. Schrödinger had four dependents. In 2006, we reported $25,000 in deductions due to E. Schrödinger’s 3 dependents. When we run our analytic reports for 2007, we will get great results; we will be able to break down the deductions and the number of dependents will play a proper role in these calculations. But when we run our reports against 2006, the deduction calculations will not compute properly. The CLIENT_DIM record will show 4 dependents, but the deduction amount for the 2006 year will have been based upon 3 dependents. Our dimensional data (CLIENT_DIM record) changed over time.
This is known as a slowly changing dimension (or SCD). Though we may not realize it, almost every dimension is in fact slowly changing; stores may move, clients may die (especially if they hang out with sadistic quantum physicists with ready supplies of hydrocyanic acid), and even our human definition of time can change over time (consider the changes to daylight savings this year). However, we don’t have to worry about all of these possible changes; we only have to worry about the ones pertaining to the facts on which we are attempting to report. Our business needs, in the end, determine which dimensions must be slowly changing.
There are three types of slowly changing dimensions: Type 1, Type 2, and Type 3. Each of these types tries to help the designer of the star schema eliminate paradox from their dimensional model (just as the three interpretations of the Schrödinger’s Cat thought experiment tries to eliminate the paradox of the living dead).
Type 1: Overwrite the old value with the new value and call it a day. This is very useful when dealing with issues such as typos on the client’s name. We don’t care about the history in this case because it was incorrect anyways.
Type 2: Create a new record in the dimension with a new primary key. In the example we’ve given, there would be two records in CLIENT_DIM for E. Schrödinger, one in which he has 3 dependents and one in which he has 4. Though he is one person from the business point of view, he is two people from a dimensional point of view.
Type 3: Overwrite the old value with the new value, and add additional data to the table such as the effective date of the change. This type of SCD resolution would be beneficial if there is a change that can happen once and only once (such as death).
These three types of SCD resolution usually help in resolving changes to “factual” lookup data. However, we can see clear correlations between these three types of resolutions and the three interpretations of the Schrödinger’s Cat thought experiment!
Bringing it all together
SCD Management Type 1 clearly matches up with the Copenhagen Interpretation of the Schrödinger’s Cat thought experiment. In the Copenhagen Interpretation, the state of the cat changes and all other states are discarded as the waveform collapses. Criticism to this interpretation applies as well to Type 1 SCD resolution. The Copenhagen Interpretation ignores the possibility of reconstruction; in quantum mechanics, it must be possible to return to any original state before measurement was taken place. In our star schema, it also ignores the possibility of reconstruction; we will not be able to return to the original state or even acknowledge that a previous state existed for the purpose of analytics.
SCD Management Type 2 matches with the Many Worlds Interpretation of the Schrödinger’s Cat thought experiment. Instead of completely destroying the other possible waveform, we simply maintain that the two possibilities decohere and form their own universes that will no longer share any correlation. This interpretation shares a similar problem with Type 2 SCD resolution. By spawning new records (universes) as the outcome of a changing event, we create multiple possibilities that no longer share any correlation. For instance, if E. Schrödinger has a new dependent, there will be two resulting rows: one in which has 3 dependents and one in which he has 4. If he then legally changes his last name to Schroedinger, we will have to record that change since it is important for tax records. Because of this, we will have three total records for this one client (E. Schrödinger with 3 dependents, E. Schrödinger with 4 dependents, and E. Schroedinger with 4 dependents). These three records will not have any correlation unless we create some sort of superkey that properly identifies a single person and their many instances. This will be important if we will be doing mining that incorporates multiple times, clients, and locations in our analysis.
SCD Management Type 3 matches with the Many Histories Interpretation of the Schrödinger’s Cat thought experiment. When an observed outcome occurs (like the birth of a new child), the old record is changed to reflect the new “real” outcome; in this case, the addition of a single dependent. However, the change is noted by the addition of a column such as an effective date, to show that this is not the only outcome that has ever existed, but it is THE outcome that does exist at this time. In effect, the old outcome (3 dependents) still exists, but is discarded, ignored, and irretrievable now that the outcomes are decoherent. In our star schema, this type of resolution will only provide us with a confusing result of “This is the case now, but it was not always so. It changed on ….” For some situations, as with Type 1 resolution, this will suffice, such as the death of a client. We only need to record that death once. However, if any new data enters our model after the death of the client (post-mortem taxes?) or if the status changes again (miraculous recovery!), we may have unreliable report output.
Conclusion
Just as a note, I understand that the Schrödinger’s Cat thought experiment and Slowly Changing Dimensions are not a true match. In quantum mechanics, we are calculating multiple events that happen at the same exact point in time (or over a period of unknown time); whereas in a data warehouse we are dealing with history and fully acknowledge the changes time may have on our data. However, one cannot help but notice the correlation between the two forms of paradox and their resolutions. In fact, in data warehousing the so-called Schrödinger’s Cat paradox becomes even more problematic because we are forced to not only predict future outcomes based on dimensional data, but to also report on past/present information based on the same data. Physicists attempting to provide interpretations of this thought experiment only have to worry about future conditions; the past and present are unmeasured, and therefore have no relevance on the problem except that they are in a quantum state. They seek to explain the future of the cat once observation/decoherence have taken place. If Schrödinger’s Cat were in a data warehouse, we would have to analyze the entire life of the cat, the cat inside that horrible box of doom, and try to figure out whether the cat will want dry food, wet food, or a good burial place after the experiment is finished.
I ran across this article today on Oracle’s new strategy for Standard Edition and Standard Edition One licensing.
As a consultant, there are times when I have to tackle licensing questions, though I usually try to stay away from it and let Oracle’s salespeople do that job.
However, I found this news quite interesting. In order to compete with Microsoft and other database vendors in the small and midsize business market, Oracle has changed their licensing model on Standard Edition and Standard Edition One. Instead of a per-core licensing scheme, licensing for these products is now on a per-socket basis. Not only does this make pricing cheaper on multi-core systems, it also makes things much easier for the accounting department!
Of course, it’s still the same for Enterprise Edition (the article calls it tortuous), but perhaps that will change soon as well. Hey, at least we’re no longer using Power Units!
I’m not a big fan of ignoring errors with OUI (we all have our horror stories), and so when I ran across this one I was loathe to ignore or cancel:
Error in invoking target 'all_no_orcl ipc_g' of makefile
Now, an experienced Oracle installer/patcher/troubleshooter/slave knows full well that most of these problems are compiler related, and this one was no different. I was patching a 10.2.0.1.0 RAC installation to 10.2.0.3.0.
To fix the problem, simply go to Oracle’s Open Source Downloads and download the latest binutils release. Install it with the following command:
Once installed, click “retry” on your OUI screen and everything should be taken care of! However, if this is a RAC system, don’t forget to install binutils on both nodes!
If you DID ignore the error and kept going with the patch, you’ll probably find that ASM does not work. To fix this, you will have to install binutils, relink all, and find a good working copy of $ORACLE_HOME/lib/libskgxp10.so from another Oracle installation to replace the one on your patched system.
The moral of the story is, of course, to fix problems before continuing whenever possible.
Errors During CATUPGRD.SQL
While I’m at it, another problem I found was during the execution of the CATUPGRD.SQL script that must be run after installing the Oracle software. I received the following error:
SELECT version_script AS file_name FROM DUAL
*
ERROR at line 1:
ORA-06575: Package or function VERSION_SCRIPT is in an invalid state
Thankfully, this was an easy enough problem to fix. Simply change some initialization parameters:
alter system set plsql_code_type string = INTERPRETED
alter system set plsql_compiler_flags string = INTERPRETED, NON_DEBUG
Rerun ?/rdbms/admin/catupgrd.sql
Or, if you really want to use native PL/SQL compilation, just set your plsql_native_library_dir and then rerun the script.
One Last Note
Make SURE you set your JAVA_POOL_SIZE and SHARED_POOL_SIZE to at least 150M before running catupgrd.sql! If you do not have a large enough shared pool, you will get errors during the install. You may be able to workaround these by flushing the shared pool (alter system flush shared_pool), but you’re better off doing it right the first time.
If you’ve ever seen the movie The Shining, you have to see this trailer!
The story, so I hear, is that a film student did this for a contest called Trailer Park. The goal is to take a movie and create a trailer for that movie that makes it seem like a completely different genre. Whatever the case, it is absolutely excellent!
Hello everyone…I know I haven’t posted in quite a while, and I’m not about to start now. Things have been extremely busy lately, starting with my 10g RAC course aboard the Carnival Glory on a 7-day Caribbean cruise (more on that later).
So I will be posting again soon, and hope to have some feedback on RMOUG (where I will be presenting three topics this week) and other things.
Until then, read the Tao of Oracle by Roby Sherman. It’s probably the single most hilarious piece of Oracle literature I’ve ever read!
I was speaking at a small User Group meeting yesterday. A crowd of about 20 was sitting at tables in a “U” configuration. About 70% of the way through the presentation, I asked the fateful question.
Me: “How many of you are DBAs?”
To my surprise, the right side of the room raised their hands. Roughly 10 people, without a single gap in the line.
Me: “How many of you are Developers?”
And the left side raised their hands, roughly 10 people, no gaps, all developers.
How the hell did the DBAs and Developers sit apart like that without any planning? They didn’t ALL know each other, they just happened to sit in such a way that they were 1) apart from each other, and 2) facing each other with the battlefield in between.
We all had a good laugh about it, but it still made me think. Is there some fundamental subconscious difference, some sort of projected vibe, between DBAs and Developers? We fight so often, but that could be easily attributed to the fact that we rely on each other to do a good job. I even did a humorous presentation/discussion on it once, an epic fairy tale explaining the peril DBAs and Developers put a company in with their bickering. You can download it here if you like.
But if it’s just the fact we work so closely yet compete, like siblings, how the heck did they all manage to sit part like that? It reminds me of the “group mind” concept from various science fiction and fantasy books (like the Borg). We are the IT Conglomerate, Developers to the left, DBAs to the right, no further communication is required, thank you.