SCD: Slowly Changing Dimensions or Schrödinger’s Cat Dilemma?

April 6th, 2007 by The Oracle Alchemist

In the world of theoretical physics there is a theoretical cat owned by Erwin Schrödinger. This famously fickle feline posed a problem: he could not decide if he was alive or dead.

In the Schrödinger’s Cat thought experiment, a living cat is placed in a box with a nefarious device. This device consists of a Geiger counter, an extremely small radioactive substance, and some acid. The radioactive substance has a 50/50 chance of decaying. If it does, the flask of acid will burst and the cat will die. If it does not, the flask will remain intact and the cat will live. But because the box is closed (and soundproof), we cannot conclude the final state of the cat, and the formulas of quantum mechanics would tell us that the cat is both living and dead.

The original translated text (Erwin Shrödinger, “The Present Situation in Quantum Mechanics,” Naturwissenschaften, 1935) is as follows:

One can even set up quite ridiculous cases. A cat is penned up in a steel chamber, along with the following device (which must be secured against direct interference by the cat): in a Geiger counter there is a tiny bit of radioactive substance, so small, that perhaps in the course of the hour one of the atoms decays, but also, with equal probability, perhaps none; if it happens, the counter tube discharges and through a relay releases a hammer which shatters a small flask of hydrocyanic acid. If one has left this entire system to itself for an hour, one would say that the cat still lives if meanwhile no atom has decayed. The ψ-function of the entire system would express this by having in it the living and dead cat (pardon the expression) mixed or smeared out in equal parts.

There are three popular interpretations of this experiment (don’t worry; we’ll see what this has to do with Oracle soon):

The Copenhagen Interpretation

In this interpretation, the results of the test are not complete until an observation takes place. Measurement, or observation (whether by a conscious source such as a physicist, or a non-conscious source such as a mechanical device) will eliminate one of the two parts of the superposition (|decayed atom, cat dies> + |non-decayed atom, cat lives>), and the wave containing one possibility or the other will collapse, resulting in a single post-observation outcome. Whether the cat is alive or dead inside the box prior to observation is irrelevant; the fact remains that when you look into the box, the cat will either go on to chase theoretical mice or will be given a proper theoretical burial.

The Many Worlds Interpretation

The many worlds interpretation says that at the time an observation occurs, it does not collapse waveforms but decoheres the possible outcomes. Because the physicist (or device, no special relevance is given to conscious beings) has become entangled with the quantum states of the cat, observer-states are formed for both possibilities and multiple universes are spawned of the possible outcomes. That is to say that a universe will exist where the cat is alive and a universe will also exist where the cat is dead. These universes could theoretically spawn from every true-or-false outcome in existence, resulting in an unthinkable multiverse in which all binary possibilities have been played out in one form or another. A true-or-false decision (|alive> or |dead>) would result in two universes, each with the opportunity to spawn later true-or-false decisions and therefore more opportunities to create additional universes. I can’t help but think of Douglas Adams’ Infinite Improbability Drive from Hitchhiker’s Guide to the Galaxy when I think about this interpretation!

The Many Histories Interpretation

This interpretation is something of a mix of the previous two. It agrees with the many worlds interpretation that waveforms do not completely collapse, but disagrees that this will go on to form multiple futures. When an observation/measurement is made, the observer becomes entangled with the state of the cat and decoherence takes place, just as with the many worlds interpretation. However, the state that did not “really” happen is ignored. The waveforms associated with this possibility still exist, but are seen as inaccessible as only one true future can come of any true-or-false situation once an observation is made.

What the hell does this have to do with Oracle?

Oracle is only a computer program, and therefore does exactly what it is coded to do (most of the time). Quantum mechanics really do not seem to have any relevance in the Oracle world because it is coded with specific outcomes to every true-or-false question. Logic gates are applied to bits, and the programmed outcome takes place. Non-programmed outcomes result in errors (ORA-600, ORA-7445).

However, things get more complex when we are modeling new environments because we end up dealing with conceptual outcomes with multiple possibilities. This is especially true in dimensional modeling, because dimensions are supposed to hold factual lookup data. How can we record a fact when important data regarding that fact may change?

For example, imagine a tax company that keeps track of deductions found in their various offices and wishes to form a star schema that will allow past and present analysis. Its (incomplete) star schema may resemble the picture shown here:

Tax Deduction Star Schema

Let CLIENT_DIM record = E. Schrödinger, 3 dependents
Let TIME_DIM record = 2006 tax year
Let LOCATION_DIM record = Virginia Beach, VA

The data will come together in our fact table to show that he had $25,000 in deductions for this combination of dimensional data. This “fact” is currently safe. During the next year, E. Schrödinger and his wife have a lovely baby boy (at least in this universe!).

Let CLIENT_DIM record = E. Schrödinger, 4 dependents
Let TIME_DIM record = 2007 tax year
Let LOCATION_DIM record = Virginia Beach, VA

The data will once again come together in our fact table, this time reporting $28,000 in deductions (I’m not a CPA, don’t get me on tax code) due to the extra dependent. This “fact” is currently safe, just as the last one was.

However, we now have a paradox in our data. In 2007, we report $28,000 in deductions, which was based on the fact that E. Schrödinger had four dependents. In 2006, we reported $25,000 in deductions due to E. Schrödinger’s 3 dependents. When we run our analytic reports for 2007, we will get great results; we will be able to break down the deductions and the number of dependents will play a proper role in these calculations. But when we run our reports against 2006, the deduction calculations will not compute properly. The CLIENT_DIM record will show 4 dependents, but the deduction amount for the 2006 year will have been based upon 3 dependents. Our dimensional data (CLIENT_DIM record) changed over time.

This is known as a slowly changing dimension (or SCD). Though we may not realize it, almost every dimension is in fact slowly changing; stores may move, clients may die (especially if they hang out with sadistic quantum physicists with ready supplies of hydrocyanic acid), and even our human definition of time can change over time (consider the changes to daylight savings this year). However, we don’t have to worry about all of these possible changes; we only have to worry about the ones pertaining to the facts on which we are attempting to report. Our business needs, in the end, determine which dimensions must be slowly changing.

There are three types of slowly changing dimensions: Type 1, Type 2, and Type 3. Each of these types tries to help the designer of the star schema eliminate paradox from their dimensional model (just as the three interpretations of the Schrödinger’s Cat thought experiment tries to eliminate the paradox of the living dead).

  • Type 1: Overwrite the old value with the new value and call it a day. This is very useful when dealing with issues such as typos on the client’s name. We don’t care about the history in this case because it was incorrect anyways.
  • Type 2: Create a new record in the dimension with a new primary key. In the example we’ve given, there would be two records in CLIENT_DIM for E. Schrödinger, one in which he has 3 dependents and one in which he has 4. Though he is one person from the business point of view, he is two people from a dimensional point of view.
  • Type 3: Overwrite the old value with the new value, and add additional data to the table such as the effective date of the change. This type of SCD resolution would be beneficial if there is a change that can happen once and only once (such as death).

These three types of SCD resolution usually help in resolving changes to “factual” lookup data. However, we can see clear correlations between these three types of resolutions and the three interpretations of the Schrödinger’s Cat thought experiment!

Bringing it all together

SCD Management Type 1 clearly matches up with the Copenhagen Interpretation of the Schrödinger’s Cat thought experiment. In the Copenhagen Interpretation, the state of the cat changes and all other states are discarded as the waveform collapses. Criticism to this interpretation applies as well to Type 1 SCD resolution. The Copenhagen Interpretation ignores the possibility of reconstruction; in quantum mechanics, it must be possible to return to any original state before measurement was taken place. In our star schema, it also ignores the possibility of reconstruction; we will not be able to return to the original state or even acknowledge that a previous state existed for the purpose of analytics.

SCD Management Type 2 matches with the Many Worlds Interpretation of the Schrödinger’s Cat thought experiment. Instead of completely destroying the other possible waveform, we simply maintain that the two possibilities decohere and form their own universes that will no longer share any correlation. This interpretation shares a similar problem with Type 2 SCD resolution. By spawning new records (universes) as the outcome of a changing event, we create multiple possibilities that no longer share any correlation. For instance, if E. Schrödinger has a new dependent, there will be two resulting rows: one in which has 3 dependents and one in which he has 4. If he then legally changes his last name to Schroedinger, we will have to record that change since it is important for tax records. Because of this, we will have three total records for this one client (E. Schrödinger with 3 dependents, E. Schrödinger with 4 dependents, and E. Schroedinger with 4 dependents). These three records will not have any correlation unless we create some sort of superkey that properly identifies a single person and their many instances. This will be important if we will be doing mining that incorporates multiple times, clients, and locations in our analysis.

SCD Management Type 3 matches with the Many Histories Interpretation of the Schrödinger’s Cat thought experiment. When an observed outcome occurs (like the birth of a new child), the old record is changed to reflect the new “real” outcome; in this case, the addition of a single dependent. However, the change is noted by the addition of a column such as an effective date, to show that this is not the only outcome that has ever existed, but it is THE outcome that does exist at this time. In effect, the old outcome (3 dependents) still exists, but is discarded, ignored, and irretrievable now that the outcomes are decoherent. In our star schema, this type of resolution will only provide us with a confusing result of “This is the case now, but it was not always so. It changed on ….” For some situations, as with Type 1 resolution, this will suffice, such as the death of a client. We only need to record that death once. However, if any new data enters our model after the death of the client (post-mortem taxes?) or if the status changes again (miraculous recovery!), we may have unreliable report output.

Conclusion

Just as a note, I understand that the Schrödinger’s Cat thought experiment and Slowly Changing Dimensions are not a true match. In quantum mechanics, we are calculating multiple events that happen at the same exact point in time (or over a period of unknown time); whereas in a data warehouse we are dealing with history and fully acknowledge the changes time may have on our data. However, one cannot help but notice the correlation between the two forms of paradox and their resolutions. In fact, in data warehousing the so-called Schrödinger’s Cat paradox becomes even more problematic because we are forced to not only predict future outcomes based on dimensional data, but to also report on past/present information based on the same data. Physicists attempting to provide interpretations of this thought experiment only have to worry about future conditions; the past and present are unmeasured, and therefore have no relevance on the problem except that they are in a quantum state. They seek to explain the future of the cat once observation/decoherence have taken place. If Schrödinger’s Cat were in a data warehouse, we would have to analyze the entire life of the cat, the cat inside that horrible box of doom, and try to figure out whether the cat will want dry food, wet food, or a good burial place after the experiment is finished.

5 Responses to “SCD: Slowly Changing Dimensions or Schrödinger’s Cat Dilemma?”

  1. John Flack Says:

    You’ve simply modeled this wrong. Number of dependents is not properly part of the Client dimension, it is a fact. As a fact, it has relationships with both the client dimension and the time dimension. Thus no paradox - client has 3 dependents associated with the 2006 tax year and 4 associated with the 2007 tax year.

  2. The Oracle Alchemist Says:

    Okay, granted. Perhaps I shouldn’t have used a numerical amount pertaining to the results as an example; however, the fact (no pun intended) still remains that slowly changing dimensions do exist, and require some resolution. Instead, I could have used the client’s last name or the location’s address as a changeable dimension value, as those could also cause problems with report output.

    Had I filled in the model more (hey, I said it was incomplete) I might have had other examples that I could have used. ;) I understand your point on the number of deductions, but SCD resolution is still a necessity for the reasons I mention in my article.

    I actually had a nagging thought about the same thing you mention after looking at the post a couple times, but I wasn’t sure if I should change it once I had posted it. I appreciate you reading and commenting!

  3. Pythian Group Blog » Log Buffer #40: a Carnival of the Vanities for DBAs Says:

    [...] Karam, the Oracle Alchemist, let the cat out of the bag (or out of the steel chamber perhaps) about indeterminacy in your Oracle databases. The crux of this excellent article is the question, “How can we record a fact when important [...]

  4. TERRY FRASER Says:

    Schrodingers cat scientificly speaking,cat tunneling with branes,

  5. Vijay Says:

    Nice article. Clearly justifies the purpose of the article.
    Moreover learnt something about quantum physics as well

Leave a Reply



Related Posts


Ouch! 65 Oracle Security Vulnerabilities?
According to a report by C|NET, that latest Oracle patch released today fixed 65 bugs, 27 of which could be exploited remotely by anonymous attackers. BE SURE TO APPLY THIS PATCH. Let ...
Hot Backups, Redo, and Fractured Blocks
I received a question from a reader that I figured I'd address here. Question: The oracle documentation tells us that when we put a tablespace in backup mode, the first DML ...
Hey Guys, Does Size Matter?
Last night I was called in on an unfortunate problem. A client has an update that they must run (unfortunately) which updates ~830,000 rows, setting one column equal to ...