This is just an Excerpt from a larger document, click here to view the entire document.
Markov Models for Simple Systems (Up/Down) and Interpretation

Now, consider the previous problem, approached as a two-state Markov Chain (References 4, 5, 6, and 7). Here we monitor the "status" of the system at time T, denoted X(T), instead of its "availability" A(T). Denote State 0 (Down) and State 1 (Up), and assess the status X(T) of your system S every hour (T = 0,1, ...). Hence, X(T) = 0 means that system S was Down at time T and X(T) = 1, that system S was Up at time T. We are interested in studying how the System S develops (transitions) over time. That is, we want to know what is the probability q (or p) that system S is Up (or Down) at time T, given that it was Down (or Up) an hour earlier (at time T - 1). We represent this problem using the Markov Chain state diagram shown in Figure 2.

Figure 2. State Diagram for System S (Click to Zoom)

 The transition probabilities are: p01 = P{X(T) = 1|X(T - 1) = 0} = q p10 = P{X(T) = 0|X(T - 1) = 1} = p p00 = 1 - q p11 = 1 - p

For example, let system S be Down at time T - 1. Then, either the system is Up at time T, with probability q, or it will be Down with probability 1 - q. If the system was Up at T - 1 then it is either Down at T with probability p, or it is Up at T, with probability 1 - p. This occurs because there are only two possibilities (Up or Down) for S at any given time T. The two state probabilities have to add up to Unit. Let's analyze this situation further.

Each time unit (hour) T, transitioned by system S, can be considered as an independent trial, and the probability pij of moving from i into the other state j, as the probability of "success". For example, let system S be in state Up. Then, moving to state Down by one step with probability p10 = p = 0.002 yields a Geometric distribution with Mean μ = 1/p = 500 hours. If, instead, system S is in state Down then, moving to state Up by one-step (hour), with probability p01 = q = 0.033, yields a Geometric, with Mean μ = 1/q = 30 hours. These are the same parameters of the "realistic" example of the previous section.

The Geometric distribution is the discrete counterpart of the continuous Exponential and, as the units of time T become smaller (hours to minutes, seconds, etc.), the two distributions converge. Therefore, this numerical example is equivalent to the one given in the previous section, which used similar time parameters, and will serve as a vehicle for comparison and contrast.

One important property of Markov Chains is their "lack of Memory." This means that only the system status at the immediately previous time has any bearing on the status at the current time, and every other past history goes into oblivion. In addition, these Markov Chains are time homogeneous (the transition probabilities pij do not change over time). Hence, it is enough to know the "one-step state transition probabilities" or the probabilities pij of going from any state "i" to any other state "j" in one step, to resolve the problems.

Markov Chains can be represented by a "Transition Probability Matrix" P, where rows represent every system state we can be in at time T, and columns represent every other state we can go to, in one step (i.e., where we will be, at T + 1). Entries of Matrix P (pij) correspond to the Markov Chain's one-step transition probabilities and must add up to unit, on every matrix row. For our numerical example, the Transition Probability matrix P is:

 States 0 1 States 0 1 0 (1 - q q) = 0 (0.967 0.033) 1 (p 1 - p) 1 (0.002 0.998)

If we need the probabilities of moving from one state to any other, in two steps, we raise matrix P to the second power. For example, moving from states Up to Down in two steps, entails either moving from Up to Down in first step, and remaining in Down state another step. Or it may entail first remaining Up for one step, before moving from states Up to Down in the second step. In matrix language, this is expressed in the following way:

In our example, the p10(2) result provides the probability that system S is Down, if it was initially Up, after operating for two hours (T = 2): p10(2) = p(1 - q) + (1 - p)p = 0.003. The p11(2) result (probability that S is Up after 2 hours, given that it started Up) can be obtained as one minus the probability that S is down, after 2 hours: p11(2) = 1 - p10(2) . We can then interpret p11(2) = A(T) = 0.997 as the system Availability, after T = 2 hours of operation, if it started in state Up (at T = 0).

To obtain the probability of moving from one state to another in "n" steps, we raise the matrix P to the "nth" power (Pn). For example, the probability that S is Down after T = 10 hours (steps) if it was initially Up, is p10(10) = 0.017 (this includes that S could have gone Down or Up, then restored, and this may have occurred more than once during the T = 10).

For a sufficiently large "n" matrix Pn yields quasi identical rows. Results are interpreted as "long run averages" or limiting probabilities "pi" of S being in the state corresponding to column "i". These results are similar to the ones obtained using the Expected Availability and Unavailability and the variable transformations approaches. To obtain these limiting probabilities (i.e., to calculate Pn) we need a practical result. For any two-state (e.g., Up, Down) system, as the one described above, this practical method is as follows.

Then, for a sufficiently large "n", the second term goes to zero and the matrix Pn reduces to:

Verify, for our given numerical example, that the probability of being in state Up at any arbitrary time T is q / (p + q) = 0.943, and the probability of being in state Down, is p / (p + q) = 0.057. These two "state occupancy rates", E(X) and E(Y), can also be interpreted as the percent of the time that the system S will spend in states Up and Down.