# Deriving the Borel distribution

So as we developed the power_ssj2008 benchmark, it came to our attention that when we targeted 100% mean load, using a Poisson process, the queue exploded! Actually the queue was simulated, but if you were to calculate the queue length at the end of the 4 minute measurement, it seemed ridiculous! There must have been something wrong! Did our targeting algorithm have an issue? Were we exposing a flaw in the random number generator? What could it be?

I really struggled to write the following, some in the derivation logic, but also in the representation. You can't just type a fraction, and it will suddenly look pretty. At least, that's what I thought at first... Now, I've found out that WordPress.com supports \( \LaTeX \) natively! That, and an editor to clue me in on \( \LaTeX \) syntax \( \implies \) I can talk math on a blog! Yippee!! OK, now jumping back on topic.

## Poisson is not enough — some derivation is required!

Basically, what I have puzzled on previously is a Poisson queuing of tasks of a fixed size, or an M/D/1 queue which is to say Markov (Poisson/Exponential) input, Deterministic (constant) service time, and 1 servers.

However, I like to do things the hard way, so rather than pouring through literature for an example derivation of M/D/1 queuing properties, I'll forge my own path to understanding.
It would be easy if I could just say that I have a Poisson process, so the Poisson distribution describes the behavior.
__Not that simple!__
Adding that queue into the process is a monkey wrench to my dreams of perfect simplicity with Poisson.

I should take a step back from Poisson queuing for a moment — it feels too large to tackle, so I will start with something smaller. This Poisson queue is made up of alternating periods of free time and work time for my worker handling the queue. I can derive a discrete distribution for the busy periods. Each interval of work is an integral multiple of \( t \) (the time to complete just one task). The free intervals could be of any duration. Note that in any actual application of the problem, I need to stick to a consistent unit measure of time, such as seconds.

Define \( \text{Borel}(x) \) as the event of a work period lasting exactly \( x \) tasks in length. What is the distribution of \( \text{Borel}(x) \)?

## Deriving the Borel distribution **\( P(\text{Borel}(x)) \)**

Breaking down a work period, I can formulate a natural partitioning of sub-intervals. Define \( 0 \) as the start of the work period. Then the first "natural" sub-interval would be \( \left [ 0,t \right ] \), because the work will continue that long at least. During the first sub-interval, some number of tasks will be queued (zero or more). Call this number \( K_{1} \).

If \( K_{1} > 0 \), then I will have a second sub-interval of work, namely \( \left [ t, t + K_{1} t \right ] \) and some number of additional tasks queued while working this sub-interval, namely \( K_{2} \).

If \( K_{2} > 0 \), then I can express \( K_{3} \) as the number of tasks queued in \( \left [ t+K_{1}t,t+K_{1}t+K_{2}t \right ] \).

...

Now I have a sequence, but I should make sure my sequence has a beginning and an end.
\( K_{i-1} \) is the *number of tasks worked* in the \( i^{th} \) sub-interval, so I can extend my sequence to include \( K_{0} = 1 \).
Of course, I also defined \( K_{i} \) as the number of queuing events, and so for a finite work period with \( n \) sub-intervals, \( K_{n} = 0 \), (the work period ends when no work is queued in the last sub-interval).

I now have an ascribed definition and meaning for \( K_{i} \) as a sequence of integers. The set of all possible such sequences represents a partition of possible events for my maddening distribution to predict work period sizes.

## Finally I can use Poisson!

Per the Poisson mass distribution, the probability of \( m \) events occurring over duration \( w \) where \( \frac{1}{\lambda} \) is the mean inter-arrival time is given by

Because I have a Poisson process, I know that the number of queuing events in any adjacent, disjoint intervals are independent.
Thankfully, this makes it so I can use a simple product for joining Poisson events to define the likelihood of one of my sequences occurring.
However, there is a logical (non-causal) dependency in the sequence flow —
the size of the interval.
The more queuing events that occur in a sub-interval, the more time I have to queue events for the next sub-interval.
Mathematically, this can be seen when I ~~right~~ write down the probability of \( K_{i+1} \) events queued in the \( i^{th} \) sub-interval as

By independence, the probability of a sequence is given by the product of above over the entire sequence. Given \( n \) sub-intervals, then

## Circling back to \( \text{Borel}(x) \):

\( \text{Borel}(x) \) corresponds to all \( K \) such that

Putting it all together, I sum the probability masses of all the sequences that result in exactly x tasks in a work period, as follows:

## Simplifying:

\( P(\text{Borel}(x)) = e^{-\lambda t x} (\lambda t)^{x-1} \kappa_{x} \) where

Continued, with some abbreviated terms —

...

Stated without proof, it appears I have the following generalization:

OK, now I have a complex summation for the distribution of \( \text{Borel}(x) \), which may be a close cousin to my original problem.

But, how complex is it? Substituting \( \kappa_{x} \) back in, I have

This new equation is shockingly similar to, yet clearly not a Poisson distribution.