So as we developed the power_ssj2008 benchmark, it came to our attention that when we targeted 100% mean load, using a Poisson process, the queue exploded! Actually the queue was simulated, but if you were to calculate the queue length at the end of the 4 minute measurement, it seemed ridiculous! There must have been something wrong! Did our targeting algorithm have an issue? Were we exposing a flaw in the random number generator? What could it be?

I really struggled to write the following, some in the derivation logic, but also in the representation. You can't just type a fraction, and it will suddenly look pretty. At least, that's what I thought at first... Now, I've found out that WordPress.com supports $ LaTeX $ natively! That, and an editor to clue me in on LaTeX syntax $ \implies $ I can talk math on a blog! Yippee!! OK, now jumping back on topic.

Poisson is not enough — some derivation is required!

Basically, what I have puzzled on previously is a Poisson queuing of tasks of a fixed size, or an M/D/1 queue which is to say Markov (Poisson/Exponential) input, Deterministic (constant) service time, and 1 servers.

However, I like to do things the hard way, so rather than pouring through literature for an example derivation of M/D/1 queuing properties, I'll forge my own path to understanding. It would be easy if I could just say that I have a Poisson process, so the Poisson distribution describes the behavior. Not that simple! Adding that queue into the process is a monkey wrench to my dreams of perfect simplicity with Poisson.

I should take a step back from Poisson queuing for a moment — it feels too large to tackle, so I will start with something smaller. Monkeys are small, right? :) Enter the monkey distribution (This may already have a name but I just don't know it). This Poisson queue is made up of alternating periods of free time and work time for my worker handling the queue. Each interval of work is an integral multiple of $ t $ (the time to complete just one task). The free intervals could be of any duration. Note that in any actual application of the problem, I need to stick to a consistent unit measure of time, such as seconds.

Define $ M(x) $ as the event of a work period lasting exactly $ x $ tasks in length. What is the distribution of $ M(x) $?

Deriving the Monkey distribution $ P(M(x)) $

Breaking down a work period, I can formulate a natural partitioning of sub-intervals. Define $ 0 $ as the start of the work period. Then the first "natural" sub-interval would be $ \left [ 0,t \right ] $, because the work will continue that long at least. During the first sub-interval, some number of tasks will be queued (zero or more). Call this number $ K_{1} $.

If $ K_{1} > 0 $, then I will have a second sub-interval of work, namely $ \left [ t, t + K_{1} t \right ] $ and some number of additional tasks queued while working this sub-interval, namely $ K_{2} $.
If $ K_{2} > 0 $, then I can express $ K_{3} $ as the number of tasks queued in $ \left [ t+K_{1}t,t+K_{1}t+K_{2}t \right ] $.
...

Now I have a sequence, but I should make sure my sequence has a beginning and an end. $ K_{i-1} $ is the number of tasks worked in the $ i^{th}_{} $ sub-interval, so I can extend my sequence to include $ K_{0} = 1 $. Of course, I also defined $ K_{i} $ as the number of queuing events, and so for a finite work period with $ n $ sub-intervals, $ K_{n} = 0 $, (the work period ends when no work is queued in the last sub-interval).

I now have an ascribed definition and meaning for $ K_{i} $ as a sequence of integers. The set of all possible such sequences represents a partition of possible events for my maddening distribution to predict work period sizes.

Finally I can use Poisson! 

Per the Poisson mass distribution, the probability of $ m $ events occurring over duration $ w $ where $ \frac{1}{\lambda} $ is the mean inter-arrival time is given by

$$ Poisson(m, \lambda w) = \frac{e^{-\lambda w}(\lambda w)^{m}}{m!} $$

Because I have a Poisson process, I know that the number of queuing events in any adjacent, disjoint intervals are independent. Thankfully, this makes it so I can use a simple product for joining Poisson events to define the likelihood of one of my sequences occurring. However, there is a logical (non-causal) dependency in the sequence flow — the size of the interval. The more queuing events that occur in a sub-interval, the more time I have to queue events for the next sub-interval. Mathematically, this can be seen when I right write down the probability of $ K_{i+1} $ events queued in the $ i^{th} $ sub-interval as

$$ Poisson(K_{i+1}, \lambda K_{i} t) = \frac{e^{-\lambda K_{i} t}(\lambda K_{i} t)^{K_{i+1}}}{K_{i+1}!} $$

By independence, the probability of a sequence is given by the product of above over the entire sequence. Given $ n $ sub-intervals, then

$$ \prod_{i=0}^{n-1}\frac{e^{-\lambda K_{i} t}(\lambda K_{i} t)^{K_{i+1}}}{K_{i+1}!}= e^{-\lambda t \sum_{i=0}^{n-1}K_{i}} (\lambda t)^{\sum_{i=0}^{n-1}K_{i+1}} \prod_{i=0}^{n-1}\frac{(K_{i})^{K_{i+1}}}{K_{i+1}!} $$

Circling back to $ M(x) $:

$ M(x) $ corresponds to all $ K $ such that

$$ x = \sum_{i=0} K_{i} $$

Putting it all together, I sum the probability masses of all the sequences that result in exactly x tasks in a work period, as follows:

$$ P(M(x)) = \sum_{ K : \sum K_{i} = x } \left ( e^{-\lambda t x}(\lambda t)^{x-1} \prod_{i=0}^{n-1}\frac{(K_{i})^{K_{i+1}}}{K_{i+1}!} \right ) $$

Simplifying:

$ P(M(x)) = e^{-\lambda t x} (\lambda t)^{x-1} \kappa_{x} $ where

$$ \kappa_{x} = \sum_{ K : \sum K_{i} = x } \left (\prod_{i=0}^{n-1}\frac{(K_{i})^{K_{i+1}}}{K_{i+1}!} \right ) $$
$$ \kappa_{1} = 1 = \frac{1^{0}}{0!} $$
$$ \kappa_{2} = 1 = \frac{1^{1}1^{0}}{1!0!} $$
$$ \kappa_{3} = \frac{3}{2} = \frac{1^{1}1^{1}1^{0}}{1!1!0!} + \frac{1^{2}2^{0}}{2!0!} $$
$$ \kappa_{4} = \frac{16}{6} = \frac{1^{1}1^{1}1^{1}1^{0}}{1!1!1!0!} + \frac{1^{1}1^{2}2^{0}}{1!2!0!} + \frac{1^{2}2^{1}1^{0}}{2!1!0!} + \frac{1^{3}3^{0}}{3!0!} $$

Continued, with some abbreviated terms —

$$ \kappa_{5} = \frac{125}{24} = \frac{1^{1}1^{1}1^{1}}{1!1!1!1!} + \frac{1^{1}1^{2}}{1!1!2!} + \frac{1^{2}2^{1}}{1!2!1!} + \frac{1^{3}}{1!3!} + \frac{2^{1}1^{1}}{2!1!1!} + \frac{2^{2}}{2!2!} + \frac{3^{1}}{3!1!} + \frac{1}{4!} $$
$$ \kappa_{6} = \frac{1296}{120} = \frac{1^{1}1^{1}1^{1}1^{1}}{1!1!1!1!1!} + \frac{1^{1}1^{1}1^{2}}{1!1!1!2!} + \frac{1^{1}1^{2}2^{1}}{1!1!2!1!} + \frac{1^{1}1^{3}}{1!1!3!} + \frac{1^{2}2^{1}1^{1}}{1!2!1!1!} + \frac{1^{2}2^{2}}{1!2!2!} + \frac{1^{3}3^{1}}{1!3!1!} + \frac{1^{4}}{1!4!} + \frac{2^{1}1^{1}1^{1}}{2!1!1!1!} + \frac{2^{1}1^{2}}{2!1!2!} + \frac{2^{2}2^{1}}{2!2!1!} + \frac{2^{3}}{2!3!} + \frac{3^{1}1^{1}}{3!1!1!} + \frac{3^{2}}{3!2!} + \frac{4^{1}}{4!1!} + \frac{1}{5!} $$

...

$$ \kappa_{x} = { 1, 1, \frac{3}{2}, \frac{16}{6}, \frac{125}{24}, \frac{1296}{120}, \frac{16807}{720}, ...} $$

Stated without proof, it appears I have the following generalization:

$$ \kappa_{x} = \frac{x^{x-2}}{(x-1)!} = \frac{x^{x-1}}{x!} $$

OK, now I have a complex summation for the distribution of $ M(x) $, which may be a close cousin to my original problem.

But, how complex is it? Substituting $ \kappa_{x} $ back in, I have

$$ P(M(x)) = \frac{ e^{-\lambda t x} (\lambda t x)^{x-1}}{x!} $$

This new equation is shockingly similar to, yet clearly not a Poisson distribution.