What Is Queuing Theory

What does Wikipedia say about Queueing Theory

According to Wikipedia Queueing Theory can be defined as, “The mathematical study of waiting lines, or queues. In queueing theory a model is constructed so that queue lengths and waiting time can be predicted. Queueing theory is generally considered a branch of operations research because the results are often used when making business decisions about the resources needed to provide a service. Queueing theory has its origins in research by Agner Krarup Erlang when he created models to describe the Copenhagen telephone exchange. The ideas have since seen applications including telecommunication, traffic engineering, computing and the design of factories, shops, offices and hospitals.

Agner Krarup Erlang, a Danish engineer who worked for the Copenhagen Telephone Exchange, published the first paper on what would now be called queueing theory in 1909. He modeled the number of telephone calls arriving at an exchange by a Poisson process and solved the M/D/1 queue in 1917 and M/D/k queueing model in 1920. In Kendall’s notation:

M stands for Markov or memoryless and means arrivals occur according to a Poisson process
D stands for deterministic and means jobs arriving at the queue require a fixed amount of service
k describes the number of servers at the queueing node (k = 1, 2,…). If there are more jobs at the node than there are servers then jobs will queue and wait for service“

You can read more about the basics of Queueing Theory at Wikipedia Link.

Why Do We Need Queueing Theory

As any system gets congested or loaded a queue of sorts starts to build up. This build up of congestion could apply to components within the system and also the system as a whole with queueing building up externally. A good understanding of the service times, the queueing times and the relationship of workload (increase and decrease) is essential to understanding the principles underpinning Queueing Theory. Queueing is part of our every day life i.e. from queueing at ATMs to withdraw money to waiting to be served at the counter when purchasing your cup of freshly brewed coffee every morning. We don’t generally realize how life around us plays out as a system of systems with Queueing Theory applicable to most of them.

Lets now focus on compute infrastructure which is a very common application of Queueing Theory for Systems Performance Engineers. There are various reasons why you need to understand the basics of Queueing Theory here.

Like every system any information technology system has limited resources
Every system provides certain functionality and does so with a given set of service times
As the workload processed by the system goes up the utilization levels of the system go up and queueing at the various resources gradually starts to build up
An increase in queueing does not change the service time for the system but rather increases the amount of wait time at a given resource
The increase in workload causes building up of queues outside the system and potentially at various key resource blocks across the system (depending on how the system is designed)
A perceived increase in response time is noted. This includes a constant “Service Time” and a variable “Wait Time”

So What Do We Do With All This Queueing Models

Thus as you have noted above an understanding of Queueing Theory helps Systems Performance Engineers understand how an increase in user workload within a system can cause an increase in customer wait times and hence overall perceived response times. Armed with this information the System Performance Engineer can do a few things. He or she can:

Tune the system bottlenecks so as to increase transactional throughput and hence delay the impact of queues
Impact the system configuration by increasing the amount of compute resources available for processing. This again will change the behavior of the system
Impact the way the users access the system and prioritize user requests. This allows for processing of higher prioritized user requests first followed by the lower priority ones.
Provide a pool of resources for the system to consume so that an increase in workload triggers an increase in compute resources and avoids building up of queues

There are various ways a Performance Engineer could impact the design of a system with a solid understanding of queues.

Kendalls Notation for Queueing Theory

Before we look into the various Queueing Theory equations let us go over the notation system designed by Kendall to describe a Queueing system.

Let us denote a system by A / B / m / K / n/ D, where

A: distribution function of the inter-arrival times,
B: distribution function of the service times,
m: number of servers,
K: capacity of the system, the maximum number of customers in the system including the one being serviced,
n: population size, number of sources of customers,
D: service discipline.

Exponentially distributed random variables are notated by M, meaning Markovain or memoryless. Furthermore, if the population size and the capacity is infinite, the service discipline is FIFO (First In First Out), then they are omitted.

Hence M/M/1 denotes a system with Poisson arrivals, exponentially distributed service times and a single server. M/G/m denotes an m-server system with Poisson arrivals and generally distributed service times. M/M/r/K/n stands for a system where the customers arrive from a finite-source with n elements where they stay for an exponentially distributed time, the service times are exponentially distributed, the service is carried out according to the requests arrival by r severs, and the system capacity is K.

Kendall’s equations are generally used to describe Queueing Systems making it easy to convey the system configuration using a single equation.

Queueing Theory Equations

Here is a summary of the various key Queueing Theory equations:

U = (S_t * X )/M …[U = System Utilization, S_t = Service Time, X = Throughput, M = No Of Servers/CPU’s]
Q_l = L Q … [Q = Queueing Time, Q_l= Queue Length, L = Arrival Rate]
R_t-cpu(For CPU Systems) = S_t / [1-U^m] …[R_t-cpu=CPU Response Time, S_t=Service Time, U = System Utilization, M = No Of Servers/CPU’s]
R_t-io(For IO Systems) = S_t / [1-U] …[R_t-io=IO Response Time, S_t=Service Time, U = System Utilization]
R_t = S_t + Q_{t …}[ Rt = Response Time, St=Service Time]
L_q = L_sys / Q_n … [ L_q = Arrival Rate Per Queue, L_sys = Total Arrival Rate for the System, Q_n = Number of Queues ]
Ec = ErlangC (m, St, L_q) = [ (M S_t_L_q)^m/ M! ] / [ (1 – m S_t) Sigma_K=0^M=1 [ ((m S_tL_q)^K / K! ) + ] ((m S_tL_q)^M / M! ) ]

The above equations have probably made your head spin a bit and I wouldn’t blame you for that. Not being a mathematician myself it took me a while to piece together the whole picture and work out how to best use the above equations to model performance of the system for increases in workload. For those who want to avoid creating complex models using error prone excel see the option provided below.

Additional Resources

Hope you’ve enjoyed the content in this section at Practical Performance Analyst and have learnt something new. Please help us grow the community by taking a moment and sharing this content with rest of community using your preferred Social Media Platform (links provided below). We are looking for the bright spark and if you think you have what it takes to build and grow this community reach out to me by Sending us an email. Practical Performance Analyst

Trevor Warren is passionate about challenging the status-quo and finding reasons to innovate. Over the past 16 years he has been delivering complex systems, has worked with very large clients across the world and constantly is looking for opportunities to bring about change. Trevor constantly strives to combine his passion for delivering outcomes with his ability to build long lasting professional relationships. You can learn more about the work he does at LinkedIn. You can download a copy of his CV at VisualCV. Visit the Github page for details of the projects he’s been hacking with.