Load Series: Throttling vs Loadshedding

In the previous post in this series we looked at how to reason about load and overload in a system.

The way to think about building systems hardened against overload is to introduce an upper bound to the amount of load you introduce in your system at a given point in time. In the graph below the yellow line indicates the load threshold above which our system availability degrades. The blue line represents a safe upper bound we want to introduce – we want to start rejecting work if the load in our system reaches this threshold.

The naive recommendation we made was to use token bucket throttling to restrict our inbound tps to 38 tps. This allowed us to draw the red line shown in the graph below.

Screen Shot 2019-05-11 at 8.12.50 PM

This would protect our service from exceeding the safe upper bound of load in our system, but it is also very conservative, because it catered for the pathologically worst scenario we could encounter. In reality, the percentile distribution of cost per unit of work will vary so the average cost per unit of inbound work will fall somewhere between the steepest curve and flattest curve. If we configured our system to start rejecting work when we encounter the red line, we will lose out on utilising the headroom we have to fulfill cheaper requests. This unutilised capacity is indicated in the red and blue triangles in the graph above.

What we actually want to do is slide that vertical red line to the right to create a safe upper bound for tps, and introduce a separate horisontal red line to restrict overload.

Screen Shot 2019-05-11 at 8.12.50 PM copy

Intuitively this makes sense, but what does that mean in practice?

Throttling vs Loadshedding

Throttling or Rate Limiting refers to controlling the rate of traffic you allow into your system. The intent is to control the transactions per second entering your system or concurrent within your system at a given point in time.

Loadshedding refers to controlling how much load you allow in your system at a given point in time. The intent is to prevent overload.

Remember in the initial post of the series I told you about a failure mode where we saw a massive spike in load in the system while maintaining constant tps? In that scenario only throttling inbound requests would not have sufficiently protected us against overload. We would have needed additional loadshedding strategies to recover from overload.


There are a few ways we can introduce throttling, and a few reasons why we would want to do so.

One of the most popular throttling mechanisms is the token bucket algorithm. See here for a nice example on how Stripe.com uses Redis to implement a shared token bucket in a distributed environment. In summary, we control rate of access to our system by handing out tokens from a shared bucket which is replenished with new tokens at a constant rate. The rate at which we can hand out these tokens is dictated by the rate of replenishment. It is designed to accommodate small bursts of traffic, but slow down larger, more sustained spikes in load.

Overall inbound traffic throttling

We know that there is a definitive upper bound to the request tps our service can handle. That vertical red line we moved to the far right in the graph above can be implemented by configuring a maxconns limit per host on your load balancer, or (preferably AND) using token bucket throttling to rate limit inbound HTTP request tps. (Note, there is a distinction between maxConns and max HTTP requests. More to follow in another future post). This can help protect you against DDOS attack, for example.

Concurrent Throttling

It is good practice to reason about how many concurrent units of work can be in flight in your system at a given point in time. Subtleties that can inform this are thread count and db connection pool size. It doesn’t make sense to accept more requests into your system than you have worker threads to execute. All you will be achieving is creating bottlenecks in your system where in flight work starts backing up and waiting for resources, leading to higher request latency.

Imagine our supermarket deli counter. If we only have three people who can hand out salami, it is inevitable that a queue will start forming. The queue might not move as slowly as our checkout counter queue, but the latency that customers spend in the shop will increase.

You can implement this by putting an interceptor in your HTTP Request Handler stack which has a local ‘bucket counter’. Every time an HTTP request enters your system, it decreases the counter. Every time an HTTP request exits the system it increases the counter. If the counter is zero it returns a 503 error.

Per customer throttling

Many customers will write code which calls our banking APIs. It is very easy for a bug in one of these systems to become a bad actor and start calling your banking service too aggressively. If they create a malformed request which returns an HTTP error response, on which they retry without exponential backoff, they can create retry traffic storms. It is a good idea to restrict the number of concurrent TCP connections against your service, as well as the number of HTTP requests a customer can create in a given period. The intent here is isolation to protect other users from one bad actor.

Per customer resource throttling

We saw that the number of accounts a customer has can have an impact on the cost per request for certain operations. Also, we often build systems where customers are billed per resource. It is good practice to restrict the rate at which customers can create new resources to protect both load in your system as well as the customer against themselves.


The point of a loadshedder is to measure the concurrent load in your system and to start rejecting work when the load increases above a healthy threshold.

[Interesting point – the word threshold comes from the wooden barrier across the bottom of a door entry which used to be build to stand up from the ground, creating a barrier. In medieval times people used to throw thresh (or straw) on the ground in the house for insulation and protecting against damp. The threshold had to prevent the straw from dribbling out the door.]

Fleet usage loadshedding

This type of loadshedding reserves a certain percentage of system availability for specific traffic. This loadshedder will start rejecting unreserved traffic once it hits a percentage of use threshold even if the system is not degraded in order to protect headroom for potential mission critical traffic. This is a nice pattern to protect health check traffic between a load balancer and host as an example.

Prioritised loadshedding

Screen Shot 2019-05-12 at 4.30.34 PM

If your system hits a load threshold, you have to start rejecting work to protect availability. Prioritised loadshedding allows you to decide which requests to reject based on its priority. In the example above you can see how we started favouring high priority traffic bands over low priority bands to reduce load in the system until it has recovered from overload. We then slowly start introducing traffic back.

Detecting Load

This is an interesting topic which actually deserves its own post, and is very much informed by the shape of your system. I can list a few mechanisms here.

Response Latency

In most systems there exists a direct relationship between load in system and response latency. If the load in your system increases your response latency will start degrading and you can start backing off. Response latency becomes a very useful proxy for load if the cost of work per request in your system is uniform.

In our banking application using response latency as an indication for degradation in the system would work for DescribeAccount, because you have an O(1) cost of work percentile distribution. It could also work for an O(n) percentile distribution if you have insight into n – this would require you to calculate a normalised latency cost, i.e. load measure = response latency/number of accounts. This would not work easily for DescribeLargestAccount() because you can’t derive n by inspecting either the request or the response.

You can easily implement a load shedder like this by adding an interceptor in your HTTP request handler chain which maintains a history of the last n response latencies seen. Every time you see a new response, you update your historic sliding window. This allows you to detect a velocity of increase/decrease of load in the system based on which you can adjust your loadshedding behaviour.

Concurrent In-flight Requests

You can use in flight requests as a proxy for load as well. This is fairly coarse, and works better for uniform cost operations where number of in flight requests is directly related to load in the system. If our service degrades then the rate of in flight requests will start increasing because work is not exiting the system. This means that in flight requests will go up. If you keep a sliding window history of in flight requests this will allow you to be able to measure the velocity by which load is increasing/decreasing in your system. This is a bad measure of absolute load in the system though. This will also not work if you have high variance in cost per unit of work.

CPU/Memory Utilisation

Your service can use CPU or memory utilisation as a proxy for load in your system. It is easy to query your OS to get utilisation metrics to inform your loadshedding behaviour.

Back Pressure

I really like this one for systems which are composed of multiple components or layers – engines, databases etc.  – each with its own load threshold and failure modes.

You want to place your loadshedder as high up in your system as possible, because the deeper into your system work penetrates, the more load on your system you add.

Sometimes, however, it is difficult to anticipate the cost of units of work by inspecting requests and responses at the edge. A good example is the failure mode I discussed in my initial post. The cost of work was determined deep within the system depending on whether we needed to do a write to the database or whether we treated a write as a no-op. There is no way for an HTTP interceptor to inspect requests and determine the cost of doing the work in the request. Two identical requests could trigger very different costs in the system, depending the existing state in the database.

A common pattern for HTTP services looks like the diagram below. The entry point to the service is an HTTP port receiving requests. In this example we have two types of operations, each with its own request handler.

We want to implement a Loadshedding Interceptor right at the outside border of the service, taking a look at each inbound request and each outbound response. This guy needs to reason about the load in the service to decide which requests to reject or allow.


The problem here is that different components of this service has different load thresholds and failure modes. The loadshedder does not necessarily have all the information to reason about load in the system, because it is far away from the blue and green layers.

There are a few ways we can try solve this.

Approach 1

Defer the loadshedding decision to the database integration layer. Let the database integration layer have its own token bucket to allow access to a db connection in the pool.

There are a few problems with this.

Firstly, one request handler can make multiple independent calls down into the database integration layer. If even one of these operations get throttled by the database integration layer it will retry – the request latency goes up and the request stays in the system for longer. Work takes longer to exit the system and we have introduced a bottleneck. We want to loadshed full requests, not sub-components of requests.

Secondly, the database integration layer is request agnostic and is fairly far down in the system. Work has to enter far down into the system before we make a loadshedding decision. This means a lot of wasted work and a lot of occupied threads that could have been dedicated to more relevant work.

Approach 2

Let each component bubble up metadata about its load upstream to the loadshedder.

There are two approaches here – add load metadata to the response (not ideal, now we are mixing system internals with business logic in the contract between components) or add extra operations on components so that the loadshedder can query component health continuously.

It can work but it is clunky and add bulk to the system.

Approach 3

Each component measures its own internal load and throws an error when it isn’t able to accommodate the cost of an operation. These errors bubble up to the loadshedder which can see that the system failed to complete work. It doesn’t need to know why the system failed to complete the work, just that it wasn’t successful. The loadshedder maintains a sliding window of the past n responses, grouping them by success and error. It can calculate the velocity of change in the ratio of successes vs failures and use this as a proxy for reasoning about the availability of the system.

This pattern extends nicely to integrate with other definition of load. You can easily build a loadshedder that creates a composite definition of load by inspecting CPU and memory utilisation, velocity of change in error rates and response latency. It is in the right position to reject or accept requests into the system. It is also in the right place to inspect the request and classify its priority, allowing us to give priority to more important requests.

Health Monitoring System

You can build an external monitoring system that observes service metrics like CPU and memory utilisation, response latency and error rate. This out of band system then calculates a composite load score that a loadshedder can use to make loadshedding decisions.

The benefit of this approach is that load monitoring and compute happens out of band. Even if your system becomes unavailable, this guy can still be online.

The problem with this approach is that an external system does not have the detailed insight into component load that each internal component has. it also comes at the cost of building and maintaining another service.

Load Series: How to reason about load within a system

Remember our banking service in the previous post in the load series? Let’s use this as an example to illustrate how to reason about load in a system.

Measuring work – latency

Remember Little’s Law?

Under steady state conditions, the average number of items in a queuing system equals the average rate at which items arrive multiplied by the average time that an item spends in the system

In our banking system, this translates to “the average number of requests waiting to be served by the system equals the rate of arrival of requests (transactions per second, or tps) multiplied by the average time that item spends in the system (request latency)”.

For us to reason about the cost of a unit of work, we can use request latency as a proxy representing that cost. That allows us to use Little’s Law to reason about throughput in our system. This makes sense – the faster our system can process requests, the more work it can soak up in a specific timeframe.

Cost of Work

Let’s start with a basic model – the service is backed by a relational database storing banking data and the service has one operation – DescribeAccount(account). When a user calls DescribeAccount(account) it returns a response with the metadata describing that single bank account. Regardless of which customer you are, the work to perform this operation is fairly constant – you need to make a round trip to the database, retrieve the account details, transform it into a nice response shape, and send it back. The percentile distribution of our response latency for this operation is O(1).

Let’s add another operation – DescribeAccounts(). This operation returns a response describing all a customer’s accounts. Your service now has to retrieve n records from the database and transform each one into a response. Let’s say your percentile distribution of latency now becomes O(n).

Finally, let’s add a third operation DescribeLargestAccount(). Now you have to retrieve all a customer’s accounts, sort them, and describe the largest account. Your system isn’t really good at calculating this, so your percentile distribution of latency now becomes O(n^2).

Screen Shot 2019-05-11 at 6.15.25 PM

Throughput and Load

Throughput refers to the rate at which our system can perform work. Load refers to the amount of in flight work in our system at a given point in time. Remember our checkout counter example? A customer took on average six minutes to complete a purchase. That till point’s throughput capacity was 10 customers an hour, and its load was one customer at a time. We were able to increase our throughput by adding two more till points. Our throughput became 30 customers an hour and our load was three customers at a time. (Obviously assuming there was a queue of customers waiting, so we were fully utilised).

If we use Little’s Law, we can derive a measure of Load (in flight work) as follows:

Load = Ave request latency * tps

Let’s see how this applies, given our latency measures in the graph above:

Screen Shot 2019-05-11 at 7.21.35 PM

What does this graph mean?

  • The blue line shows that load grows linear to tps in a system where your average latency distribution is constant. A second with 5tps taking 50ms per request will have five times the load in your system than one second with 1tps.
  • The orange and grey lines show how your load is sensitive to increases in average latency. This means that, even if your tps stays constant, if more of the calls are DescribeLargestAccount() calls made by a customer with a large number of accounts, the load on your system will increase.


In the graph above, you can use load and throughput interchangeably – they intuitively appear to have a linear relation. If the amount of work in your system goes up, so does the number of units of work, right? The more the merrier.

In real life it isn’t really that simple. All systems have maximum load thresholds, beyond which the system availability degrade. You start running out of memory, you start thrashing or work starts backing up in the system because you have too few threads available to comfortably parallelise work. As soon as you hit this threshold, your system load starts increasing dramatically, because work is not exiting the system. At the same time, your throughput starts dropping, because you aren’t successfully completing work at the same rate as before. It goes something like this:

Screen Shot 2019-05-11 at 7.40.25 PM

Oh dear, what now?

This finally brings us to the beginning of the interesting topic – how to build systems hardened against overload.

Once again, remember our formula for load using Little’s Law?

Load = Ave request latency * tps

What we want to do is place an upper bound on the load capacity of our system. The two axes informing load are request latency and tps.

Remember our graph from earlier?Let’s add a max load threshold here:

Screen Shot 2019-05-11 at 8.12.50 PM

We can clearly see that we can accommodate a higher tps rate of cheaper calls than more expensive calls. We can also see that the worst case scenario in the graph above is hitting that max load threshold at 40 tps/120ms request latency. Finally, we can see how the ratio of cheap vs. expensive calls affects load in the system given a constant tps – this is something tricky to manage, which I will discuss in a future post.

Naively, we can do two things immediately to introduce the blue line safe upper bound:

  1. Add a max request timeout of 110ms to the service
  2. Use token based throttling on incoming traffic to only allow 38 in flight requests at a given point in time

Unfortunately we are hardening against the p100 load situation, which we don’t expect to see very often. This means that we are conservative with our system throughput, and we expect to be under-utilising our resources most of the time. For most systems this is an acceptable tradeoff, but for some systems we really want to optimise our throughput to fully utilise our system resources, or the relationship between latency and tps is not that simple to manage. I will be covering this in a future post.

How do we go about determining these numbers for our system? You can and should determine these values for your system using load testing. I will be covering this in a future post as well.

Load Series: Basic concepts

When thinking about throughput and load in systems, it is important to understand some foundational concepts. In this post I provide a basic explanation of some of the terminology I will use in later posts.

Little’s Law

L= λW


L = average number of items in the queuing system

λ = average number of items arriving per unit time

W = average waiting time in the system for an item

Little’s Law says that, under steady state conditions, the average number of items in a queuing system equals the average rate at which items arrive multiplied by the average time that an item spends in the system

Phew, that is a lot of words. Let’s take a look at what that means. Imagine a supermarket checkout. In South Africa we have different chains with different checkout strategies – some allow a queue to form per checkout counter, and some form one checkout queue where the person at the head of the queue goes to the next open counter. We are working with the latter model here for simplicity.

Little’s law states that we can calculate the number of people in the queue considering the average rate at which new people join the queue multiplied by the average time that a customer spends at a checkout counter.

Let’s say you run a store. You measure your checkout counter latency and realise the average customer takes six minutes (0.1 hours ) to complete their checkout and an average of 40 people enters your checkout queue per hour. How can you use Little’s Law?

Scenario 1

Management can state that we want no more than three customer waiting in the checkout queue at a single point in time.

L = λW

L = 40 x 0.1

L = 4 customers in the queue on average

There are a few things you can now do to drop the value of L. You can make your checkout counters process customers faster by adding staff to help pack groceries.

Solve for W:

W = L/λ

W = 3/40

W = 4.5 minutes

You need to improve your checkout counter throughput time to 4.5 minutes to achieve an average queue length of three customers with the current customer arrival rate.

Scenario 2

Month end means people get payed, so they tend to do their monthly groceries. This means that people fill their trolleys with more things and more people enter your store. Management needs to accommodate fire hazard regulations, so they can’t accommodate more than a certain number of customers in the store at a given time. They want to hire in more staff to help around this time, so ask you how much staff they need to bring in to help, and how many checkout counters they need to open to accommodate the additional load. From previous months you know that your arrival rate of customers entering the checkout queue jumps up to sixty customers per hour. You also know that full trolleys mean that the average processing time per customer is now nine minutes (0.15 hours).

Solve for L:

L = λW

L = 60 x 0.15

L = 9 customers

If you want to maintain your 3 customer queue from scenario 1, you now need 3 checkout counters open because your queue scaled 3-fold.

[Update: Today is voting day in South Africa. I just queued about 30 minutes to make my mark. These people know how to make a queue move. Two things that could influence the throughput of voting queues today – this seems like a contentious year, so the W might increase as people might take longer to decide who to vote for. Also Wimpy offered free coffees to anyone who has the vote mark on their thumb, which might lead to an increase in L :)]


As a kid I vividly remember playing with balloons by inflating them over and over again. At some point the balloon suffers material fatigue, and every time you touch it it shrivels in at that area. When you inflate it again it inflates inconsistently, stretching more in some areas than others.


Systems consist of different subsystems, and each subsystem responds uniquely to overload. In our example in the first post, our HTTP service was able to scale better horizontally than our database write head, leading to a strange failure mode. Our system scaled like the balloon above – it constrained faster in some areas than others, realising in a really lopsided whole.

When you build systems, consider how each component behaves under load, what overload looks like, what headroom it requires, and how it is able to scale.

It is important to understand how the system as a whole respond to overload, given the behaviour of each sub-system. It can quickly become too complex to reason about. Use load testing to discover your system’s failure modes and plan accordingly.

I will dedicate a post in this series to load testing, as well as a post to failure modes of different shaped systems.

Unit of Work

In an HTTP service, it is easier to reason about in flight work, throughput and load if all the requests of a certain type has uniform cost.

Scenario 1

Imagine a banking service where customers can make a request to describe their account details. The request latency for DescribeAccounts(params) can easily be O(n) where n is number of accounts. This means that DescribeAccounts request latency will be much higher for customers with many accounts than for customers with few accounts.

Scenario 2

Imagine the same banking service, but an operation where customers can make a request to describe a monthly transactional summary. The request might be the same for all customers, the response body size might be O(1) across all requests, but the request latency is O(n) where n is number of transactions per month.

Applying Little’s Law in such circumstances become difficult because average latency, given a constant tps, is now sensitive to which customer is making requests – something you can’t control.

I will be dedicating a post in this series to strategies to go from O(n) to O(1) unit of work cost for different scenarios.


Horisontal vs Vertical Scaling

You can respond to scaling a system in two ways – horisontal and vertical. Imagine our banking service in the previous example. You can scale your HTTP service by either giving it a larger machine to run on. This is called vertical scaling. Alternatively, you can put more identical machines in your fleet behind a load balancer. This is called horisontal scaling.

Vertical scaling strategies in a relational database can include using indices to improve query latency, increasing your heap allocation and sql partitioning. Horisontal scaling strategies can include master-slave replication trees and partitioning your data across multiple databases.


Headroom refers to how much buffer you maintain to absorb increases in load in your system.

Load headroom

Load headroom refers to the percentage increase in throughput your system is able to absorb before degrading in availability. In our banking app, this could refer to how many accounts a customer can have before seeing degraded availability in the DescribeAccounts operation. It could also refer to how much increase in tps your system can support before degrading.

Temporal headroom

In most systems, if you are doing things right you will be seeing a steady increase in usage of your system over time. Systems that were able to support load a year ago might not be able to support the increased load that became normal a year later. Systems either degrade gradually, e.g. with a steady increase in p50 latency, or they degrade in a stepwise manner, where they reach a tipping point and suddenly degrade dramatically. We call this a scaling cliff. It is your responsibility as an engineer to anticipate approaching scaling cliffs and know how much temporal headroom you have to invest in scaling your system.

I will be dedicating a future post in this series to headroom.

Load Series: Hardening distributed systems against overload

This is the introductory post of a series that has been brewing in the back of my mind for the last six months – a mental framework guiding hardening distributed systems against overload.

Six months ago I saw a very interesting failure mode in a system my team was building. Our service was running on a distributed fleet of hosts, each with its own http request endpoint, but backed by a shared database. Our read traffic far exceeded our write traffic, and we also controlled the writing client and had invested a lot of work in normalising the write request throughput into the service. All good. The gritty bit came in a nuanced optimisation we had made in our service. A lot of the write requests were redundant and would not affect a state change in our database, i.e. they were no-ops. Like the responsible engineers we were, we did a cheap read operation against the database to determine whether a write would be a no-op or not before actually executing a write [Note – this does not always make sense, for our db platform it was an improvement].

Very intermittently a scenario would play out where “everything in the world changed”. Our write request tps stayed constant, but the ratio of noops to mutations changed dramatically, leading to a dramatic spike in write throughput through the write head of the database. Our database tried like a little champ to do as much of the work thrown at it, but it became slower and slower – read latency degraded, leading to an increase in retries on both the write and read paths, translating into even more load on the database. Everything came to a grinding halt. Oh dear.

Protecting the service against overload wasn’t simple:

  1. Simple tps based throttling would not have been sufficient – our request tps stayed constant, it was the ratio of expensive vs cheap requests that changed
  2. We would need to do a read against the db to determine whether a write was cheap or expensive, meaning that we had to penetrate pretty deep into our service in order to gather information to determine the cost of a write request
  3. Because of the lack of isolation between the read and write traffic, reasoning about the resulting load in the system was hard

This situation resulted in a lot of reading and thinking for me. It triggered a lot of ideas around how to think about overload, throughput and hardening of distributed systems, which I am going to try explore in this series of posts.

I will be posting shortcuts to the whole series here:

Post 1: Basic Concepts

Post 2: Reasoning about load in a system

Post 3: Throttling vs Loadshedding

How to get groups to make difficult decisions effectively

I mentioned a difficult learning curve I recently went through in my previous post. This new role requires of me to facilitate conversations between small groups of people across the organisation to make high impact decisions, and, man, what a learning curve that was. Driving consensus within a group of people to make data-driven, unbiased decisions is a skill that everyone needs to learn sooner rather than later in their careers. I caught up with a friend recently and we spoke about mechanisms that are useful in these conversations.

I would like to try summarise some of the tools in my toolkit, and also provide examples to illustrate how they can be used.

Don’t make the decision yourself

There are two ways I have seen leaders facilitate decision making conversations. The first, more common, involves a leader/person with authority asking questions from the group, thinking, making the decision themselves, and (if they remember), informing the group of their decision.

An alternative way of approaching the conversation is for the leader to ask questions from all members of the group in such a way that you frame the input to the decision in a clear manner everyone can follow. The person facilitating the conversation can then use summaries to frame what the group has said in such a way that the outcome decision becomes apparent to everyone. You finally ask everyone in the room for their take, allowing room for people to explain why they made the decision. If you use this mechanism well, you can drive the right decision without voicing an opinion yourself, rather, just by asking the right questions and making the group talk.

A benefit of the latter strategy is that you remove single person bias. A decision is much more robust if eight people hashed it out and converged on the same outcome, than if one person made a high-judgement decision themselves. You also get stronger buy-in from the group to commit to the decision using the latter, because people have a stronger sense of ownership if they were part of the final decision making process. Finally, people are part of the journey, allowing them to understand why a decision was made. It makes it easier for them to commit to the decision even if they had an opposing view initially.

Separate framing how to make the decision from the outcome

Imagine a team who needs to decide what database stack they need to use to build a new product. They have to choose between an open source relational data store, or they can get a license for a proprietary relational data store. There are pros and cons to both – with the proprietary platform you get guaranteed support from the vendor, however, you do lock yourself in for a few years, upgrading is harder and you have to consider the licensing cost. With the open source option you need to either support it yourself, or you need to engage with a company that provides support on that open source platform, which also comes at a cost. This is an ambiguous problem with no clear right answer.

You can frame the considerations in your questions and summary, without making the decision yourself. Imagine gathering a room full of people, and asking the following questions:

“Paul, what are the technical risks to us supporting the open source platform ourselves in house?”

“John, which features do we require from our platform, and do either of the two platforms lack any of these features?”

“Grace, If we need additional support for the open source option, which companies provide this as a service, and how much does it cost?”

“Aamer, How much does the proprietary platform license cost, and how long do we have to commit to this product?”

“Paul, What additional costs are involved if we need to do a version upgrade on the propriety platform?”

“Grace, we have never used this open source platform before. What knowledge would we need to develop in house if we were to adopt this solution?”

At no point is a decision being made, yet the questions are targeted to gather specific information about the choice. In choosing these questions you are framing the dimensions which inform the decision – cost, agility, knowledge, support. You are just framing the benefits and risks of both platforms and allowing all the participants to contribute. If cost is of greater concern than knowledge – you have a strong engineering team adept at learning new technologies fast, but you have limited budget, you can bias towards this by choosing questions to reflect this. You can also weigh more heavily on those trade-offs in your summary, while still allowing the group to make the decision.

Ask questions that force people off the fence

People hate climbing off the fence. People also like fishing for information by asking open ended questions. Imagine our database platform decision above.

A few examples of open ended questions:

“Paul, tell me about the open source platform?”

“Grace, you like the proprietary platform?”

“Aamer, what do you think?”

You are going to hang around for a while having this conversation without driving consensus. What is missing? First, you are opening the door for people to waffle on without direction. You are also not introducing those dimensions along which we need to frame the decision we mentioned earlier. Finally, you are not forcing the person to tie their information back to a benefit or risk that informs the decision to be made.

Compare these examples against the examples of questions above. Let’s ask the following question:

“Paul, what are the technical risks for us supporting the open source platform ourselves in house?”

The answer will allow you to summarise the outcome as follows:

“Can I summarise that there is some knowledge we don’t have about this product, but the product has a widely supported user base and well-frequented forum that we can rely on. The risk of introducing a new product into the team can be mitigated by allocating additional engineering effort towards building labs and generating support playbooks?”

Immediately the group can slot this data point into their mental model framing the decision and the ambiguity here has been resolved.

Ask targeted questions and get people to climb off that fence.

Use those summaries

Miller’s Law states that there is a cognitive limitation to the number of data points a person can retain in short term memory at a point in time. People really suck at recalling a long list of data points during a conversation. If you don’t leave breadcrumbs for people to recall everything that was discussed, their recency bias will kick in really early, and this will skew the decision they lean towards in a conversation like we are describing. To manage this, you need to do two things.

First, after every item of discussion, summarise the take-away in short, concise terms, making it clear how the data point relates to our decision. Put a pin in it, and ensure the summary leaves no ambiguity hanging.

Secondly, make room during the conversation to summarise the big picture so far, framing where the conversation stands.

“To summarise so far, the open source platform can save us xx a year in licencing fees if we are equipped to support it in house. To mitigate the risk of introducing a new technology into the team, we will have to invest in building our knowledge base by allocating x engineer hours, as well as expand the support team with two headcount. This will come at a cost of yy, and will require us to hire n people. Let’s discuss the version upgrade feasibility to see how this affects our decision?”

Remove group dynamic bias – make everyone heard

I was discussing this with my friend a few weeks ago, and he asked me how to facilitate a conversation where you are brokering a decision between a junior who has a lot of knowledge about the problem, and a senior who has a lot of authority to make the decision. Ha!

Let’s talk about a few things that introduce bias in group conversations:

Some people are more vocal in group discussions than others 

  • Make room for everyone to talk by asking targeted questions to specific individuals
  • Learn how to cut ramblers short without being rude e.g. you can interrupt, apologise, and immediately ask a yes/no question to get them to the point and move on
  • Write a short document/email framing the context of the decision to be made before the discussion, allowing people to frame their opinions beforehand
  • Allow people to submit written feedback/input before the discussion. Build this into your process

In a room with varying seniority, the senior people’s opinion carries more weight

  • Engage participants in the conversation from most junior to most senior. This also eliminates the bias of juniors parroting senior opinions
  • Ask questions from juniors intentionally asking them to advocate for the opposite position. That way you open a door for them to disagree without being perceived as insubordinate

One person just won’t budge

  • If you have one person getting stuck on a single data point, and won’t budge, ask them to motivate why they feel so strongly, acknowledge their point of view, re-summarise the whole discussion so far, and move on. Trust the democracy of the group to de-bias the final decision

Reduce group think

Highly cohered groups often align early during an ambiguous decision making process – sometimes too early. This can happen in groups with high trust, should their trust co efficiency drown out willingness to rigorously explore alternatives.

Intentionally introduce questions asking people to argue for the opposing view – that way the conversation requires of them to explore alternatives without compromising their allegiance to the group.


The past few months I have been fortunate enough to have been training for a very specific and challenging role in our organisation, and the journey has been tough. The role requires you to maintain and protect consistent standards and facilitate important decision making involving groups of people from all over the organisation. I am extremely grateful to have this opportunity though, because, apart from being very rewarding, I also learned an incredible amount about coaching and feedback. I’m going to dump a summary of my musings on feedback here both from the perspective of the coach and the coachee.

On being coached

I think it is part of the human condition to need validation of your strengths as well as your mental model of how the world works and your function in it. We seek out affirmation and recognition from our peers rewarding us by giving us affirmative messages.

This mental model, though, needs to change if you are to grow as a person, and shifting or invalidating your mental model and your function in it is inherently a very uncomfortable experience. Receiving critical feedback challenges your mental model and your self-belief – it requires emotional work on your part to facilitate a shift (if necessary, of course) and defining what the resulting shift should look like.

My first year as a software developer, for example, was a massive learning curve – adjusting to a profession and learning how to navigate teams and function within an organisation.

Going through this most recent training was tough, because it involves being shadowed by people already in that role and receiving in-detail feedback on everything you did and could improve on. On top of that, the role is extremely qualitative or fungible – it isn’t as simple as learning a process or improving code quality. It requires you to learn how to facilitate a group of people, drive a complex conversation, and ensuring the right outcome for the organisation. I don’t think anyone in the world can teach you how to do that by just explaining the steps to you. The only way you can really teach someone how to do this is by exposing them to the situation, letting them practice and providing feedback with every iteration.

The more you grow in your career, the more qualitative and intuitive the skills become that dictates your impact. As a junior engineer your value lies in producing good quality code at healthy volume. The more senior you become, the larger and more ambiguous the problems become. Your impact on team and organisational culture becomes more important when you start delivering through others and driving the direction your team takes. The blast radius of getting it wrong also increases. Coaching at this level becomes proportionally harder, because it becomes more qualitative – these are attributes and behaviour you can’t just explain to someone over a coffee.

Being coached at this level is not easy, but I have devised a framework for myself in which to process feedback:

Have respect for how hard it is for others to coach you

Coaching is hard. Remember, you might be going through the learning curve of mastering a difficult skill – the person coaching you is going through the learning curve of learning how to coach others that skill. You might just be one of the eggs they need to make their omelet.

Have empathy with yourself – allow yourself to iterate on it and fail

I think this is something especially women struggle with – I know I do. I started doing ballet a few years back and initially I was extremely frustrated. It took a few months for me to realise that the practice of ballet means constant pursuit of mastery through failure. The only bloody way to learn how to do a pirrouette is by falling over hundreds of times repeatedly until you get that first turn. You also have to realise that the immediate next turn after that first successful one is going to be a faceplant again. Mastery is not linear and not binary. It takes repeated practice.

Seek out diverse opinions

The first thing I learned going through this training was that you are going to get conflicting feedback. They can both be right – ask them why and find the underlying value of their advice. Everyone needs to develop their own flavour of doing something, and you need to define your own way as well.

Own it

This is your learning curve. You are not uncomfortable for someone elses benefit, but for your own mastery. You are doing this because you want to grow. This means that you carry the responsibility for getting the feedback required and actioning that feedback to facilitate your growth. Your development is not the responsibility of anyone else. If you receive feedback you don’t understand or you don’t know how to action, carry it with you and have conversations with the people coaching you until you do.

On Coaching

Giving critical feedback is one of the hardest things to do as a human, often to the extent where people shy away, rather walking away from the conflict. I believe that learning how to give critical feedback is a critical skill everyone needs to learn, no matter how uncomfortable it is. I am very grateful to have been on the receiving end of coaching by a few masters of the trade recently, allowing me to observe how they did it and the way I received it. Similarly I have been the egg in the omelet of a few less successful coaching attempts, highlighting to me a few things to avoid. Here are my few cents worth.

Distinguish between punishment and feedback

Never, ever try punish someone under the guise of ‘giving feedback’. If you resent the person and do not engage with the emotional intent of improving their experience, walk away and rather go throw around weights in the gym.

Allow them to bitch about it

Receiving important feedback is an extremely uncomfortable experience. Allow the person on the receiving end to express this to you. Sit there and listen, nod and smile and acknowledge that you empathise with how they feel, regardless of whether you agree or not. Successful coaching requires a willing recipient, and you need to create a high trust environment to coach successfully.

If necessary, let them run off and lick their wounds before catching up again.

Don’t overload the person with too much in one go

I recall having read somewhere that the human brain can only retain up to seven ideas in short term memory before needing to offload it to long term memory. I am pretty sure that this number drops to two or three in stressful situations. Anticipate that a feedback session is a high stress situation for the recipient so rather give diverse feedback points over multiple shorter sessions, allowing the person to process one idea at a time. Overloading a person with a grocery list of areas of improvement will overwhelm them. They will then need to do so much emotional work having to overcome a sense of failure that they will have little capacity left to actually respond to the feedback.

Recognise growing pains

Understand that people only have a limited capacity for working at personal growth. If you see a person regress or stagnate, maybe back off on introducing other areas for growth, allowing them to complete the learning curve they are currently in.

Make it concrete

You are not kind when trying to soften the blow by saying “sometimes you can possibly do something like…”. Rather stick to “On this day, in that situation, you did…”. That gives the person context on where and when behaviour took place.

Explain why

If you need someone to change their behaviour, you need to explain to them why. Explain what the risk is of not changing, or the advantages gained by changing.

Make sure they recognise it

I recently received feedback that I tend to ‘give too much of my own voice to a conversation’. I couldn’t understand it, because I was intentionally trying not to do that – until someone told me that they thought the feedback referred to my verbal summary in a conversation. I would repeat what I heard someone say in summary, trying to validate that I understood their intent, and in doing so, I could become floral and instill my own interpretation in the message. As soon as they said it, I understood what they meant, and I could change my style.

Make it actionable

Work with the person to identify ways in which they can try address the feedback. Help them identify concrete outcomes to measure their success in doing so.

Make sure they receive it

Giving feedback via someone else – a manager or a mentor – is easy, right? What if that person decides not to pass on the message? You start resenting that person because you don’t see any attempts to respond to the feedback, and they become frustrated because they don’t know why you are suddenly frustrated with them. Make sure feedback reaches someone – preferably give it yourself.

Limit the hops (broken telephones suck)

The more hops there are between you and the recipient, the easier it becomes to give critical feedback. You email a manager who has a chat with another manager who asks that person’s mentor to deliver the message. By the time that message reaches the poor recipient, if it even makes it that far, the feedback has been so diluted that it loses its impact, context and actionability. This is extremely frustrating to the recipient, because now they know 1. Someone out there thought they didn’t do a good job, 2. Their whole management chain (chain of trust) knows about it, so well, that is awkward, and 3. they have no idea what to do about it other than go home and feel bad about it.

Close the loop – reward recipients for mastering the bloody thing

Close the loop on the feedback by celebrating with a person when they get it. Tell them when you see an improvement, thank and congratulate them for working at it.


Kibibytes and the missing bits

Right, I just had a noob moment in the office. I was reading about MemInfo metrics, and came across this sentence:

MemTotal — Total amount of usable RAM, in kibibytes, which is physical RAM minus a number of reserved bits and the kernel binary code.

Wait, a what?!

Sarah – “What is a kibibyte?”

Ian – “It is 1024 bytes.”

How did I not know this? It is a thing!

Pretty JSON formatting in Sublime Text on Mac

I like using Sublime Text for ruby/json/ad hoc stuff on my Mac, it has a good balance between ide-like text editing, but still lightweight enough for you to muck around.

JSON formatting options are a dime a dozen, but this one was quick to install and is very usable – SublimePrettyJSON GitHub page

To install:
cd ~/Library/Application\ Support/Sublime\ Text\ 2/Packages

git clone https://github.com/dzhibas/SublimePrettyJson.git

To use:

  1. Select Text
  2. cmd+ctrl+j

Hard conversations – Burn the ships

I want to tell you a story.

Paul* started as a junior engineer at a sexy tech company shortly after completing his BSc CS Masters. He worked hard, wrote good code, and got promoted. He developed a reputation very quickly as a rising star and soon got put onto teams doing tough projects.

Let’s move to introducing John*. John moved to Paul’s company after learning the ropes at another very good development shop, and came with a good reputation. John and Paul are now two mid-level engineers with a great track record, and are given a big ‘prove-yourself’ project and a handful of young engineers to lead. Here things fall apart. The team is morbidly unhappy, sprint goals are missed week after week, the team starts slipping on delivery dates like a toddler in a skating rink. Internal conflicts start picking up and people area leaving not only the team, but the company.

I have encountered this team in different levels of disfunction in various environments, either as a team member or an observer, and it is a pattern I have started to recognise in our industry. Let’s unpack some symptoms I have observed in the past:

  • Code reviews are scathing and personal. “Are you THIS stupid?!”, “????!!!!!” or paragraphs of passive aggressive criticism are the norm.
  • A senior engineer holds the big model in his head and hands out little chunks of micro-managed work to other engineers. When questioned, he answers with a ‘well, why do you need to know everything, I know what we are working towards, you just do your bit’. No one knows how the project is tracking or how to even find out how the project is tracking.
  • When someone throws a white-board across the room, you can just as well disband the team and go on holiday.
  • Newer team members are not allowed to contribute to a conversation, and are shut down with the “well, once you have been here as long as we have, you will learn how we do things around here.”
  • The one who talks the loudest, wins.

Let’s head back to Paul and John and see how they are doing. Unfortunately, the constant praise they are used to have now de-generated into increasing pressure from senior management to get their team and project in order. In the past, Paul and John got where they are because they were smarter than everyone else AND they had the grit to push harder to get the work done. They have been rewarded for this repeatedly throughout their careers. It is just a case of doing this once again, right?

Unfortunately the strengths required from Paul and John right now are completely orthogonal to being the smartest and most hard-working engineer in the room. What no one told them was that they now suddenly have to empower other engineers to be the smartest and grittiest workers in the room. In order to do this, Paul and John suddenly have a steep learning curve ahead – they need to create a high conflict, high trust space in which younger engineers feel safe enough to deliver stretch work despite high ambiguity and self-doubt, probably while experiencing all of this themselves. This is a very steep learning curve and I have seen engineers fail in this space either by getting themselves fired (the white board incident was anectodal) or leaving out of frustration.

I have good news and bad news.

The bad news is that most engineers are a John or Paul at some point in their career, and for some this is a wall they can’t overcome.

The good news is that I believe that this is fixable.

A legitimate strategy can be to isolate technical experts and buffer a team from their emotional toxicity by separating a technical visionary role from a generalist team lead role. I am personally not a fan of this, because this means you have very senior brittle engineers and team success relies on inter-person collaboration between people known for EQ as a weakness.

Another strategy is to foster a culture in which Paul and John is introduced to this learning curve earlier in their career and are given mentors who are successful at facilitating team cultures like this. I can name people I regard as these role models I have had in my career – Malcolm Hall, Mylo Mannya, Anthony Robinson, James Greenfield and, most recently, Ian Davies.

I have, to date, chosen my career opportunities based on the presence of such people in a space – this is very important to me.

I don’t like throwing more and more critical feedback at John and Paul – this makes them more anxious about their inability to succeed and probably leads to worse leadership behaviour and more general angst. If we don’t empower John and Paul to succeed in this learning curve, we risk losing great engineers.

This brings me to burning the ships. Captain Hernán Cortés arrived at Veracruz in 1519, and had to capture new territory. He instructed his troops to burn the ships – he knew the only way to instill enough motivation to succeed was if there was no option to turn back. I have walked away from professional spaces with problems in the past, as many people do. We are fortunate (or unfortunate) enough to be in an industry where development skills are in high demand, so we always have the ‘ah, I have options open’ in the back of our minds. The accessible back door is a psychological safety tool that all people tend to leverage. The problem with this is that we don’t tend to take ownership for the problems in the space we are in now.

We need to burn the ships psychologically and commit wholly to taking ownership for the culture of the space we find ourselves in right now. The great working environment we all want won’t exist except if someone intentionally builds it.

How am I taking responsibility for building the kind of working environment I would like to work in? By opening up the tough conversations, while making sure people around me are emotionally safe. By rewarding the quiet rock stars vocally. By advocating and sponsorship of engineers who might find themselves in tough conflict situations. By talking about problematic behaviour I observe and building allies through conversation. Reward the good, don’t attack the bad. Ride out the tough times to stay long enough to see the good emerge.

Go burn your ships.


Keep your call path primed

I recall asking my Mom as a kid on Winter mornings why she always turns the key before starting the engine, and her response was “I need to prime the engine, because it is cold.”

My team is working on something that is common in a service-based environment – writing a replacement for an existing service that we intend to hot-swap without taking an outage. If something goes haywire (initially) we retain the ability to flip traffic back to the legacy call path as a backup, right?

My concern is this: As soon as you reroute traffic from the legacy call path, you will have to assume that that service’s ability to serve that call volume immediately starts to degrade. The team that owns it becomes disincentivised to maintain that service to accommodate the call volume capacity it is carrying today. Since they don’t serve production call volumes anymore, they won’t notice if their service degrades, and in our eco-system, that happens in a matter of days.

I recommend a new migration design pattern – keep your call path primed by routing traffic through ALL call paths until you are comfortable retiring a call path completely. You can do this by routing production traffic through shadow call paths, generate mock service calls that hits extraordinary paths frequently or rely on integration test/black box test coverage. This also includes validating backup restores and tasks you rely on in operational crises.