YOW! CTO Summit 2019 Melbourne - Your Team as a Distributed System

2019-12-14 Meetups and Conferences

# Your Team as a Distributed System - Andrew Harvey

The presentation by Andrew Harvey, the CTO in Residence at Microsoft for Startups focussed on the people aspect of tech.

A number of the topics that Andrew covered are close to my heart, especially when he was talking about the Peter Principle which basically states that people are promoted to their level of incompetence. Andrew talked about the chance of promotion increasing as a developer’s technical ability increases; this ultimately leads to them moving into a management position, and as many people will say, technical proficiency has no bearing on management ability. With this in mind, organisations need to ensure they are moving people into management roles who have the right skills to be a manager, this needs to be supported by training and mentoring to ensure they are able to lead and manage. I really appreciated Andrew converting this problem into technical terms by saying “people don’t throw stack traces; instead they silently fail, segfault and then uninstall themselves”.

Andrew then described how a team can be considered a distributed system. A distributed system has multiple processes (in this case team members), there are inter-process communications (team members talk), there is a disjoint address space (each team member is maintaining a different state in their mind) and there is a collective goal (hopefully the team has this). With the knowledge that a team is a distributed system we can apply the Fallacies of Distributed Computing to understand in technical sense some of the challenges of leading and managing people.

# Fallacy 1: The network is reliable

The first fallacy is easy to prove in a team environment. We just need to remember the last time we accidentally missed an email, forgot to respond to a Slack message, or forgot something that was said in a meeting. Once we accept that our network is unreliable (if it were a computer network, we’d fire the network admin because it’s so bad) we can start to recognise that we need to use different communication protocols for different types of information and for different recipients. If the wrong protocol is used, then the communication will likely be lost. Again, in networking terminology, we need to use TCP not UDP; if we use UDP we have no idea if our communication was received. With TCP we have some additional overhead but get confirmation that the message was received.

# Fallacy 2: Latency is zero

If you need proof that latency isn’t zero, simply send an email to someone and don’t do anything else until you get a response, we can quickly discover there can be significant latency in our human network. The members of our team have to prioritise their tasks, the latency caused by a communication delay can trigger team members (or teams) to perform conflicting work, it can create deadlocks where people are unable to proceed due to unfulfilled dependencies and can become quite wasteful without appropriate management.

# Fallacy 3: Bandwidth is infinite

Unfortunately, as humans, we do not have unlimited bandwidth. We have a limited ability to express ourselves, our ideas and our knowledge. Our communication mediums are also limited; when we talk face-to-face we have more bandwidth available as we can get a better idea of body language, in video calls we lose the body language but still have access to facial expressions, on a phone call we can still hear vocal inflections even though we can’t see the person’s face, and when we get down to text communications we often have to rely on emojis. As we receive information, we fill in the blanks to make up for lost bandwidth. This can lead to miscommunication and misunderstandings.

# Fallacy 4: The network is secure

In most situations we can make an assumption that our human network is relatively secure and no one is actively sabotaging it, although sometimes people are looking out for their own self-interests which can be a form of unintentional sabotage.

What is more likely is that our network is prone to corruption. Every time information is communicated it changes to some degree, the more points of relay, the less likely that the information being conveyed will be accurate. It is important to ensure that we verify what we heard with the person telling us some information to reduce the chance of corruption.

# Fallacy 5: Topology doesn’t change

The topology of our human network changes frequently. This could be driven by corporate restructures, it could be the scaling (either up or down) of a business or team, or it could be due to staff turnover. I’ve never seen a team remain stable, there will always be changes in the topology of our human network, we need to ensure this is accounted for in all actions of the team. This could mean documenting processes, removing single points of failure, and having easy onboarding processes to ensure a change in team topology has the minimum possible impact.

# Fallacy 6: Only one administrator

Many businesses are structured to have a single administrator, but in practice there are always multiple people controlling the flow of information. The board or CEO may set a direction for the business, but at the message is communicated down through the layers of the hierarchy, each layer adds or removes information to better align to their own agenda. Even at the individual contributor level, each person will focus on what’s important to them. Because of our ability to individualise each requirement we need to ensure that consensus is reached and that all parties are working toward the same goal.

# Fallacy 7: Transport is free

Many of us are acutely aware of the cost of communications, rom the time spent in a meeting, to the context switching and loss of productivity from interruptions. It is important to balance the need for communication and the required productivity. With too little communications everyone is working for a different goal, with too much communication no one can achieve anything.

# Fallacy 8: Network is homogeneous

The final fallacy is that the network is homogeneous. Every team and every team member has a different driver or motivator, we need to learn what each of these is and to optimise the interactions as early as possible. The closer we can get to a homogeneous environment the easier the communications become. We don’t have to align the drivers of every team member but helping them to understand each other will help to make communications more effective.

# Conclusions I’ve Drawn from the Fallacies

Recognising that our team or human network is a distributed system and is subject to the fallacies of a distributed system leads me to the belief that the primary role as leaders and managers is to coordinate the communications between the components of our system (the people). If our team is not delivering in a coordinated fashion at the expected rate of delivery, we are failing as managers to optimise our network.

Our primary role as a leader or manager is to manage and facilitate communication. To achieve this we need to ensure that messages are delivered clearly and via appropriate means, that delays are kept to a minimum, that messaging is understood and acknowledged, that we can cope with change, that we don’t under or over communicate, and that we align as much as possible.

If we treat our team as a distributed system, account for the deficiencies of these systems in all that we do, then we can be successful as a leader and our team can be successful. If we fail to allow for the deficiencies, or if we are unable to negate the negative impacts of these deficiencies, then we have failed as a leader and manager.

Andrew talked about his thoughts on scaling teams, the requirements to monitor team health, and resolve issues quickly whilst ensuring that we know the how, what, why and where are we going. He reminded us that conflict resolution is a required skill for a leader, and if conflict is left unattended it will only get worse.

Especially when managing managers, it is important to have skip-level one-on-ones. Although I have talked about one-on-ones in One-on-Ones Don't Exist in the Scrum Guide - Why do we do them?, I didn’t cover the skip-level one-on-one. A skip-level one-on-one is important as it enables team members to tell you the information they want you to hear when an intermediate manager may be acting as a filter. As a senior leader it can help you to understand the real pain points from an individual, and may help to seed ideas for improvements within the organisation. It may also show up a deficiency in an intermediate manager’s ability.

Another aspect Andrew talked about was ensuring there is no single point of failure. This is something I’ve been passionate about throughout my career, and likely stems from when I have run my own business. If a single point of failure exists, be it a core system, a team member with a large amount of knowledge, a CTO who needs to approve every decision, a small issue can have a massive effect. If we look at a team member with a large amount of undocumented knowledge, what happens if they are sick or they resign? If a CTO has to approve every decision then that CTO can’t take a holiday, or if they do then the entire team will cease working because the decision maker is unavailable. As leaders we have to ensure we empower people to make independent decisions; to do this we need to ensure they have enough context to make the decision, and that all the required information is easily available to them.

Coupled with this is the need to ensure there is a clarity of roles. Who is leading the team? Who is driving the team? Are they the same person? What are the goals and drivers for each team member and how can they be coordinated for the maximum benefit and minimum friction of the team? We need to ensure that collective goals are communicated clearly and that actions align with these goals. There is no point communicating that the focus should be on the customer if all the actions being taken are focussed on profitability with little regard given to the customer.

As Andrew’s presentation drew to a close, he talked about what culture is. It’s almost impossible to miss the focus on culture, especially within IT organisations, but very rarely does anyone define culture and what it does. If there is one thing you take from this article about culture, it should be the following paragraph:

Culture eats strategy for breakfast! I know I’ve heard this before, although I cannot recall where, but it is something that needs to be remembered. It doesn’t matter what strategies you put in place, if they don’t align with the culture they will fail. Remember that as a company scales, culture will set the scene, a culture of laziness will breed more laziness, a culture that allows unethical behaviour will generate more unethical behaviour. Once a culture has been created it is hard to change; it is built through decisions, and it will take time for decisions to influence and change an ingrained culture. When building a team, remember that culture is created by decisions such as who we hire, who we fire (and why), the actions that we reward, but also the actions that we accept.