Becoming a CTO

A former co-worker reached out to me recently. They are a director of engineering at a midsize startup and just got their first headhunter inquiry for a CTO role. Having never been in the role before, they wanted to know what the position was like and how to prepare for the interviews.

I realized that while there are some books on technology leadership careers, there aren’t many resources explaining the most senior levels. My goal is to provide some insight and advice for those interested in someday becoming a CTO.

I’ve been a CTO for five and a half years

I’ve worked at a hundred-thousand-person company, seed-stage startups, and many of the variants in-between. I started as a developer and followed a traditional path of moving up to more senior levels on the development track and then moving to lead, engineering manager, director, VP, and now chief technology officer. I’ve been the CTO at three different companies in two countries and three parts of the technology industry. I’m part of a few networks where I meet and talk with CTOs of all sizes and stages of companies.

I’ve learned that one reason there isn’t a good reference for the role of the CTO is that the size of the company and the expectations of the CEO define the job. Some of my role expectations and responsibilities are like those of many of my peers at similar-size companies. However, there are also significant differences in our expectations from our executive peers and boards.

Because of the variability of the role, I will broadly share my direct experiences, joined with an understanding of the expectations of other CTOs that I know.

The early-stage company CTO is often the developer-in-chief

At earlier stage companies, the CTO is often the technical co-founder. They are likely the developer who built many of the earlier versions of the software and helped hire the original development team. Their responsibilities are primarily technical: driving architecture, doing advanced development tasks, and creating technical vision.

Frequently, the first CTO of the company is hired for their ability to code and not their ability to grow or manage a team. Depending on the person, they may also lead the development team. Still, often the team’s management will eventually move to another person, an experienced manager, who may report to the CTO or be a peer to them.

The early-stage CTO is the leading technical voice for the company externally, especially if they are a co-founder. They talk to investors and potential partners and meet with potential vendors. If they also manage the development team, they will solely represent engineering in the senior leadership team. As a result, they will have responsibility for the decisions made by the engineering team. Nevertheless, if they do not manage the team directly, they might not be involved in the decisions around the day-to-day operations.

A mistake that inexperienced founding CTOs often make is that they don’t understand their role beyond coder-in-chief. They focus solely on the technology and are not active participants in the company’s leadership. As a result, they do not work cross-functionally. CTOs fixated on the how without the why or what will not be in the role very long once the company grows.

If they have no experience leading an engineering team or organization, the early-stage CTO will be challenged to grow with the company. If they cannot scale, eventually they will end up in a subordinate role reporting to a more experienced CTO hired to replace them.

The midsize company CTO is responsible for leading the organization, corporate strategy, and making technical decisions

Once a company reaches a size at which it needs new processes and structures, the scrappy leaders who helped get the company off the ground are often replaced with more experienced leaders knowledgeable in taking companies through the next growth stage. If the CTO hasn’t grown into the larger role, they will be part of that replaced group.

The midsize company CTO is a full-fledged executive team member working cross-functionally and meeting with partners, investors, and customers. Frequently, the midsize company CTO will also manage the engineering organization. The CTO is responsible for setting technical direction, making sure good architectural decisions are being made, and establishing best practices and working methods. They are still expected to have good technical depth, but don’t often actively contribute to shipping code. A red flag for me personally is seeing a CTO role description where the expectation is to lead a 50-plus-person organization while also actively coding on the product. It means the executive team does not have appropriate expectations for the role.

A midsize company CTO spends significant time establishing culture and practices for the teams they are responsible for; they are also very directly accountable for the organization’s decisions and its track record of delivery. The CTO meets internally with members of the other functions, such as sales, marketing, HR, and finance, to share direction for the organization and get feedback. The CTO is responsible for the administration of the teams, including the budget.

The CTO is also responsible for hiring, performance management, and team structure and may be very active in their teams’ recruitment and interview processes, especially in a scale-up type of company.

A CTO leading a more extensive development organization must be a generalist, understanding different roles and responsibilities. Their remit may include Corporate IT and Technical Support. In some companies, they may also manage the business analytics, security, product, and UX teams. A CTO who is too focused on the areas closest to their background or does not respect non-coding functions will not succeed.

As a midsize company CTO, you will often spend as much time with your peers and their teams as you spend with your own. As a result, you will need to learn about their functions and how your teams can work together. CTOs who “stay in their lane” will not be seen as an equal member of the senior leadership team and may lose their say in decisions that affect the organization.

It is very unusual for someone to move into a midsize company CTO role without having some experience leading a multilevel-development organization and working with other business functions.

Growing (or moving) into the CTO role

If you are a manager or a manager of managers with the goal of being a CTO, there are a few things you can start to focus on that will help you on your path.

Learn about the business your company is in

Offer to sit in on sales calls, on user research interviews. Try to understand the company’s financials when the CFO presents them. If you can’t, make a friend in the finance team and ask them to explain them to you. Understand the KPIs not only for your team, but also for the teams around you.

Learn about the other functions

Get recommendations of reading or conference talks from your peers in the product, UX, and marketing teams. Think about how their work influences yours, and yours influences theirs.

Respect and learn other technology areas aside from your own

If you lead an area you don’t have personal experience in, approach the people in that function with respect and a genuine desire to understand their work. They want to help you know what they do and how they do it.

Hone your craft

Hopefully, you are already working on deepening your skill as an engineering manager or director, but are you trying to understand the bigger picture? Read other companies’ (public) handbooks, engineering blog posts, and conference presentations about their ways of working. What practices are interesting? Which can you try in your team? How do you think they will scale, or what issues do you think they may have?

Ask your CTO if there are tasks they can delegate to you

The best way to learn the job is to do the job. Even better is having someone who is already doing the job explain to you how they perform it so you can help them.

Start thinking in terms of strategy

The main difference between the expectations of line managers and senior managers is the emphasis on strategic thinking. Executives contribute to the company’s strategic planning and use their understanding of the company’s goals and the current situation to make sure that their teams are setting up the conditions for the company’s success. Strategic thinking is a learnable skill, but it takes practice.

The rewards of being a CTO

Being a CTO was not what I imagined it to be when I first decided it was my career goal. It is a lot of work, carries much stress, has fewer perks than you might think, and can be somewhat lonely. However, it is also the most personally rewarding job I have ever had. With the challenges, there is also incredible responsibility, tons to learn, the ability to influence the company’s direction, and the chance to affect the lives of dozens or hundreds of people on your team. I have yet to regret my choice to pursue this role.


Thanks to Laura Blackwell for editing assistance

The p-word

I have never heard the word “politics” used in a positive light when describing a work situation. On the contrary, the words “corporate politics” evoke memories of cynical executives in ’90s movies quoting The Art of War to their reports while figuring out how to undermine their peers. One of the Merriam-Webster dictionary’s definitions of the word is “political activities characterized by artful and often dishonest practices.”

I propose that we reconsider the word “politics,” especially when used in a work context. The word, and the techniques ascribed to it, are inherently neither good nor bad. You can use politics for ill intent or good. Good intention aims for a win-win solution, whereas bad intent aims for a win-lose solution. Instead, let’s use this Merriam-Webster definition for the word “politics”: “the total complex of relations between people living in society.” For our purposes, let’s call the company we work in our society. A company is a society in that it can have a panoply of personal relationships, group dynamics, shared goals, and systems of governance.

If we can get over our initial reaction to being political, how can we use some of these techniques for win-win solutions to problems?

Thinking politically means thinking ahead (being strategic), understanding the motivations of the people you need to convince (having empathy), and understanding the interactions of the systems you are trying to influence (systems thinking). You are working to make something happen-something you cannot do on your own. You may be working to overcome resistance to a new idea in a conservative institution. You could be trying to persuade another group to help your team with a project that will be good for the company but might make that group miss their quarterly goals.

Thinking politically will help you gain support for your ideas, soften resistance to change, and focus people on the bigger picture. If you improve everyone’s situation, your peers will appreciate you, and you will find the way forward easier in the future. Conversely, done poorly, where you or your team move ahead at the expense of others, you will find it increasingly hard to gather support in the future.

Playing politics so everybody wins: A personal example

In the early days of the public cloud, I worked in a company that already had established data centers worldwide. Getting a new server racked meant requisitioning a server from the central IT organization, following all their guidelines around the machine’s configuration and which technologies could be used on it, and giving them access to maintain and manage it. The process to get a single server going with a public-facing interface could take months.

I was leading a new team trying to incubate a new product. We had adopted a Lean Startup approach, moving to get to market in under six months. As this was a brand-new area for the company, we couldn’t be sure how quickly the public would adopt the product, and we wanted the ability to add capacity quickly if needed or shut the project down if it wasn’t getting traction. The company’s lead time for servers was not going to work for us. So, we decided to leverage Amazon’s young AWS offering. I knew that this would be a controversial decision and might incur opposition from other teams, especially IT. I could have chosen to “ask forgiveness, not permission” and hope that my small team could fly under the radar long enough to launch, but that was very risky. Our actions could be interpreted as a deliberate avoidance of company security and budget policies, which could prevent us from launching if we were found out.

I spoke to my peers in other teams to understand their prior experience working with the centralized IT team. I learned that if I approached the IT team directly for permission to use the public cloud, I would get an immediate “no.” That would put me in a position of having to get their decision overruled, which would take a lot of time and energy. I decided to go a different route.

I put together a presentation on our plan for our product. I included our quick path to market to mitigate risk for the company, our plan to leverage the public cloud (to scale quickly and manage cost-effectively), and how we would address any corporate security concerns. The goal of the presentation was to build trust that my team was thinking about the business and not just playing with new technology, and to show that we had answers to the issues I expected other groups to raise. I portrayed our product plan as an innovative experiment in new product development; a low-risk approach to moving faster as a company. I wanted to get some protection for my team at a level that would short-circuit other teams worried about this new way of doing things.

After working with my boss to ensure I had their unequivocal support, we got time on my SVP’s calendar to discuss the plan. I prepared for any argument against the plan, but I also left room for input from the SVP to help them feel invested so they would help protect the project. We left the meeting with approval for our approach and moved forward quickly enough to launch the product within our six-month window.

The product was more successful and grew more quickly than we had planned. Our public cloud adoption made it far easier for us to scale as the number of our customers did. Our success also increased our visibility within the company, however. The teams invested in managing and growing our worldwide data center infrastructure now started to see us as a threat. I began to have many increasingly tense meetings with them to discuss moving into the corporate infrastructure. I could have used my product’s success to force the other team to back off, but that would have created even more enmity, setting up our teams for friction forever.

Instead of using my team’s success as a wedge to ignore the IT team’s demands, I worked with them to understand why we had to make the decision we had. I also identified what they could do to make switching to the internal infrastructure an easy decision for us and for other teams

considering following in our footsteps. I committed my team to switching to the company’s infrastructure as soon as it could support us.

Reading through that experience, you can see several political maneuvers I used to get my team the space we needed to ship our product.

  • Talking to my peers to understand what their experiences and anticipate challenges. Consulting my peers on the problem got them enlisted as allies. If I were successful, they would have a better chance of success themselves in the future. (systems thinking/having empathy)
  • Building a strategy to prevent the central IT organization from stopping my team’s plan. (being strategic)
  • Enlisting my manager meant I had an ally in my effort to convince other senior managers, someone who understood the SVP’s motivations and concerns. (having empathy)
  • Preparing my argument to the SVP not just to convince but also to engage. Making the executive not just an approver of the plan but a participant with a stake in its success. (being strategic/systems thinking/having empathy)

If I had stopped at this point or pressed my new advantage over the IT team, that would have been the type of corporate politics that people despise. I would have created a win-lose situation (and some very angry co-workers who would have justifiably felt I’d wronged them).

Instead, I took the following steps to help the group I felt I had to work around and improve the situation for everyone at the company:

  • I used the lessons we learned from AWS to help the centralized IT team understand groups like ours with less predictable or forecastable needs. (having empathy/systems thinking)
  • I committed to them that if they could support our needs (with our help), we would switch to the “official” infrastructure. (having empathy/being strategic)

Making sure we helped the IT team was more work for my team, but it was better for them and the company at large. It made the solution a win-win.

An outside observer, especially a jaded one, could look at each of my actions in a very different light. That observer would say that I schemed to isolate the IT team, skirted appropriate behavior, and cheated by going over their heads. If I hadn’t then gone back to help lift the IT team, I might agree.

One might say that the best thing to do would have been to work with the centralized IT team to convince them that they should allow and support my plan. I would agree with that sentiment if it were possible to gain the IT team’s support and ship my product on schedule. However, the experience of my peers told me otherwise. Those who have worked at large corporations with siloed functions working against different goals understand how intractable those other groups can be.

Don’t be a player of the p-word

When faced with a challenge at work, try to understand the motivations of the people you work with and the systems they operate within. From there, build a strategy to achieve your goal. If you can achieve your goal while helping others move towards theirs in the long term, you will be an innovator and a team player within the society that is your company. On the other hand, if you achieve your goal at the expense of others, you will be nothing more than a player of the unspeakable p-word.


Thanks to Laura Blackwell, Hannah Davis, and Mandy Mowers for editing help.

Addressing the challenges of partially distributed engineering teams

As companies begin planning their approach to post-pandemic life, patterns are starting to emerge.

Some companies are planning to return to their offices and carry on as they did before. Others are adopting a partially distributed structure, reopening some of their offices but not assigning desks or requiring employees to work there. Some companies have even switched to being fully distributed.

In a fully co-located team, every member is in the same office near each other, nearly every day. A fully distributed team has every member in a different physical location almost every day. Everything else is partially co-located/partially distributed. Even if you all work in the same physical office, if people are there on different days, you are only partially co-located.

The tech industry has consistently demonstrated creativity and innovation in how work is structured. Companies like Automattic, GitLab, and others have long shared lessons on how fully distributed teams can be effective; decades of business books have addressed the challenges of leadership in co-located teams, but there isn’t as much published wisdom on leading partially distributed teams beyond the suggestion of treating them as fully distributed.

I have led fully co-located teams, fully distributed teams, and partially distributed teams. I have always said that the latter is the hardest to do well.

The problems with being partially distributed

Inconsistent communication speeds

The most significant problem with being partially co-located is that you quickly fall into asymmetric communication patterns. Folks in the same physical locations have very high bandwidth conversations, but everyone else communicates at a fraction of the efficiency.

The bandwidth issue extends beyond group meetings where some are in a conference room and others are on video, and permeates spontaneous meetings, too: people talking at the coffee machine, at lunch, or bumping into each other in the hall. People are more inclined to call out to a colleague they can physically see rather than trying to reach one over chat or video.

These spontaneous conversations might inspire solutions to team challenges that don’t involve the larger group, creating friction. The team can splinter as the people who follow similar schedules end up working together more frequently. Distrust builds within the team, especially if some are in other cities or have other commitments preventing them from being present when their peers are. Parts of the group may feel isolated or left out of the decision-making process.

This inconsistent communication bandwidth can significantly impact the design of the team’s deliverables due to the effects of Conway’s Law: “Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.”

Unequal visibility of work

People naturally have a recency bias, tending to prefer recent events over historical ones. They also have more of a personal connection with those they interact with in person over those on a video call (a familiarity bias). Both biases work against those who are physically present less often. The lack of direct visibility of you or your team relative to other groups can be a significant disadvantage for you, the team, or individuals when it is time for performance reviews or recognition. Even if the team members have worked hard to achieve a goal, if they were not literally visible and other individuals were, you will have to overcome those biases to justify raises or promotions.

Inconsistent working hours

If your team is spread over many time zones, you get the classic problem of people having to wait on others when they need assistance, clarification, or to hand-off work. This problem can also happen with teams in near time zones if part of the team works from home to better incorporate their outside-of-work responsibilities or have a more flexible schedule. Unless addressed through restructuring the way work flows or introducing other constraints, work stalls in the team and people get frustrated.

Addressing the challenges

Create consistent expectations of availability

If most of the team plans to work two or three days in the office each week, have the team agree on which days that will be. So that when the group is in the office, most of the people are there.

If people need to accommodate their time zone or outside-of-work commitments, mutually agree on shared working hours. During those times, people are expected to be online and available for discussions or questions. Those hours do not need to be contiguous!

Ensure that people on the team are proactively communicating their availability so that others know when they can reach out or expect faster responses. It may be helpful to have a shared team calendar where every member puts their expected working hours for the week and adjusts that calendar if something comes up.

If possible, avoid being partially distributed

If nearly everyone on your team plans to return to the office, it might make sense for the people who plan to work off-site to switch to a mainly distributed team and vice-versa. It may seem extreme to suggest that people switch to another group, but it will make things easier for them and for their teams to be fully distributed or fully co-located.

Make sure that your team is visible

The adage, “out of sight, out of mind” speaks to a truth about human nature. Suppose your organization’s leadership is working from the office, and you or members of your team are primarily working from other locations. In that case, it is vital to make sure that the individuals’ and teams’ work is visible.

Invite your manager to your virtual team demos or ask them to stop by your team meeting or standup. Proactively tell your boss when individuals exceed your expectations. Invite your manager to set up occasional skip-level meetings with the distributed people in the group. If your boss has office hours, encourage members of your team to attend from time to time. These strategies to increase visibility are helpful even for entirely co-located organizations, but they are crucial for partially distributed teams.

Take opportunities to build empathy

Video meetings can become very transactional, especially when it seems like the workday is full of them. Without the spontaneous conversations and connections that arise from chance meetings in an office, you can start to forget that the people on your team are actual humans and not just pixels on a screen. Take time in your meetings for small talk, and don’t feel like you have to force people immediately back to the agenda if the team digresses into talking about their favorite TV shows or places they want to visit. These human details remind everyone that their co-workers are people with lives and motivations. It encourages empathy.

When the whole team can travel safely and without concern, bring them together on a regular cadence (the frequency determined by the travel budget and people’s freedom to travel). Spending time with each other will encourage much more profound empathy between the people on the team.

Move communication offline as much as possible

Co-located teams avoid many meetings by stopping by each other’s desks for a five-minute chat. When you can’t see the person, a very natural thing to do is to book a 30-minute meeting for that conversation instead and invite several others who might have input while you are at it, thus guaranteeing that it will use all 30 minutes (and may run over).

Fully distributed teams have long favored written communication as the primary tool to document decision-making and be more inclusive of colleagues spread across time zones. This model was also pioneered and perfected by large open source projects where wide geographic distribution is the norm.

Enforcing an offline-first communication and decision-making process within your team helps ameliorate the challenges of non-overlapping work schedules and brings increased symmetry to communication speeds.

Take a remote-first approach to team meetings

As I mentioned above, a common suggestion for partially co-located teams is to have everyone dial into group meetings even if they are sitting near each other; treating all as if they are distributed helps put everyone on an equal footing in the discussion. Still, it is not always possible if there are no adequate facilities in the office to support that. Alternatives would be to have whoever leads the meeting always dial in from another location or have the bulk of team meetings on days the group has decided to work from home.

We’re not done changing; this is an opportunity!

Just over a year ago, most companies were comfortable with their ways of working or were evolving them slowly; but the pandemic forced all companies to adjust to new ways of operating immediately. Finding a “new normal” will be a much more gradual process, with much uncertainty.

As an engineering leader, there is a massive opportunity to find new ways for your team to be effective in this new world of work. You and the group should approach the challenge with flexibility and a willingness to experiment.

Share the ideas you learn with your peers and the larger organization. If companies struggle to make this new flexibility in work successful, they will quickly move back to their old ways of doing things.

You can help your company be a leader in providing flexibility and freedom to its employees while also being effective at delivering value to your customers. When you figure things out, please share them with the rest of us; we’re on this “new normal” journey together.

[Originally posted at https://leaddev.com/managing-distributed-teams/addressing-challenges-partially-distributed-engineering-teams]

Own your calendar

Every six months, I take a day to review and reflect on how things have been going and the changes that I want to make moving forward. This day is my personal strategy offsite.

As part of the process, I think about the things I want to do more of and the things I want to do less of, and how much time I should allocate each week towards my professional goals. I then create a sample of what a perfect day would look like and a mockup of what an ideal week would look like apportioning my time in alignment with my goals.

With my review and planning done, I go to my work calendar and clean it up to make it look like my ideal week. I delete or stop attending meetings that are not useful. I block out time for focused work on my goals. Then, to give some flexibility for the things that arise, I make sure that I leave some gaps or mark some of my project-work time as “free,” allowing others to schedule it if needed.

Each week has unique challenges: unforeseen work appears, a critical customer meeting dominates, or a work emergency takes over my calendar.

At the end of the week, I look back at the calendar and figure out how much of my time spent maps to my planned time allocation.

Often, I find that new things are creeping in if I am not attentive. As my time starts to diverge from my ideal allocation, I must decide if I change my plan based on my new reality (and possibly adjust my goals) or if I re-assert my plan and delegate or drop the new constraints on my time.

I track each week’s time allocations in a spreadsheet. It helps me understand where I am spending my time over the year. In addition, it makes it very clear if I am spending too much time on low-value work. The spreadsheet also shows if I am unrealistic about how I allocate my time in a week which is helpful for when the next six-month planning comes.

This process may seem very rigid, and in many ways, it is. However, I’ve come to it over the years through iteration after finding myself feeling very busy but not making meaningful progress towards my personal or professional goals.

As we grow in our roles, new opportunities and responsibilities appear. Our peers, team, and others want our input and time. This activity gives us the impression that we are doing necessary, valuable work. At the end of the week, though, we may look at our full calendars and wonder what we accomplished. If this situation feels familiar to you, it may be worth adding some rigor to understand how you want to spend your time and how you actually spend your time.

When, why, and how to stop coding as your day job

By letting go of writing code, you open yourself up to excelling as a manager.

I am a computer programmer.

I was one of those people who started coding at a young age – in my case, on a TRS-80 Model 1 in my school’s library. I loved the feeling of teaching the computer to do something and then getting to enjoy the results of interacting with what I built. Since I didn’t own a computer, I would fill spiral-bound notebooks with programs that I would write at home. As soon as I could get time on the computer, I would type it in line-by-line. When I learned that I could write software as a job, I couldn’t imagine anything else that I would want to do.

After university, I got my dream job writing 3D graphics code. I was a software engineer! I defined a successful day by the amount of code I wrote, the compiler issues I resolved, and the bugs I closed. There were obvious, objective metrics that I could use to measure my work. Those metrics and my job defined me.

Today, I am a Chief Technology Officer, leading software development organizations. If I am writing code on the product, it is probably a bad thing. I now have to define my success by much fuzzier metrics: building good teams, hiring and training good people, setting multi-year technical strategy and vision for the company, collaborating with other departments, and setting and managing a budget. I may have a good day or a bad day, but I have to measure my success based on quarters or years.

My achievements are now always tied to the successes of others. Getting to this point wasn’t easy, but I wouldn’t have it any other way. It was a journey that took years, and the first challenge was understanding that coding was no longer my job.

Why is it hard to stop coding as our day-to-day work?

When I speak to engineering leads or managers working to grow into more senior engineering leadership levels, the question of ‘How much do you code?’ is very often raised. We usually have a hard time imagining that we can still be useful if we don’t code for a significant part of our time. Why is that?

We’ve been traditionally bad at hiring managers in the software engineering industry

Usually, companies choose development leads because they are the best, technically, on the team. I would guess that the reasoning behind this is that it’s assumed that the best developers are the right people to supervise their peers. This practice creates the impression that managing others is a promotion for a skilled developer when, in actuality, it is a career change away from what made them successful in the first place.

The worst managers I’ve had were very talented developers who hated having to spend time doing the boring stuff that wasn’t coding. They resented the time spent away from the keyboard and weren’t always good at hiding that fact.

Many companies now feature dual career tracks for technologists, giving them a choice to advance as an individual contributor or move into management. This choice of career is an excellent thing. It means that if you want to spend your days coding, you can do that without sacrificing your career. It also means that if you desire to find joy in leading teams and growing others’ development and skills, you can do that.

We fear becoming ‘non-technical’

We joined the technology industry to be close to technology. We fear that by moving away from coding, we will morph into the classic ‘pointy-haired boss’ – ridiculed by the people on our team and unable to understand what the developers are discussing. I won’t say this can’t happen, but it won’t happen on its own. It will only happen if you choose to avoid technology once you move into the management role.

As you take on broader leadership responsibilities, you will need to learn and understand new technologies. Moving beyond the specifics of your expertise is necessary for you to move up in management. I have managed developers coding in at least a dozen languages on the backend, frontend, mobile, operating systems, and native applications. I have also managed testers, data scientists, data engineers, DevOps, Security, designers, data analysts, program managers, product managers, corporate IT teams, and some other roles I don’t even remember anymore. It isn’t possible to be an expert in all those fields. I need to take the lessons from my time as a developer and use them to inform my understanding, help me learn new areas, and give me empathy for the people who work for me.

It isn’t that you will become non-technical. It is that you will become less narrowly technical.

As a new manager, we are often expected to continue coding

It is common to move from being a developer on a team to managing that team. As the new manager, this means you are still responsible for part of the codebase. Unless you immediately start leading a large group, your new role still requires that you spend a significant portion of your time coding. This expectation makes the transition to the new role more comfortable – but it can also be an anchor that holds you back from embracing your new role as your management responsibilities grow.

We still see ourselves as a resource that can ‘save’ a deliverable

As a manager, you are accountable for the results of your team. If the group is struggling to make a deadline, it might be tempting to jump into the weeds to try and help the team finish the project on time. While this is sometimes the right decision, it can also make the problems worse because the team loses the person who looks at the more significant issues and coordinates with other teams to get more help or prepare them for the delay.

Why do we need to stop coding eventually?

We don’t need to stop coding, ever. However, once you move into engineering leadership, it will need to become a smaller and smaller part of your job if you are working to lead larger teams or broaden your responsibilities scope.

I had led teams before I was a manager at Adobe, and I had always spent a significant part of my work week contributing code as part of the groups I was in. At Adobe, though, my team had grown to be fourteen people, with another four dotted-lined to me.

I had been the primary developer for a part of the project, and I took pride that I was still contributing important features to every release. However, my management responsibilities were starting to fill my work weeks. Between 1:1s, sync meetings with other teams, and other manager work, my feature development time was increasingly moving into my evenings and weekends. My features were often the last to be merged and usually late.

The company had two mandatory shut-down weeks. To work during this time, you needed the prior approval of a Vice President. The team was preparing for a release, and my features were still in the to-do column; I met with my VP to get his permission to work over the shut-down week. He asked me, “Who is the worst developer on your team?” I hemmed and hawed – I didn’t want to call out anyone on my team, and I hadn’t even really considered the question. Seeing my uncertainty, he answered for me. “You are! You’re always late with your features. The rest of the team is always waiting on you. If you were a developer instead of the manager, you would be on a performance improvement plan.” He was right. My insistence on coding was hurting the team, not helping it.

Taking on the lead role doesn’t mean you should stop coding immediately, but it does mean that your coding responsibilities should now be secondary to your leadership ones. There are other developers on your team, but there aren’t other leads. If you aren’t doing your lead job, no one else will. Similarly, your professional development’s primary focus should now be on your leadership skills, not your coding skills. You are moving into a new career, and if you don’t work to get better at it, you will find yourself stuck.

As your leadership responsibilities increase, you should transition your development responsibilities to other people on the team. This transition is good practice because delegation is an essential part of leadership.

How do you stay ‘technical’ when coding isn’t your job anymore?

As I said earlier, staying technical is a choice that you need to make. Hopefully, one of the primary reasons you chose to make a career in the technology industry was that you were interested in it, so this shouldn’t be a problem.

As I also said earlier, as you develop as a technology leader, your focus broadens as your scope widens. 

The best way that I have found to remain a credible technologist for my teams is to be interested in them and their work. To do this, talk to the people on your team and take a genuine interest in the things they are working on. If a technology comes up in a meeting or 1:1 that you don’t know, add it to a list of things to research later. Then, dedicate time in your week to go through that list and learn about the technologies well enough to have your own opinions about them. This practice allows you to have further discussions with whoever mentioned the technology to you.

If you get interested in what you learn about the new technology, you may want to keep trying to understand it better; you may read more or embark on a personal project using it to gain more practical knowledge. As I said, it isn’t that you have to stop coding, it is that, eventually, it shouldn’t be your day job anymore.

By taking an interest in the technologies your team uses in their work, you deepen your empathy for them and expand your own knowledge. You’ll be able to discuss the work, ask reasonable questions, and make connections to other things happening in the organization and your own experience. This way, the people on your team know that while you may not be able to step in for them, you understand their work and care about it.

Success is defined differently when you lead people

The feeling of accomplishment that comes from completing a cool user story, deploying a new service, or fixing a difficult bug is significant. It is a dopamine hit, and just like other dopamine-inducing behaviors, it can be hard to stop.

Having a great 1:1 or leading a productive team meeting can also feel good but in a more esoteric way. As a team leader, you need to learn to perceive the success of making others successful. Success takes longer, but the feeling is more profound and more rewarding.

Having a release resonate with your customers, being able to easily justify the promotion of a developer that you have mentored, and having someone accept a job offer for your team, are all fantastic feelings. In the day-to-day, watching stories get completed, helping resolve the issues when they aren’t, and seeing people get excited about the direction you’re setting for the team can leave you feeling satisfied at the end of the day.

Being a technical leader doesn’t mean writing code every day

As you grow in your new leadership career, you will need to devote your time to mentoring, developing, and leading your team. As you spend less time in your code editor, you will find new challenges in strategy, clearing roadblocks, fixing broken processes, and new tools like HR information systems, slides, and spreadsheets (it isn’t as bad as it sounds). You will spend less time learning all the intricacies of a specific language or toolchain and instead learn about how systems interact, understand when to build vs. buy, and learn about entirely new areas of technology. And you can still code, but make sure that you aren’t the developer holding your team back.

[This was originally posted at https://leaddev.com/skills-new-managers/when-why-and-how-stop-coding-your-day-job]

Twenty Questions for your 1:1s

You sit down for a chat with someone on your team. You get through the pleasantries, the small-talk, the status, and you run out of things to say. It happens to even the most curious and high-EQ people. It happens to me, and I have probably had 10,000 one-on-one meetings in my career.

When you hit that moment where neither of you has anything to say, it is very tempting to just say, “Ok then! Let’s talk next week.” giving you both time back into your day. If this happens occasionally, it is not a severe problem. However, if you find it happening more often than you would like, it is good to keep some prompts handy to move into deeper topics that might prompt a valuable conversation.

Here are twenty questions that you might find helpful if you get stuck at the surface level or run out of things to discuss in your one-on-ones:

  1. What is more challenging in your day-to-day work than it should be?
  2. What is the most fulfilling thing that has happened this week?
  3. Who on the team has been really impressing you lately?
  4. What are you looking forward to in the next six months? Why?
  5. What haven’t you told me that I probably should know?
  6. What is one thing that you miss from your last job/team?
  7. Who on the team are you most worried about?
  8. If there is one thing I could change about your role today, what would it be?
  9. Who on another team do you most enjoy working with?
  10. What do you wonder about?
  11. How would you change the company’s strategy?
  12. What is the product or feature that we should build next?
  13. When do you feel the most satisfied at work?
  14. What is your least favorite part of your job?
  15. What percentage of your work time do you think you are in a flow state?
  16. What is the one meeting that you would add to your calendar?
  17. What is the one meeting that you would remove from your calendar?
  18. What do you wish that I did differently?
  19. What should I keep doing?
  20. When am I the most helpful?

If you don’t like any of these questions, you can create your own list in a few minutes. You are looking for a question that gets the other person to think a bit, share a bit more with you, and hopefully give you an avenue for deeper discussion.

Have other questions that you like to use? Please add them in the comments!

Building a vernacular with your engineering team

Teams consist of people. People communicate via a common language. The base unit of most languages is words.

The impact of language

Whether written or spoken, words are essential – both the general terms we use and those specific to our work.

The terms and phrases that are specific to our jobs or our companies create a vernacular. The definition of vernacular is ‘the mode of expression of a group or class.’ Our vernacular separates software developers from lawyers, Amazon employees from Microsoft employees, and your team from the other teams in your company.

The words and phrases that we use in team discussions give us a shorthand. They save time. Instead of saying, ‘Ok, deploy this to production, let the support team know that it is going live and then let marketing know once it has gone to 50% of active users.’ Your team may just say, ‘let’s deploy-ify it.’ The larger context is defined and understood in the vernacular of your team.

One of the challenges of joining a new company or a new team is learning the vernacular. One of the significant struggles of team forming or cross-team communication is different definitions for the same words.

Consider the word ‘Agile’. To you, it may mean ‘Scrum’ because your only experience working in Agile teams was working with the Scrum framework. For me, it may mean Kanban or a set of principles not tied to any specific framework. If we are on the same team and I say that I think we should work in an Agile way, we could have very different interpretations of what that means, which may inadvertently create tension in the team.

‘Done’ is another word that often leads to problems – both for a development lead and between teams. A developer on your team says that their feature is ‘done’. Do they mean that they finished the code? That they tested the code? That they deployed it to the staging environment? That the code is running in production? That the A/B tests for the code have completed?

Having clarity on the meaning of words is critical. Companies will often create glossaries of the terms and phrases in everyday use to help onboard new employees. You should do the same for your team for the words and phrases your team uses day-to-day. As a leader, you should also deliberately cultivate your team vernacular.

Creating a team vernacular

Creating a team vernacular is a simple way to drive team unity, identity, and alignment around best practices.

A simple way to start building a team vernacular is to use a group meeting to identify and define the words and phrases used in the team. You can get the discussion started by spending a few weeks taking note of words or phrases that come up often in team discussions. Terms such as done, tested, shipped, agile, stuck, autonomy, microservice, or waiting, may have different definitions from different people on a team.

Ask the team what they think each of these words means. If there is a general agreement, add it to your team glossary. Don’t stress over creating a perfect definition for each word. You can reference a dictionary definition or definitive blog post if you want, but your goal is team consensus around the meaning, nothing more.

Once the team establishes the primary vernacular, update it as necessary. Clarify the definitions of the new terms introduced. If someone uses a word in a new way, ask, ‘What does that word mean to you?’ If you don’t recognize a term that others are using, ask the team for the definition. Add these new words and phrases to your glossary. Over time, the meanings of words will change as they understand new subtleties or gain new skills. When this happens, append or replace the existing definitions.

Using your vernacular to train the team

Creating consistency in the words you use, or introducing new words, is also a valuable way to train your team or introduce new concepts.

You may find that there is debate within the team about the constraints for a system to be called a ‘microservice’. This debate is an opportunity to find blog posts or a book for the team to read together and discuss to create the definition for the team glossary.

If you want to understand secure coding practices better, you could watch a conference talk as a team and then discuss what words and techniques you could introduce into your vernacular.

As you build your glossary, include the phrase and what it means to your team and the references your team used to arrive at that meaning. Your dictionary can then become an onboarding tool, a training tool, and a reference to share with other groups.

Vernaculars happen

Groups of friends, co-workers, teams, and families all create unique vernaculars over time. The in-jokes you have with friends, the shorthand you have with your partner, the CEO’s catchphrase. Be aware of this, be deliberate about it within your team, and use this naturally occurring phenomenon to your advantage!

[posted originally at https://leaddev.com/communication-relationships/building-vernacular-your-engineering-team]

Fail Safe, Fail Smart, Succeed! Part Five: Putting it into Practice

Fail Safe, Fail Smart, Succeed!

Putting this into practice at Avvo

If you think you would like to use these ideas at your company, but you are unsure where to start, I can describe what we did at Avvo. I joined when the company was already nine years old. It had a mostly monolithic architecture running in a single data center with minimal redundancy.

There were some things that we did quickly to move to a more fail-safe world.

Moving from planning around objectives to planning around priorities

First, we worked to build a supportive culture that could handle the inevitable failures better. We moved from planning around specific deliverable commitments to organizing our work around priorities.

Suppose specific achievements, my output, measure my performance. This way of measuring performance often creates problems.

Suppose I need to coordinate with another person, and their commitments do not align to mine. That situation will create tension. If the company’s needs change, but my obligations do not, there is little incentive to reorient my work. To achieve my commitments, I can be thwarted by dependencies or hamper the priorities of the company.

People in leadership like quarterly goals or Managing By Objectives because they create strict accountability. If I commit to doing something and it is not complete when I say it will be, I have failed.

Suppose you think instead about aligning around priorities. In that case, those priorities may change from time to time. Still, if everyone is working against the same set of priorities, you can be sure that they are broadly doing the right things for the company. Aligning to priorities sets an expectation of outcome, not output.

Talk about failure with an eye to future improvement instead of blame

The senior leadership team must be aligned with these approaches. The rest of the organization may not be initially. When leaders talk about failure, they must do it with a learning message rather than blame or punishment. People should know that the expectation is that they may fail. If they are avoiding failure, then they probably aren’t thinking big enough. It is a message that “we want to see you fail, small, and we want to make sure we learn from that failure.”

I created our slack channel to share the lessons from our failures. I sent a message to my organization, making it clear that I don’t expect perfection. I shared my vision that we become a learning organization in town halls and one-on-ones.

Fail-safe architecture

Monoliths are natural when building a new company or when you have a small team. Monoliths are simple to make and more straightforward to deploy when you don’t have multiple teams building together. As the codebase and organization grow, microservices become a better model.

It is critical to recognize the point where a monolith is becoming a challenge instead of an enabler. Microservices require a lot more infrastructure to support them. The effort to transition from one architecture to another is significant, so it is best to prepare before the need becomes urgent.

Avvo had already started moving to a microservices architecture, but lack of investment stalled the transition. I increased investment in the infrastructure team. The team built tools that simplified the effort of creating, testing, monitoring, and deploying services. We then made rapid progress.

We also redesigned our organization to leverage the reverse Conway Maneuver, further accelerating the new architecture.

You can build a fail-safe / fail-smart team

In every company, I use the lessons that I have shared in this article to build a culture where teams can innovate and learn from their users. It manifests differently with each group, but every team that has adopted these ideas has improved both business outcomes and employee satisfaction. Work with your peers to adopt some of these ideas. Start small and grow. The process of adopting these concepts mirrors the product development process you are working to build.

If you decide that it isn’t a good fit for your company, you will have failed smart by failing small.

I will leave you with a final thought from Henry Ford.

Fail Safe, Fail Smart, Succeed!

Fail Safe, Fail Smart, Succeed! Part Four: My Biggest Failure

Fail Safe, Fail Smart, Succeed!

My Biggest Failure

If you are a long-time Spotify user, you probably won’t recognize the interface shown in the photo below. In May of 2015, though, Spotify was very interested in telling the whole world about it. It was a new set of features in the product called “Spotify Now.”

I lead the engineering effort at Spotify on the Spotify Now set of features. It was the most extensive concerted effort that Spotify had done at the time, involving hundreds of employees across the world.

Spotify Now was a set of features built around bringing the right music for you at any moment in time. The perfect, personalized music for every user for every moment of the day. This effort included adding video, podcasts, the Running feature, a massive collection of new editorial and machine learning generated playlists, and a brand new, simplified user interface for accessing music. It was audacious for a reason. We knew that Apple would launch its Apple Music streaming product soon. We wanted to make a public statement that we were the most innovative platform. Our goal was to take the wind out of Apple’s sails (and sales!)

Given that this was Spotify and many of the things I’ve shared come from Spotify, we understood how to fail smart.

As we launched the project, I reviewed the project retrospective repository. I wanted to see what had and had not worked in large projects before. I was now prepared to make all new mistakes instead of repeating ones from the past.

We had a tight timeline, but some of the features were already in development. I felt confident. However, as we moved forward and the new features started to take shape in the product’s employee releases, there was a growing concern. We worried the new features weren’t going to be as compelling as the vision we had for them. We knew that we, as employees, were not the target users for the features. We were not representative of our users. To truly understand how the functionality would perform, we wanted to follow our product development methods and get the features in front of users to validate our hypotheses.

Publicly releasing the features to a narrow audience was a challenge at that time. The press, also aware of Apple’s impending launch, was watching every Spotify release exceptionally closely. They knew that we tested features, and they were looking for hints of what we would do to counter Apple.

Our marketing team wanted a big launch. This release was a statement, so we wanted a massive spike in Spotify’s coverage extolling our innovation. The press response would be muted if our features leaked in advance of the event.

There was pressure from marketing not to test the features and pressure from product engineering to follow our standard processes. Eventually, we found a compromise. We released early versions of the Spotify Now features to a relatively small cohort of New Zealand users. Satisfied that we were now testing these features in the market, we went back to building Spotify Now and preparing for the launch while waiting for the test results to come back.

After a few weeks, we got fantastic news. For our cohort, retention was 6% higher than the rest of our customer base.

For a subscription-based product like Spotify, customer retention is the most critical metric. It determines the Lifetime Value of the customer. The longer you stay using a subscription product, the more money the company will make from you.

With a company of the scale of Spotify, it was tough to move a core metric like retention significantly. A whole point move was rare and something to celebrate. With Spotify Now, we had a 6% increase! It was massive.

Now, all of our doubt was gone. We knew we were working on something exceptional. We’d validated it in the market! With real people!

On the launch day, Daniel Ek, Spotify’s CEO and founder, Gustav Söderstrom, the Chief Product Officer, and Rochelle King, the head of Spotify’s design organization, shared a stage in New York with famous musicians and television personalities. They walked through everything we had built. It was a lovely event. I shared a stage in the company’s headquarters in Stockholm with Shiva Rajaraman and Dan Sormaz, my product and design peers. We watched the event with our team, celebrating.

As soon as the event concluded, we started the rollout of the new features by releasing them to 1% of our customers in our four largest markets. We’d begun our Ship It phase! We drank champagne and ate prinsesstÃ¥rta.

I couldn’t wait to see how the features were doing in the market. After so much work, I wanted to start the progressive roll out to 100%. Daily, I would stop by the desk of the data scientist who was watching the numbers. For the first couple of days, he would send me away with a comment of “it is too early still. We’re not even close to statistical significance.” Then one day, instead, he said, “It is still too early to be sure, but we’re starting to see the trend take shape, and it doesn’t look like it will be as high as we’d hoped.” Every day after, his expression became dourer. Finally, it was official. Instead of the 6% increase we’d seen in testing, the new features produced a 1% decrease in retention. It was a seven-point difference between what we had tested and what we had launched.

Not only were our new features not enticing customers to stay longer on our platform, but we were driving them away! To say that this was a problem was an understatement. It was a colossal failure.

Now we had a big quandary. We had failed big instead of small. We had released several things together, so it was challenging to narrow down the problem. Additionally, we’d just had a major press event where we talked about all these features. There was coverage all over the internet. The world was now waiting for what we had promised, but we would lose customers if we rolled them out further.

Those results began one of the most challenging summers of our lives. We had to narrow down what was killing our retention in these new features. We started generating hypotheses and running tests within our cohort to find what had gone wrong.

The challenge was that the cohort was too small to run tests quickly (and it was shrinking every day as we lost customers). Eventually, we had to do the math to figure out how much money the company would lose if we expanded the cohort so our tests would run faster. The cost was determined to be justified, and so we grew the cohort to 5% of users in our top four markets.

Gradually, we figured out what in Spotify Now was causing users to quit the product. We removed those features and were able to roll out to the rest of the world with a more modest retention gain.

In the many retrospectives that followed to understand what mistakes we’d made (and what we had done correctly), we found failures in our perceptions of our customers, failures in our teams, and other areas.

It turns out that one of our biggest problems was a process failure. We had a bug in our A/B testing framework. That bug meant that we had accidentally rolled out our test to a cohort participating in a very different trial. A trial to establish a floor on what having no advertising in the free product would do for retention.

To Spotify’s immense credit, rather than punish me, my peers, and the team, instead, we were rewarded for how we handled the failure. The lessons we learned from the mistakes of Spotify Now were immensely beneficial to the company. Those lessons produced some of the company’s triumphs in the years that have followed, including Spotify’s most popular curated playlists, Discover Weekly, Release Radar, Daily Mixes, and podcasts.

Part Five: Putting it into Practice

Fail Safe, Fail Smart, Succeed! Part Three: Making Failure Safer

Fail Safe, Fail Smart, Succeed!

Making Failure Safer

How do we reduce the fuel-air bomb failure into an internal combustion failure? How can we fail safely?

Minimizing the cost of failure

If you fail quickly, you are reducing the cost in time, equipment, and expenses. At Spotify, we had a framework, rooted in Lean Startup, that we used to reduce the cost of our failures. We named the framework “Think it, Build it, Ship it, Tweak it.

This graph shows investment into a feature over time through the different phases of the framework. Investment here signifies people’s time, material costs, equipment, opportunity cost, whichever.

Think It

Imagine this scenario: you are coming back from lunch with some people you work with, and you have an idea for a new feature. You discuss it with your product owner, and they like the idea. You decide to explore if it would be a useful feature for the product. You have now entered the “Think It” phase. During this phase, you may work with the Product Owner and potentially a designer. This phase represents a part-time effort by a small subset of the team–a small investment.

You might create some paper prototypes to test out the idea with the team and with customers. You may develop some lightweight code prototypes. You may even ship a very early version of the feature to some users. The goal is to test as quickly and cheaply as possible and gather some real data on the feature’s viability.

You build a hypothesis on how the feature can positively impact the product, tied to real product metrics. This hypothesis is what you will validate against at each stage of the framework.

If the early data shows that the feature isn’t needed or wanted by customers, your hypothesis is incorrect. You have two choices. You may iterate and try a different permutation of the concept, staying in the Think It phase and keeping the investment low. You may decide that it wasn’t as good an idea as you hoped and end the effort before investing further.

If you decide to end during the Think It phase, congratulations! You’ve saved the company time and money building something that wasn’t necessary. Collect the lessons in a retrospective and share them so that everyone else can learn.

Build It

The initial tests look promising. The hypothesis isn’t validated, but the indicators warrant further investment. You have some direction from your tests for the first version of the feature.

Now is the time to build the feature for real. The investment increases substantially as the rest of the team gets involved.

How can you reduce the cost of failure in the Build It phase? You don’t build the fully realized conception of the feature. You develop the smallest version that will validate your initial hypothesis, the MVP. Your goal is validation with the broader customer set.

The Build It phase is where many companies I speak to get stuck. If you have the complete product vision in your head, finding the minimal representation seems like a weak concept. Folks in love with their ideas have a hard time finding the core element that validates the whole. Suppose the initial data that comes back for the MVP puts the hypothesis into question. In that case, it is easier to question the validity of the MVP than to examine the hypothesis’s validity. This issue of MVP is usually the most significant source of contention in the process.

It takes practice to figure out how to formulate a good MVP, but the effort is worth it. Imagine if the Clippy team had been able to ship an MVP. Better early feedback could have saved many person-years and millions of dollars. In my career, I have spent years (literally) building a product without shipping it. Our team’s leadership shifted product directions several times without ever validating or invalidating any of their hypotheses in the market. We learned nothing about the product opportunity, but the development team learned a lot about refactoring and building modular code.

Even during the Build It phase, there are opportunities to test the hypothesis: early internal releases, beta tests, user tests, and limited A/B tests can all be used to provide direction and information.

Ship It

Your MVP is ready to release to your customers! The validation with the limited release pools and the user testing shows that your hypothesis may be valid–time to ship.

In many, if not most, companies shipping a software release is still a binary thing. No users have it, and now all users have it. This approach robs you of an opportunity to fail cheaply! Your testing in Think It and Build It may have shown validation for your hypothesis. It may have also provided incorrect information, or you may have misinterpreted it. On the technical side, whatever you have done to this point will not have validated that your software performs correctly at scale.

Instead of shipping instantly to one hundred percent of your users, do a progressive rollout. At Spotify, we had the benefit of a fairly massive scale. This scale allowed us to ship to 1%, 5%, 10%, 25%, 50%, and then 99% of our users (we usually held back 1% of our users as a control group for some time). We could do this rollout relatively quickly while maintaining statistical significance due to our size.

If you have a smaller user base, you can still do this with fewer steps and get much of the value.

At each stage of the rollout, we’d use the product analytics to see if we were validating our assumptions. Remember that we always tied the hypothesis back to product metrics. We’d also watch our systems to make sure that they were handling the load appropriately and didn’t have any other technical issues or bugs arising.

If the analytics showed that we weren’t improving the product, we had two decisions again. Should we iterate and try different permutations of the idea, or should we stop and remove the feature?

Usually, if we reached this point, we would iterate, keeping to the same percentage of users. If this feature MVP wasn’t adding to the product, it took away from it, so rolling out further would be a bad idea. This rollout process was another way to reduce the cost of failure. It reduced the percentage of users seeing a change that may negatively affect product metrics. Sometimes, iterating and testing with a subset of users would give us the necessary direction to move forward with a better version of the MVP. Occasionally, we would realize that the hypothesis was invalid. We would then remove the feature (which is just as hard to do as you imagine, but it was more comfortable with the data validating the decision).

If we removed the feature during the Ship It phase, we would have wasted time and money. We still would have wasted a lot less than if we’d released a lousy feature to our entire customer base.

Tweak It

The shaded area under this graph shows the investment to get a feature to customers. You earn nothing against the investment until the feature’s release to all your customers. Until that point, you are just spending. The Think It/Ship It/Build It/Tweak It framework aims to reduce that shaded area; to reduce the amount of investment before you start seeing a return.

You have now released the MVP for the feature to all your customers. The product metrics validate the hypothesis that it is improving the product. You are now ready for the next and final phase, Tweak It.

The MVP does not realize the full product vision, and the metrics may be positive but not to the level of your hypothesis. There is a lot more opportunity here!

The result of the Ship It phase represents a new baseline for the product and the feature. The real-world usage data, customer support, reviews, forums, and user research can now inform your next steps.

The Tweak It phase represents a series of smaller Think It/Build It/Ship It/Tweak It efforts. From now, your team iteratively improves the shipped version of the feature and establishes new, better baselines. These efforts will involve less and less of the team over time, and the investment will decrease correspondingly.

When iterating, occasionally, you reach a local maximum. Your tweaks will result in smaller and smaller improvements to the product. Once again, you have two choices: move on to the next feature or look for another substantial opportunity with the current feature.

The difficulty is recognizing that there may be a much bigger opportunity nearby. When you reach this decision point, it can be beneficial to try a big experiment. You may also choose to take a step back and look for an opportunity that might be orthogonal to the original vision but could provide a significant improvement.

You notice in the graph that the investment never reaches zero. This gap reveals the secret, hidden, fifth step of the framework.

Maintain It

Even if there is no active development on a feature, it doesn’t mean that there isn’t any investment into it. The feature still takes up space in the product. It consumes valuable real estate in the UI. Its code makes adding other features harder. Library or system updates break it. Users find bugs. Writers have to maintain documentation about the functionality.

The investment cost means that it is critical not to add features to a product that do not demonstrably improve it. There is no such thing as a zero-cost feature. Suppose new functionality adds nothing to the product in terms of incremental value to users. In that case, the company must invest in maintaining it. Features that bring slight improvements to core metrics may not be worth preserving, given the additional complexity they add.

Expect failure all the time

When you talk about failure in the context of software development from the year 2000 to now, there is a substantial difference. Back then, you worked hard to write robust software, but the hardware was expected to be reasonably reliable. When there was a hardware failure, the software’s fault tolerance was of incidental importance. You didn’t want to cause errors yourself, but if the platform was unstable, there wasn’t much you were expected to do about it.

Today we live in a world with public clouds and mobile platforms where the environment is entirely beyond our control. AWS taught us a lot about how to handle failure in systems. This blog post from Netflix about their move to AWS was pivotal to the industry’s adapting to the new world.

Netflix’s approach to system design has been so beneficial to the industry. We assume that everything can be on fire all the time. You could write perfect software, and the scheduler is going to come and kill it on mobile. AWS will kill your process, and your service will be moved from one pod to another with no warning. We now write our software expecting failure to happen at any time.

We’ve learned that writing big systems makes handling failure complicated, so micro-service architectures have become more prevalent. Why? Because they are significantly more fault-tolerant, and when they fail, they fail small. Products like Amazon, Netflix, or Spotify all have large numbers of services running. A customer doesn’t notice if one or more instances of the services fail. When a service fails in those environments, the service is responsible for a small part of the experience; the other systems assume that it can fail. There are things like caching to compensate for a system disappearing.

Netflix has its famous chaos monkey testing, which randomly kills services or even entire availability zones. These tests make sure that their systems fail well.

Having an architecture composed of smaller services that are assumed to fail means that there is near zero user impact when there is a problem. Failing well is critical for these services and their user experience.

Smaller services also make it possible to use progressive rollout, feature flags, dark loading, blue-green deploys, and canary instances, making it easier to build in a fail-safe way.

Part Four: My Biggest Failure