Becoming a CTO

A former co-worker reached out to me recently. They are a director of engineering at a midsize startup and just got their first headhunter inquiry for a CTO role. Having never been in the role before, they wanted to know what the position was like and how to prepare for the interviews.

I realized that while there are some books on technology leadership careers, there aren’t many resources explaining the most senior levels. My goal is to provide some insight and advice for those interested in someday becoming a CTO.

I’ve been a CTO for five and a half years

I’ve worked at a hundred-thousand-person company, seed-stage startups, and many of the variants in-between. I started as a developer and followed a traditional path of moving up to more senior levels on the development track and then moving to lead, engineering manager, director, VP, and now chief technology officer. I’ve been the CTO at three different companies in two countries and three parts of the technology industry. I’m part of a few networks where I meet and talk with CTOs of all sizes and stages of companies.

I’ve learned that one reason there isn’t a good reference for the role of the CTO is that the size of the company and the expectations of the CEO define the job. Some of my role expectations and responsibilities are like those of many of my peers at similar-size companies. However, there are also significant differences in our expectations from our executive peers and boards.

Because of the variability of the role, I will broadly share my direct experiences, joined with an understanding of the expectations of other CTOs that I know.

The early-stage company CTO is often the developer-in-chief

At earlier stage companies, the CTO is often the technical co-founder. They are likely the developer who built many of the earlier versions of the software and helped hire the original development team. Their responsibilities are primarily technical: driving architecture, doing advanced development tasks, and creating technical vision.

Frequently, the first CTO of the company is hired for their ability to code and not their ability to grow or manage a team. Depending on the person, they may also lead the development team. Still, often the team’s management will eventually move to another person, an experienced manager, who may report to the CTO or be a peer to them.

The early-stage CTO is the leading technical voice for the company externally, especially if they are a co-founder. They talk to investors and potential partners and meet with potential vendors. If they also manage the development team, they will solely represent engineering in the senior leadership team. As a result, they will have responsibility for the decisions made by the engineering team. Nevertheless, if they do not manage the team directly, they might not be involved in the decisions around the day-to-day operations.

A mistake that inexperienced founding CTOs often make is that they don’t understand their role beyond coder-in-chief. They focus solely on the technology and are not active participants in the company’s leadership. As a result, they do not work cross-functionally. CTOs fixated on the how without the why or what will not be in the role very long once the company grows.

If they have no experience leading an engineering team or organization, the early-stage CTO will be challenged to grow with the company. If they cannot scale, eventually they will end up in a subordinate role reporting to a more experienced CTO hired to replace them.

The midsize company CTO is responsible for leading the organization, corporate strategy, and making technical decisions

Once a company reaches a size at which it needs new processes and structures, the scrappy leaders who helped get the company off the ground are often replaced with more experienced leaders knowledgeable in taking companies through the next growth stage. If the CTO hasn’t grown into the larger role, they will be part of that replaced group.

The midsize company CTO is a full-fledged executive team member working cross-functionally and meeting with partners, investors, and customers. Frequently, the midsize company CTO will also manage the engineering organization. The CTO is responsible for setting technical direction, making sure good architectural decisions are being made, and establishing best practices and working methods. They are still expected to have good technical depth, but don’t often actively contribute to shipping code. A red flag for me personally is seeing a CTO role description where the expectation is to lead a 50-plus-person organization while also actively coding on the product. It means the executive team does not have appropriate expectations for the role.

A midsize company CTO spends significant time establishing culture and practices for the teams they are responsible for; they are also very directly accountable for the organization’s decisions and its track record of delivery. The CTO meets internally with members of the other functions, such as sales, marketing, HR, and finance, to share direction for the organization and get feedback. The CTO is responsible for the administration of the teams, including the budget.

The CTO is also responsible for hiring, performance management, and team structure and may be very active in their teams’ recruitment and interview processes, especially in a scale-up type of company.

A CTO leading a more extensive development organization must be a generalist, understanding different roles and responsibilities. Their remit may include Corporate IT and Technical Support. In some companies, they may also manage the business analytics, security, product, and UX teams. A CTO who is too focused on the areas closest to their background or does not respect non-coding functions will not succeed.

As a midsize company CTO, you will often spend as much time with your peers and their teams as you spend with your own. As a result, you will need to learn about their functions and how your teams can work together. CTOs who “stay in their lane” will not be seen as an equal member of the senior leadership team and may lose their say in decisions that affect the organization.

It is very unusual for someone to move into a midsize company CTO role without having some experience leading a multilevel-development organization and working with other business functions.

Growing (or moving) into the CTO role

If you are a manager or a manager of managers with the goal of being a CTO, there are a few things you can start to focus on that will help you on your path.

Learn about the business your company is in

Offer to sit in on sales calls, on user research interviews. Try to understand the company’s financials when the CFO presents them. If you can’t, make a friend in the finance team and ask them to explain them to you. Understand the KPIs not only for your team, but also for the teams around you.

Learn about the other functions

Get recommendations of reading or conference talks from your peers in the product, UX, and marketing teams. Think about how their work influences yours, and yours influences theirs.

Respect and learn other technology areas aside from your own

If you lead an area you don’t have personal experience in, approach the people in that function with respect and a genuine desire to understand their work. They want to help you know what they do and how they do it.

Hone your craft

Hopefully, you are already working on deepening your skill as an engineering manager or director, but are you trying to understand the bigger picture? Read other companies’ (public) handbooks, engineering blog posts, and conference presentations about their ways of working. What practices are interesting? Which can you try in your team? How do you think they will scale, or what issues do you think they may have?

Ask your CTO if there are tasks they can delegate to you

The best way to learn the job is to do the job. Even better is having someone who is already doing the job explain to you how they perform it so you can help them.

Start thinking in terms of strategy

The main difference between the expectations of line managers and senior managers is the emphasis on strategic thinking. Executives contribute to the company’s strategic planning and use their understanding of the company’s goals and the current situation to make sure that their teams are setting up the conditions for the company’s success. Strategic thinking is a learnable skill, but it takes practice.

The rewards of being a CTO

Being a CTO was not what I imagined it to be when I first decided it was my career goal. It is a lot of work, carries much stress, has fewer perks than you might think, and can be somewhat lonely. However, it is also the most personally rewarding job I have ever had. With the challenges, there is also incredible responsibility, tons to learn, the ability to influence the company’s direction, and the chance to affect the lives of dozens or hundreds of people on your team. I have yet to regret my choice to pursue this role.


Thanks to Laura Blackwell for editing assistance

When, why, and how to stop coding as your day job

By letting go of writing code, you open yourself up to excelling as a manager.

I am a computer programmer.

I was one of those people who started coding at a young age – in my case, on a TRS-80 Model 1 in my school’s library. I loved the feeling of teaching the computer to do something and then getting to enjoy the results of interacting with what I built. Since I didn’t own a computer, I would fill spiral-bound notebooks with programs that I would write at home. As soon as I could get time on the computer, I would type it in line-by-line. When I learned that I could write software as a job, I couldn’t imagine anything else that I would want to do.

After university, I got my dream job writing 3D graphics code. I was a software engineer! I defined a successful day by the amount of code I wrote, the compiler issues I resolved, and the bugs I closed. There were obvious, objective metrics that I could use to measure my work. Those metrics and my job defined me.

Today, I am a Chief Technology Officer, leading software development organizations. If I am writing code on the product, it is probably a bad thing. I now have to define my success by much fuzzier metrics: building good teams, hiring and training good people, setting multi-year technical strategy and vision for the company, collaborating with other departments, and setting and managing a budget. I may have a good day or a bad day, but I have to measure my success based on quarters or years.

My achievements are now always tied to the successes of others. Getting to this point wasn’t easy, but I wouldn’t have it any other way. It was a journey that took years, and the first challenge was understanding that coding was no longer my job.

Why is it hard to stop coding as our day-to-day work?

When I speak to engineering leads or managers working to grow into more senior engineering leadership levels, the question of ‘How much do you code?’ is very often raised. We usually have a hard time imagining that we can still be useful if we don’t code for a significant part of our time. Why is that?

We’ve been traditionally bad at hiring managers in the software engineering industry

Usually, companies choose development leads because they are the best, technically, on the team. I would guess that the reasoning behind this is that it’s assumed that the best developers are the right people to supervise their peers. This practice creates the impression that managing others is a promotion for a skilled developer when, in actuality, it is a career change away from what made them successful in the first place.

The worst managers I’ve had were very talented developers who hated having to spend time doing the boring stuff that wasn’t coding. They resented the time spent away from the keyboard and weren’t always good at hiding that fact.

Many companies now feature dual career tracks for technologists, giving them a choice to advance as an individual contributor or move into management. This choice of career is an excellent thing. It means that if you want to spend your days coding, you can do that without sacrificing your career. It also means that if you desire to find joy in leading teams and growing others’ development and skills, you can do that.

We fear becoming ‘non-technical’

We joined the technology industry to be close to technology. We fear that by moving away from coding, we will morph into the classic ‘pointy-haired boss’ – ridiculed by the people on our team and unable to understand what the developers are discussing. I won’t say this can’t happen, but it won’t happen on its own. It will only happen if you choose to avoid technology once you move into the management role.

As you take on broader leadership responsibilities, you will need to learn and understand new technologies. Moving beyond the specifics of your expertise is necessary for you to move up in management. I have managed developers coding in at least a dozen languages on the backend, frontend, mobile, operating systems, and native applications. I have also managed testers, data scientists, data engineers, DevOps, Security, designers, data analysts, program managers, product managers, corporate IT teams, and some other roles I don’t even remember anymore. It isn’t possible to be an expert in all those fields. I need to take the lessons from my time as a developer and use them to inform my understanding, help me learn new areas, and give me empathy for the people who work for me.

It isn’t that you will become non-technical. It is that you will become less narrowly technical.

As a new manager, we are often expected to continue coding

It is common to move from being a developer on a team to managing that team. As the new manager, this means you are still responsible for part of the codebase. Unless you immediately start leading a large group, your new role still requires that you spend a significant portion of your time coding. This expectation makes the transition to the new role more comfortable – but it can also be an anchor that holds you back from embracing your new role as your management responsibilities grow.

We still see ourselves as a resource that can ‘save’ a deliverable

As a manager, you are accountable for the results of your team. If the group is struggling to make a deadline, it might be tempting to jump into the weeds to try and help the team finish the project on time. While this is sometimes the right decision, it can also make the problems worse because the team loses the person who looks at the more significant issues and coordinates with other teams to get more help or prepare them for the delay.

Why do we need to stop coding eventually?

We don’t need to stop coding, ever. However, once you move into engineering leadership, it will need to become a smaller and smaller part of your job if you are working to lead larger teams or broaden your responsibilities scope.

I had led teams before I was a manager at Adobe, and I had always spent a significant part of my work week contributing code as part of the groups I was in. At Adobe, though, my team had grown to be fourteen people, with another four dotted-lined to me.

I had been the primary developer for a part of the project, and I took pride that I was still contributing important features to every release. However, my management responsibilities were starting to fill my work weeks. Between 1:1s, sync meetings with other teams, and other manager work, my feature development time was increasingly moving into my evenings and weekends. My features were often the last to be merged and usually late.

The company had two mandatory shut-down weeks. To work during this time, you needed the prior approval of a Vice President. The team was preparing for a release, and my features were still in the to-do column; I met with my VP to get his permission to work over the shut-down week. He asked me, “Who is the worst developer on your team?” I hemmed and hawed – I didn’t want to call out anyone on my team, and I hadn’t even really considered the question. Seeing my uncertainty, he answered for me. “You are! You’re always late with your features. The rest of the team is always waiting on you. If you were a developer instead of the manager, you would be on a performance improvement plan.” He was right. My insistence on coding was hurting the team, not helping it.

Taking on the lead role doesn’t mean you should stop coding immediately, but it does mean that your coding responsibilities should now be secondary to your leadership ones. There are other developers on your team, but there aren’t other leads. If you aren’t doing your lead job, no one else will. Similarly, your professional development’s primary focus should now be on your leadership skills, not your coding skills. You are moving into a new career, and if you don’t work to get better at it, you will find yourself stuck.

As your leadership responsibilities increase, you should transition your development responsibilities to other people on the team. This transition is good practice because delegation is an essential part of leadership.

How do you stay ‘technical’ when coding isn’t your job anymore?

As I said earlier, staying technical is a choice that you need to make. Hopefully, one of the primary reasons you chose to make a career in the technology industry was that you were interested in it, so this shouldn’t be a problem.

As I also said earlier, as you develop as a technology leader, your focus broadens as your scope widens. 

The best way that I have found to remain a credible technologist for my teams is to be interested in them and their work. To do this, talk to the people on your team and take a genuine interest in the things they are working on. If a technology comes up in a meeting or 1:1 that you don’t know, add it to a list of things to research later. Then, dedicate time in your week to go through that list and learn about the technologies well enough to have your own opinions about them. This practice allows you to have further discussions with whoever mentioned the technology to you.

If you get interested in what you learn about the new technology, you may want to keep trying to understand it better; you may read more or embark on a personal project using it to gain more practical knowledge. As I said, it isn’t that you have to stop coding, it is that, eventually, it shouldn’t be your day job anymore.

By taking an interest in the technologies your team uses in their work, you deepen your empathy for them and expand your own knowledge. You’ll be able to discuss the work, ask reasonable questions, and make connections to other things happening in the organization and your own experience. This way, the people on your team know that while you may not be able to step in for them, you understand their work and care about it.

Success is defined differently when you lead people

The feeling of accomplishment that comes from completing a cool user story, deploying a new service, or fixing a difficult bug is significant. It is a dopamine hit, and just like other dopamine-inducing behaviors, it can be hard to stop.

Having a great 1:1 or leading a productive team meeting can also feel good but in a more esoteric way. As a team leader, you need to learn to perceive the success of making others successful. Success takes longer, but the feeling is more profound and more rewarding.

Having a release resonate with your customers, being able to easily justify the promotion of a developer that you have mentored, and having someone accept a job offer for your team, are all fantastic feelings. In the day-to-day, watching stories get completed, helping resolve the issues when they aren’t, and seeing people get excited about the direction you’re setting for the team can leave you feeling satisfied at the end of the day.

Being a technical leader doesn’t mean writing code every day

As you grow in your new leadership career, you will need to devote your time to mentoring, developing, and leading your team. As you spend less time in your code editor, you will find new challenges in strategy, clearing roadblocks, fixing broken processes, and new tools like HR information systems, slides, and spreadsheets (it isn’t as bad as it sounds). You will spend less time learning all the intricacies of a specific language or toolchain and instead learn about how systems interact, understand when to build vs. buy, and learn about entirely new areas of technology. And you can still code, but make sure that you aren’t the developer holding your team back.

[This was originally posted at https://leaddev.com/skills-new-managers/when-why-and-how-stop-coding-your-day-job]

Fail Safe, Fail Smart, Succeed! Part Five: Putting it into Practice

Fail Safe, Fail Smart, Succeed!

Putting this into practice at Avvo

If you think you would like to use these ideas at your company, but you are unsure where to start, I can describe what we did at Avvo. I joined when the company was already nine years old. It had a mostly monolithic architecture running in a single data center with minimal redundancy.

There were some things that we did quickly to move to a more fail-safe world.

Moving from planning around objectives to planning around priorities

First, we worked to build a supportive culture that could handle the inevitable failures better. We moved from planning around specific deliverable commitments to organizing our work around priorities.

Suppose specific achievements, my output, measure my performance. This way of measuring performance often creates problems.

Suppose I need to coordinate with another person, and their commitments do not align to mine. That situation will create tension. If the company’s needs change, but my obligations do not, there is little incentive to reorient my work. To achieve my commitments, I can be thwarted by dependencies or hamper the priorities of the company.

People in leadership like quarterly goals or Managing By Objectives because they create strict accountability. If I commit to doing something and it is not complete when I say it will be, I have failed.

Suppose you think instead about aligning around priorities. In that case, those priorities may change from time to time. Still, if everyone is working against the same set of priorities, you can be sure that they are broadly doing the right things for the company. Aligning to priorities sets an expectation of outcome, not output.

Talk about failure with an eye to future improvement instead of blame

The senior leadership team must be aligned with these approaches. The rest of the organization may not be initially. When leaders talk about failure, they must do it with a learning message rather than blame or punishment. People should know that the expectation is that they may fail. If they are avoiding failure, then they probably aren’t thinking big enough. It is a message that “we want to see you fail, small, and we want to make sure we learn from that failure.”

I created our slack channel to share the lessons from our failures. I sent a message to my organization, making it clear that I don’t expect perfection. I shared my vision that we become a learning organization in town halls and one-on-ones.

Fail-safe architecture

Monoliths are natural when building a new company or when you have a small team. Monoliths are simple to make and more straightforward to deploy when you don’t have multiple teams building together. As the codebase and organization grow, microservices become a better model.

It is critical to recognize the point where a monolith is becoming a challenge instead of an enabler. Microservices require a lot more infrastructure to support them. The effort to transition from one architecture to another is significant, so it is best to prepare before the need becomes urgent.

Avvo had already started moving to a microservices architecture, but lack of investment stalled the transition. I increased investment in the infrastructure team. The team built tools that simplified the effort of creating, testing, monitoring, and deploying services. We then made rapid progress.

We also redesigned our organization to leverage the reverse Conway Maneuver, further accelerating the new architecture.

You can build a fail-safe / fail-smart team

In every company, I use the lessons that I have shared in this article to build a culture where teams can innovate and learn from their users. It manifests differently with each group, but every team that has adopted these ideas has improved both business outcomes and employee satisfaction. Work with your peers to adopt some of these ideas. Start small and grow. The process of adopting these concepts mirrors the product development process you are working to build.

If you decide that it isn’t a good fit for your company, you will have failed smart by failing small.

I will leave you with a final thought from Henry Ford.

Fail Safe, Fail Smart, Succeed!

Fail Safe, Fail Smart, Succeed! Part Four: My Biggest Failure

Fail Safe, Fail Smart, Succeed!

My Biggest Failure

If you are a long-time Spotify user, you probably won’t recognize the interface shown in the photo below. In May of 2015, though, Spotify was very interested in telling the whole world about it. It was a new set of features in the product called “Spotify Now.”

I lead the engineering effort at Spotify on the Spotify Now set of features. It was the most extensive concerted effort that Spotify had done at the time, involving hundreds of employees across the world.

Spotify Now was a set of features built around bringing the right music for you at any moment in time. The perfect, personalized music for every user for every moment of the day. This effort included adding video, podcasts, the Running feature, a massive collection of new editorial and machine learning generated playlists, and a brand new, simplified user interface for accessing music. It was audacious for a reason. We knew that Apple would launch its Apple Music streaming product soon. We wanted to make a public statement that we were the most innovative platform. Our goal was to take the wind out of Apple’s sails (and sales!)

Given that this was Spotify and many of the things I’ve shared come from Spotify, we understood how to fail smart.

As we launched the project, I reviewed the project retrospective repository. I wanted to see what had and had not worked in large projects before. I was now prepared to make all new mistakes instead of repeating ones from the past.

We had a tight timeline, but some of the features were already in development. I felt confident. However, as we moved forward and the new features started to take shape in the product’s employee releases, there was a growing concern. We worried the new features weren’t going to be as compelling as the vision we had for them. We knew that we, as employees, were not the target users for the features. We were not representative of our users. To truly understand how the functionality would perform, we wanted to follow our product development methods and get the features in front of users to validate our hypotheses.

Publicly releasing the features to a narrow audience was a challenge at that time. The press, also aware of Apple’s impending launch, was watching every Spotify release exceptionally closely. They knew that we tested features, and they were looking for hints of what we would do to counter Apple.

Our marketing team wanted a big launch. This release was a statement, so we wanted a massive spike in Spotify’s coverage extolling our innovation. The press response would be muted if our features leaked in advance of the event.

There was pressure from marketing not to test the features and pressure from product engineering to follow our standard processes. Eventually, we found a compromise. We released early versions of the Spotify Now features to a relatively small cohort of New Zealand users. Satisfied that we were now testing these features in the market, we went back to building Spotify Now and preparing for the launch while waiting for the test results to come back.

After a few weeks, we got fantastic news. For our cohort, retention was 6% higher than the rest of our customer base.

For a subscription-based product like Spotify, customer retention is the most critical metric. It determines the Lifetime Value of the customer. The longer you stay using a subscription product, the more money the company will make from you.

With a company of the scale of Spotify, it was tough to move a core metric like retention significantly. A whole point move was rare and something to celebrate. With Spotify Now, we had a 6% increase! It was massive.

Now, all of our doubt was gone. We knew we were working on something exceptional. We’d validated it in the market! With real people!

On the launch day, Daniel Ek, Spotify’s CEO and founder, Gustav Söderstrom, the Chief Product Officer, and Rochelle King, the head of Spotify’s design organization, shared a stage in New York with famous musicians and television personalities. They walked through everything we had built. It was a lovely event. I shared a stage in the company’s headquarters in Stockholm with Shiva Rajaraman and Dan Sormaz, my product and design peers. We watched the event with our team, celebrating.

As soon as the event concluded, we started the rollout of the new features by releasing them to 1% of our customers in our four largest markets. We’d begun our Ship It phase! We drank champagne and ate prinsesstÃ¥rta.

I couldn’t wait to see how the features were doing in the market. After so much work, I wanted to start the progressive roll out to 100%. Daily, I would stop by the desk of the data scientist who was watching the numbers. For the first couple of days, he would send me away with a comment of “it is too early still. We’re not even close to statistical significance.” Then one day, instead, he said, “It is still too early to be sure, but we’re starting to see the trend take shape, and it doesn’t look like it will be as high as we’d hoped.” Every day after, his expression became dourer. Finally, it was official. Instead of the 6% increase we’d seen in testing, the new features produced a 1% decrease in retention. It was a seven-point difference between what we had tested and what we had launched.

Not only were our new features not enticing customers to stay longer on our platform, but we were driving them away! To say that this was a problem was an understatement. It was a colossal failure.

Now we had a big quandary. We had failed big instead of small. We had released several things together, so it was challenging to narrow down the problem. Additionally, we’d just had a major press event where we talked about all these features. There was coverage all over the internet. The world was now waiting for what we had promised, but we would lose customers if we rolled them out further.

Those results began one of the most challenging summers of our lives. We had to narrow down what was killing our retention in these new features. We started generating hypotheses and running tests within our cohort to find what had gone wrong.

The challenge was that the cohort was too small to run tests quickly (and it was shrinking every day as we lost customers). Eventually, we had to do the math to figure out how much money the company would lose if we expanded the cohort so our tests would run faster. The cost was determined to be justified, and so we grew the cohort to 5% of users in our top four markets.

Gradually, we figured out what in Spotify Now was causing users to quit the product. We removed those features and were able to roll out to the rest of the world with a more modest retention gain.

In the many retrospectives that followed to understand what mistakes we’d made (and what we had done correctly), we found failures in our perceptions of our customers, failures in our teams, and other areas.

It turns out that one of our biggest problems was a process failure. We had a bug in our A/B testing framework. That bug meant that we had accidentally rolled out our test to a cohort participating in a very different trial. A trial to establish a floor on what having no advertising in the free product would do for retention.

To Spotify’s immense credit, rather than punish me, my peers, and the team, instead, we were rewarded for how we handled the failure. The lessons we learned from the mistakes of Spotify Now were immensely beneficial to the company. Those lessons produced some of the company’s triumphs in the years that have followed, including Spotify’s most popular curated playlists, Discover Weekly, Release Radar, Daily Mixes, and podcasts.

Part Five: Putting it into Practice

Fail Safe, Fail Smart, Succeed! Part Three: Making Failure Safer

Fail Safe, Fail Smart, Succeed!

Making Failure Safer

How do we reduce the fuel-air bomb failure into an internal combustion failure? How can we fail safely?

Minimizing the cost of failure

If you fail quickly, you are reducing the cost in time, equipment, and expenses. At Spotify, we had a framework, rooted in Lean Startup, that we used to reduce the cost of our failures. We named the framework “Think it, Build it, Ship it, Tweak it.

This graph shows investment into a feature over time through the different phases of the framework. Investment here signifies people’s time, material costs, equipment, opportunity cost, whichever.

Think It

Imagine this scenario: you are coming back from lunch with some people you work with, and you have an idea for a new feature. You discuss it with your product owner, and they like the idea. You decide to explore if it would be a useful feature for the product. You have now entered the “Think It” phase. During this phase, you may work with the Product Owner and potentially a designer. This phase represents a part-time effort by a small subset of the team–a small investment.

You might create some paper prototypes to test out the idea with the team and with customers. You may develop some lightweight code prototypes. You may even ship a very early version of the feature to some users. The goal is to test as quickly and cheaply as possible and gather some real data on the feature’s viability.

You build a hypothesis on how the feature can positively impact the product, tied to real product metrics. This hypothesis is what you will validate against at each stage of the framework.

If the early data shows that the feature isn’t needed or wanted by customers, your hypothesis is incorrect. You have two choices. You may iterate and try a different permutation of the concept, staying in the Think It phase and keeping the investment low. You may decide that it wasn’t as good an idea as you hoped and end the effort before investing further.

If you decide to end during the Think It phase, congratulations! You’ve saved the company time and money building something that wasn’t necessary. Collect the lessons in a retrospective and share them so that everyone else can learn.

Build It

The initial tests look promising. The hypothesis isn’t validated, but the indicators warrant further investment. You have some direction from your tests for the first version of the feature.

Now is the time to build the feature for real. The investment increases substantially as the rest of the team gets involved.

How can you reduce the cost of failure in the Build It phase? You don’t build the fully realized conception of the feature. You develop the smallest version that will validate your initial hypothesis, the MVP. Your goal is validation with the broader customer set.

The Build It phase is where many companies I speak to get stuck. If you have the complete product vision in your head, finding the minimal representation seems like a weak concept. Folks in love with their ideas have a hard time finding the core element that validates the whole. Suppose the initial data that comes back for the MVP puts the hypothesis into question. In that case, it is easier to question the validity of the MVP than to examine the hypothesis’s validity. This issue of MVP is usually the most significant source of contention in the process.

It takes practice to figure out how to formulate a good MVP, but the effort is worth it. Imagine if the Clippy team had been able to ship an MVP. Better early feedback could have saved many person-years and millions of dollars. In my career, I have spent years (literally) building a product without shipping it. Our team’s leadership shifted product directions several times without ever validating or invalidating any of their hypotheses in the market. We learned nothing about the product opportunity, but the development team learned a lot about refactoring and building modular code.

Even during the Build It phase, there are opportunities to test the hypothesis: early internal releases, beta tests, user tests, and limited A/B tests can all be used to provide direction and information.

Ship It

Your MVP is ready to release to your customers! The validation with the limited release pools and the user testing shows that your hypothesis may be valid–time to ship.

In many, if not most, companies shipping a software release is still a binary thing. No users have it, and now all users have it. This approach robs you of an opportunity to fail cheaply! Your testing in Think It and Build It may have shown validation for your hypothesis. It may have also provided incorrect information, or you may have misinterpreted it. On the technical side, whatever you have done to this point will not have validated that your software performs correctly at scale.

Instead of shipping instantly to one hundred percent of your users, do a progressive rollout. At Spotify, we had the benefit of a fairly massive scale. This scale allowed us to ship to 1%, 5%, 10%, 25%, 50%, and then 99% of our users (we usually held back 1% of our users as a control group for some time). We could do this rollout relatively quickly while maintaining statistical significance due to our size.

If you have a smaller user base, you can still do this with fewer steps and get much of the value.

At each stage of the rollout, we’d use the product analytics to see if we were validating our assumptions. Remember that we always tied the hypothesis back to product metrics. We’d also watch our systems to make sure that they were handling the load appropriately and didn’t have any other technical issues or bugs arising.

If the analytics showed that we weren’t improving the product, we had two decisions again. Should we iterate and try different permutations of the idea, or should we stop and remove the feature?

Usually, if we reached this point, we would iterate, keeping to the same percentage of users. If this feature MVP wasn’t adding to the product, it took away from it, so rolling out further would be a bad idea. This rollout process was another way to reduce the cost of failure. It reduced the percentage of users seeing a change that may negatively affect product metrics. Sometimes, iterating and testing with a subset of users would give us the necessary direction to move forward with a better version of the MVP. Occasionally, we would realize that the hypothesis was invalid. We would then remove the feature (which is just as hard to do as you imagine, but it was more comfortable with the data validating the decision).

If we removed the feature during the Ship It phase, we would have wasted time and money. We still would have wasted a lot less than if we’d released a lousy feature to our entire customer base.

Tweak It

The shaded area under this graph shows the investment to get a feature to customers. You earn nothing against the investment until the feature’s release to all your customers. Until that point, you are just spending. The Think It/Ship It/Build It/Tweak It framework aims to reduce that shaded area; to reduce the amount of investment before you start seeing a return.

You have now released the MVP for the feature to all your customers. The product metrics validate the hypothesis that it is improving the product. You are now ready for the next and final phase, Tweak It.

The MVP does not realize the full product vision, and the metrics may be positive but not to the level of your hypothesis. There is a lot more opportunity here!

The result of the Ship It phase represents a new baseline for the product and the feature. The real-world usage data, customer support, reviews, forums, and user research can now inform your next steps.

The Tweak It phase represents a series of smaller Think It/Build It/Ship It/Tweak It efforts. From now, your team iteratively improves the shipped version of the feature and establishes new, better baselines. These efforts will involve less and less of the team over time, and the investment will decrease correspondingly.

When iterating, occasionally, you reach a local maximum. Your tweaks will result in smaller and smaller improvements to the product. Once again, you have two choices: move on to the next feature or look for another substantial opportunity with the current feature.

The difficulty is recognizing that there may be a much bigger opportunity nearby. When you reach this decision point, it can be beneficial to try a big experiment. You may also choose to take a step back and look for an opportunity that might be orthogonal to the original vision but could provide a significant improvement.

You notice in the graph that the investment never reaches zero. This gap reveals the secret, hidden, fifth step of the framework.

Maintain It

Even if there is no active development on a feature, it doesn’t mean that there isn’t any investment into it. The feature still takes up space in the product. It consumes valuable real estate in the UI. Its code makes adding other features harder. Library or system updates break it. Users find bugs. Writers have to maintain documentation about the functionality.

The investment cost means that it is critical not to add features to a product that do not demonstrably improve it. There is no such thing as a zero-cost feature. Suppose new functionality adds nothing to the product in terms of incremental value to users. In that case, the company must invest in maintaining it. Features that bring slight improvements to core metrics may not be worth preserving, given the additional complexity they add.

Expect failure all the time

When you talk about failure in the context of software development from the year 2000 to now, there is a substantial difference. Back then, you worked hard to write robust software, but the hardware was expected to be reasonably reliable. When there was a hardware failure, the software’s fault tolerance was of incidental importance. You didn’t want to cause errors yourself, but if the platform was unstable, there wasn’t much you were expected to do about it.

Today we live in a world with public clouds and mobile platforms where the environment is entirely beyond our control. AWS taught us a lot about how to handle failure in systems. This blog post from Netflix about their move to AWS was pivotal to the industry’s adapting to the new world.

Netflix’s approach to system design has been so beneficial to the industry. We assume that everything can be on fire all the time. You could write perfect software, and the scheduler is going to come and kill it on mobile. AWS will kill your process, and your service will be moved from one pod to another with no warning. We now write our software expecting failure to happen at any time.

We’ve learned that writing big systems makes handling failure complicated, so micro-service architectures have become more prevalent. Why? Because they are significantly more fault-tolerant, and when they fail, they fail small. Products like Amazon, Netflix, or Spotify all have large numbers of services running. A customer doesn’t notice if one or more instances of the services fail. When a service fails in those environments, the service is responsible for a small part of the experience; the other systems assume that it can fail. There are things like caching to compensate for a system disappearing.

Netflix has its famous chaos monkey testing, which randomly kills services or even entire availability zones. These tests make sure that their systems fail well.

Having an architecture composed of smaller services that are assumed to fail means that there is near zero user impact when there is a problem. Failing well is critical for these services and their user experience.

Smaller services also make it possible to use progressive rollout, feature flags, dark loading, blue-green deploys, and canary instances, making it easier to build in a fail-safe way.

Part Four: My Biggest Failure

Fail Safe, Fail Smart, Succeed! Part Two: Building a fail-safe culture

Fail Safe, Fail Smart, Succeed!

Building a fail-safe culture

If innovation requires failure, to build an innovative product or company, how your culture handles the inevitable failures is key to creating a fail-safe environment.

Many companies still punish projects or features that do not succeed. The same companies then wonder why their employees are so risk-averse. Punishing failure can take many forms, both obvious and subtle. Punishment can mean firing the team or leader who created an unsuccessful release or project. Sanctions are often more subtle:

  • Moving resources away from innovative efforts that don’t yield immediate successes.
  • Allowing people to ridicule failed efforts.
  • Continuing to invest in the slow, steady, growth projects instead of the more innovative but risky efforts. Innovator’s dilemma is just the most well-known aspect of this.

Breeding innovation out

I spend several years working at a company whose leadership was constantly extorting the employees to be more innovative and take more risks. It created ever-new processes to encourage new products to come from within the organization. It was also a company that had always grown through acquisition. Every year, it would acquire new companies. At the start of the next year’s budget process, there would inevitably be the realization that the company had now grown too large. Nearly every year, there would be a layoff.

If you are a senior leader and need to trim ten percent of your organization, where would you look? In previous years, you likely had already eliminated your lowest performers. Should you reduce the funding of the products that bring in your revenue or kill the new products that are struggling to make their first profit? The answer is clear if your bonus and salary are dependent on hitting revenue targets.

Through the culture of the company, it communicated that taking risks was detrimental to a career. So the company lost its most entrepreneurial employees either through voluntary or involuntary attrition. Because it could not innovate within, innovation could only happen through acquisitions, perpetuating the cycle.

If failure is punished, and failure is necessary for innovation, then punishing failure, either overtly or subtly, means that you are dis-incentivizing innovation.

Don’t punish failure. Punish not learning from failure. Punish failing big when you could have failed small first. Better yet, don’t punish at all. Reward the failures that produce essential lessons for the company and that the team handles well. Reward risk-taking if you want to encourage innovation.

If you worry about employees taking risks without accountability, give them participation in the revenue that they bring in

Each failure allows you to learn many things. Take the time to learn those lessons

Learning from failure

It can be hard to learn the lessons from failure. When you fail, your instinct is to move on, to sweep it under the rug. You don’t want to wallow in your mistakes. However, if you move on too quickly, you miss the chance to gather all the lessons, which will lead to more failure instead of the success you’re seeking.

Lessons from failure: Your process

Sometimes the failure was in your process. The following exchange is fictional, but I’ve heard something very much like it more than once in my career.

“What happened with this release? Customers are complaining that it is incredibly buggy.”

“Well, the test team was working on a different project, so they jumped into this one late. We didn’t want to delay the release, so we cut the time for testing short and didn’t catch those issues. We had test automation, and it caught the issue, but there have been a lot of false positives, so no one was watching the results.”

“Did we do a beta test for this release? An employee release?”

“No.”

The above conversation indicates a problem with the software development process (and, for this specific example, a bit of a culture-of-quality problem). If you’ve ever had an exchange like the one above, what did you do to solve the underlying issues? If the answer is “not much,” you didn’t learn enough from the failure, and you likely had similar problems afterward.

Lessons from failure: your team

Sometimes your team is a significant factor in a failure. I don’t mean that the members of the group aren’t good at their jobs. Your team may be missing a skillset or have personality conflicts. Trust may be an issue within the team, and so people aren’t open with each other.

“The app is performing incredibly slowly. What is going on?”

“Well, we inherited this component that uses this data store, and no one on the team understands it. We’re learning it as we’re doing it, and it has become a performance problem.”

Suppose the above exchange happened in your team. In that case, you might make sure that the next time you decide to use (or inherit) a technology, you make sure that someone on the team knows it well, even if that means adding someone to the team.

Lessons from failure: your perception of your customers

A vein of failure, and a significant one in the lesson of Clippy, is having an incorrect mental model for your customer.

We all have myths about who our customers are. Why do I call them “myths”? The reason is that you can’t precisely read the minds of every one of your customers. At the beginning of a product’s life cycle, you may know each of your customers well when there are few of them. That condition, hopefully, will not last very long.

How do you build a model of your user? You do user research, talk to your customer service team, beta test, and read app reviews and tweets about your product. You read your product forums. You instrument your app and analyze user behavior.

We have many different ways of interacting with the subsets of our customers. Those interactions give us the feeling that we know what they want or who they are.

These interactions provide insights into your customers as an aggregate. They also fuel myths of who our customers are because they are a sampling of the whole. We can’t know all our customers, so we create personas in our minds or collectively for our team.

Suppose you have a great user research team, and you are very rigorous in your effort to understand your customers. You may be able to have in-depth knowledge about your users and their needs for your product. However, that knowledge and understanding will only be for a moment in time. Your product continues to evolve and change and hopefully add new users often. Your new customers come to your product because of the unique problems they can solve. Those problems are different from the existing users—your perception of your customers ages quickly. You are now building for who they were, not who they are.

Lessons from failure: your understanding of your product

You may think you understand your product; after all, you are the one who is building it! However, the product that your customers are using may be different from the product you are making.

You build your product to solve a problem. In your effort to solve that problem, you may also solve other problems for your customers that you didn’t anticipate. Your customers are delighted that they can solve this problem with your product. In their minds, this was a deliberate choice on your part.

Now you make a change that improves the original problem’s solution but breaks the unintended use case. Your customers are angry because you ruined their product!

Lessons from failure: yourself

Failure gives you a chance to learn more about yourself. Is there something you could do differently next time? Was there an external factor that is obvious in hindsight but could have been caught earlier if you approached things differently?

Our failures tend to be the hardest to dwell on. Our natural inclination is to find fault externally to console ourselves. It is worth taking some time to reflect on your performance. You will always find something that you can do that will help you the next time.

Collecting the lessons: Project Retrospectives

The best way that I have learned to extract the lessons is to do a project retrospective.

A project retrospective aims to understand what happened in the project from its inception to its conclusion. You are looking to understand each critical decision, what informed the decision, and its outcome.

In a project retrospective, you are looking for the things that went wrong, the things that went well, and the things that went well, but you could do better the next time. The output of the retrospective is neutral. It is not for establishing blame or awarding kudos. It exists to make sure you learn. For this reason, it is useful for both unsuccessful and highly successful projects.

A good practice for creating a great culture around failure is to make it the general custom to have a retrospective at the end of every project in your company. Having retrospectives only for the unsuccessful projects perpetuates a blame culture.

For an example of project retrospectives processes, see this post from Henrik Kniberg.

The project retrospective repository

Since the project retrospectives are blameless, it is good to share them within your company. Create a project retrospective repository and publicize it.

The repository becomes a precious resource for everyone in your company. It shows what has worked and what has been challenging in your environment. It allows your teams to avoid making the mistakes of the past. We always want to be making new mistakes, not old ones!

The repository is also handy for new employees to teach them about how projects work in your company. Finally, it is also a resource for documenting product decisions.

The retrospective repository is a valuable place to capture the history of your products and your process.

Spotify’s failure-safe culture

I learned a lot about creating a failure safe culture when I worked at Spotify. Some of the great examples of this culture were:

One of the squads created a “Fail Wall” to capture the things they were learning. The squad didn’t hide the wall. It was on a whiteboard facing the hallway where everyone could see it.

This document is a report from one of the project retrospectives. You don’t need any special software for the record. For us, it was just a collection of Google docs in a shared folder.

One of the agile coaches created a slack channel for teams to share the lessons learned from failures with the whole company.

Spotify’s CTO posted an article encouraging everyone to celebrate the lessons that they learned from failure. Which inspired other posts like this:

If you look at the Spotify engineering blog, there are probably more posts about mistakes that we made than cool things we did in the years I worked there (2013-2016).

These kinds of posts are also valuable to the community. Often, when you are searching for something, it is because you are having a problem. We might have had the same issue. These posts are also very public expressions of the company culture.

Failure as a competitive advantage

We’re all going to fail. If my company can fail smart and fast, learning from our mistakes; while your company ignores the lessons from failure, my company will have a competitive advantage.

Part Three: Making Failure Safer

Fail Safe, Fail Smart, Succeed! Part One: Why Focus on Failure?

This article is about failure and everything I’ve learned from 28 years of failing (and succeeding) in the technology industry. Its basis is my talk of the same name that I first gave in 2015.

I’ve broken it into five parts to make it easier to read and share:

The importance of failure in software development

How we approach failure is critical in any industry, but it is especially crucial in building software.

Why?

The answer is simple: invention requires failure.

We don’t acknowledge that fact enough as an industry. Not broadly. It is something we should recognize and understand more. As technologists, we are continually looking for ways to transform existing businesses or build new products. We are an industry that grows on innovation and invention.

Real innovation is creating something uniquely new. If you can create something genuinely novel without failing a few times along the way, it probably isn’t very innovative. Albert Einstein expressed this as “Anyone who has never made a mistake has never tried anything new.”

In his own words, Thomas Edison says that he created three thousand different theories before he found the right materials for his electric light. To invent his battery, the laboratory performed over ten thousand experiments.

Filmmaker Kevin Smith says, “failure is success training.” I like that sentiment. It frames failure as leading to success.

Failure teaches you the things you need to know to succeed. Stated more strongly: failure is a requirement for success.

Creating a fail-safe environment

To achieve success, what’s important isn’t how to avoid failure; it’s how to handle failure when it comes. The handling of failure makes the difference between eventual success and never succeeding. Creating conditions conducive to learning from failure means creating a fail-safe environment.

In the software industry, we define a fail-safe environment as setting up processes to avoid failure. Instead, we should ensure that when the inevitable failure happens, we handle it well and reduce its impact. We want to fail smart.

When I was at Spotify, a company that worked hard to create a fail-smart environment, we described this as “minimizing the blast radius.” This quote from Mikael Krantz, the head architect at Spotify during that time, sums up the idea nicely: “we want to be an internal combustion engine, not a fuel-air bomb. Many small, controlled explosions, propelling us in a generally ok direction, not a huge blast leveling half the city.”

So, let us plan for failure. Let’s embrace the mistakes that are going to come in the smartest way possible. We can use those failures to move us forward and make sure that they are small enough not to take out the company. I like the combustion engine analogy because it embraces that failure, well-handled, pushes us in the right direction. If we anticipate, we can course correct and continue to move forward.

One way you can create these small, controlled explosions is to fail fast. Find the fastest, most straightforward path to learning. Can you validate your idea quickly? Can you reduce the concept down so that you can get it in front of real people immediately and get feedback before investing in a bunch of work? Failing fast is one of the critical elements of the Lean Startup methodology.

A side benefit of small failures is that they are easier to understand. You can identify what happened and learn from it. With a big failure, you must unpack and dig in to know where things went wrong.

The Lesson of Clippy

Even if you’ve never used the Office Assistant feature of Microsoft Office, you are likely aware of it. It was a software product flop so massive that it became a part of pop culture.

I worked at Microsoft when the company created Office Assistant. Although I didn’t work on that team, I knew a few people who did.

It is easy to think that the Office Assistant was a horrible idea created by a group of poor-performing developers and product people, but that couldn’t be farther from the truth. Extremely talented developers, product leads, researchers with fantastic track records, and PhDs from top-tier universities built Clippy. People who thought they understood the market and their users. These world-class people were working on one of (if not THE) most successful software products of all-time at the apex of its popularity. Microsoft spent millions of dollars and multiple person-years on the development of Clippy.

So, what happened?

What happened is that those brilliant people were wrong. Very wrong, as all of us are from time to time. How could they have found their mistake before releasing widely? It wasn’t easy at the time to test product assumptions. It was much harder to validate hypotheses about users and their needs.

How we used to release software

Way back before we could assume high-bandwidth internet connections, we wrote and shipped software in a very different way.

Software products were manufactured, transcribed onto plastic and foil discs. For a release like Microsoft Office, those discs were manufactured in countries worldwide, put into boxes, then put onto trucks and trains and shipped to warehouses, like TV sets. From there, trucks would take them to stores where people would purchase them in person, take them home and spend an afternoon swapping the discs in and out of their computers, installing the software.

With a release like Office, Microsoft would need massive disc pressing capability. It required dozens of CD/DVD plants across the world to work simultaneously. That capability had to be booked years in advance. Microsoft would pay massive sums of money to take over the entire CD/DVD pressing industry essentially. This monopolization of disc manufacturing required a fixed duration. Moving or growing that window was monstrously expensive.

It was challenging to validate a new feature in that atmosphere, peculiarly if that feature was a significant part of a release that you didn’t want to leak to the press.

That was then; this is now.

Today, the world is very different. There is no excuse for not validating your ideas.

You can now deploy your website every time you hit save in your editor. You can ship your mobile app multiple times per week. You can try ideas almost as fast as you can think of them. You can try and fail and learn from the failure and make your product better continuously.

Thomas J Watson, the CEO of IBM from 1914 until 1956, said, “If you want to increase your success rate, double your failure rate.” If it takes you years and millions of dollars to fail and you want to double that, your company will not survive to see the eventual success. Failing Fast minimizes the impact of your failure by reducing the cost and delay in learning.

I worked at an IBM research lab a long time ago. I was a developer on a project building early versions of synchronized streaming media. After over a year of effort, we arranged to publish our work. As we prepared, we learned there were two other labs at IBM working on the same problems. We were done, it was too late to collaborate. At the time, it seemed to me like big-company stupidity, not realizing that three different teams were working on the same thing. Later I realized that this was a deliberate choice. It was how IBM failed fast. Since it took too long to fail serially, IBM had become good at failing in parallel.

Part Two: Building a Fail-Safe Culture

Things We Learned Creating Technology Career Steps

This is a repost from: https://labs.spotify.com/2016/02/22/things-we-learned-creating-technology-career-steps/

This is part three of a three-part series on how we created a career path framework for the individual contributors at Spotify. Part one discussed the process we used to formulate the framework. Part two contained version 1.0 of our framework. In this segment, I’ll talk about the lessons we learned rolling out the framework to the technology organization. If you haven’t read the first two parts, I suggest that you read those first before proceeding with this one.

The Launch to the Organization

We launched the request for comments (RFC) version of Career Steps at a special town hall for Spotify’s entire technology department in December, 2014. Leading up to the town hall we’d done several reviews of earlier iterations with increasingly larger groups from the organization, and we’d given trainings in Steps and conversations around Steps for every manager in Technology. In the presentation, we left plenty of time for questions, which was good. We prompted people to read the document, ask questions in the document, via a mailing list or a slack channel that we set up, or to ask their manager. In retrospect, we should have followed up with some more opportunities to talk to our working group face-to-face. Most of the organization would have been reading the document for the first time after the town hall, and some had only read earlier drafts.

The Relationship between Steps and Compensation

There was one area that was still incomplete when we launched Steps, and this turned out to be an issue that challenged us for a long time. This was the connection between your step and your salary. We had asserted that there was a connection, but at the point that we launched the framework, our Compensation and Benefits team had not finished their review of how this should work. We knew the generalities, but not the details. This left us in a very challenging situation, since we had given Steps a critical connection to the individuals in the organization, but we couldn’t yet explain it. We knew that there needed to be a connection between Steps and compensation. If we were saying that Steps embodied what we valued from the members of our organization, but the salary of those members was determined in a completely separate way; we were essentially contradicting ourselves and undercutting the effectiveness of the framework.

The lack of clarity around compensation created some tension in the adoption of Steps with some individuals in the organization. Unfortunately, the connection between salary and step wasn’t resolved until nearly a year later. This meant that the first salary review after launching Steps wasn’t able to use the framework. On one level this was a failure, but in some ways this was also positive. Since Steps were very new, having people get a step and then immediately have it affect their salary might have lead to a lot bad feelings. By delaying the tie to compensation people had time to adjust to the system and potentially change their step before their first salary review.

Our inability to talk concretely about Steps and salary also created misperceptions around how this would work. We had to spend a lot of effort working through these only to have them re-appear in mail threads and discussions again and again. The most persistent misperception was that the only way to change your salary was by changing your step. This naturally caused much concern that was difficult to dispel until the Compensation and Benefits team had completed their work. Our C&B team did a stellar job on creating a fair and very progressive system that allowed good overlap in salary ranges between the steps and a lot of headroom. Since that has been completed, we hopefully have put many of the concerns to rest; especially as we now using Steps as part of our current salary review.

Behavior Versus Achievements

We made an explicit decision around Career Steps that career growth was characterized by your behaviors and not by your achievements. This is counter to the way career advancement works at many other companies. It was something that we not only felt strongly about; it was something that we were proud of. In a culture that encourages innovation, failure (and learning from failure) is a natural occurrence. If we only rewarded success, then we were consequently punishing failure and discouraging risk-taking. We also wanted to encourage real personal growth, and not a culture of checking off achievements on a list and expecting a promotion. This naturally created some ambiguity around the requirements for each step. The working group embraced this ambiguity as a way of giving managers some room to make decisions and also to encourage discussion between the individual and their manager.

In retrospect, our approach was a bit naive in a few ways. While we as an organization are very comfortable with ambiguity in many areas, this was something that was personal, and had personal consequences. Some individuals were very uncomfortable with this ambiguity, and would have preferred more concrete requirements around advancement. There was another group that thought even the examples given were too much of a checklist. This is still something that we are working through.

We are looking for ways to support those in the organization who would like more clarity without being too prescriptive. We have discussed creating lists of (anonymized) examples of the behaviors. The concern is that this would lead to people treating the examples as achievements to be checked off instead.

This has been a particularly thorny path to traverse with some very vocal minorities, but the majority of the organization seems to have understood the concept.

Having Greater Impact Means Not Working on a Team?

There was one aspect of the Steps Framework that we hadn’t anticipated being controversial, and didn’t come up often in our earlier reviews of the document. This was the aspect of Steps being a reflection of your sphere of influence. The core idea around this was that as you increase your professional maturity, you naturally will become a resource for ever larger parts of the organization; either through your technical leadership, your deep technical skills, or your general problem solving ability. This increasing sphere of influence brings new, broader, responsibilities with it. This was well aligned with our own personal experiences in the working group at Spotify and other companies. For some individuals, the idea that they couldn’t move to a Tribe/Guild step without working outside their squads was a concern.

At Spotify, we point to the squad as the central place that work gets done, it is the top of our inverted servant-leadership hierarchy. The question we heard repeatedly was: why couldn’t someone just be better at what they do and get “promoted” for doing the exact same role if they were adding value to their team? I think that this was a case where the we could have presented the reasoning of this decision in a better way in the document.

Not Enough Chances For Advancement?

When deciding on the number of steps to create, we had decided to keep the number relatively small. The thought being that it was easier to add more than it was to remove some. Four steps is not many changes in a potentially forty-plus year career. The expectation was that people would potentially stay at a step for a very long time. This turned out to be very discouraging for some people who felt that they would never be “promoted” in their Spotify career.

Here we missed the understanding that a segment of our organization had been wanting the opportunity for recognition on increasing advancement. We had not been doing this before in any respect, so we hadn’t considered that it was something that a group really felt the absence of. It turns out that we were wrong about that and with Steps we weren’t giving that group enough opportunity to get it. This was a big oversight since we knew that part of what had been encouraging people to make career changes to management or product was this lack of recognition. This is an active area of discussion for next iterations of the framework.

Assigning Steps

After the town hall presentation introducing Steps, we gave people some time to respond to the RFC version of the document and feedback to the working group. In January, we started the process of making sure every individual contributor in the technology organization had a step. Initially, we left the process up to each tribe. This was a definite mistake. The tribes needed some more guidance on a good way to make this process work. Luckily, at Spotify, we are quite good at figuring these things out and sharing good ideas. The Infrastructure and Operations Tribe came up with the suggestion: each individual should use the Steps document to make a self-assessment of which step they should be on, then have a discussion with their manager about where they agreed and disagreed. This was quickly adopted as a good process across most of the organization.

Once these discussions were completed in each tribe, the tribe would meet to make sure that each manager was assigning steps to their employees in a similar way. Additionally, functional managers in different parts of the organization also met to do a similar exercise. These synchronizations helped make sure that we were being fair and consistent across the entire organization. Before the steps for individuals were finalized, the Tribe Leads and CTO met to go over all the recommendations for the Tribe/Guild and Tech/Company steps. This ended up serving two very valuable purposes: it assured consistency across the entire organization for the individuals who were on these steps, and it made the senior technical leadership aware of who these individuals were and what their manager thought they contributed to the organization.

The process of assigning steps to individuals completed in March, but here we also had a problem. The effort of adding steps into our HR systems ended up being a bigger challenge than our HRIS team had anticipated. So, the steps were assigned in the spring, but weren’t easily visible to employees, their managers or HR until the fall. This also meant we had difficulties tracking which teams were behind in submitting steps for their employees. We couldn’t easily run reports to see how steps were distributed as well.

Once this was finally worked out, we found that we were missing a lot of data. Some employees had never been assigned a step. Some employees had transitioned between teams or roles and their step hadn’t been communicated. Steps hadn’t been integrated into the onboarding process, so many new employees had never gotten their step. There were a bunch of other issues as well. All of these had to be rectified before we could do our salary review. The lack of visibility also contributed to an “out of sight, out of mind” issue where some people or managers didn’t do much with the framework after the initial discussion.

In retrospect, the working group should have taken ownership of this until the issues with the HR system were handled. Unfortunately, there was some poor communication around the timeline for the fixes and we were always thinking that the issue was nearly solved.

Positive Results

Once the Steps Framework was launched, we started getting strong positive feedback from the organization in addition to the concerns noted above. Many of our line managers (Chapter Leads) were previously individual contributors; they especially appreciated having a structure to help frame personal development discussions. Also many individuals told us how it was good to have some structure and understanding about how to grow at Spotify. The working group collected feedback in multiple ways including doing interviews with employees in different offices.

We had wanted to have some real data in addition to our anecdotal evidence to have greater confidence we were actually adding value. Some strong supporting data came from our yearly Great Place to Work survey, where in the Technology Organization, the “Management makes its expectations clear” measurement increased by 4 percentage points year over year, and the “I am offered training or development to further myself professionally” measurement increased by 6 percentage points. There were several things that could have impacted these measurements, but we had reasonable belief that Steps had a part in these increases.

While we had started our effort with the desire of being data-driven, as we went to measure our impact it was clear that we were not as rigorous as we should have been. We should have started the entire effort with some qualitative numbers on how the organization felt about the support they had for personal development driven by a survey or poll. This would have given us better metrics to measure ourselves against, and would help us to guide future iterations as well. This is something that the working group is actively pursuing.

Loss of Focus

Once the steps had been assigned, the working group started moving into a less active phase. There was a strong sense of accomplishment, but also weariness. From meeting twice a week and working on the document in between the meetings, meeting with individuals and teams to discuss steps, training the managers, launching the effort and supporting the initial rollout, there had been a tremendous amount of work done. However, this wasn’t anyone’s main job; we were all moonlighting to do this. Also during this time many of us were involved in a large project that also was demanding our focus. It felt like we had completed our mission and that it was time to move onto new things. Unfortunately, we were wrong.

In this post-launch phase is where we made some of our biggest mistakes. We’d launched the framework, but our support for it was largely put on hold. We hadn’t put the necessary structures in place to make sure that new employees were being trained in Steps. We stopped sending updates and communicating about what was happening around Steps. This lead to a lot of questions and confusion in the organization around what was going on with the effort.

Eventually, we realized our error. At that point we had to do some cleanup and rebuilding work to get the program back on track. Luckily during this time, the HRIS team had done their thing, as had Comp and Benefits. We also had our first Steps-enabled salary review to do. We were also very fortunate that parts of the organization had really started to embrace Steps and were using it in several ways to support individual growth through structured trainings and formal or informal mentoring. Having these efforts emerge helped keep Steps from completely being forgotten in the technology team.

The working group had been inactive too long at this point. Most of the members had new commitments. For a time we continued moving the effort forward with the participation we had, but it was clear that we needed to move into a new phase, which would require a new group.

Current Status

Currently, the Working Group is reforming with new members. The current members are engaged with our Human Resources team on training in Steps for Managers and Employees. We’re also starting working on the next iteration of Steps addressing some of the lessons learned and feedback now that Technology has been living with the framework for a while. The new group will have a few members from the first version, which will give some continuity to the effort and provide some history, but the rest of the group will be new, which should inject some new ideas and approaches to the work.

The leadership of the technology organization has been doing a lot to support the framework. We are actively giving more responsibility and opportunities to our Tribe/Guild step employees and we are also looking to grow our Squad/Chapter step employees to the next step if that is what they are interested in.

We’ve had several promotions between steps during the year; which is a serious measure of validation. If people weren’t moving between steps that would be a major issue.

Additionally, in my own organization, two managers who were also extremely skilled technologists moved back to being individual contributors. They were excited by the possibility of being able to be technical leaders without being people managers.

Many tribes are now starting to consider their Tribe/Guild step people as incremental members of their squads so that when they are helping outside their squad they don’t adversely affect the ability for the squad to get its work done.

Lessons

We learned many lessons in the creation of Steps. If there was one thing that I think was paramount, it was that this ended up being a lot more like Culture Change than we expected it would be. Our view was that we were creating something that would support and reinforce our company culture. In fact, we were specifying something where there had been only people’s own conceptions before. So, while we feel that we did create something generally well aligned and supportive of the Spotify culture, it was still a change for many individuals in the organization. If we had realized that, we would have treated the rollout much more like a Culture Change effort, and used more of those techniques in our rollout plans.

Another major lesson that I didn’t capture above was that we didn’t really have our long term support plan in place. We did put together a plan around supporting the effort, but that plan was created after the launch of Steps. As I mentioned above, the working group was pretty tired from the effort of getting the framework to that point. It would have been much better to think through the long term support needs of the work closer to the beginning of the effort and then adjust them later as we learned more.

I think that the process of using increasingly large groups to give feedback was quite good. It definitely resulted in a much better result than if we had just generated the framework in a smaller, more isolated, group. We primarily used the managers and leadership of the organization to collect feedback though. We would have benefited from including groups of individual contributors as well, especially if we could include the same people in multiple reviews of the document, just as we did with the coaches and leadership.

There was something that I neglected to mention in the first post that I thought was quite valuable. Once the Steps Framework was starting to solidify, we did a “simulation” to see what a potential distribution of steps was in the organization might be once we rolled it out. The committee had what we thought was an ideal distribution based on Spotify’s hiring strategy and what would make sense for an organization of our size. We asked each manager to estimate (in a non-binding way) which step each of their employees was on. We then combined these to get a master histogram of what the distribution might be. Luckily for us, we got a distribution fairly close to what we were hoping for. So, it was a good validation of the Framework at that point. This also had the effect of making each manager have to apply the Framework in a concrete way, which also generated more good feedback.

In addition to the lessons in the rest of this document, I want to reinforce the issues we encountered by underestimating the efforts required from Compensation and Benefits as well as our back-office systems to support this work. While HR was involved from the onset, we didn’t really take into account the timelines for the structural support to make the whole effort work. We should have involved those groups from the beginning, which would have avoided difficulties later.

Conclusions

In the process of leading the effort around career pathing in the technology organization at Spotify, I learned a tremendous amount. While I obviously spent more time thinking about how to motivate and incentivize employees than I ever had before, I also learned a lot more about my company and my coworkers than I had expected to. I was also reminded clearly how different my path had been than any of my fellow members of Technology. Things that I had learned experientially, I had to put reasoning and explanation behind and that was incredibly valuable. Many of our employees had never been at a company with career path support at all before. Those with experience at other companies that had career path frameworks, had not seen anything like the working group had created. As I talked with people individually and in small groups, I found better and better ways to articulate the reasoning behind the things that had just seemed obvious to us as we wrote the document. This helped us improve the document immeasurably and hopefully also has helped clarify the things that I’ve written about in these posts.

The effort of creating this framework also raised some issues in the organization that we’ll need to address as a group. The culture of Spotify is characterized by the Swedish word “lagom.” We try to see each other always as equals. We encourage people to challenge their leaders if they don’t think they are right. As is reflected in the Steps document, we put higher value on strong teams than on strong individuals. We also imbue our teams and our individuals with autonomy in how they do their work. We tried to incorporate all of these ideas in our Career Steps. When people in the organization wish for more recognition and status to be reflected in our career pathing, is this counter to our culture, or is this indicative of the culture changing? What is the role of career pathing in enforcing a desired culture rather than supporting it? These are questions we’ll continue to look at as we evolve Career Steps.

Spotify is a unique company, with a singular culture. The specifics of what we created and the lessons that we learned may or may not apply to your company. In aggregate, hopefully you will find our experiences and learnings valuable as you think through how you want to do similar programs at your own company.

The video of my talk from QCon Shanghai is live

Had a great time at QCon, met a ton of interesting folks, and the organizers were awesome. The video and slides are now live on their site: http://www.infoq.com/cn/presentations/building-strong-engineer-culture

There are a couple issues to be aware of. They did the slides separately, so they don’t always line up with what I’m saying. Also, since I was being simultaneously translated, I talk more slowly and pause more often than I would normally to let the translator catch up.

I think that this will be the last time I give this talk publicly, now that it is recorded fairly well, and available world-wide.

The Myth of the Startup in a Large Company

I was reading this post from John Gruber, which had this paragraph in an ad for the iOS development team at Google:

My thanks to Google — that’s right, Google (kind of awesome, right?) — for sponsoring this week’s DF RSS feed. They’re hiring developers and designers for their iOS app teams, which operate like a start-up within the walls of Google.

and I thought about the number of times I’d heard that line: “Operates like a startup within insert large company name here” or “Operates like a startup, but without the risk.” I’ve heard that line so many times from recruiters, from friends… To be honest, I’ve even said it myself a few times, trying to sell a prospective candidate who I was trying to woo away from a startup.

That notion, of working like you are in a startup, but being part of a much larger organization, is a myth. Anyone who says it is naive, disingenuous or just plain wrong. Large companies that try to build those kinds of teams; be it “innovation lab”, “startup experiment” or “corporate startup incubator” usually fail to achieve the innovation or energy they sought. The result is usually a whole bunch of wasted money and angry employees who felt like they were promised a bill of goods.

Stewart Butterfield, discussing his experience selling Flickr to Yahoo:

They sold out to Yahoo assuming that they’d be backstroking in rivers of money and terabytes of memory. Instead they had to fight for everything: servers, people, time.

This is talking about the inverse problem, but it comes down to the central crux of the issue at large companies: resource contention. This is beyond innovator’s dilemma.

In a startup: all of your attention is spent on finding the right product/market fit, finding customers, finding a flow of income, and/or finding investment. You will make the trade-offs you need to get your product off the ground. Often this may mean choosing poor technologies in the short term to help you get going more quickly. Your resources are limited, you need to make do to get going. Maybe you will take some short-cuts in other areas just to get the product launched. You are fighting for your life and your income, and you will do whatever it takes to get there. Why do you do it? Because you love the energy, or because you are looking for the fiscal payoff. No risk, no reward.

In a big company, you don’t have to make those trade-offs. There is probably a very mature infrastructure to build on; there is a brand to build off of; there is the promise of a paycheck no matter what the outcome. It is these conditions that destroy the innovation.

There is a mature infrastructure, but maybe it is not a great fit for what you are trying to build. Maybe you just need some small tweaks, but the infrastructure team is primarily focused with serving the existing teams that bring in the revenue; it will be hard to get your needs prioritized. Maybe you can even prototype or launch with your own skunkworks infrastructure; that won’t last for long. The corporate infrastructure is vetted, financed, regulatory compliant, and they own their turf and don’t appreciate someone jury-rigging something else.

There is a brand, but that brand is well known and highly controlled. You can’t launch just anything using that brand. It needs to be vetted. This means that instead of focusing on building your product, you are instead focusing on getting internal support. Maybe you launch under some new secret brand. This may work for a little while, but if you are successful, there will be increasing pressure to join the fold. And in any case, launching under a secret brand basically kills the benefit of the being part of the parent company.

The lack of risk is its own deterrent. Knowing that you get the paycheck is nice, but it is also understanding that you have very little ownership in the outcome. It isn’t “your” product, it is your corporations’ product, you are just one of the people on the team. While you may still put in startup hours for the joy of it, eventually you will realize that you aren’t getting the startup reward for all your hard work, and that is pretty demoralizing.

The general problem is that even if you have the deep pockets of a large corporation backing you, you don’t have the ability to do what it takes to survive. From the minute the project is launched, you are on a clock. Because you are part of a larger (presumably already profitable) parent, you will be restricted from certain business models: it’s hard to justify spending a few years taking substantial losses to scale your business quickly when you are seen as a drain on the profits of your parent company. Amazon, with their don’t-ask-us-about-profits model could never have been created as a division of Microsoft. The shareholders would have rebelled.

If you don’t succeed quickly, your team’s resources will be coveted by the teams around you. They are like vultures waiting for you to fail, and they will rush to declare you a failure as early as possible if they think they can benefit from it.

If you are successful and start to grow, you have the same problem. Teams will attempt to co-opt your mission, or take over your team, or switch you onto the “official” technology stack, or just flood you with resources trying to get some of your “startup” energy.

If you are successful by startup standards, that may not be seen as much of a success in a larger parent company where there is an established business. Being slightly profitable is a huge win for a startup; being slightly profitable is a major loss for an established corporation.

So, how can you create a startup in a large company? I think the university model is an interesting approach. Say you are Company X, a large multi-national technology company, and you are constantly challenged by your inability to move at startup speed or innovate. Instead of creating a startup team inside some division; instead create an actual startup.

Solicit pitches from your entrepreneurial employees. Pick one or more, and fund them as independent companies. Give the founders an equity stake in the new venture, but Company X will own a significant stake as well, plus some non-exclusive licenses on the IP. Allow them to recruit from your company, but they will no longer be employees of Company X. If the venture fails, they may be able to interview to rejoin Company X, and they may get to get some of their benefits back, but there is no guarantee of employment. Also, get them out of your building. Allow them to raise outside investment if they need.

By sponsoring your own employees, they are likely to build in a compatible way with the way your company works, given that it is their training. They will know your industry. They won’t be complacent because they can’t afford to be. They will also be invested because they will directly benefit from their success. In the end, you will get the innovation you want, and probably cheaper than if you tried to fund it from within your own cost structure with all of its overhead.

[This post was updated on August 23, 2014. Nothing was removed, but I added some more thoughts about business model limitations and startup success levels not matching the expectations of a more established company]