A response to the IEEE article: Yahoo’s Engineers Move to Coding Without a Net

http://dilbert.com/strip/1996-07-01

link to the article

First, a disclaimer

Let me start out with the understanding that the article as written is being interpreted through the writer’s own biases and background. I certainly have been misquoted or had a radically different interpretation of my words by an author than was meant.

This is a response to the article as written. I would be happy for someone from Yahoo to let me know if they feel the article was an accurate depiction or not.

Yahoo’s new approach to quality is something many other companies already do

What yahoo learned is what many other companies have learned a long time ago. That segmenting test and development is a bad idea. It leads to developers and testers living in their own silos and playing a blame game on each other. Agile methodologies challenged this idea 16 years ago. Having separate test and development teams is a pretty big red flag for waterfall development.

It wasn’t the fact that there were QA that was the problem

Taking away QA can improve quality to a point because it makes quality everyone’s job, but it won’t get you a high-quality product. You could do that by creating a culture of quality, making everyone feel responsible for quality and keeping the QAs on the teams. You can’t find every bug in the product through test automation. Developers tend to be poor testers of their own code, because they expect it to work and they have a hard time using the product like a non-developer. Having someone on the team with an expertise in quality thinking and a user mindset can do real wonders for product quality.

What Yahoo should try next in their quality evolution

Yahoo could also have increased quality and velocity substantially by keeping the QA, integrating the teams of developers and testers and by creating a culture where everyone feels responsible for the quality of the product. I contend that in their evolution they might yet find another factor increase in these by doing exactly that.

Nice deck from Dan McKinley of Stripe: Choose Boring Technology

As you grow as a developer (and development leader) and you work with more and more technologies over time in different projects, you start to realize how easy it is for the team to get more focused on the challenging technical problems than the actual product issues. Ignoring the product issues will kill the product (and possibly the company). With limited attention (he calls them innovation credits), it is best to put your effort into innovations that can differentiate your product. All too often, teams get more focused on the next cool technologies, turning everything into a nail as the old saying goes.

Dan McKinley does a great job explaining this in his talk below.

Video of my talk “Apportioning Monoliths”

This was my talk at the Daho.am conference. Listening back to it now, I am struck by how often I said “many, many.” And I cursed! I usually try not to do that. So, it’s a bit of a looser take on this presentation. Luckily the audience had beer (this was in Bavaria, after all), so all were fine with it. I had flown in from Stockholm that morning, so I might have been a bit more tired than I thought…

I was really impressed by the lineup of speakers and the content of the presentations. A really good day. The Stylight engineering and event teams did a great job.

The Spotify model: how to create, dissolve, and remix teams to be more dynamic and more innovative

This post was originally written by request and posted on popforms.com. Special thanks to Kate Stull for requesting the article, and helping me with editing.

 

One of the most challenging parts of managing a traditional, hierarchical, organization is being responsive to new opportunities; especially those that require leveraging skillsets outside your own team. At Spotify, our organizational model allows us to create, dissolve, and remix teams with a minimal disruption to individuals or managers. This gives us tremendous abilities to address both temporary and long term opportunities.

How it used to be

As a manager at Microsoft and Adobe, I was always challenged when there was a problem or opportunity that required repurposing a team or adding on additional scope to an existing team.

This kind of thing comes up all the time: a business development opportunity, or integration with another product. Often, this would require small efforts from multiple specialized teams.

It would cause disruption as those teams had to change their current plans and had to coordinate around a new challenge while still making progress towards their existing goals. Given that people and resources were managed within the team, and managers were still responsible for delivery of their existing commitments, often it would be hard to motivate them towards supporting this new effort.

Creating a new “tiger” team is often the solution in these situations, but that isn’t always an adequate solution for long-term or permanent projects since it essentially punishes the managers of the existing teams and requires finding a new temporary manager for the new team.

Another problem in existing organizations is figuring out what to do with a team whose project has been cancelled.

If the team is a high-performing team you may try to turn the team onto a new problem, which may or may not be a good fit for their skills and experience. You may instead dissolve the team, assigning the members to new teams based on the needs of those teams rather than the preferences of the individuals. You may leave it up to the individuals to find new roles in the company or face layoffs if they are unsuccessful.

These solutions end up punishing both the individuals on the teams and their managers, often for reasons beyond their control. In an organization seeking to innovate (which requires some amount of failure), it sends a counter message to one of experimenting and taking chances.

How we remix teams at Spotify

At Spotify, we wanted to create an organization that allowed us to be dynamic around our staffing, and adaptable in our teams.

We embrace failure as being important to learning and innovation, so we didn’t want dissolving a team to be a punishment. We put this new organizational model into effect over two years ago and have been working with it since. In that time, the technical organization has grown from 250 to over 600 people. We went from having three engineering offices to five, and from having 30 teams to over 70.

We focused on building full-stack, autonomous teams, built around a single, clear, mission. The expectation is that once the team’s mission has been fulfilled that it will dissolve.

To this end, new teams are constantly being created and old teams dissolving, with their members building new teams or moving into existing teams if they need additional staffing. Rather than create a formal manager role for these teams, we decided instead to make the teams collectively responsible for fulfilling their mission.

With this model, changing teams does not mean changing your manager, and dissolving a team doesn’t leave a manager looking for a new role.

We do have a strong belief in role of the manager as mentor to their reports, so we have a strong managerial culture; it just is manifested in a matrix, rather than hierarchical model.

Why Chapter Leads work better than traditional managers

Our technical managers are called Chapter Leads. They are usually responsible for managing a narrow range of developer disciplines within their larger organizations, for example: mobile developers, or backend developers. A Chapter Lead usually has direct reports in multiple teams in the organization.

For an individual, it is common to change teams, but it is less common to change managers. As each team is responsible for their full stack and all platforms, a team may include members from several chapters.

An example is the search team in my organization. Its members come from five different chapters: the backend chapter, the mobile chapter, the keyboard and mouse (desktop and web) chapter, the agile coach chapter, and the test chapter. Additionally, there is a product owner and a UX designer, both of whom are part of the product organization (which is organized more traditionally).

The Chapter Leads are not responsible for deliverables directly. Instead, the Chapter Leads are responsible for staffing the teams appropriately; for working with the individuals in the team to help them grow; and for working with the Product Owner and the Agile Coach to make sure that the team is performing well together.

Since the Chapter Lead has visibility into multiple teams, they can often identify short or long-term skill set needs and are empowered to resolve them.

Sometimes, this means switching two developers in two teams temporarily for a skill set need. Sometimes this means moving a developer into a different team to address a short term staffing need. This also means that if there is a new mission to be addressed, the chapter leads can work together to staff a new team to address that mission out of the existing teams in the organization.

A benefit of this model for an individual is that there are many opportunities for them to work on new projects or develop new skillsets since there are new projects spinning up on a regular clip.

When and how we remix and dissolve teams

This remixing is not constant throughout the technology team. We do have several very long-lived teams that are focused on features in the product, but even those teams will shift people between each other based on short or long-term needs. In some parts of the organization, specifically the infrastructure teams, they tend to be focused on short-term projects and are creating new teams more often. Those teams dissolve when they have completed their project.

We will also dissolve teams if we believe that their mission is no longer necessary. Usually this is the result of the team invalidating their mission themselves. We celebrate these conclusions just as much as the successful completion of the project, since we value the lessons from a “failed” project. Celebrating your failures as valuable lessons encourages risk taking, experimentation and innovation.

By striving towards a model that gives the individual consistency (their manager, and their Chapter) while still giving the organization fluidity and adaptability, we’ve found a happy balance that lets us extend our agile-first values beyond the work that a team performs to the organization as a whole. This has allowed us to focus on innovation and leverage opportunities that slower-moving organizations would have difficulty addressing.

Several companies have attempted to adapt our model but there is something critical to understand. Our organization model itself is fluid and continues to change and evolve to support the needs of the organization. The specifics of our implementation are less important than the underlying values and ideals that created it.

If you want the benefits of a dynamic organization, you will need to build something that is suited to the values of your own organization. I would argue that a central requirement is endowing teams with autonomy and decision-making authority. If you cannot support this, then you should look instead to adapt your existing model to remove impediments and bottlenecks instead.

Thoughts on emulating Spotify’s matrix organization in other companies

I was in San Francisco in December for a conference. While I was there, I ended up connecting with a couple different companies who have been inspired by Henrik Kniberg’s whitepaper on Scaling Agile at Spotify, and who have been trying to implement some of those ideas in their own companies.

I think Henrik’s paper does an excellent job on describing the what and how, but it seems that the “why”, and some of the critical ideas can get lost when others read it.

If you haven’t read Henrik’s white paper, I’d suggest that you read that before reading the rest of the blog post. I will do a quick recap here though.

Spotify’s engineering and product organization (now over 600 people) is split into several large groups called Tribes. Each Tribe is responsible for a set of related features or engineering functions. For example, our largest tribe is the Infrastructure and Operations Tribe, whose name is pretty self-explanatory. I am the Tribe Lead of the Music Player Tribe. We handle importing audio from our label and distribution partners, storing and streaming the music, search, collection and playlists, artist pages, music metadata and the music knowledge graph that supports things like the above, but also ads, discover, radio and the like.

While the whole company works on the same product, Spotify, each tribe is set up so that it can work as independently as possible. As you will see below, a critical aspect of our organizational model is to give autonomy at every level. This helps remove decision-making bottlenecks and unnecessary dependencies, which improves velocity.

Each tribe is composed of squads. A squad is a team that is responsible for a single feature or component. For example, there is a squad that is responsible for search, a squad responsible for the AB test infrastructure, etc. As each tribe is set up to be as autonomous as possible, each squad is also set up to be autonomous. In the context of a feature development team, this means that each team is a full-stack team. A full-stack team is responsible for both backend implementation as well as the user interface implementation, on all platforms.

A typical feature squad would have web service engineers, iOS, Android, web and desktop engineers as well as testers, an agile coach, a product owner and UX designer. With this staffing, the squad has everything they need to implement anything related to their feature. They don’t have to wait on another team to implement the pieces they need. They also have autonomy and local decision-making ability, so there are few impediments on their speed of execution.

To this point with Tribes and Squads described only, Spotify may seem like a traditional, hierarchical engineering organization, but this is where the similarity ends. Unlike a traditional organization, a squad does not have a single engineering leader whom everyone on the team reports to. In fact there is not a single leader for the squad. The Product Owner and UX Designer work with the engineers and testers collectively to make decisions about their features.

Spotify is not a “no manager” culture though. We feel strongly that managers have a role in supporting the people who work for them. Managers have an important role to play as technical and career mentors and organizational communication conduits. Rather than have management hierarchies follow organizational ones (creating a de facto command-and-control structure), we instead have first level managers responsible for technical functional areas across multiple squads.

We call these reporting and functional groupings “chapters.” Again, as an example, reporting to me, the tribe lead, are Chapter Leads. In my tribe, there are currently three backend (services) development chapters, two front-end development chapters (including all the UI developers), a core library chapter, and a test chapter.  These seven Chapters span eight different squads. Almost every chapter lead has reports in 2 squads, and a few of them have reports in three squads. Almost all chapter leads work within a squad in some capacity as well, either as developer or technical lead, and not necessarily within a squad that has members of their chapter.

This chapters/squads matrix organization is critical to our organizational agility. It allows the squads and the tribe to be more fluid. We can spin up a new squad to take advantage of an opportunity or handle an issue without worry about changing reporting structures. If a squad completes its goals and has no reason to exist anymore, we can dissolve it without punishing a manager. This is a very important difference to a traditional hierarchy, because it gives us a lot of flexibility and helps us avoid the old political issues around empire building and resource contention.

In addition to our Tribes, Squads and Chapters, we also have virtual organizations called Guilds. Guilds are cross-tribe organizations centered on different technical or interest areas and their membership is voluntary. The guilds serve as ways to promote cross-tribe collaboration and communication, especially around things like best practices. For example, we have guilds for Web Development, Agile Practices, Leadership, Test Automation, etc. The guilds foster developer-to-developer communication, which is one of the ways that we keep all these autonomous teams from all going off in completely different directions.

From Henrik’s paper, this diagram illustrates the organizational structure I discuss above:

Screen Shot 2013-11-09 at 7.30.08 AM

I’d like to give some more background around why we have implemented this organizational model at Spotify; elaborate on our goals for implementing it, and discuss the aspects of our culture, which have been critical to its success. It is really great that other companies have been inspired by the way we work, but I think if you implement only parts of the model or try to impose it on a very different corporate culture; you will have a difficult time achieving the same level of success with it that we have had.

If you are considering using the Spotify organizational model within your company, there are a few things that will be critical to your success:

Our organization model assumes that engineering is doing development with agile methodologies. Our goals for autonomy mean that we do not prescribe any particular development framework our squads must subscribe to. However, all of squads use agile development methodologies. While we do our best to minimize dependencies between squads and tribes, there will always be some since we are all working on the same product. Any individual squad choosing to build using a traditional waterfall or other non-agile process would not be able to keep up with the rapidly changing teams around them. If they tried to impose some sort of process on other teams so that they could follow a longer-term development plan, they would start slowing down the rest of the organization.

A critical requirement in making our organization model work well is that the entire company works with and understands agile practices and processes. While our legal team isn’t doing scrum or kanban, they are used to working with engineering teams that use agile processes. Having the entire corporation understand and agree with agile means that no line of business area becomes an impediment to the speed of implementation. Think of this in terms of Amdahl’s law, applied to a development organization. If your development teams are working quickly in parallel, but marketing or legal is not supportive of an agile approach, they will become a bottleneck that will slow down the overall speed of the company.

Similarly, implementing this with just one team as a test in a larger engineering organization will be prone to issues. A traditional engineering organization is not usually set up for autonomy. Adding a single autonomous team within that web of dependencies is likely to hamper and frustrate the team and skew the results of the experiment.

While I’ve mentioned autonomy in several places already, I cannot understate its criticality. Each squad must be empowered to make their own decisions, not only on features, but also on development model, infrastructure, and implementation. Every decision that has to be approved outside the team means a delay that slows development. Each dictated implementation or infrastructure decision means that a technology that doesn’t fit to the way the team works or something new that must be learned before the team can build. This is a challenge to coordination, but in practice it isn’t as bad as it might seem. Best practices and technologies do spread from team to team through avenues like guilds. Teams adopt these practices and technologies on their own schedule or pioneer new ways of working if it makes it easier for them to deliver value to our customers and then spread their learnings to the other teams.

Trying to layer the tribe and squads model over a traditional reporting hierarchy would be very problematic. While we have many long-lived squads at Spotify, we are constantly creating and disbanding squads as new needs arise or missions are fulfilled. Squad membership will also ebb and flow as required by the needs of a squad’s mission. Traditional hierarchical organizations are self-perpetuating and restructuring them is very disruptive both to the management chains as well as the individual team members. You would gain some of the benefits of the Spotify model by building full-stack teams in a traditional organizational hierarchy, but you would lose a lot of the overall speed benefits that we leverage with our matrix organization.

In conclusion, if you are looking to improve the speed of your development and are inspired by Spotify’s organizational model, there are a few things that you need to understand. Our model works because it is layered on top of our corporate culture. Our culture values autonomy, agile processes, democratic teams, and servant leadership, amongst other things. You can certainly take some of the ideas from the way we work and apply them in your organization, but without the cultural underpinnings you may not get the same returns.

Some other references worth checking out are Henrik Kniberg’s keynote at the Paris Scrum Gathering and my keynote at the { develop: BBC } conference.