Leading What You’ve Never Done Before

What happens when your role outgrows your technical background

Tokyo, Japan – Photo by Kevin Goldsmith – January, 2026

One of the guarantees of career progression in technology leadership is that, eventually, you will outrun your own resume.

Most technology leaders start as specialists. Frontend, backend, SRE, QA, mobile, data. Your first leadership role is often managing a team you used to be on, doing work you understand deeply. That familiarity is comforting. It’s also temporary.

As your scope grows, you start owning work you’ve never done yourself. A backend lead suddenly finds themselves managing frontend teams. An application developer finds themselves responsible for mobile, infrastructure, data, or security. Eventually, the gap widens enough to feel genuinely uncomfortable.

That discomfort is normal. It’s also where many leaders get stuck.

Early in my management career, I felt that pressure acutely. I had spent years as a C++ application developer. That expertise was my credibility. When I found myself responsible for iOS clients, Ruby backends, and systems I didn’t know well, I worried that I was no longer qualified to make decisions or lead effectively.

My instinct was to compensate by trying to learn everything. I wanted to stay technically credible in every domain I owned.

That instinct turned out to be the wrong one.

The trap of technical credibility

As leaders, we often confuse credibility with expertise. We assume that to lead an area, we need to be the smartest person in the room about it. That might work when you’re leading a single team in a domain you know well. It breaks down quickly as your scope expands.

If you try to become an expert in every system your organization owns, you slow everything down. Worse, you undermine the people you hired to be experts. You become a bottleneck without realizing it.

The opposite mistake is just as damaging. Some leaders disengage entirely from unfamiliar domains. “I don’t know that area, so I’ll just trust the team and focus on what I know.” That signals disinterest, even if you don’t intend it to. Teams notice. Motivation drops. Accountability weakens. And you’re still responsible when things go wrong.

Neither extreme works.

The real challenge isn’t a lack of knowledge. It’s understanding how your role changes as your scope grows.

How leadership actually shifts

There’s a progression most leaders have to make, whether they realize it or not:

You start by doing the work.
Then you review the work.
Then you design how the work gets done.
Eventually, you design the system that produces good decisions.

When you’re leading areas you’ve never worked in, your role moves firmly into those last two stages. You are no longer there to solve problems directly. You’re there to make sure the right problems are being solved, in the right way, by the right people.

That shift is uncomfortable if your identity is still tied to being a strong individual contributor. But it’s unavoidable if you want to lead at scale.

Failures at senior levels rarely stem from a lack of subject-matter expertise. They come from unclear ownership, misaligned incentives, poor decision-making systems, and leaders who stop being curious about the parts of the organization they don’t personally understand.

What to focus on instead

You don’t need deep expertise in every domain you own. You do need a solid mental model.

That means understanding the vocabulary, incentives, constraints, and common failure modes. It means being able to follow the conversation, ask good questions, and recognize when something doesn’t quite add up.

The questions that matter at scale are remarkably consistent, regardless of domain:

What does “good” look like here, and who decides?
What happens if this fails?
What risks are we knowingly accepting?
Are we relying on heroics?
How do problems surface early?

Those questions don’t require you to know the implementation details. They require judgment, context, and experience. That’s the transferable part of your background.

Your job is to let specialists own the “how,” while you own the “why” and the “so what.” When you provide clear context and intent, good teams make better decisions on their own.

Things you have to stop doing

Leading unfamiliar domains also requires unlearning some habits.

You have to stop pretending to be the smartest person in the room. Your job is to hire people who are better than you at their craft and give them room to do it well.

You have to stop asking detailed technical questions just to mask uncertainty. People can tell when you’re doing it, and it erodes trust rather than building it.

You have to stop measuring success only through artifacts and dashboards. As you move up, outcomes matter more than outputs. Metrics are signals, not substitutes for understanding.

Trust without abdication

Trusting domain experts doesn’t mean stepping away. It means staying engaged without micromanaging.

You are still accountable. You still ask questions. You still challenge assumptions. But you don’t make decisions for teams in areas where they clearly know more than you do.

You also have to create space for people to disagree with you. Role power makes that harder than most leaders realize. When you suggest something, people often hear a decision. If you don’t actively invite dissent, you lose valuable signal.

That matters even more when you’re not the expert.

The long view

Over time, leading unfamiliar domains gets easier. Not because you become an expert in everything, but because you get better at learning just enough, asking better questions, and building trust with the people who know the details.

Progression doesn’t mean abandoning your technical background. It means extracting the lessons from it and applying them at a different level.

If you’re feeling exposed because you’re leading something you’ve never done before, that’s not a sign you’re failing. It’s usually a sign that your role has changed faster than your self-image has caught up.

That gap is uncomfortable. It’s also where real leadership growth tends to happen.

To hear an extended discussion of this topic, please listen to my podcast episode: Leading What You’ve Never Done Before.

Originally published in the It Depends: Lessons in Technology Leadership newsletter at https://kevingoldsmith.substack.com/p/leading-what-youve-never-done-before

Process Is a Tool, Not a Virtue

Tokyo, Japan – Photo by Kevin Goldsmith – December 2025

Process is one of those topics that reliably creates tension in growing engineering organizations. Too little of it and everything feels chaotic. Too much of it and nothing moves. Most teams I work with aren’t confused about whether process matters. They’re struggling with when to add it, how much to add, and how to tell when it’s gone too far.

Process is almost always introduced as a solution to a problem. What’s less obvious is that process also creates its own problems. Every new rule, gate, or ceremony adds coordination overhead. Sometimes that overhead is worth it. Sometimes it quietly becomes the thing slowing you down.

The hard part isn’t deciding whether you need a new process. It’s understanding what problem you’re actually trying to solve.

Why Process Exists (and Why It Backfires)

At its core, process exists to solve coordination problems.

When there’s too little process, you see the same patterns repeat:

Teams duplicate work.
Ownership is unclear.
Quality varies wildly.
Decisions get revisited because there’s no shared understanding.
Production systems break in ways no one anticipated.

In small teams, this can be survivable. People talk. They improvise. They find and fix problems quickly. But as the organization grows, informal coordination stops scaling. You can’t rely on hallway conversations once dozens of teams are working in parallel.

That’s where process starts to help.

The failure mode comes when a process is copied wholesale from larger organizations or added reflexively in response to a single incident. What speeds up a 10,000-person company will slow a 500-person company to a crawl, and it can completely stall a 100-person startup.

A process that doesn’t match your size, culture, or problems doesn’t just fail to help; it actually harms. It actively makes things worse.

The Real Question Leaders Should Ask

The question isn’t “Do we need more process?”

The real question is: What coordination problem are we trying to solve, and what is the lightest-weight way to solve it?

Leaders tend to get this wrong in two predictable ways.

Some avoid adding process entirely, trying to preserve a sense of scrappiness long after the organization has outgrown it. Things that used to work stop working, but no one wants to acknowledge that the context has changed.

Others swing too far in the opposite direction, importing big-company processes because “that’s how it’s done at scale.” The result is unnecessary gates, approvals, and bottlenecks that strip teams of autonomy without actually improving outcomes.

Both paths lead to frustration, just for different reasons.

Process Is a Choice About Autonomy

Every process decision is a decision about autonomy.

Good processes increase autonomy by clarifying boundaries, reducing surprises, and preventing rework. Bad processes reduce autonomy by adding gatekeepers, slowing decision-making, and centralizing control.

A simple test helps here: Does this process let teams move faster, or does it make them wait on someone else?

Code review is a good example. Lightweight reviews prevent production issues and enable teams to move faster with confidence. Mandatory review meetings that require everyone to wait for a calendar slot are a very different thing. Same intent. Very different outcome.

A Simple Process Evolution Cycle

Over time, I’ve found it helpful to think about process in three phases.

First: chaos signals.

You don’t add process because it feels right. You add it because you’re seeing repeated friction—across teams, not just in one place. Rework, unclear ownership, inconsistent quality, inter-team conflict, and recurring bottlenecks are all signals worth paying attention to.

Be careful here. A disagreement between two people is not a process problem. Solving a people issue with an organization-wide process is one of the fastest ways to create unnecessary drag. Similarly, repeated friction between the same two teams may be more indicative of an organizational structure issue rather than a lack of process.

Second: deliberate addition.

Add the minimum process required to solve the specific coordination problem you’ve identified. Design it to enable autonomy, not replace it. And treat it as an experiment, not a permanent fixture.

Every bit of process takes away some autonomy by design. The goal is to take away as little as possible while still solving the problem.

Feel free to learn from other companies’ processes, but ensure the components you implement are scaled to your organization’s size and the problem you are trying to address, and aligned with your culture.

Third: monitoring and pruning.

Process ages. What helped at 50 people often slows you down at 200. What worked at 200 creates friction at 500.

If you aren’t actively removing processes, you’re accumulating them. Regular retrospectives, not just within teams, but across projects and the organization, are one of the best ways to surface what processes are no longer pulling their weight.

If a process no longer serves you, remove it. Don’t keep it out of habit.

One Practical Example

One lightweight process I’ve used successfully in multiple organizations addresses a common bottleneck: cross-team changes.

Team A is working on a feature or fix that requires a change in code that Team B is responsible for. Instead of Team A filing a ticket and waiting weeks for Team B to prioritize and make the change, Team A makes the change itself. Team A notifies Team B of their intent, in case there are important gotchas they should be aware of. Team A submits a pull request, and Team B reviews it. Once merged, ownership of the new code transfers to Team B.

The result is fewer queues, faster delivery, and greater transparency in accountability, with very little added overhead. Same coordination problem. A very different solution from a backlog and a waiting line. This straightforward process helped one of my organizations avoid a new heavyweight quarterly planning process (so many meetings!), which was proposed to address increasing cross-team dependencies

How Process Goes Wrong

Most process failures fall into a few buckets:

Copying another company’s process wholesale.
Using process to compensate for poor communication or a lack of trust.
Adding controls to feel safer after an incident, without considering the cost.
Creating a process for the whole organization to address friction between two individuals or teams,
Never revisiting what you’ve added.

In the incident example, leaders sometimes add new processes in response to unexpected fragility, which leaves them fearful about what else they do not know. Process as fear-driven control kills speed and erodes trust. Guardrails like automation, testing, and monitoring usually achieve better outcomes with less friction.

Bringing People Along

People tolerate a new process when they understand the problem it solves.

Be explicit about the why. Involve teams in the design. Ask for the lightest-weight solution. And commit publicly to revisiting it. Most teams are willing to try something if they know they won’t keep living with it if it doesn’t work.

The fastest way to create resistance is to impose a decree from above without context.

The Point Isn’t Less Process

Early in my career, I thought that process was the enemy of speed. Experience taught me something more nuanced.

Process isn’t good or bad. It’s a tool. And like any tool, it has to fit the problem.

Clear ownership boundaries can increase freedom. Lightweight coordination can make teams faster. Thoughtful pruning can restore momentum that’s been quietly lost.

The goal isn’t less process. The goal is fit-for-purpose processes, deliberately added, designed to protect autonomy, and removed when they stop helping.

That’s how you scale without losing what made your team effective in the first place.

To hear an extended discussion of this topic, please listen to my podcast episode: The Right Amount of Process: Finding the Balance Between Chaos and Bureaucracy

Originally published in my newsletter at https://kevingoldsmith.substack.com/p/process-is-a-tool-not-a-virtue

Making Technology Choices That Last

Every few years, a new wave of technology arrives, and people in our industry start to declare that everything is about to change. Right now, that wave is generative AI. A few years ago, it was Web3. Before that, mobile and public cloud. Before those, the web itself. Some of these shifts turned out to be genuinely transformative. Others didn’t.

Having lived through enough of these cycles, I have learned that the excitement and fear are always the same. The hard part is never knowing in the moment which changes will last. The other hard part is that, as technology leaders, we have to make decisions about them anyway. Should we adopt this new thing? Wait it out? Ignore it? Every one of those choices comes with a cost.

AI is the current case study, but the real question is broader. How do we make technology choices at all? How do we separate what is genuinely important from what is just noise?

Curiosity with Discipline

Being a technical leader involves staying curious. You have to pay attention to what is happening in the industry. That means reading, listening, and talking to peers. You cannot afford to ignore change because sometimes it is not a fad at all. Sometimes it is the thing that allows the next generation of companies to surpass your company.

Hopefully, you don’t see this curiosity as a burden. You chose a career in technology, after all!

But curiosity alone is not enough. There has to be discipline behind it. I have seen teams that refactor constantly, switching frameworks every few months or adopting tools that promise faster builds or cleaner syntax. They expend a great deal of energy and end up standing in the same place. It is easy to confuse motion with progress.

The goal is not to be early. The goal is to be ready when it matters.

Knowing When to Pay Attention

When something new appears, the first question is not “should we use it,” but “should we even pay attention to it?” Most technologies take a while to prove themselves. They start with hobbyists and early adopters. Then you begin to hear about them in conference talks and blog posts. If you start to see teams you respect using it, that is a signal to look more closely.

There are exceptions. Sometimes, a new tool or framework solves a problem that nothing else does. If it unlocks something meaningful for your business or team, consider taking the risk of early adoption. But most of the time, patience pays off. You want to see that a technology has a community around it, good documentation, and some stability. You want to know that if the creator moves on, the thing won’t go away.

I have seen open-source projects from large companies become popular overnight, only to discover that they were designed to address those companies’ unique constraints. They required more effort to maintain than smaller teams could afford. The code was free, but there was a hidden cost in maintenance. Before adopting the new tool, ensure you are familiar with its operation and can use it efficiently.

Use the Curiosity Around You

You don’t have to do all the exploration yourself. The best engineering teams already have people who watch the industry and bring new ideas. You can turn that energy into a system.

Give people permission to explore. Encourage them to read, share, and experiment safely. Create a channel or regular forum where developers can showcase their work, whether it is a side project or a prototype. It shows them that curiosity is valued, not something to hide until after work hours.

You can tell a lot about your team’s culture by how it reacts to experimentation. If people are afraid to try things, they stop learning. If you reward them for exploring ideas responsibly, you build a team that continues to grow even when the company is stable.

Let the eager ones go first, but set boundaries. They should know what kinds of experiments are helpful in the business. Guide their curiosity. The best developers don’t need permission to learn, but they do appreciate direction.

Evaluating What You Find

When something looks promising, that is when leadership really matters. You have to evaluate it in a way that balances technical interest with business sense.

Start with questions:

What problem does this solve for us right now?
Is it mature enough for the scale and reliability we need?
What are the maintenance implications?
Who else is using it successfully, and what can we learn from them?

Build a small proof of concept. Keep it outside production at first. Compare it with what you already have. Measure performance, stability, and effort. Run it in parallel with your current solution so you can see real data before committing (shadow testing).

This part takes patience. It is easy to fall in love with a clean new abstraction or a faster benchmark. But the actual cost of new technology is rarely visible at the start. The hardest lesson I have learned is that the adoption work is always more expensive than it looks. The second-hardest is that we almost always underestimate the maintenance cost.

If possible, discuss with peers at similar companies. Ask them what happened when they tried it. Did it actually deliver value? Did it hold up? You can save months of effort just by learning from other teams’ experiments.

Managing the People Side

Every new technology comes with human change as well. Some team members will be excited. Others will resist it. Both reactions are normal.

The eager developers will show up ready to rebuild everything in the new tool. The skeptics will say there’s no need to change what already works. Both have a point.

Your job is to balance them. Channel the energy of the early adopters toward solving real problems. Give them structure and goals. At the same time, acknowledge the fear of the skeptics. For many engineers, expertise is tied to identity. When you replace a system they know deeply, it can feel like erasing years of experience.

Pair the enthusiasts with the skeptics. Let them learn from each other. Measure the results and share them openly. When the experiment works, celebrate the outcome. When it doesn’t, celebrate the learning. What matters is not whether every experiment succeeds, but whether the organization learns at a faster rate.

You cannot let your early adopters run wild, but you also cannot let the cautious ones stop progress. Most of the time, the right path sits somewhere between.

How to Adopt Deliberately

When you decide to move forward with something new, make it an experiment, not a crusade. Define a hypothesis about what you expect to gain. You may be looking for better scalability, lower costs, or a faster development cycle. Write that down. Then define how you will measure it.

Expect the transition to follow a change curve. Things will feel worse before they get better. People will struggle with the new approach and question why you made the switch. That is normal. Keep communicating what you are seeing and why you are continuing. When the metrics begin to show improvement, make that visible.

If it doesn’t deliver the results you hoped for, be willing to stop. End the experiment, share what you learned, and move on. That kind of clarity builds trust.

When it does work, document it. Recognize the people who led the effort. Build on that experience the next time. Adoption is not just a technical process. It is a cultural one.

Building a System for Technology Choices

The most successful organizations I have worked in treat technology decisions as part of their system, not a one-time event. They create space for discovery, experimentation, and evaluation before anything reaches production. They involve the team in technical direction, not just management.

If you are a CTO, you own the technology strategy; however, the best strategies often emerge from shared ownership. The people doing the work often see opportunities or risks before you do. Involve them. Encourage them to propose ideas, test them safely, and share what they learn.

After every adoption or rejection, hold a retrospective. Talk about what worked, what didn’t, and what to do differently next time. That habit turns random trial and error into institutional learning.

You can build a culture that views new technology not as a threat or distraction, but as an integral part of how the team evolves. Curiosity and discernment can coexist.

Closing Thought

Technology waves will keep coming. Some will reshape how we work; most will not. The challenge of leadership is recognizing the difference.

Good technology choices rarely come from being first. They stem from being thoughtful, from understanding your context, and from fostering a culture that learns quickly and acts deliberately.

The best technology decisions are not about what is new. They are about what is right for you, right now.

Originally published on my newsletter on November 02, 2025

Values -> Culture -> Everything

A practical guide to recognizing, protecting, and finding the right organizational culture

In this post, I’m revisiting a keynote I gave in 2013 at the BBC Developer Conference, titled “Building a Strong Engineering Culture.” Twelve years later, my thinking has both stayed the same and evolved. I want to discuss culture, not just for engineering teams, but for you as an individual, as a manager, and as someone seeking your next role.

What Actually Is Culture?

Culture’s a heavy word. We use it in many contexts, including company culture, engineering culture, and team culture. But what does it really mean?

Henrik Kniberg, whom I had the immense pleasure of working with at Spotify, distilled it beautifully:

“Culture is the stuff people do without noticing it.”

My definition is more formal: Culture is the manifestation of the shared values of the organization as represented by the actions of its members.

The key words here are values and actions. It’s not what you say, it’s what you do. And to Henrik’s point, it’s what you do without even thinking about it.

Real Values vs. Aspirational Values

Every company I’ve worked at has had its values spelled out somewhere. The problem? Many companies have publicly stated values that aren’t really their values. These are what I call ‘aspirational values’ – they are the values the company wishes to have or believes it should have, but they may not necessarily reflect the actual behaviors and beliefs of the employees.

How do you know what your current team or company’s actual culture is? Scott Berkun created a great test:

“Can an employee say no to a decision from a superior on the grounds that it violates a core value?”

If your company has a core value of honesty to customers, and your boss tells you to lie to a customer, could you say, “I’m sorry, I can’t do that. That doesn’t align with our core values”? And would your boss fire you, give you a poor review, or say, “I don’t care, do it anyway”? If so, it’s not actually a core value.

Think about what the real values of your company are. Not the stated values, the real ones. In companies where the stated and real values matched, those values would become shorthand. Someone would say “That wouldn’t be aligned to [value]” in a meeting, and it would just end the discussion. No disagreement, no argument.

Another way to think about this: if your team shares an office, where’s the thermostat set? Has everyone determined the temperature that they agree on? A new person enters, tries to change it, and somebody says, ‘Whoa, no. This is the temperature we work at.’ That’s a shared value in action. It’s not an important one, but it’s the same principle for how you approach coding or organize your work. This is a simple example of how shared values can manifest in everyday actions, from the way you dress to the way you communicate.

The Flow: Values -> Culture -> Everything Else

Your values create the basis of the culture. The culture then influences everything else:

Processes: How work gets done, or how you account for vacation time, and expenses.
Artifacts: Physical things like signs, swag, and how offices are decorated. At Microsoft, receiving the “Ship It” award upon shipping your product was a significant achievement. It reinforced the core value of delivering value to customers.
Rituals: Company and team meetings, how you celebrate, or how you bring teams together.
Beliefs: What you believe about the industry, about product development, and building successful companies.

Why Culture Matters

You’ve probably heard the quote attributed to Peter Drucker: “Culture eats strategy for breakfast.” I agree with the general sentiment, but I’ve seen plenty of companies with great cultures that struggled as businesses, as well as successful companies with punishing cultures. Patty McCord, former head of HR at Netflix, said it better: “Culture enables success, but it does not cause success.” A great culture helps you go faster, happier, and healthier. While an amazing culture alone won’t guarantee success, it will make things a lot harder without it.

Protecting and Reinforcing Your Culture

If you have a good culture, you need to protect it. Your culture and values must inform every process and framework that guides the company’s operations. Otherwise, you’re creating conflict within the organization between what you say and what you do. If you reward something other than the culture, the culture will shift to the behaviors and values that you reward.

Start With Your Career Ladder

I get particularly frustrated when companies adopt another company’s career ladder. You’re two different companies with two different cultures! Hopefully, that other company designed its ladder to support its values. If your values aren’t aligned with theirs (they aren’t), you’ll start promoting, hiring, and rewarding based on another company’s values.

Build your own career ladder based on your values. It’s a tremendous first artifact because it informs everything else, from how many levels you have (which affects promotion frequency) to the expectations at each level.

Hiring Is Critical

Who you hire either supports or hurts the culture you have. I don’t suggest requiring “perfect” alignment or becoming a monoculture; you need diversity of thought. However, if you have a core value of collaboration and you hire someone who prefers to work alone, dislikes collaboration, and produces good work independently, you have problems.

One: they’ll be unhappy because people keep wanting to collaborate. Two: if they stay and receive raises or promotions, it sends a strong signal that collaboration isn’t actually a core value. Three: if they end up in interview loops, they will be looking for people like them due to similarity bias, which will accentuate the problem.

Onboarding Matters

You can’t simply throw new hires into a team and expect them to pick up the culture. At Spotify, when someone joined, they would spend a sprint with all the people who joined on the same day, plus a few experienced Spotify folks: an agile coach, a development lead, and a product manager. They would build features together and ship them. They would learn why the company did things in a specific way.

Because those new joiners would end up in different parts of the organization, they’d reinforce and refresh that cultural understanding. If a team started to drift, they’d help steer it back.

This intro sprint was also an excellent opportunity to identify if someone wasn’t aligned with the company values. Better to know in the first sprint than months later.

Performance Reviews and Firing

When deciding an employee’s performance as a manager, your reference isn’t other employees; it’s the rubric, the career ladder. What are they supposed to be doing? What is the expectation at this career step? You’re comparing them against the rubric because your culture and values inform the rubric.

If you don’t use that as the yardstick, you start promoting or giving raises based on something else. People notice. I’ve heard “My friend in another team got promoted and they’re way worse than me, so why am I not getting promoted?” more times than I can count. That usually means management is not being consistent.

Can you make hiring mistakes? Absolutely. When it becomes apparent that someone isn’t aligned, even if they’re fun to be around and do adequate work, they will erode your culture. You have to make a decision. It’s better to move them along where, honestly, they’ll be happier. If you aren’t aligned with your company’s culture, it’s not a happy place for you.

Team Culture vs. Company Culture

I used to think Microsoft had a broken culture. I spent eight years there. I was happy for about one or two of those. But the problem was that I wasn’t well aligned with Microsoft’s culture. Nothing was wrong with Microsoft; I wasn’t a good fit for the company culture.

I tried to make my team work the way I wanted the company to work. I convinced my management to let me build a team and use extreme programming to deliver a feature. The project was incredibly successful. When I went back and said, “Look, it worked, can we do more?” my boss said, “You’re absolutely right, it worked better than we expected. However, no, we’re not going to do that anymore because that’s not the way the company works.”

He was right. That wasn’t who Microsoft was. I wanted to turn the company into the place I wanted to be, but that wasn’t what the company wanted.

If you’re hoping your team can change the dominant culture, good luck. It’s unlikely, especially in larger organizations. It isn’t impossible, but it’s very, very difficult.

I had a different experience at Adobe. Adobe was open to change, not fixed in its mindset. When I worked on a lean, startup-like project there and succeeded, Adobe rewarded that success rather than saying, “Wow, that was great, we’re never doing it again.” Adobe had a core value (stated or not) of being open to change. That was much in line with who I am, and I was very happy there.

When Culture Shifts

Culture isn’t fixed. It will evolve with the company and its employees, sometimes slowly, sometimes quickly.

Slow, Organic Change

Organic change happens naturally. Companies grow, expand into new markets, and hire new employees. Over time, the culture will change to incorporate the new shared values and new processes required to support the larger entity. That’s okay if you’ve been careful in your hiring to ensure that new hires align with the company’s core values.

If the company’s core values are genuine, they will endure as the company doubles in size, becomes public, or changes its funding models. The culture may shift slightly, but what makes the company the company — the culture — stays consistent.

Fast, Disruptive Change

Fast cultural change occurs when a company is acquired or when the board brings in a new CEO. An acquiring company may have no interest in your culture; they’re buying you for financial or business reasons. Your processes will change to align with them, you’ll inherit their career ladder (and thus their values).

Or a new CEO comes in and says, “This is crazy, we can’t run a company this way,” and starts making changes based on their values. This change is often a deliberate choice from the board to “shake up” a company, or because the board is unaligned with the company’s values.

You often see this in startups. They hire a C-level person from Meta or Amazon, and that person starts implementing things from their prior company because that’s what they know works. An Amazon person says, “We need six-page memos for all meetings because that worked well at Amazon.” It does work at Amazon. Will it always work at your company? No.

You can hire talented individuals from these companies; they have great people there. You need those who are open-minded, who understand their experience was for that environment, and who ask, “How do I take what I learned and apply it in this new context?”

When rapid cultural change occurs, you’re either going to be pleased about it or not. If you’re unhappy, complaining or fighting the change is not a good use of your time. If you’re open-minded and don’t actively hate it, try it out. You might learn new skills or approaches. But if you’re unhappy and unaligned, why are you staying?

There’s a quote from Shanley Kane: “Broken cultures break people.”

Finding a Culturally Aligned Company

If you’re looking for a new job, it’s essential to determine if a company aligns with your values. First, you have to know your own values. What’s important to you? What are your must-haves versus nice-to-haves?

Then develop questions you can ask your interviewers. Ask about what happens when the company is under pressure. What happens when revenue is short for a quarter? What happens when a product is late? If you’re in B2B, what happens when a customer is about to churn? When a company or its leadership is under pressure, that’s when its true values are revealed.

I once joined a company where the CEO told me great stories about fixing broken cultures and the great culture he wanted to build. The team was terrific and shared many of my values, as well as the stated values of the CEO. But when the startup hit a rough patch, we started violating those stated values one by one. When I said, “I’m not going to do what you’re asking because that’s not who we say we are,” and the response was, “I need you to do it anyway,” I knew it wasn’t the right place for me after all.

So, ask for examples of stressful situations and how they affected the company’s operations and processes. If the person you are speaking to can provide concrete examples of how they reacted under stress in ways that align with their stated values, that’s a sign that they are genuine core values.

Final Thoughts

I loved working at Spotify. It’s the best job I’ve ever had. Why? The stated values were the actual values. They weren’t aspirational. That’s one reason I took the risk of moving my family to another continent to work there.

My experience with Adobe was very similar. The core values and culture of both companies were incredibly aligned with my personal values. Not only did that make me a happy employee, but it also made me a successful one, as I naturally worked in a way that was aligned with my peers and management.

Am I happy at my current company? Yes, I am. Because my values align closely with the company’s actual values. Will I be happy forever? It will depend on how the culture evolves.

I hope your company’s values align with yours. If they aren’t, I hope you’re in a position to influence the culture or find a place that’s better aligned with your values. It’s worth paying attention to. It’s worth being aware of. It’s essential to consider this, especially if you’re aiming for professional growth. If the company isn’t aligned with your values, you might learn the wrong things.

To hear an extended discussion of this topic, please listen to my most recent podcast episode: https://itdependspod.com/episodes/values-culture-everything-why-company-culture-actually-matters/

Originally published at https://kevingoldsmith.substack.com/p/values-culture-everything

Building a vernacular with your engineering team

Teams consist of people. People communicate via a common language. The base unit of most languages is words.

The impact of language

Whether written or spoken, words are essential â€“ both the general terms we use and those specific to our work.

The terms and phrases that are specific to our jobs or our companies create a vernacular. The definition of vernacular is â€˜the mode of expression of a group or class.â€™ Our vernacular separates software developers from lawyers, Amazon employees from Microsoft employees, and your team from the other teams in your company.

The words and phrases that we use in team discussions give us a shorthand. They save time. Instead of saying, â€˜Ok, deploy this to production, let the support team know that it is going live and then let marketing know once it has gone to 50% of active users.â€™ Your team may just say, â€˜letâ€™s deploy-ify it.â€™ The larger context is defined and understood in the vernacular of your team.

One of the challenges of joining a new company or a new team is learning the vernacular. One of the significant struggles of team forming or cross-team communication is different definitions for the same words.

Consider the word â€˜Agileâ€™. To you, it may mean â€˜Scrumâ€™ because your only experience working in Agile teams was working with the Scrum framework. For me, it may mean Kanban or a set of principles not tied to any specific framework. If we are on the same team and I say that I think we should work in an Agile way, we could have very different interpretations of what that means, which may inadvertently create tension in the team.

â€˜Doneâ€™ is another word that often leads to problems â€“ both for a development lead and between teams. A developer on your team says that their feature is â€˜doneâ€™. Do they mean that they finished the code? That they tested the code? That they deployed it to the staging environment? That the code is running in production? That the A/B tests for the code have completed?

Having clarity on the meaning of words is critical. Companies will often create glossaries of the terms and phrases in everyday use to help onboard new employees. You should do the same for your team for the words and phrases your team uses day-to-day. As a leader, you should also deliberately cultivate your team vernacular.

Creating a team vernacular

Creating a team vernacular is a simple way to drive team unity, identity, and alignment around best practices.

A simple way to start building a team vernacular is to use a group meeting to identify and define the words and phrases used in the team. You can get the discussion started by spending a few weeks taking note of words or phrases that come up often in team discussions. Terms such as done, tested, shipped, agile, stuck, autonomy, microservice, or waiting, may have different definitions from different people on a team.

Ask the team what they think each of these words means. If there is a general agreement, add it to your team glossary. Donâ€™t stress over creating a perfect definition for each word. You can reference a dictionary definition or definitive blog post if you want, but your goal is team consensus around the meaning, nothing more.

Once the team establishes the primary vernacular, update it as necessary. Clarify the definitions of the new terms introduced. If someone uses a word in a new way, ask, â€˜What does that word mean to you?â€™ If you donâ€™t recognize a term that others are using, ask the team for the definition. Add these new words and phrases to your glossary. Over time, the meanings of words will change as they understand new subtleties or gain new skills. When this happens, append or replace the existing definitions.

Using your vernacular to train the team

Creating consistency in the words you use, or introducing new words, is also a valuable way to train your team or introduce new concepts.

You may find that there is debate within the team about the constraints for a system to be called a â€˜microserviceâ€™. This debate is an opportunity to find blog posts or a book for the team to read together and discuss to create the definition for the team glossary.

If you want to understand secure coding practices better, you could watch a conference talk as a team and then discuss what words and techniques you could introduce into your vernacular.

As you build your glossary, include the phrase and what it means to your team and the references your team used to arrive at that meaning. Your dictionary can then become an onboarding tool, a training tool, and a reference to share with other groups.

Vernaculars happen

Groups of friends, co-workers, teams, and families all create unique vernaculars over time. The in-jokes you have with friends, the shorthand you have with your partner, the CEOâ€™s catchphrase. Be aware of this, be deliberate about it within your team, and use this naturally occurring phenomenon to your advantage!

[posted originally at https://leaddev.com/communication-relationships/building-vernacular-your-engineering-team]

Fail Safe, Fail Smart, Succeed! Part Five: Putting it into Practice

Fail Safe, Fail Smart, Succeed!

Putting this into practice at Avvo

If you think you would like to use these ideas at your company, but you are unsure where to start, I can describe what we did at Avvo. I joined when the company was already nine years old. It had a mostly monolithic architecture running in a single data center with minimal redundancy.

There were some things that we did quickly to move to a more fail-safe world.

Moving from planning around objectives to planning around priorities

First, we worked to build a supportive culture that could handle the inevitable failures better. We moved from planning around specific deliverable commitments to organizing our work around priorities.

Suppose specific achievements, my output, measure my performance. This way of measuring performance often creates problems.

Suppose I need to coordinate with another person, and their commitments do not align to mine. That situation will create tension. If the company’s needs change, but my obligations do not, there is little incentive to reorient my work. To achieve my commitments, I can be thwarted by dependencies or hamper the priorities of the company.

People in leadership like quarterly goals or Managing By Objectives because they create strict accountability. If I commit to doing something and it is not complete when I say it will be, I have failed.

Suppose you think instead about aligning around priorities. In that case, those priorities may change from time to time. Still, if everyone is working against the same set of priorities, you can be sure that they are broadly doing the right things for the company. Aligning to priorities sets an expectation of outcome, not output.

Talk about failure with an eye to future improvement instead of blame

The senior leadership team must be aligned with these approaches. The rest of the organization may not be initially. When leaders talk about failure, they must do it with a learning message rather than blame or punishment. People should know that the expectation is that they may fail. If they are avoiding failure, then they probably aren’t thinking big enough. It is a message that “we want to see you fail, small, and we want to make sure we learn from that failure.”

I created our slack channel to share the lessons from our failures. I sent a message to my organization, making it clear that I don’t expect perfection. I shared my vision that we become a learning organization in town halls and one-on-ones.

Fail-safe architecture

Monoliths are natural when building a new company or when you have a small team. Monoliths are simple to make and more straightforward to deploy when you don’t have multiple teams building together. As the codebase and organization grow, microservices become a better model.

It is critical to recognize the point where a monolith is becoming a challenge instead of an enabler. Microservices require a lot more infrastructure to support them. The effort to transition from one architecture to another is significant, so it is best to prepare before the need becomes urgent.

Avvo had already started moving to a microservices architecture, but lack of investment stalled the transition. I increased investment in the infrastructure team. The team built tools that simplified the effort of creating, testing, monitoring, and deploying services. We then made rapid progress.

We also redesigned our organization to leverage the reverse Conway Maneuver, further accelerating the new architecture.

You can build a fail-safe / fail-smart team

In every company, I use the lessons that I have shared in this article to build a culture where teams can innovate and learn from their users. It manifests differently with each group, but every team that has adopted these ideas has improved both business outcomes and employee satisfaction. Work with your peers to adopt some of these ideas. Start small and grow. The process of adopting these concepts mirrors the product development process you are working to build.

If you decide that it isn’t a good fit for your company, you will have failed smart by failing small.

I will leave you with a final thought from Henry Ford.

Fail Safe, Fail Smart, Succeed!

Fail Safe, Fail Smart, Succeed! Part Four: My Biggest Failure

Fail Safe, Fail Smart, Succeed!

My Biggest Failure

If you are a long-time Spotify user, you probably won’t recognize the interface shown in the photo below. In May of 2015, though, Spotify was very interested in telling the whole world about it. It was a new set of features in the product called “Spotify Now.”

I lead the engineering effort at Spotify on the Spotify Now set of features. It was the most extensive concerted effort that Spotify had done at the time, involving hundreds of employees across the world.

Spotify Now was a set of features built around bringing the right music for you at any moment in time. The perfect, personalized music for every user for every moment of the day. This effort included adding video, podcasts, the Running feature, a massive collection of new editorial and machine learning generated playlists, and a brand new, simplified user interface for accessing music. It was audacious for a reason. We knew that Apple would launch its Apple Music streaming product soon. We wanted to make a public statement that we were the most innovative platform. Our goal was to take the wind out of Apple’s sails (and sales!)

Given that this was Spotify and many of the things I’ve shared come from Spotify, we understood how to fail smart.

As we launched the project, I reviewed the project retrospective repository. I wanted to see what had and had not worked in large projects before. I was now prepared to make all new mistakes instead of repeating ones from the past.

We had a tight timeline, but some of the features were already in development. I felt confident. However, as we moved forward and the new features started to take shape in the product’s employee releases, there was a growing concern. We worried the new features weren’t going to be as compelling as the vision we had for them. We knew that we, as employees, were not the target users for the features. We were not representative of our users. To truly understand how the functionality would perform, we wanted to follow our product development methods and get the features in front of users to validate our hypotheses.

Publicly releasing the features to a narrow audience was a challenge at that time. The press, also aware of Apple’s impending launch, was watching every Spotify release exceptionally closely. They knew that we tested features, and they were looking for hints of what we would do to counter Apple.

Our marketing team wanted a big launch. This release was a statement, so we wanted a massive spike in Spotify’s coverage extolling our innovation. The press response would be muted if our features leaked in advance of the event.

There was pressure from marketing not to test the features and pressure from product engineering to follow our standard processes. Eventually, we found a compromise. We released early versions of the Spotify Now features to a relatively small cohort of New Zealand users. Satisfied that we were now testing these features in the market, we went back to building Spotify Now and preparing for the launch while waiting for the test results to come back.

After a few weeks, we got fantastic news. For our cohort, retention was 6% higher than the rest of our customer base.

For a subscription-based product like Spotify, customer retention is the most critical metric. It determines the Lifetime Value of the customer. The longer you stay using a subscription product, the more money the company will make from you.

With a company of the scale of Spotify, it was tough to move a core metric like retention significantly. A whole point move was rare and something to celebrate. With Spotify Now, we had a 6% increase! It was massive.

Now, all of our doubt was gone. We knew we were working on something exceptional. We’d validated it in the market! With real people!

On the launch day, Daniel Ek, Spotify’s CEO and founder, Gustav SÃ¶derstrom, the Chief Product Officer, and Rochelle King, the head of Spotify’s design organization, shared a stage in New York with famous musicians and television personalities. They walked through everything we had built. It was a lovely event. I shared a stage in the company’s headquarters in Stockholm with Shiva Rajaraman and Dan Sormaz, my product and design peers. We watched the event with our team, celebrating.

As soon as the event concluded, we started the rollout of the new features by releasing them to 1% of our customers in our four largest markets. We’d begun our Ship It phase! We drank champagne and ate prinsesstÃ¥rta.

I couldn’t wait to see how the features were doing in the market. After so much work, I wanted to start the progressive roll out to 100%. Daily, I would stop by the desk of the data scientist who was watching the numbers. For the first couple of days, he would send me away with a comment of “it is too early still. We’re not even close to statistical significance.” Then one day, instead, he said, “It is still too early to be sure, but we’re starting to see the trend take shape, and it doesn’t look like it will be as high as we’d hoped.” Every day after, his expression became dourer. Finally, it was official. Instead of the 6% increase we’d seen in testing, the new features produced a 1% decrease in retention. It was a seven-point difference between what we had tested and what we had launched.

Not only were our new features not enticing customers to stay longer on our platform, but we were driving them away! To say that this was a problem was an understatement. It was a colossal failure.

Now we had a big quandary. We had failed big instead of small. We had released several things together, so it was challenging to narrow down the problem. Additionally, we’d just had a major press event where we talked about all these features. There was coverage all over the internet. The world was now waiting for what we had promised, but we would lose customers if we rolled them out further.

Those results began one of the most challenging summers of our lives. We had to narrow down what was killing our retention in these new features. We started generating hypotheses and running tests within our cohort to find what had gone wrong.

The challenge was that the cohort was too small to run tests quickly (and it was shrinking every day as we lost customers). Eventually, we had to do the math to figure out how much money the company would lose if we expanded the cohort so our tests would run faster. The cost was determined to be justified, and so we grew the cohort to 5% of users in our top four markets.

Gradually, we figured out what in Spotify Now was causing users to quit the product. We removed those features and were able to roll out to the rest of the world with a more modest retention gain.

In the many retrospectives that followed to understand what mistakes we’d made (and what we had done correctly), we found failures in our perceptions of our customers, failures in our teams, and other areas.

It turns out that one of our biggest problems was a process failure. We had a bug in our A/B testing framework. That bug meant that we had accidentally rolled out our test to a cohort participating in a very different trial. A trial to establish a floor on what having no advertising in the free product would do for retention.

To Spotify’s immense credit, rather than punish me, my peers, and the team, instead, we were rewarded for how we handled the failure. The lessons we learned from the mistakes of Spotify Now were immensely beneficial to the company. Those lessons produced some of the company’s triumphs in the years that have followed, including Spotify’s most popular curated playlists, Discover Weekly, Release Radar, Daily Mixes, and podcasts.

Part Five: Putting it into Practice

Fail Safe, Fail Smart, Succeed! Part Three: Making Failure Safer

Fail Safe, Fail Smart, Succeed!

Making Failure Safer

How do we reduce the fuel-air bomb failure into an internal combustion failure? How can we fail safely?

Minimizing the cost of failure

If you fail quickly, you are reducing the cost in time, equipment, and expenses. At Spotify, we had a framework, rooted in Lean Startup, that we used to reduce the cost of our failures. We named the framework “Think it, Build it, Ship it, Tweak it.“

This graph shows investment into a feature over time through the different phases of the framework. Investment here signifies people’s time, material costs, equipment, opportunity cost, whichever.

Think It

Imagine this scenario: you are coming back from lunch with some people you work with, and you have an idea for a new feature. You discuss it with your product owner, and they like the idea. You decide to explore if it would be a useful feature for the product. You have now entered the “Think It” phase. During this phase, you may work with the Product Owner and potentially a designer. This phase represents a part-time effort by a small subset of the teamâ€“a small investment.

You might create some paper prototypes to test out the idea with the team and with customers. You may develop some lightweight code prototypes. You may even ship a very early version of the feature to some users. The goal is to test as quickly and cheaply as possible and gather some real data on the feature’s viability.

You build a hypothesis on how the feature can positively impact the product, tied to real product metrics. This hypothesis is what you will validate against at each stage of the framework.

If the early data shows that the feature isn’t needed or wanted by customers, your hypothesis is incorrect. You have two choices. You may iterate and try a different permutation of the concept, staying in the Think It phase and keeping the investment low. You may decide that it wasn’t as good an idea as you hoped and end the effort before investing further.

If you decide to end during the Think It phase, congratulations! You’ve saved the company time and money building something that wasn’t necessary. Collect the lessons in a retrospective and share them so that everyone else can learn.

Build It

The initial tests look promising. The hypothesis isn’t validated, but the indicators warrant further investment. You have some direction from your tests for the first version of the feature.

Now is the time to build the feature for real. The investment increases substantially as the rest of the team gets involved.

How can you reduce the cost of failure in the Build It phase? You don’t build the fully realized conception of the feature. You develop the smallest version that will validate your initial hypothesis, the MVP. Your goal is validation with the broader customer set.

The Build It phase is where many companies I speak to get stuck. If you have the complete product vision in your head, finding the minimal representation seems like a weak concept. Folks in love with their ideas have a hard time finding the core element that validates the whole. Suppose the initial data that comes back for the MVP puts the hypothesis into question. In that case, it is easier to question the validity of the MVP than to examine the hypothesis’s validity. This issue of MVP is usually the most significant source of contention in the process.

It takes practice to figure out how to formulate a good MVP, but the effort is worth it. Imagine if the Clippy team had been able to ship an MVP. Better early feedback could have saved many person-years and millions of dollars. In my career, I have spent years (literally) building a product without shipping it. Our team’s leadership shifted product directions several times without ever validating or invalidating any of their hypotheses in the market. We learned nothing about the product opportunity, but the development team learned a lot about refactoring and building modular code.

Even during the Build It phase, there are opportunities to test the hypothesis: early internal releases, beta tests, user tests, and limited A/B tests can all be used to provide direction and information.

Ship It

Your MVP is ready to release to your customers! The validation with the limited release pools and the user testing shows that your hypothesis may be validâ€“time to ship.

In many, if not most, companies shipping a software release is still a binary thing. No users have it, and now all users have it. This approach robs you of an opportunity to fail cheaply! Your testing in Think It and Build It may have shown validation for your hypothesis. It may have also provided incorrect information, or you may have misinterpreted it. On the technical side, whatever you have done to this point will not have validated that your software performs correctly at scale.

Instead of shipping instantly to one hundred percent of your users, do a progressive rollout. At Spotify, we had the benefit of a fairly massive scale. This scale allowed us to ship to 1%, 5%, 10%, 25%, 50%, and then 99% of our users (we usually held back 1% of our users as a control group for some time). We could do this rollout relatively quickly while maintaining statistical significance due to our size.

If you have a smaller user base, you can still do this with fewer steps and get much of the value.

At each stage of the rollout, we’d use the product analytics to see if we were validating our assumptions. Remember that we always tied the hypothesis back to product metrics. We’d also watch our systems to make sure that they were handling the load appropriately and didn’t have any other technical issues or bugs arising.

If the analytics showed that we weren’t improving the product, we had two decisions again. Should we iterate and try different permutations of the idea, or should we stop and remove the feature?

Usually, if we reached this point, we would iterate, keeping to the same percentage of users. If this feature MVP wasn’t adding to the product, it took away from it, so rolling out further would be a bad idea. This rollout process was another way to reduce the cost of failure. It reduced the percentage of users seeing a change that may negatively affect product metrics. Sometimes, iterating and testing with a subset of users would give us the necessary direction to move forward with a better version of the MVP. Occasionally, we would realize that the hypothesis was invalid. We would then remove the feature (which is just as hard to do as you imagine, but it was more comfortable with the data validating the decision).

If we removed the feature during the Ship It phase, we would have wasted time and money. We still would have wasted a lot less than if we’d released a lousy feature to our entire customer base.

Tweak It

The shaded area under this graph shows the investment to get a feature to customers. You earn nothing against the investment until the feature’s release to all your customers. Until that point, you are just spending. The Think It/Ship It/Build It/Tweak It framework aims to reduce that shaded area; to reduce the amount of investment before you start seeing a return.

You have now released the MVP for the feature to all your customers. The product metrics validate the hypothesis that it is improving the product. You are now ready for the next and final phase, Tweak It.

The MVP does not realize the full product vision, and the metrics may be positive but not to the level of your hypothesis. There is a lot more opportunity here!

The result of the Ship It phase represents a new baseline for the product and the feature. The real-world usage data, customer support, reviews, forums, and user research can now inform your next steps.

The Tweak It phase represents a series of smaller Think It/Build It/Ship It/Tweak It efforts. From now, your team iteratively improves the shipped version of the feature and establishes new, better baselines. These efforts will involve less and less of the team over time, and the investment will decrease correspondingly.

When iterating, occasionally, you reach a local maximum. Your tweaks will result in smaller and smaller improvements to the product. Once again, you have two choices: move on to the next feature or look for another substantial opportunity with the current feature.

The difficulty is recognizing that there may be a much bigger opportunity nearby. When you reach this decision point, it can be beneficial to try a big experiment. You may also choose to take a step back and look for an opportunity that might be orthogonal to the original vision but could provide a significant improvement.

You notice in the graph that the investment never reaches zero. This gap reveals the secret, hidden, fifth step of the framework.

Maintain It

Even if there is no active development on a feature, it doesn’t mean that there isn’t any investment into it. The feature still takes up space in the product. It consumes valuable real estate in the UI. Its code makes adding other features harder. Library or system updates break it. Users find bugs. Writers have to maintain documentation about the functionality.

The investment cost means that it is critical not to add features to a product that do not demonstrably improve it. There is no such thing as a zero-cost feature. Suppose new functionality adds nothing to the product in terms of incremental value to users. In that case, the company must invest in maintaining it. Features that bring slight improvements to core metrics may not be worth preserving, given the additional complexity they add.

Expect failure all the time

When you talk about failure in the context of software development from the year 2000 to now, there is a substantial difference. Back then, you worked hard to write robust software, but the hardware was expected to be reasonably reliable. When there was a hardware failure, the software’s fault tolerance was of incidental importance. You didn’t want to cause errors yourself, but if the platform was unstable, there wasn’t much you were expected to do about it.

Today we live in a world with public clouds and mobile platforms where the environment is entirely beyond our control. AWS taught us a lot about how to handle failure in systems. This blog post from Netflix about their move to AWS was pivotal to the industryâ€™s adapting to the new world.

Netflixâ€™s approach to system design has been so beneficial to the industry. We assume that everything can be on fire all the time. You could write perfect software, and the scheduler is going to come and kill it on mobile. AWS will kill your process, and your service will be moved from one pod to another with no warning. We now write our software expecting failure to happen at any time.

We’ve learned that writing big systems makes handling failure complicated, so micro-service architectures have become more prevalent. Why? Because they are significantly more fault-tolerant, and when they fail, they fail small. Products like Amazon, Netflix, or Spotify all have large numbers of services running. A customer doesn’t notice if one or more instances of the services fail. When a service fails in those environments, the service is responsible for a small part of the experience; the other systems assume that it can fail. There are things like caching to compensate for a system disappearing.

Netflix has its famous chaos monkey testing, which randomly kills services or even entire availability zones. These tests make sure that their systems fail well.

Having an architecture composed of smaller services that are assumed to fail means that there is near zero user impact when there is a problem. Failing well is critical for these services and their user experience.

Smaller services also make it possible to use progressive rollout, feature flags, dark loading, blue-green deploys, and canary instances, making it easier to build in a fail-safe way.

Part Four: My Biggest Failure

Fail Safe, Fail Smart, Succeed! Part Two: Building a fail-safe culture

Fail Safe, Fail Smart, Succeed!

Building a fail-safe culture

If innovation requires failure, to build an innovative product or company, how your culture handles the inevitable failures is key to creating a fail-safe environment.

Many companies still punish projects or features that do not succeed. The same companies then wonder why their employees are so risk-averse. Punishing failure can take many forms, both obvious and subtle. Punishment can mean firing the team or leader who created an unsuccessful release or project. Sanctions are often more subtle:

Moving resources away from innovative efforts that don’t yield immediate successes.
Allowing people to ridicule failed efforts.
Continuing to invest in the slow, steady, growth projects instead of the more innovative but risky efforts. Innovator’s dilemma is just the most well-known aspect of this.

Breeding innovation out

I spend several years working at a company whose leadership was constantly extorting the employees to be more innovative and take more risks. It created ever-new processes to encourage new products to come from within the organization. It was also a company that had always grown through acquisition. Every year, it would acquire new companies. At the start of the next year’s budget process, there would inevitably be the realization that the company had now grown too large. Nearly every year, there would be a layoff.

If you are a senior leader and need to trim ten percent of your organization, where would you look? In previous years, you likely had already eliminated your lowest performers. Should you reduce the funding of the products that bring in your revenue or kill the new products that are struggling to make their first profit? The answer is clear if your bonus and salary are dependent on hitting revenue targets.

Through the culture of the company, it communicated that taking risks was detrimental to a career. So the company lost its most entrepreneurial employees either through voluntary or involuntary attrition. Because it could not innovate within, innovation could only happen through acquisitions, perpetuating the cycle.

If failure is punished, and failure is necessary for innovation, then punishing failure, either overtly or subtly, means that you are dis-incentivizing innovation.

Don’t punish failure. Punish not learning from failure. Punish failing big when you could have failed small first. Better yet, don’t punish at all. Reward the failures that produce essential lessons for the company and that the team handles well. Reward risk-taking if you want to encourage innovation.

If you worry about employees taking risks without accountability, give them participation in the revenue that they bring in

Each failure allows you to learn many things. Take the time to learn those lessons

Learning from failure

It can be hard to learn the lessons from failure. When you fail, your instinct is to move on, to sweep it under the rug. You don’t want to wallow in your mistakes. However, if you move on too quickly, you miss the chance to gather all the lessons, which will lead to more failure instead of the success you’re seeking.

Lessons from failure: Your process

Sometimes the failure was in your process. The following exchange is fictional, but I’ve heard something very much like it more than once in my career.

“What happened with this release? Customers are complaining that it is incredibly buggy.”
“Well, the test team was working on a different project, so they jumped into this one late. We didn’t want to delay the release, so we cut the time for testing short and didn’t catch those issues. We had test automation, and it caught the issue, but there have been a lot of false positives, so no one was watching the results.”
“Did we do a beta test for this release? An employee release?”
“No.”

The above conversation indicates a problem with the software development process (and, for this specific example, a bit of a culture-of-quality problem). If you’ve ever had an exchange like the one above, what did you do to solve the underlying issues? If the answer is “not much,” you didn’t learn enough from the failure, and you likely had similar problems afterward.

Lessons from failure: your team

Sometimes your team is a significant factor in a failure. I don’t mean that the members of the group aren’t good at their jobs. Your team may be missing a skillset or have personality conflicts. Trust may be an issue within the team, and so people aren’t open with each other.

“The app is performing incredibly slowly. What is going on?”
“Well, we inherited this component that uses this data store, and no one on the team understands it. We’re learning it as we’re doing it, and it has become a performance problem.”

Suppose the above exchange happened in your team. In that case, you might make sure that the next time you decide to use (or inherit) a technology, you make sure that someone on the team knows it well, even if that means adding someone to the team.

Lessons from failure: your perception of your customers

A vein of failure, and a significant one in the lesson of Clippy, is having an incorrect mental model for your customer.

We all have myths about who our customers are. Why do I call them “myths”? The reason is that you can’t precisely read the minds of every one of your customers. At the beginning of a product’s life cycle, you may know each of your customers well when there are few of them. That condition, hopefully, will not last very long.

How do you build a model of your user? You do user research, talk to your customer service team, beta test, and read app reviews and tweets about your product. You read your product forums. You instrument your app and analyze user behavior.

We have many different ways of interacting with the subsets of our customers. Those interactions give us the feeling that we know what they want or who they are.

These interactions provide insights into your customers as an aggregate. They also fuel myths of who our customers are because they are a sampling of the whole. We can’t know all our customers, so we create personas in our minds or collectively for our team.

Suppose you have a great user research team, and you are very rigorous in your effort to understand your customers. You may be able to have in-depth knowledge about your users and their needs for your product. However, that knowledge and understanding will only be for a moment in time. Your product continues to evolve and change and hopefully add new users often. Your new customers come to your product because of the unique problems they can solve. Those problems are different from the existing usersâ€”your perception of your customers ages quickly. You are now building for who they were, not who they are.

Lessons from failure: your understanding of your product

You may think you understand your product; after all, you are the one who is building it! However, the product that your customers are using may be different from the product you are making.

You build your product to solve a problem. In your effort to solve that problem, you may also solve other problems for your customers that you didn’t anticipate. Your customers are delighted that they can solve this problem with your product. In their minds, this was a deliberate choice on your part.

Now you make a change that improves the original problem’s solution but breaks the unintended use case. Your customers are angry because you ruined their product!

Lessons from failure: yourself

Failure gives you a chance to learn more about yourself. Is there something you could do differently next time? Was there an external factor that is obvious in hindsight but could have been caught earlier if you approached things differently?

Our failures tend to be the hardest to dwell on. Our natural inclination is to find fault externally to console ourselves. It is worth taking some time to reflect on your performance. You will always find something that you can do that will help you the next time.

Collecting the lessons: Project Retrospectives

The best way that I have learned to extract the lessons is to do a project retrospective.

A project retrospective aims to understand what happened in the project from its inception to its conclusion. You are looking to understand each critical decision, what informed the decision, and its outcome.

In a project retrospective, you are looking for the things that went wrong, the things that went well, and the things that went well, but you could do better the next time. The output of the retrospective is neutral. It is not for establishing blame or awarding kudos. It exists to make sure you learn. For this reason, it is useful for both unsuccessful and highly successful projects.

A good practice for creating a great culture around failure is to make it the general custom to have a retrospective at the end of every project in your company. Having retrospectives only for the unsuccessful projects perpetuates a blame culture.

For an example of project retrospectives processes, see this post from Henrik Kniberg.

The project retrospective repository

Since the project retrospectives are blameless, it is good to share them within your company. Create a project retrospective repository and publicize it.

The repository becomes a precious resource for everyone in your company. It shows what has worked and what has been challenging in your environment. It allows your teams to avoid making the mistakes of the past. We always want to be making new mistakes, not old ones!

The repository is also handy for new employees to teach them about how projects work in your company. Finally, it is also a resource for documenting product decisions.

The retrospective repository is a valuable place to capture the history of your products and your process.

Spotify’s failure-safe culture

I learned a lot about creating a failure safe culture when I worked at Spotify. Some of the great examples of this culture were:

One of the squads created a “Fail Wall” to capture the things they were learning. The squad didn’t hide the wall. It was on a whiteboard facing the hallway where everyone could see it.

This document is a report from one of the project retrospectives. You don’t need any special software for the record. For us, it was just a collection of Google docs in a shared folder.

One of the agile coaches created a slack channel for teams to share the lessons learned from failures with the whole company.

Spotify’s CTO posted an article encouraging everyone to celebrate the lessons that they learned from failure. Which inspired other posts like this:

If you look at the Spotify engineering blog, there are probably more posts about mistakes that we made than cool things we did in the years I worked there (2013-2016).

These kinds of posts are also valuable to the community. Often, when you are searching for something, it is because you are having a problem. We might have had the same issue. These posts are also very public expressions of the company culture.

Failure as a competitive advantage

We’re all going to fail. If my company can fail smart and fast, learning from our mistakes; while your company ignores the lessons from failure, my company will have a competitive advantage.

Part Three: Making Failure Safer

Fail Safe, Fail Smart, Succeed! Part One: Why Focus on Failure?

This article is about failure and everything I’ve learned from 28 years of failing (and succeeding) in the technology industry. Its basis is my talk of the same name that I first gave in 2015.

I’ve broken it into five parts to make it easier to read and share:

The importance of failure in software development

How we approach failure is critical in any industry, but it is especially crucial in building software.

Why?

The answer is simple: invention requires failure.

We don’t acknowledge that fact enough as an industry. Not broadly. It is something we should recognize and understand more. As technologists, we are continually looking for ways to transform existing businesses or build new products. We are an industry that grows on innovation and invention.

Real innovation is creating something uniquely new. If you can create something genuinely novel without failing a few times along the way, it probably isn’t very innovative. Albert Einstein expressed this as “Anyone who has never made a mistake has never tried anything new.”

In his own words, Thomas Edison says that he created three thousand different theories before he found the right materials for his electric light. To invent his battery, the laboratory performed over ten thousand experiments.

Filmmaker Kevin Smith says, “failure is success training.” I like that sentiment. It frames failure as leading to success.

Failure teaches you the things you need to know to succeed. Stated more strongly: failure is a requirement for success.

Creating a fail-safe environment

To achieve success, what’s important isn’t how to avoid failure; it’s how to handle failure when it comes. The handling of failure makes the difference between eventual success and never succeeding. Creating conditions conducive to learning from failure means creating a fail-safe environment.

In the software industry, we define a fail-safe environment as setting up processes to avoid failure. Instead, we should ensure that when the inevitable failure happens, we handle it well and reduce its impact. We want to fail smart.

When I was at Spotify, a company that worked hard to create a fail-smart environment, we described this as “minimizing the blast radius.” This quote from Mikael Krantz, the head architect at Spotify during that time, sums up the idea nicely: “we want to be an internal combustion engine, not a fuel-air bomb. Many small, controlled explosions, propelling us in a generally ok direction, not a huge blast leveling half the city.”

So, let us plan for failure. Let’s embrace the mistakes that are going to come in the smartest way possible. We can use those failures to move us forward and make sure that they are small enough not to take out the company. I like the combustion engine analogy because it embraces that failure, well-handled, pushes us in the right direction. If we anticipate, we can course correct and continue to move forward.

One way you can create these small, controlled explosions is to fail fast. Find the fastest, most straightforward path to learning. Can you validate your idea quickly? Can you reduce the concept down so that you can get it in front of real people immediately and get feedback before investing in a bunch of work? Failing fast is one of the critical elements of the Lean Startup methodology.

A side benefit of small failures is that they are easier to understand. You can identify what happened and learn from it. With a big failure, you must unpack and dig in to know where things went wrong.

The Lesson of Clippy

Even if you’ve never used the Office Assistant feature of Microsoft Office, you are likely aware of it. It was a software product flop so massive that it became a part of pop culture.

I worked at Microsoft when the company created Office Assistant. Although I didn’t work on that team, I knew a few people who did.

It is easy to think that the Office Assistant was a horrible idea created by a group of poor-performing developers and product people, but that couldn’t be farther from the truth. Extremely talented developers, product leads, researchers with fantastic track records, and PhDs from top-tier universities built Clippy. People who thought they understood the market and their users. These world-class people were working on one of (if not THE) most successful software products of all-time at the apex of its popularity. Microsoft spent millions of dollars and multiple person-years on the development of Clippy.

So, what happened?

What happened is that those brilliant people were wrong. Very wrong, as all of us are from time to time. How could they have found their mistake before releasing widely? It wasn’t easy at the time to test product assumptions. It was much harder to validate hypotheses about users and their needs.

How we used to release software

Way back before we could assume high-bandwidth internet connections, we wrote and shipped software in a very different way.

Software products were manufactured, transcribed onto plastic and foil discs. For a release like Microsoft Office, those discs were manufactured in countries worldwide, put into boxes, then put onto trucks and trains and shipped to warehouses, like TV sets. From there, trucks would take them to stores where people would purchase them in person, take them home and spend an afternoon swapping the discs in and out of their computers, installing the software.

With a release like Office, Microsoft would need massive disc pressing capability. It required dozens of CD/DVD plants across the world to work simultaneously. That capability had to be booked years in advance. Microsoft would pay massive sums of money to take over the entire CD/DVD pressing industry essentially. This monopolization of disc manufacturing required a fixed duration. Moving or growing that window was monstrously expensive.

It was challenging to validate a new feature in that atmosphere, peculiarly if that feature was a significant part of a release that you didn’t want to leak to the press.

That was then; this is now.

Today, the world is very different. There is no excuse for not validating your ideas.

You can now deploy your website every time you hit save in your editor. You can ship your mobile app multiple times per week. You can try ideas almost as fast as you can think of them. You can try and fail and learn from the failure and make your product better continuously.

Thomas J Watson, the CEO of IBM from 1914 until 1956, said, “If you want to increase your success rate, double your failure rate.” If it takes you years and millions of dollars to fail and you want to double that, your company will not survive to see the eventual success. Failing Fast minimizes the impact of your failure by reducing the cost and delay in learning.

I worked at an IBM research lab a long time ago. I was a developer on a project building early versions of synchronized streaming media. After over a year of effort, we arranged to publish our work. As we prepared, we learned there were two other labs at IBM working on the same problems. We were done, it was too late to collaborate. At the time, it seemed to me like big-company stupidity, not realizing that three different teams were working on the same thing. Later I realized that this was a deliberate choice. It was how IBM failed fast. Since it took too long to fail serially, IBM had become good at failing in parallel.

Part Two: Building a Fail-Safe Culture