The API Versioning Trap
At Toggl, we recently concluded a decade-long journey to retire our V8 API and fully transition to the latest V9. This process was far more complex and time-consuming than we could ever expect, highlighting the challenges of API versioning for a product company. This post explores the complexities of API versioning, the pitfalls we encountered, and how best practices can differ depending on whether you're building a B2C or B2B SaaS product.
Capital Sin
Our V8 API represented a major step for our backend infrastructure. Indeed, V8 marked the transition from our former Ruby on Rails backend, to the current Go dominance. Practically speaking, it means that a team of Ruby engineers embraced a new language, quite young back in 2012. Moving from an interpreted to a strongly typed language can be challenging, in particular speaking of a quite opinionated languages like Ruby and Go. We moved from an extreme like Ruby, probably one of the most implicit languages that can look like magic for newbies, to Go that is one of the languages the most advocating for explicitness.
Go delivered on promises, allowing us to scale our infrastructure up to these days. Performance is not the only aspect to take into account when dealing with a growing codebase tough. In terms of structure and code design, V8 API was… arguably sub-optimal, to the point the team quickly started implementing an improved version 9 a couple of years later.
Over the following ten years, V8 and V9 coexisted for a number of reasons, making the deprecation of the old version harder and harder. V8 became deeply integrated not only with our internal systems but also with numerous external applications used by our customers.
Every new feature and change we worked on during this period required updates both to V8 and V9 endpoints, while keeping V8 interface unaffected. Such a persistent dependency is a massive burden for engineers, that doesn’t affect only velocity but also the happiness dealing with changes.
Complexity of Dependencies
Successfully deprecating an old API version involves much more than technical work. It requires ongoing communication and collaboration with both internal stakeholders and external customers.
Within Toggl, our product and engineering teams had to identify all the internal services relying on the V8 API. This was no small feat, given the API's extensive use in various features and backend processes. We had to ensure that all these services were compatible with V9 before deprecating V8, which required significant refactoring and rigorous testing.
The external dependencies were even more challenging. Many of our customers had built their own tools and workflows around the V8 API. These integrations were critical to their operations, meaning that any disruption could have serious consequences for their businesses. We needed to communicate effectively with these customers, provide ample notice of the impending changes, and offer support to help them migrate to the new API.
It took more than 3 years to fully deprecate V8 since we decided to seriously get it done.
We issued multiple deprecation announcements to keep our customers informed about the timeline for retiring the API. These announcements were critical in ensuring that everyone had sufficient time to plan and execute their migrations. Despite our best efforts, we still encountered situations where customers were caught off guard, underscoring the importance of clear and repeated communication. We pushed the deadline for deprecation so many times I lost the count. Not only the public ones, but also when we manage to restrict the API usage to a small number of premium customers, we still iterated over and over to be able to fully drop it.
Our support and customer success teams played a crucial role in this transition. They worked closely with customers to understand their specific use cases, troubleshoot issues, and provide guidance on migrating to V9. This back-and-forth communication was essential in addressing concerns and ensuring a smooth transition.
You can understand how time-consuming and expensive it can be to involve all these stakeholders to deprecate such a piece of infrastructure. On a business perspective, it is really difficult to sustain such an effort for long time, and the risk is to keep an outdated piece of infrastructure running for years, like it happened in our case. Simple choices like creating a new API version could chase you for years. We strongly suggest to mind your steps and hopefully learn from our mistakes.
Probably one of the biggest mistakes we made during these 10 years was to not declare the previous V8 deprecated as soon as V9 was ready. As previously mentioned there are a number of reasons why it didn’t happen, including V9 exhibiting worse performance, but eventually the unclear versioning strategy led to more systems to integrate and rely on the stable V8. Communication was not clear both internally and externally. We had multiple internal clients still using V8 when we set the deprecation goal, including our browser extension. The discovery about the browser extension has been really painful, we fell from it when we thought at least most of the internal clients were deprecated. It also significantly pushed the ETA because dropping the old installed versions of the extensions was not trivial at all. While looking at external communication, our public docs only listed V8 up to three years ago, making another case for really bad communication and explaining why the network of dependencies kept growing.
Theory and Best Practices
API versioning is a widely discussed topic. Navigating the web you will find hundreds of posts describing benefits and best practices.
Important companies like Stripe and Shopify wrote about about API versioning strategies as well.
API versioning is a double-edged sword. On one hand, it allows for evolution and improvement of services. On the other, it can lead to a proliferation of versions that become increasingly difficult to maintain. Supporting multiple versions simultaneously is technically referred as Version Sprawl.
Way too often resources lack to mention how the approach to API versioning can differ significantly depending on whether you're building for B2C or B2B markets. When building a B2C product you need fast iteration cycles, generally low impact of breaking changes and emphasis on user experience and feature delivery. On the other side building a B2B product comes with longer deprecation cycles, higher impact of breaking changes due to deep integrations and greater need for backwards compatibility. These are really different set of requirements that must affect your versioning strategy.
We consider ourselves a product led company, fully devoted to user feedback, experimentation and fast iteration. Technical debt is the number one enemy to meet these requirements. Indeed, the inevitable growing technical debt in a scaling company slow down delivery cycles and overall the entire machine, making experimentation and improvements harder to deliver. API versioning can really easily turn into technical debt if your versioning strategy is not clear since the beginning.
All of this is also a side effect of the product culture in a B2C company, that comes with additional friction for an engineering team to prioritize maintenance and toils like public documentation. But there is no way around it. Companies like Stripe have a clear incentive in maintaining a clear and up to date documentation, because integrations are the main source of revenue for the company. In a company like Toggl that at the same time both never historically monetized API usage and has an active ecosystem of third-party integrations, the incentives for prioritization around these topics are mostly left to engineering maintenance and personal initiative.
The recipe for these occurrences is a mix of strategic planning and automation. First of all, abandon the idea of getting rid of technical debt all at once. It will require a number of iterations and smoothing to get to the point where you will actually be able to get rid of it. It was a key lesson for us and the only way we eventually managed to deprecate the old API. In terms of automation instead, whatever requires a manual process to be maintained and such a process is not directly aligned with business goals will end up being outdated. For this reason our existing public documentation is auto generated automatically on each code release. The amount of manual work required for maintenance is really limited, and for the past three years seldom we suffered any type of inconsistency. Unfortunately, it doesn’t mean our communication is now perfect, we are still missing to communicate a consistent changelog to inform third parties about API changes, but we are working on it.
Another aspect conflicting with best practices in the day-to-day is that despite you can try to avoid breaking changes as much as you can, those are inevitable. As part of the infinite game, your company will be evolving over time and sooner or later the evolution will require some breaking change. The general approach to avoid being blocked by breaking changes is to branch. In the case of API versioning you can either create a new version or new endpoint(s), migrate dependencies to the new branch, and eventually deprecate the old one. In this case, you are temporarily maintaining multiple functioning versions of the same system. The transition period is inevitable, but at least it allow to deliver the new behavior meeting users’ expectations.
When branching becomes systematic tough and not just a temporarily measure to deliver new functionalities, you might end up trying to maintain multiple API versions for longer periods of time. Maintaining multiple versions at the same time can easily transform into a nightmare. The primary reason is when your underlying data model, like your database schema, changes throughout different versions. In that case, the options are either to maintain different running schema versions, introducing serious complexity and maintenance overhead your team probably doesn’t want to deal with, or on the other side to manage the backward compatibility via business logic. The latter is definitely easier and more manageable, but over time it will be more and more difficult to ensure consistency with older versions.
In our case, the V9 API grew introducing major changes to the data model. For instance, the Organization layer was later introduced, as well as a number of features and user roles. These changes made maintenance of V8 a growing burden. Eventually, the old version will end up lacking compatibility or worse will become a dangerous entry point. First of all security holes could arise when the old version has access to domains that are not covered by the new version. In the best case scenario the mismatch will be the source for inconsistencies.
When V10?
If you think the question is a joke, it was actually discussed a number of times within the backend team. Up to some point, there was quite certainty that we would have started the implementation of a new V10 API sooner or later.
In the recent years, the opportunity it was seriously discussed in particular during two circumstances. The first one was a major refactoring we made of V9 API to standardize the code architecture in all our domain packages. We won’t share details about this here, because it would require another blog post that maybe will come at some point, but enough context is to say that we spent almost 2 years refactoring our API code. Potentially, just writing a new API version would have been a reasonable approach, but we decided that the effort was as massive and took the safe bet. In retrospective, luckily we didn’t pick the new version option just for sake of building a shiny new thing, because we would be potentially still today be struggling deprecating V9, or maybe both V8 and V9 because deprecating two versions would have required even further effort we would have never prioritized.
The second circumstance was after the refactoring when we actually wondered which was our versioning strategy if we had one at all. The conclusion from all these discussions was that as of today we consider our API version-less. We have the v9 in the path because of backward compatibility, but we don’t plan for a new version any time soon. We decided to embrace flexibility and we think a version-less approach is the most efficient one to minimize maintenance and allow us to quickly deliver on user feedback and requirements.
We embraced breaking changes as well, although we try to minimize them in every possible way. We had a number of projects that required some breaking change to implement new functionalities or systems. Branching, keeping the existing backward compatibility running for some time, announcing a deprecation notice proved to be effective. For instance, as part of our Shared Auth release we talked about in our previous blog post, we announced the deprecation of a number of endpoints that we kept running for some time offering backward compatibility, and eventually deprecated a few months later. In such case, we also clearly communicated brownout periods, to force customers relying on the service to react to interruptions. Of course communication went beyond a simple post, but we actively monitored usage and informed the organizations affected when the deadlines were approaching.
Hopefully we learned the lesson, using clear and structured communication strategy while not be blocked releasing new stuff is a balance we think will allow us to keep speed of delivery and innovation going forward.
Key Lessons
We discussed a number of problems and challenges we faced with our API versioning. To summarize on the lessons (and rants) here are some quick tips to avoid the pitfalls we encountered:
- Understand your business needs: take into account your target customers and invest into your infrastructure accordingly. If your main source of revenue are third party integrations, you will have to invest more into your infrastructure compared to how much you read in this post. Otherwise, our experience could save you a lot of time.
- Plan for deprecation from the start: be aware about the lifetime of your system and build deprecation strategies into your design from day one. If you think deprecation is a problem for your older version, you might regret it for longer than your younger self expected.
- Communicate in advance: schedule a deprecation period, communicate clearly to customers using more than one channel, like a blog post and reaching out via email.
- Define brownout periods: often customers will not answer, but will eventually fire up when the service becomes unstable. Make sure the brownout windows are clearly communicated and long enough to give enough time to customers to react.
- Offer Migration Support: Provide tools and documentation to ease the transition. For instance sharing public Swagger spec for our V9 helped out customers building their clients.
- Consider Version-less APIs: Explore strategies that allow for evolution without explicit versioning. Don’t feel a bad engineer if you don’t version your API. Long term maintainability and evolutionary architectures are the most important aspects for companies, and no AI agent will be able to replace you taking these decisions anytime soon.
Our experience with the transition has been a valuable learning opportunity. As we move forward, we're committed to implement these best practices to ensure smoother API evolution in the future, as well as maintaining and improving our communication toward third party developers.