After ~6.5 years of working at Spotify, I decided that it was time to try something else. I couldn’t have asked to work at a better company with such great colleagues. To be part of a company that grew from around 450 employees from when I started to over 4,000 along with 300+ million MAU has been quite the adventure.
Over these years, I had to opportunity to work and contribute to many interesting projects…
- Getting bare-metal server provisioning down to a few minutes with a single API call.
- Migrating from debian squeeze to ubuntu trusty and bionic (yay systemd).
- Migrating all of Spotify from data centers to Google cloud.
- Performing manyyyy DR tests across regions and providers and starting to automate this failover process and perform them on a regular cadence.
- Rolled out distributed tracing to all of Spotify.
- Re-architected metric ingestion for Heroic, Spotify’s OSS time-series database from Kafka to Google Pubsub.
- Being a core member of the Incident Manager On-Call (IMOC) rotation responsible for Incident coordination and communication across the company. I was involved in some >150 incidents over those 6.5 years :).
- Started and lead a new team that eventually grew to 7 engineers that focused on creating “reliability” tools and frameworks. E.g. SLO tracking, blackbox monitoring, automated regional failover, reliability insights.
I’ve also learned a enormous amount (customer refers to internal engineers)…
- Keep learning and never stop.
- Take the time to really listen and understand before acting.
- Keep documenting our internal platform and how to use it. Aim to keep this simple. Also diagrams/pictures can go a long way in explaining something words cannot.
- Continue having customer empathy - not everything they bring up will be the right thing to do but it’s important to understand the “what” and see if you can spot patterns that multiple customers are complaining about.
- Keep learning from incidents and prioritizing those remediation’s. Raise the flag if things are getting worse and prioritize together.
- With any tech changes think about how to reduce the migration pain and if the current approach is worth it.
- Continue working with teams across the company to build a uniform product (figure out how to do even more of this). Networking with others is really important. These networks can really help spread knowledge locally.
- Continue working with teams outside of the core-infra department to build infrastructure together.
- Take time to go deep in tech areas you want to explore more. e.g. could mean doing embeds outside the department.
- Remember you can say no to things for many reasons. There isn’t ever going to be enough time or resources to work on everything.
- Use email more. Especially for deeper discussions.
- Keep having fun!
- Margaritas always.