5 Books That Can Help You Become a Modern Solution Architect

Nicolas Vogt
9 min readMay 22, 2021

--

The way to modern Architecture

Photo by Maarten Deckers on Unsplash

I thought the major component of being a successful IT Architect was expertise. Although you generally have to justify a significant amount of expertise in one or multiple domains before being hired for the position, it happens that you don’t necessarily use your hard gained previous experience to perform during the job.

I have been a System and Storage Engineer for more than ten years, and during those years I had to overcome many problems very quickly, some of them in the middle of the night when your brain is half working. This job modelled my way of viewing complex systems and especially the inter-connectivities between their inner working parts. This is precisely why I thought I would be a good architect, and decided to move into the job.

I have to say now that I am a bit disappointed. IT Architecture is not really what I thought it was. It is a constant source of frustration, because in every decision you have to make there is a trade-off. Plus, the industry is evolving so fast that it is quite impossible to keep up with the pace of change. The frustration becomes excitement when you see the horizon of possibilities, but the amount of energy that requires a small change in a company is what you have to cope with every day.

The truth is that you need more interpersonal skills than knowledge to do the job, because you will have to deal with many different personalities. The other aspect of the job is that you have to process a awful lot of information considering the domain you have to cover is very wide. To thrive, you have to read a lot and blitz learn everything in order to master complex concepts very fast. By chance, I am a voracious reader and I have selected here few books that will help you throughout your journey.

Accelerate

Building and Scaling High-Performing Technology Organizations

by Nicole Forsgren, Jez Humble and Gene Kim

The Reference — cover from goodreads.com

The authors reject the dichotomy between speed and stability, and explains that those are intertwined notions. Rapid change often brings good practices that give you a new form of stability.

Their evidence refutes the bimodal IT notion that you have to choose between speed and stability — instead, speed depends on stability, so good IT practices give you both.

Continuous delivery comes with a whole set of tools to secure and accelerate the process of delivering a new software, eliminating toils and reducing the blast radius of bugs. Most of it is based on the Agile Manifesto and the Devops philosophy but the book shows a broader perspective and describe different examples you can find in the major players in the industry.

The other important aspect of this book is that the point of view is very liberal. I mean that it preaches against adding unnecessary control in your organization.

Knowledge is power, and you should give power to those who have the knowledge.

This sentence illustrates the idea of the book, the top down pyramidal organization is not described as the most effective to embrace change, and loosen control and give more responsibilities to key employees is a way to adapt more quickly to the market.

If you want to read more books like this one, I recommend you :

  • Blitzscaling: The Lightning-Fast Path to Building Massively Valuables Companies, by Reid Hoffman
  • No Rules Rules: Netflix and Culture of Reinvention, by Reed Hastings

Chaos Engineering

System Resiliency in Practice

by Casey Rosenthal, Nora Jones, Nathan Aschbacher

The more sulfurous — cover from goodreads.com

Following the impact Netflix’s Chaos Monkey has had on the Tech, this book is about describing what is Chaos Engineering and what it is not.

We hear Chaos Engineering is described as “breaking things in production”. While this might sound cool, it doesn’t appeal to entreprises running at scale and other complex system operators who can most benefit from the practice. A better characterization of Chaos Engineering would be: fixing stuff in production.

The concept is quite simple when formulated, and the name is very appealing, but the intrications behind are more complex.

“Breaking stuff” is easy; the difficult parts are around mitigating blast radius, thinking critically about safety, determining if something is worth fixing, deciding whether you should invest in experimenting on it, and the list goes on.

The assumption behind is that microservices have brought complexity to a level that is not manageable any more. We are dealing with non-linear systems (inputs and outputs are not proportional) and you can only control a system with another system of the exact same complexity. This tendency leads indubitably to unpredictable behaviors.

Chaos Engineering is the only major discipline in software that focuses solely on proactively improving safety in complex systems.

In this book you will find examples of Chaos Engineering and some of them in unexpected industries like banking. I am working in the finance industry and regulation makes things a little bit painful. It is surprising to see how they managed to bring these kind of practices in such rigid environments.

This is the last book I read, so I do not have any recommendation related to this one yet. What I can do though is share with you the two books I put on my reading list to dive deeper into the domain :

  • Security Chaos Engineering, by Aaron Rinehart and Kelly Shortridge
  • Nonlinear Dynamics and Chaos: with applications to Physics, Biology, Chemistry and Engineering, by Steven H. Strogatz

Monolith to Microservice

Sustaining Productivity While Detangling the System

by Sam Newman

The hard truth about the hard stuff — cover from goodreads.com

Current software industry is massively oriented toward microservices. This kind of architecture is more suitable to containerization, continuous delivery and are Cloud-ready and this makes everybody crazy about that.

Sam Newman is pragmatic about that, and states that not every software should be constructed in such a way. It has to be aligned with your company’s objectives.

I can’t define the goals you may have for your company. You know better the aspirations of your company and the challenges you are facing. What I can outline are the reasons that often get stated as to why microservices are being adopted by companies all over the world.

This book is very practical and help you through your journey to untangle legacy systems. It presents handy use cases and both organizational and technical agnostic recipes to guide you toward the transformation of a monolith software. Personally, I found the part on decomposing the database especially useful.

Splitting a database apart is far from a simple endeavour, however. We need to consider issues of data synchronization during the transition, logical versus physical schema decomposition, transactional integrity, joins, latency, and more.

I also warmly recommend those two books if you are interested in software architecture :

  • Fundamentals of Software Architecture: An Engineering Approach, by Mark Richard
  • Software Engineering at Google: Lessons Learned from Programming Over Time, by Titus Winters, Tom Manshreck and Hyrum Wright

The Site Reliability Workbook

Practical Ways to Implement SRE

by Betsy Beyer, Niall Richard Murphy, David R. Rensin, Kent Kawahara and Stephen Thorne

A book about the book — cover from goodreads.com

This book is more than a companion to Google’s Site Reliability Engineering book , but rather a practical hands-on guide through your SRE journey. You will be given real use-cases in order to understand the principles around SRE practices and not limited to the way things are done at Google, but in other companies like Spotify, Evernotes or Home Depot.

SRE is widely associated to devops, and is a multi-disciplinary role navigating between software and infrastructure which mission is:

  • to eliminate Single Point of Failures
  • to manage Service Level Objectives
  • to minimize toil
  • to reduce the Mean Time to Repair

One thing that struck me in this book is the wisdom that lies behind. It is the result of 12 years of practice at a massive scale that gives the author enough hindsight about what works and what don’t. I still remember the Murphy-Beyer effect :

There is an unintuitive and interesting interaction between this benchmark and how it plays out when we think about automation and toil. Over time, an SRE team winds up automating all that it can for a service, leaving behind things that can’t be automated. Other things being equal, this comes to dominate what an SRE team does unless other action are taken.

Automation brings non-automatable tasks, unintuitively but undeniably. So Google has set a hard limit of how much time a team member can spend on toil to 50%.

The other aspect I liked in this book is the simplicity of the disclosed concepts. An easy example is NALSD, Non-Abstract Large System Design.

NALSD describes a skill critical to SRE: the ability to assess, design, and evaluate large systems. Practically, NALSD combines elements of capacity planning, component isolation, and graceful system degradation that are crucial to highly available production systems.

Basically, it is an iterative approach of designing systems in two phases. The first phase consists in two questions :

  • Is it possible?

Is the design even possible? If we didn’t have to worry about enough RAM, CPU, network bandwidth, and so on, what would we design to satisfy the requirements?

  • Can we do better?

For any such design, we ask “Can we do better?” For example, can we make the system meaningfully faster, smaller, more efficient? If the design solves the problem in O(N) time, can we solve it more quickly — say, O(ln(N))?

The next phase is about scaling up and is made of three questions :

  • Is it feasible?

Is it possible to scale this design, given constraints on money, hardware, and so one? If necessary, what distributed design would satisfy the requirements?

  • Is it resilient?

Can the design fail gracefully? What happens when this component fails? How does the system work when an entire datacenter fails?

  • Can we do better?

Notice that the last question triggers the next phase and brings continuous improvement to the design. The is it feasible question sets the limit of what you can improve. I have to say I admire the simplicity and the efficiency of this model.

To complete this book, I recommend that you read the very first edition :

  • Site Reliability Engineering: How Google Runs Production Systems, by Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy

Team Topologies

Organizing Business and Technology Teams for Fast Flow

by Matthew Skelton and Manuel Prais

Like breeds like — cover from goodreads.com

You do probably ask yourself why an architect has to know about team topologies, but this book is based on Conway’s law, which states that organizations design systems that mirror their own communication structure. Thus software produced are shaped by the way teams are organized in a company and how much interaction there is between them.

[Conway’s law] creates an imperative to keep asking : “Is there a better design that is not available to us because of our organization?” — Mel Conway, Toward Simplifying Application Development, in a Dozen Lessons

You can have a wide range of software architectures whether the team is small or big, whether there are many of them, or also if there is only one DBA in your company. All these parameters influence how the software is constructed. This book help you identify the patterns and use them as leverage to shape what best suits your need.

I hope this selection will help you toward your journey toward modern architecture. As I said, I read a lot and I will probably come back with another selection very soon. Stay tuned!

--

--

Nicolas Vogt

Curious, most of the time, eager to learn something new when I’m not