This article has also been published in Architecture and Governance Magazine. You can find the published article here.
One of the most important skills an architect possesses is the ability to manage and make good decisions. Managing decisions and facilitating the decision-making process are key to building healthy and sustainable architectures. Within the architect profession, a key factor in business technology failures is missing the significance of architecture decisions. Sometimes what seemed like a simple decision can later turn out to be more complex than we figured and result in unexpected consequences or side effects. In the worst case, this might mean that the architecture is no longer feasible and in the best case, more effort and time are required for re-work.
The architect is a role providing leadership, and one of the main responsibilities of the architect is to guide stakeholders and help teams with business technology decisions. The architect has a unique perspective, understanding the key aspects of the business and technology. Of course, the architect is not responsible for all business technology decisions, but rather the “significant” decisions, which profoundly affect the architecture, business or stakeholders. Martin Fowlers describes these decisions simply as “the stuff that matters”.
Whether working with products or an enterprise we expect the architect to manage the “significant” decisions. During the development and evolution of an architecture, there may be countless meetings where we often choose paths and alternatives which drive the outcomes of the enterprise and our products, meeting the stakeholder’s needs. We are taking decisions all the time at all levels within an organization, so how do we know when a decision is “significant?
One way to help identify significant decisions early and manage them is to adopt a mindset for quickly analyzing decisions. Kind of like a mental checklist which raises an alarm if certain criteria are met.
The diagram above shows some of the aspects of a decision we can consider to quickly assess its significance. We can use the acronym PRILER to make these aspects easy to remember.
Priority and Value
The significance of a decision may be driven by how highly the stakeholders of the architecture value and prioritize the alternatives. It is important not only to consider the technical aspects of a decision but also to consider if there are political consequences. For example, making decisions based only on the best technical expertise may fail if we later find that there was a lack of political backing from the stakeholders.
From this perspective, we may consider a decision as “significant” if it is of high priority or value to the stakeholders in the architecture.
Reversibility
Reversibility indicates how easy it is to roll back or change a decision after or during the implementation of the change has begun. This is a really important aspect, as decisions which are difficult to reverse are significant. Many architectures may find themselves living with workarounds, add-ons and limitations because it was just too much effort to reverse a decision. Decisions which can be rolled back easily with little consequences are often less significant since little effort is required to reverse the change, however, if a decision can’t be rolled back then the decision is likely to be significant as it represents a “point of no return”.
Impact (Scope/Effect)
The impact of a decision indicates the scope of change, in other words, how widely will the ripples of change be felt within an organization. For example, a decision which affects a single component in a product may have a low impact, alternatively, a decision which affects a whole enterprise will have a high impact. The impact can have consequences for the way technology products are developed or the way a business works. It is important to remember that what can seem like low-impact technical decisions sometimes can have a high impact on users, the way of working, or the business as a whole. The same is true of business decisions which may appear low-impact, but have a high technical impact. Decisions with high impact raise the significance of the decision.
Limitations and Constraints
Many decisions are trade-offs, opening one door may mean that certain other doors close. For each decision we need to consider which constraints will be placed on the architecture, perhaps limiting future options or increasing effort on other aspects of the architecture. We want to keep as many options available as long as possible to ensure that the architecture is sustainable. Decisions which incur serious limitations or constraints on the architecture are likely to be significant.
Effort
Effort represents the work, resources, time and costs associated with the decision. If the resources and financial backing are not available, or feasible then it is unlikely the chosen alternative will succeed. Even if the chosen alternative is the best in terms of the architecture, it will be worthless if the means to execute it are not available. It is important to remember that even a decision to maintain the status quo can incur significant effort, so even a decision to change nothing costs effort.
Risk
Risk is the possible exposure to negative effects resulting from the decision. Most decisions involve some level of risk. We can consider if a decision entails many risks, the seriousness of those risks and if there are any ways we can mitigate the risks. If a decision depends on external factors, these can represent risks that are out of our control and difficult to mitigate. A decision which incurs a high level of risk is likely to be significant.
Adopting the PRILER mindset to architecture decision-making may be the first step to identifying significant decisions. What happens next is likely a more detailed analysis of the alternatives where the architect will consult a broader field of stakeholders. It may be the case that the consequences of a decision are so significant that the decision is escalated to other stakeholders in the business. In conclusion, every decision is unique and there is no “one size fits all”, but approaching architecture decision-making with the right mindset can help quickly identify “the stuff that matters”, and this contributes to avoiding serious pitfalls, helping to keep our architectures healthy and sustainable.
Some years ago I attended a conference where John Zachman gave a seminar about his enterprise architecture framework. Something which stayed with me from that seminar was his approach to architecture models. It is common when we work with architecture that we meet a great many stakeholders. We gather information from the stakeholders regarding the architecture and it tends to present itself as a mix of lots of stakeholder perspectives. A mixture of processes, flows, sequences, roles, technologies, information, and so on…. This is the challenge for the architect, how to untangle this information and communicate the architecture to the stakeholders, and how to reach an agreement with all stakeholders on the architecture. Stakeholders may come from different roles, for example, executives, business managers, technology experts or developers. If we mix many architecture perspectives in a single view without first understanding each perspective on its own, the view we communicate to these stakeholders may be confusing or difficult to understand. It is also likely to be challenging for stakeholders and the architect to reach an agreement on the architecture.
It is therefore important for stakeholders to understand the fundamental perspectives of the architecture, before working with views that combine perspectives. This is where the concept of Primitive and Composite models by Zachman provided me with some inspiration (Enterprise Architecture Defined: Primitives and Composites ).
The primitive models of the architecture provide a single perspective of the architecture. Composite models use a combination of perspectives from the primitive models, elements from the primitive models can be combined to gain insights into the architecture and consider alternative solutions.
Using this concept, I like to think rather like how colours are used when painting. There are the primary colours, these cannot be created by mixing other colours. They are colours in their own right. These colours can be mixed to produce other colours. So if we think about this in architecture terms, perhaps we can create a set of Primary Views, which are views in their own right, we cannot create these views by mixing lots of different architectural elements. Then using the elements from the primary views we can create Composite Views, views that show several perspectives of the architecture and can be used to explore alternative architecture solutions.
Note that I have switched from models to views, the reasoning behind this is that it is the views that provide the communication with stakeholders. We still use a model or multiple models to create the views, however, views provide a communication with stakeholders which is the essence for forming the architecture and meeting the stakeholders needs. The views are the communication method, the models are the architect’s tools.
It is important to note that although Zachman addresses Enterprise Architecture, I think that primary and composite views are just as useful in solution architectures.
Primary Views
Primary views show a single aspect of the architecture. I generally follow the “Five Ws and 1H” when considering aspects of the architecture:
Why – why we need the architecture (motivation)
What – what does the architecture consist of (structure)
Who – who are the stakeholders/actors in the architecture (organization)
When – when are flows/activities performed (timing)
Where – where is the architecture used (location)
How – how are flows/activities performed (behaviour)
Out of these six aspects, I normally focus on views that provide answers to Why, What, Who and How, in that order. I find that it is easier to gain agreement with stakeholders when addressing a single aspect at a time. The following sections show examples of some views I commonly use.
Strategy and vision
Before we start an architecture assignment it is important to know why the architecture is needed, in other words, the motivation behind the architecture. The strategy and vision of a business describe the objectives that are to be achieved. This drives the reasoning for the architecture and is the basis for developing architecture principles and requirements. In its simplest form, this may well be a list of objectives, or cascading objectives. Agreeing on the objectives with the stakeholders is the foundation of any successful architecture.
Capabilities
Capabilities are an interesting aspect of architecture and there are several definitions of what a capability is. In my view, a capability consists of processes, people and technologies and describes “what” a business does to create value. A capability map can be used to detail the structure of business capabilities, and identify which capabilities a business needs in order to achieve it’s objectives.
So the focus is placed on what the business needs to do to create value, and not so much on how it does it.
Organisation
The organisation is often expressed in actors or roles. This describes “who” engages with the architecture. This may be expressed as an organizational chart with communication relationships which help us understand how people interact, their responsibilities, and perhaps even the skills they possess.
This view can be used to describe the structure of the organization, and ensure that stakeholders have a common understanding of the organization. The focus is placed on who will be engaging with the architecture.
Information
Information is expressed in classes or objects. The following conceptual information view shows the structure of information in the architecture. This is what stakeholders create, read, update, or delete as a central method of communication with other stakeholders.
The conceptual information view shows the relationships between the information objects and contains a description of each object. This provides a common language for architecture stakeholders and a common understanding of the information structures used in communication.
System Landscape
A system landscape describes applications, systems and solutions with their relations. This describes which (or what) systems that the stakeholders in the architecture interact with.
The view shows which systems exist in the architecture and the relationships between these systems. It may be enough to detail that there is a relationship between systems without going into details, for example, which information flows between systems, or the nature of dependencies. Each system should have a description which details the purpose of the system, for example, sales support, booking system, human resources…. This allows stakeholders to agree on the technologies that are needed, or exist to support the business.
Process and flow
Processes and flows describe a sequence or pathway through a series of activities. These describe “how” the architecture behaves in a particular scenario. In a primary view we show the control flow of activities, providing a focus on how the activities are performed.
This provides a common view to stakeholders as to how the architecture behaves in certain scenarios. This may be applied to organisational processes, or technical processes.
Composite Views
When we are asked to provide a solution to a specific problem it is no longer sufficient to consider a single aspect of the architecture, we have to consider how these aspects can be combined to create solutions. Having worked with the primary views, stakeholders together with the architect can start to combine elements from the primary views to create different solution alternatives. Since the stakeholders already have an understanding and agreement of the primary views, it is easier to work with fitting the elements together in more complex views, or what I would call Composite Views. Just as the name suggests we create views composed of elements from several primary views. These are more complex views but if we understand the primary views we have a foundation to work from, and by combining elements from the primary views we can create implementable solutions.
Workflow with Information
Imagine we are taking the primary views from the previous section and want to consider creating a service for taking customer orders. In the first instance, we might want to consider the workflow for the service, and the information that is needed at each stage.
The diagram above is an example of a composite view, showing the workflow and the information required for a Customer Order Service. Notice that the concern here is with, how the activities in the service will execute, what information will be used in each activity, and perhaps even when the activity occurs.
Workflow and Technology
Another aspect of a Customer Order Service might be which systems we expect to support the workflow as orders are processed. We may even consider the roles or actors which require access to the systems in order to perform the work.
The diagram above is an example of a composite view, showing the workflow, systems and actors required for a Customer Order Service. Notice that the concern here is how the service will work, which systems will be used and who uses the system.
Other composite views
The example composite views show one possible implementation of the Customer Order Service, but there are many combinations and alternatives we can create by mixing the elements from the primary views. For example, perhaps we want the Mechanic and the Chef Mechanic to work with workshop planning, perhaps the customer order information should also be available in the workshop planning, or perhaps the customer can interact directly with the Sales and Order System.
The value of Primary and Composite Views
Working with stakeholders on primary views allows the architect and stakeholders to agree on the fundamental structures and behaviours of the architecture. It provides a platform to stand on for solving more complex architecture problems, but these views do not provide solutions on their own.
Composite views provide a way of exploring many alternative solutions for a given assignment. Since the stakeholders are already familiar and have reached an agreement on the primary views, it is easier to work with the more complex composite views.
This is not a sequential way of working with architecture but an iterative method. After working with composite views we will find ourselves returning to the primary views and refining them or adding new pieces of knowledge. This in turn may provide us with new composite views. This is how we evolve an architecture, and at the same time build models to help us anticipate the consequences of change together with the architecture stakeholders.
As the architecture evolves we gain a greater understanding of the primary views. We can then use these views to shape the architecture into composite views providing many alternative solutions, just like an artist mixing the colours from a palette.
I have been working with Agile development for 10+ years and often in a variety of roles, such as Enterprise Architect, Solution Architect, Product Owner…. Since I work mainly as a consultant, I have had the privilege to work in several different sectors, companies and products. If there is one thing that all my assignments have in common, it is that each assignment is unique. Each assignment and organisation comes with its own culture, different people, different technologies and approaches to architecture. What works on one assignment, does not necessarily work on another.
This is why it was refreshing to see Disciplined Agile taking a different approach to agility. Many prescriptive Agile methods, for example, SAFE or Scrum, don’t really take business context into the equation. A framework is presented and the organisation has to try and bend itself into the framework. This can cause friction in the organization, problems with culture clash, and can lead to disillusion as results are not achieved as quickly as expected. Rather than a framework, Disciplined Agile provides a toolkit that can be used to help an organization develop its way of working and improve agility.
Disciplined Agile provides several principles, promises and guidelines to help foster an agile mindset. In this article, I would like to emphasise three of the principles which I think can help in developing scaled agile practices.
Context Counts
“Every person, every team, every organization is unique. We face unique situations that evolve over time. The implication is that we must choose our way of working (WoW) to reflect the context that we face, and then evolve our WoW as the situation evolves.” – PMI, Disciplined Agile® (DA™).
The “Context Counts” principle is really important. If your business is a start-up company, adopting Agile practices is in many ways easier. There is no baggage from other methods, and the culture of the organisation can be developed from scratch.
Organisations that have established cultures, processes and methods have another challenge, this transformation is perhaps more of a long-term journey than something you transform in a short space of time. Organisations that operate in heavily regulated environments also have challenges with agility, for example, nuclear, pharmaceutical or medical industries. Heavily regulated environments often require time-consuming validation processes since the risk of errors can lead to life-threatening consequences. From a business perspective, non-compliance poses significant business risks with major financial implications, substantial planning up-front may be required to mitigate these risks. In many of these types of assignment documentation is as important as a working system, non-compliance of documentation may mean the product will not make it to market on time.
This is why context does count, there is no point trying to fit your organization into a way of working, which is not working. So instead select a way of working which works for the context of the organisation and continually improve the flow as you go.
Be Pragmatic
“Our aim isn’t to be agile, it’s to be as effective as we can be and to improve from there. To do this we need to be pragmatic and adopt agile, lean, or even traditional strategies when they make the most sense for our context. In the past, we called this principle “Pragmatism.” ” – PMI, Disciplined Agile® (DA™).
This is a common-sense approach to developing a way of working, if it works well keep it, if it doesn’t try something else, but don’t restrict yourself to a specific framework or strategy. While organisations running with prescriptive frameworks may gain substantial benefits, they may feel that the framework is inflexible and that they cannot adopt a way of working that doesn’t quite ring true with the adopted framework.
Being pragmatic means that we should be open to adopting and combining ways of working that best suit our business context. As indicated, the aim of the business is not to be the best at following a particular agile practice, it is to be the best at delivering value to our customers.
Enterprise Awareness
“Disciplined agilists look beyond the needs of their team to take the long-term needs of their organization into account. They adopt, and sometimes tailor, organizational guidance. They follow and provide feedback too, organizational roadmaps. They leverage, and sometimes enhance, existing organizational assets. In short, they do what’s best for the organization and not just what’s convenient for them. ” – PMI, Disciplined Agile® (DA™).
This principle is perhaps what brings scale into the arena. As an organization becomes larger and complex, agile teams need to be aware of what is happening outside their team, as well as inside their team. Sharing assets across the enterprise provide a basis for making the organisation as effective as possible. If someone has already invented the wheel, we don’t need to invent it again, right?
Organisational guidance can support faster decision-making, and help avoid significant business risks. Common guidance, roadmaps and enterprise strategies provide a way to help the many different teams pull in the same organizational direction. This significantly contributes to making the organization effective as a whole.
Disciplined Agile and Architecture
In the same way that a way of working is unique to a particular business context, we can also conclude the same is often true about architecture. Every solution or enterprise has its challenges, its requirements and its constraints. This affects how we practice architecture within an organization. Since architecture often requires planning-up-front, in some cases quite a lot of planning-up-front, this can be a point of contention with some agile methods. This is not about architecting every detail of a solution before we start development, but rather making sure we cover enough of the fundamental architecture decisions to provide a stable platform for development and ensure that we reduce the significant business risks. With the significant design decisions in place, the architecture and design can then evolve as the solution develops through its lifecycle.
Disciplined Agile accounts for the business context, pragmatism and enterprise awareness, which aligns well with the practice of architecture. This flexibility means that architecture and agility fit together and complement each other, even as the architecture and agile practices scale with the organisation.
So “Hooray for Disciplined Agile”, a better fit for working with architecture, and perhaps just a better fit overall.
Gathering substantial Technical Debt can weigh your business down, affecting your business agility and value delivery. To keep Technical Debt on balance we have to be able to measure it. In addition, it is important to separate Technical Debt from Quality Issues, the difference being that Technical Debt is intentional and Quality Issues are not. This distinction is important from a business perspective, since Technical Debt gives a trade-off in delivery time, which is of value to the business. A Quality Issue gives no value to the business, it is only a burden. Read more about this in my article “Is Technical Debt Good?”.
Since Technical Debt is intentional, then we are aware it exists and we can record it. At the point of taking on the debt we probably know the cost of the temporary implementation and we also probably have some idea of the cost to fix it at a later date.
Several articles suggest using a Technical Debt Ratio (TDR) to measure Technical Debt. This creates are ratio between the total cost of product development (Development Cost) and the cost of the fix (Remediation Cost). This provides a ratio in the form of a percent.
For example:
Development Cost = 9000h Remediation Cost 150 h
TDR = (150/9000) * 100 = 1.6%
The reasoning behind using this ratio is that we can relate the debt according to how large the product is, instead of using absolute values such as hours. For example, 100 hours may be a large debt for a small product, but acceptable for a large product. TDR provides a metric for measuring Technical Debt, but its accuracy and reliability as a cost indicator lies in how we calculate the Development Cost and the Remediation Cost.
The business should understand and quantify the total cost of remediation. While TDR may be useful, it is the Remediation Cost that we use to calculate resources or calendar time required to fix the debt. So, we need to keep tabs on a ratio to give us a relative measure, and the Remediation Cost to provide an absolute measure. We will want to apply this not only to Technical Debt but also to Quality Issues. This allows us to assess the cost of quality issues, debt and gain a total set of metrics for technical quality.
Finally, to produce good metrics, we need a way to manage the Technical Debt and Quality Issues so we can track, prioritize and resolve issues.
Development Cost and Remediation Cost
Before we move into how we manage Technical Debt and Quality Issues, we should consider how we define Development Cost and Remediation Cost.
In many articles, I notice that lines of code are used as a way of calculating Development Cost and Remediation Cost. The number of lines is multiplied by a time value to gain an idea of cost. In my opinion, this does not provide an indicator of the cost of development. Firstly, development is not just about code, it’s about the time taken to design, collaborate, test, review, document, etc.… It is very difficult to turn lines of code into time or cost, the process of developing code often means that the developer has several attempts at the code before settling for a given implementation. So, the lines of code represent the final result, not the time or cost spent to achieve the result. We also need to account for many of the soft factors of development, for example, competencies, motivation or work situation, which also affects cost. This is difficult to assess from lines of code.
Note that while lines of code and other metrics (such as complexity, coupling, inheritance) are valuable indicators about the quality of the code, and they may be able to provide a ratio in terms of quality, however, I’m not sure they say much about the cost. They just give a measure of the current state of quality, which is difficult to turn into a real term cost.
TDR uses Development Cost to create a ratio for Technical Debt, where Development Cost is the total cost of developing the product (although different articles appear to have different views on this). The thinking is that we can relate Technical Debt to the cost of developing the product, to create a relative value. For example, a product with a Development Cost of 20 000 hours and Technical Debt of 2000 hours, provides a TDR of 10%, and so does a product with a Development Cost of 1000 hours and Technical Debt of 100 hours.
To provide an accurate ratio, TDR relies on the accurate calculation of the Development Cost, but how do we work out that cost? The finished code in quality and quantity doesn’t say much about the cost which was incurred to create the code. Consider any product, a car for example, we may be able to inspect a car and see that the car is top quality but we cannot know the effort it took to make the car without knowing a lot more about the process and resources used to produce the car. If we cannot estimate the Development Cost accurately, what does that say about the meaning given to the ratio?
The Technical Quality Backlog Since we want to manage issues primarily concerned with quality, it is useful to create a specific backlog for Technical Debt and Quality Issues. In the backlog, we categorize items as Technical Debt or Quality Issues. This can form part of the product backlog if you are working with agile practices, so long as we categorize the items.
When we take on debt or discover a Quality Issue, we create an issue in the backlog with the Remediation Cost. We can, for example, use hours as our measurement of cost. If you are working with agile practices this will be no different from estimating the work required for other items in the backlog. From the backlog, we will be able to calculate the total number of hours required to remove our Technical Debt and our Quality Issues.
At regular intervals, a Technical Quality Review (TQR) is held, for example, every two weeks, or at the end of an agile sprint. The TQR allows the team to review each item in the backlog, update the remediation times as required, and prioritize items for work. The result of the review is an up-to-date Remediation Cost for all items in the backlog. An important aspect of analyzing Technical Debt and Quality Issues is to keep a record of the Remediation Cost over time since this data can be used later to plot trends and analyze technical quality.
Use Investment instead of Development Cost
From a business perspective, we know that Technical Debt and Quality Issues harm product development, and we want to measure how this affects value delivery. Sometimes we want to focus on delivery at the cost of quality, but we don’t want to end up in an impossible situation where the quality of the product is so poor that it seriously affects delivery. Instead of solely looking at the code base, perhaps we can gain more insight into costs if we consider comparing Quality Issues and Technical Debt against investment. We normally have an idea of the current investment placed in the maintenance and development of the product, and should even have access to historical data on investment. It is also reasonable to assume that there is an investment plan or budget for the product. So, if we know how much investment we have available, it is useful for the business to know how much of the investment is required to resolve Quality Issues and Technical Debt. This will also enlighten the business as to how much of the investment goes to value delivery. So, using the information in the Technical Quality Backlog we can start to plot the Quality Issues and Technical Debt against investment. For example, say we have a yearly development budget of 60 000 hours. This would mean we can perform a TQR every month against 5000 hours of investment.
The above graph shows how we can track Quality Issues and Technical Debt remediation costs in hours. This puts the cost in real terms so we can work out how many resources are required to resolve technical quality issues, and how many hours can be allocated to producing value-added features. While this is useful as a graph and analysis of cost, this does not perhaps give a ratio that can be used as a relative indicator. However, we can apply the same formula as TDR but use Investment instead of Development Cost, to gain a Payback Ratio.
This can be calculated for Quality Issues and Technical Debt, these together provide a ratio for the cost of maintaining technical quality.
The above graph looks pretty much the same as the graph showing Remediation Cost, with the exception that it can be compared to other products to assess the Payback Ratio. If the technical debt or issues are not resolved, they will continue to eat at the budget and affect value delivery. Of course, we can increase investment to reduce our Payback Ratio, but that requires a decision to inject investment. This should raise the question as to why debt has not been paid off within the current budget and require analysis of what is causing the issues.
It is also important to stress, that both a ratio and cost are important measures that complement each other, and both should be tracked.
Tracking Payback
It is perhaps also useful to track how much debt and issues are being resolved. If we see that we are using the investment to resolve debt and issues, but continue to see a rising Remediation Cost, this would be an indicator that serious action is needed to address quality issues in product development.
In the graph above we can see the resolved debt and issues represented as a percentage of the actual investment used to resolve technical quality. We can also see that although an effort is being made to reduce debt and issues, it is not having the desired effect on our Remediation Costs.
Resolved debt and issues can be calculated by logging the working time in the Technical Quality Backlog. This, however, does require discipline from those carrying out the work, so that the actual time used to resolve the issues is logged.
Debt Creep
Debt Creep is what can get businesses into trouble, for example, we might take on what seems like a small debt and then, 3 months later the Remediation Cost for the debt is 3 times the original value. This is Debt Creep, and we need to understand how the cost of our debt change over time, this can also be applied to Quality Issues. Estimating Debt Creep is challenging, as it is difficult to determine just how future development will affect a particular debt. This is why a regular TQR is used to continually assess, update remediation times, and prioritize items for work.
The following are examples of activities that affect Debt Creep:
Further modification to the Technical Debt (further workarounds)
Creating dependencies to the Technical Debt (increases the scope of change)
Changes in the team regarding skills or competence (degradation of knowledge)
Changes in the designs and implementations surrounding the debt (the rest of the product progresses, but the Technical Debt remains unchanged)
In an attempt to estimate Debt Creep we can use the historical trend we have recorded from the Technical Quality Backlog, and then apply that trend forward in time. As can be seen in the graph we also apply this to Quality Issues.
We can estimate the future debt and quality issue cost by taking an average deviation of the historical results (for example last five results) and plot this accumulatively at several intervals into the future.
This will perhaps give an indication of the Technical Debt/Quality Issue trend based on current information. The weakness in this method is it does not account for the future state of product development, for example, if development enters an intensive period, if more investment becomes available to the project, or if there are plans to replace parts of the product. However, as an indicator, this is useful for drawing attention to what can happen if we do not gain control over our debt and issues.
No Silver Bullet
While it would be nice to have a simple solution to estimating the costs of Technical Debt and Quality Issues, we have to estimate many different factors in a Remediation Cost to gain an accurate estimate. Quality metrics and automated tools may help in assessing the quality of a technical product and provide valuable information to an estimation, but these metrics alone are difficult to translate into real costs.
We can measure Technical Debt and Quality Issues by maintaining a backlog, in the same way, we maintain backlogs for other items in software development. Keeping track of Quality Issues and Technical Debt separately provides the business with an important indicator. Technical Debt is intentional and the business gains value through an early release time trade-off, while Quality Issues are unintentional and provide no value to the business. The cause of Quality Issues is very different to Technical Debt, for example, we may have to resolve problems with processes, organization or skills. Keeping a Technical Quality Backlog up to date with the Remediation Costs for both Technical Debt and Quality Issues provides a basis for estimating the Remediation Cost. This provides a cost for maintaining technical quality based on the ability and judgement of the team managing the product. We can use investment and Remediation Costs from the Technical Quality Backlog, to calculate a Payback Ratio and an absolute cost. This provides the business with an indicator of how much investment is being used to deliver value and how much is being used to maintain technical quality.
The term technical debt, seems to cover all manner of sins these days. It is sometimes used, in my opinion, incorrectly as a motivation to adopt new technologies or features. It is also often used to describe any kind of technical quality issue with a product.
Ward Cunningham the original author of the term indicates that the debt term is about being able to make a gain, in terms of earlier delivery time. Rush the software out to delivery, accepting that the software design and implementation is below expectations. Then quickly refactor the software with gained experience to pay back the debt. Interestingly, he notes that quality of the code must be good, in order to make refactoring manageable.
I recently read an article where technical debt was characterized as being deliberate or inadvertent. This made me reflect on what we mean by technical debt, and how does it fit with QA (quality assurance). The difference being that technical debt can be a good thing, it helps you deliver a product earlier at a cost of correcting the debt in the future. An issue with QA, for example poor code or poor design, has no gain what so ever, it provides only a constraint on future development.
I would define technical debt as having three characteristics:
There is a decision to accept the debt, debt doesn’t just appear.
There is a clear value gain from the debt, for example in terms of delivery time.
There is a specific intention to pay the debt back in the short-term
The decision to accept debt means that we know there is an expectation of how we should be delivering something, but we intentionally choose a sub-optimal implementation in order to gain a specific value, often quicker delivery. If it is a technical debt, we are aware that it exists and we have an intention to fix it, in the short-term. That suggests that we have a plan for managing the debt, for example, probably tasks or stories in your backlog.
Technical debt differs from QA issues. If the product is suffering from quality issues, such as poor code, no designs or weak architecture, this is not technical debt in my view. This is just poor quality, there is no gain to delivery time, there is no trade-off. While technical debt is controlled, QA is perhaps more inadvertent and indicates that action is require in the QA processes to ensure better quality. This is not the case with technical debt. The decision to take on technical debt is made knowingly, and is not a QA issue unless it doesn’t get fixed as promised.
It is also worth remembering that quality and debt are outcomes. What is really interesting is the causes of poor quality or rising technical debt. In the case of technical debt, we might find that a continuous pressure on delivery causes growth of technical debt. There is no time for the development team to pay back the debt, so the debt grows exponentially. If the debt is left too long it will eventually come to a point where it is no longer debt but a constraint, it will be simply too expensive to fix and the best that can be done is damage limitation.
With regards to quality issues these are identified and resolved by employing good QA practices, for example, tests, mentoring or reviews. Quality issues appear unexpectedly when the QA practices are not sufficient to identify quality issues early in the development process, for example, badly written code or poor design. Unlike technical debt there is no value gain, in fact, the delivery of the product would probably be faster if the quality had been maintained in the first place. However, just like technical debt, if these issues are not addressed in the short-term, they will come to a point where it is just too expensive to fix, and this will place a constraint on the product. I think the important take-away here is that technical debt is intentional and is a good thing if used wisely. So long as you look after your debt and pay it back in a timely fashion, the trade-off for a quicker delivery time can be very advantageous. However, quality issues are never a good thing, these only lead to problems with the product and give no value gain. These should be minimized by continually improving QA processes and fixing the issues at the earliest opportunity.
When an organisation makes a significant strategic change, this can result in the need for a large-scale digital transformation. This may feel like you have a mountain to climb, or perhaps several mountains. The organisation may need to modify or gain capabilities in order to meet market demands. This will likely have a profound effect on the architecture of the organisation, for example, decommissioning of legacy systems, launch of new systems, changes in work processes or re-structuring the organisation. These types of changes are often long-term engagements since they affect so many aspects of the organisation’s business, and since the scope of change is often large, they represent a challenge in managing the transformation.
In such large-scale transformations the architect can aid a smooth transformation by providing a solid plan describing how the architecture will change over time. This delivers the following benefits:
Provides a common plan for all stakeholders
Helps make the transformation manageable
Ensures the right activities are done at the right time
Helps to mitigate business risk
Provides tangible deliverables and goals for the transformation
I often use a transition architecture approach when working with these large changes. This approach is inspired by methods described in TOGAF ADM (in particular Phase E: Opportunities and Solutions) and provides an iterative approach to the transformation.
When starting a digital transformation, we should know at least two things before executing the strategy:
The current state of the architecture (sometimes referred to as “as-is”)
How we want the architecture to be when the transformation is delivered (sometimes referred to as “to-be”)
The current state of the architecture is the Baseline Architecture. This is the current architecture of the organisation which is deployed and operational, as we speak. We can describe the Baseline Architecture by analyzing the current operations within the organisation.
How we want the architecture to be, is largely formed by strategic objectives, which are aligned with the goals for the digital transformation. The objectives provide the architect with a foundation on which to construct the future architecture. This is the Target Architecture.
Since the Target Architecture is often planned for the long-term, it is prone to change. So, we can think of the Target Architecture as a guiding light. It is important to get the balance of detail right in the Target Architecture, just enough detail to guide the transformation. Putting too much detail in the target architecture will make it difficult and time consuming to maintain, as changes occur over time.
Working with just the Baseline Architecture and the Target Architecture presents a problem. The gap between the two architectures is often substantial, and this presents a challenge in managing the transformation in a single jump. We also want to deliver value to the organisation and its customers as quickly as possible, so it is an advantage to deliver parts of the transformation as we go. This is where we can use Transition Architectures to make management of the transformation easier and deliver value incrementally.
The Transition Architectures are like milestones on the way to the Target Architecture. These contain the detailed architecture descriptions and the closer the transition is to delivery, the more detailed the architecture description.
Starting from the Baseline Architecture we can consider how we can change the current architecture in a series of deliverables which will eventually result in the Target Architecture. A good principle is to think about the minimal changes that deliver value in each stage, and the business priorities of the organisation. For example, in the first transition we may focus on a specific functionality, capability or organisational unit which delivers high value.
In large transformations, it is common that each transition will require changes to a number of aspects in the organisation, for example, several systems may change, work processes change, training for personnel is required or infrastructure is modified. These can be viewed as the deliverables of the transition, and may well be executed as separate projects.
Since each digital transformation is unique, the viewpoints used to describe the architecture depend on the type of transformation required. The way the architect describes the architecture is specific to the transformation scenario and relies on the skill and experience of the architect to choose the right viewpoints.
There are many viewpoints which can be used to describe a transition architecture, the diagram above shows an example of the logical system landscape viewpoint which I often use to show how the system landscape will change. This viewpoint shows all the systems in the landscape and their information flows, where each information flow can be regarded as a dependency. This is often a useful way to communicate a transition architecture with stakeholders as it shows existing systems in the system landscape, and systems which will be launched when the transition is complete (boxes with dotted lines). Variants of this viewpoint can be used to show other aspects of the system landscape such as, systems which are decommissioned, systems which are modified, or systems which are unchanged. This viewpoint can be used with many types of stakeholders and a deep technical knowledge is not required to understand the system landscape.
Understanding the information flow between systems in a transformation is really useful as is provides an understanding to all stakeholders regarding complexity, and highlights the need for collaboration between the various product/system owners. For example, perhaps we cannot decommission a system until some other systems are in place, or we cannot change a process until the supporting systems are launched.
A digital transformation which requires several transitions, in this case, would have a logical system landscape view for each transition showing the progression towards the Target Architecture.
As mentioned earlier, we are often working with long timescales (often years) and should expect the Target Architecture to change during the execution of the transformation. It is therefore important to continually review the Target Architecture and adjust Transition Architectures to make sure they are aligned. I find that working with a large digital transformation in this way increases manageability, and provides stakeholders with a common view of deliverables over time. Using this method facilitates delivering value as quickly as possible without having to fully implement the Target Architecture, this aids agility. At the same time the Target Architecture allows stakeholders to adjust their strategy and future state providing a long-term guiding light for the transformation.
In general, I would say that most people in the IT world would agree that a good architecture is of fundamental importance to any successful technology development. How would the Golden Gate Bridge have looked, if not for the architectural talents of Joseph Strauss? IT-Architecture is a profession, and the people that practice architecture require a specific set of skills and experience, in order to develop good architectures. Recently the concept of Agile Architecture considers architecture from two perspectives, Intentional Architecture and Emergent Design. While I have no argument with an evolutionary approach to architecture, I think it is relevant to question how much of Emergent Design is architecture.
As a starting point I always like to go back to the famous quote by Grady Booch.
“All architecture is design but not all design is architecture. Architecture represents the significant design decisions that shape a system, where significant is measured by cost of change.”, Grady Booch
What I like about this is statement is it gives us some leading principles for both what an architecture is and what the role of an architect should be.
Firstly, architecture is design. That means we make some kind of plan up front in order to help someone build something tangible. If you are building without a design, then you are just building ad hoc, generally not recommended for any complex problem software or otherwise.
Grady indicates that architecture is about “significant design decisions”, and the real emphasis here is on the word significant. When designing a large system or enterprise there will be a great number of design decisions made, but not all of these are architectural decisions. Many designs and decisions are made without having an effect on the architecture. Drawing the line between architecture decisions and software design decisions is perhaps the cause of friction between some architects and development teams, and it is a delicate balance. Architects we need to avoid micro-managing design but at the same time make the significant architecture decisions, without which a weak architecture is the result.
So where do we draw the line between architecture and other forms of design? Grady defines significant in terms of the “cost of change”, where that cost can come in many different forms. I consider this as not just to be the cost of making the changes to the technology, but the costs to the business. Examples of the cost of change may be, effort from business roles other than developers, testing, cost due to delayed projects or the cost of sub-optimization when an optimal design can no longer be achieved. The significant “cost of change” is the cost to the business, not just the effort from the development team.
So, which significant decisions in a technology development drive costs?
These are often decisions which are difficult to reverse. Changing such a decision may result in weeks or months of work. This is usually because other architectural or software design decisions will be made based on these significant decisions. Thus, changing such significant decisions will change a number of designs and technologies, requiring greater effort and time.
This is perhaps what we would call the Intentional Architecture in agile terms. These architecture decisions are required up-front and usually require a good deal of analysis before moving to execution. The reason being that the cost of change to the business has a significant effect on value delivery.
The following are some examples of decisions which fully or partly fall into this category:
Style and Patterns – the choice of architectural style (for example service-oriented, tiers, cloud-based), or the choice of architectural patterns (for example MVC, MVVM, domain-driven). These patterns provide a foundation, structure and behavior for functional design, this is difficult to reverse, as it changes the very foundations upon which components are designed.
Critical or major functional requirements – decisions regarding the functional design of a solution, which have a serious effect if they fail. For example, if failure results in serious financial loss, damage to business reputation, or injury/loss of life.
Quality Attributes (a form of non-functional requirement) – design decisions regarding quality attributes such as security, performance, scalability, safety etc.., are often critical to a successful technology delivery. Functional designs for the technology are built upon the decisions made to meet Quality Attributes. This makes these decisions hard to reverse, since it will impact much of the functional design.
Broad scope of change – decisions which have a broad impact on the business. Such a change may be broad in the terms that it affects many systems, business processes or many organizations within the business. A small technology change can drive a major change in the business.
So where does Emergent Design come into architecture? Well, to start with we have to remember that all design is not architecture, the architecture designs are of a significant nature. This is important for the architect and other roles which perform design, for example, lead devs, developers and UX. There needs to be room to take decisions close to the development team in order to facilitate agility. The architecture should provide a guide and a framework which helps the team to meet the architectural requirements, and feel confident about making their own design decisions.
Emergent Design is often design which is much closer to implementation and may well arise from prototyping in the source code. The question we have to ask ourselves here, is if we are happy to let these designs emerge so close to implementation, how significant are these decisions? Perhaps they are not even part of the architecture, but in fact just part of the solution design. Although, it may well be the case that a design pattern emerges during the solution design and may be adopted as part of the architecture.
While I don’t doubt the merits of working with Emergent Design, the value of an architecture lies in the significant and intentional design decisions. These are the design decisions which help keep a product sustainable and actually provide the platform for agility. The risk with focusing on Emergent Design as a process for constructing architecture is if you leave the significant architecture decisions until late in the development process, it might be too late. This may result in a refactoring hell, having to perform major re-designs, or in the worst case, explaining to your sponsor that the product is no longer sustainable from a cost perspective.
A number of months ago I wrote an article for IASA on Velocity for the ITaBoK. You can find the article here. It really got me thinking about what velocity really is. A high velocity is certainly a desirable outcome in any organization, and a good IT-architecture should facilitate velocity.
So how do we measure velocity? I notice that in many organizations using agile practices, that velocity is measure by the rate at which you get things done. However, this only measures the speed at which the team can work, essentially measuring throughput.
“Speed is the time rate at which an object is moving along a path, while velocity is the rate and direction of an object’s movement.”, Britannica, What’s the difference between speed and velocity?
The significant factor with velocity is that it is both rate and direction. In terms of software development, velocity becomes a much more interesting measurement if we consider the travel towards value delivery as the direction.
A problem with measuring speed, or the rate at which you get things done, is that there is no guarantee that the work which is being completed adds any value. For example, if we run several sprints with lots of refactoring, re-design or bug fixes, this can give the impression that the team is working really well doing a lot of work, but in fact there is little value being delivered.
If we instead measure only the outcome of completed work which adds value, and we can see the rate of value delivered from the team. This can help identify problems early before teams start speeding off fast in the wrong direction, for whatever reason, re-design issues, technical debt, major refactoring or architectural issues. This type of velocity gives a completely different perspective for the business.
Of course, the business has to define what it considers as value. Then it is possible to compare velocity with throughput, this will give an idea of how much work is non-value. This is not to say that the non-value work doesn’t need to be done, it is often required to recover from bad decisions, technical debt, design problems, etc… In fact, it may well be intentional, since in some cases technical debt is accepted in order to meet a deadline, in the understanding, that velocity will be sacrificed at a later date when the technical debt needs fixing. Using velocity in this context provides a great indicator of value delivery. This can help businesses identify potential problem areas, act early to fix problems and reduce waste.