Contact ITInvolve

Archive for April, 2014

Improving Configuration Management: Getting Control Over Drift

Monday, April 28th, 2014

Configuration Drift poses a number of challenges to your IT organization and your business; for example the risk of non-compliance with security policies, performance and availability issues, and the failed deployment of new application releases.

To address drift most IT organizations have now employed some combination of scripts (or automation tools), a configuration management database (CMDB), and have defined a software configuration management approval process.  Despite these efforts, we find that configuration drift still occurs a lot in large enterprises.

Why is this the case?

First, if you are like most IT organizations, you probably follow the 80/20 rule with your administrators focusing 80% of their time on the configuration elements they consider most important to their roles and that leaves quite a gap where drift can still occur.  What’s more, if you are using scripts and automation tools to enforce configurations, it’s important to keep in mind these approaches rely on an explicit formula – meaning you have to specify exactly which configuration settings to enforce and when.  This leaves things pretty much wide open that settings you haven’t gotten around to specifying can be changed and additional software installed that might cause problems.

For example, let’s say that your security policy states that a certain range of TCP/IP ports should not be open on a certain class of servers.  You might reinforce this policy with an automation script that routinely verifies the port status and closes any ports in the range that may have been opened through some other means.  Sounds like you’ve got things covered, right?  Well, what if that port was opened as part of a change process to deploy a new application to one of those servers, and what if those working on the project knew nothing about the TCP/IP port enforcement script.  They deploy the new application, test it to make sure all is working well, and then send out the email to the user community letting them know the new application has been launched – a great day for IT enabling the business! Then, overnight (or the next time the script is scheduled to run), the port is closed.  Users come into work the next day and are unable to access the new application, calls start coming into your service desk, an all hands on deck meeting is hastily assembled, and, after some period of time, the port closure is identified as the issue and the port is reopened – only to have it closed again the next time the script runs – until finally someone realizes this script is the underlying cause (because probably the person who wrote it is no longer there and they didn’t document it other via a notation in an audit report that a script was the enforcement mechanism selected.)

Consider another example, where we have an application that has very low utilization most days except for spikes of activity at the end of each month (such as an application that accepts orders from a dealer network).  Let’s say an engineer is looking for some available equipment to install a new application on and identifies the same server running the dealer order system as a good candidate because of its strong specs and low average utilization.  He installs the new app and everything is working great until the end of the month when the dealer network comes alive with hundreds of orders every hour.  Now because we have two applications vying for the same physical components, we start to see performance issues and scramble move the new application to other hardware, taking it offline in the process, and putting it on an available server with lesser specs causing it to run slower than before irritating the user community even further.  In this scenario, your automation scripts would have done nothing to prevent this drift from the expected configuration (i.e. the dealer order system is the only application running on this box), because they would have no awareness that the new application even existed.  What’s more, automation could have actually made things worse if you had employed a strategy to periodically wipe and rebuild your machines (these are referred to as “phoenix servers” and it’s another strategy some have tried to reduce drift) – because, in this case, if you had followed such an approach your new app would have been erased from your data center entirely at the new rebuild.

So how can you get control over drift and avoid these sorts of issues?

First, the scripts and automations you have running need to be documented including what they do, when they run, and who is responsible for them.  With this information, you can make people proactively aware of any script and configuration conflicts as part of your change and release management process.  This will help you avoid the first example where the TCP/IP port was unexpectedly closed, because your team is more aware of and can account for the fact that there needs to be an exception to your TCP/IP port range – not only updating the script to reflect this but also documenting the exception proactively for your auditors.

Second, with accurate documentation about how your environment and key applications are configured, you can better understand why that dealer order system was running on equipment all by itself (because the tribal knowledge about the end of month peak loads was documented), and you can then also compare the current state against the expected state to identify drift issues and take action to address them as appropriate.  For example, you might trigger an incident and assign ownership to the relevant administrators who own the automations for that equipment and/or applications.

ITinvolve’s Drift Manager can help you implement both capabilities and more.  Drift Manager helps you document scripts and automations as well as “gold standard” configuration settings leveraging information you already have (via importing or federation) while also capturing the undocumented tribal knowledge and validating it through social collaboration methods and peer review.  Drift Manager also helps you compare the current vs. expected state in real-time and then facilitates raising drift incidents when required.  What’s more, ITinvolve helps you “broadcast” upcoming configuration changes so all relevant experts are included in your configuration management process and can fully assess the risk and implications to avoid the kinds of issues discussed above.  Finally, it ensures your teams are aware of the policies that govern your resources so that, as configuration changes are being considered, the potential policy impacts are considered at the same time.

No matter your approach, configuration drift will happen.  The question is, do you know about it when it happens and can you get the right experts quickly engaged to address it without causing other issues?

Matt Selheimer
SVP, Marketing

Merging Creation With Operations

Monday, April 21st, 2014

How facilitated collaboration enables continuous delivery for successful DevOps

by John Balena

So much has been written lately about the challenge of improving IT agility in the enterprise. The best sources of insight on why this challenge is so difficult are the CIOs, application owners, ecommerce and release engineering executives, and VPs of I&O, grappling to change their organizations right now.

At a conference I attended recently, I met two Fortune 100 IT executives from the same company: one the head of development and the other operations. Their story is emblematic of just how hard this is in the real world. As interesting background, both the development and operations leaders were childhood best friends, participated in each others’ weddings, and spend time together socially on an almost weekly basis – but by their own admission, even they couldn’t get effective collaboration and communication to work between their two organizations.

The lesson learned from this example is that the DevOps collaboration and communication challenge cannot be solved by sheer will, desire or executive fiat. Instead, you must breakdown the barriers that inhibit collaborative behavior and facilitate new ways of communicating and working together. The old standbys of email, instant messaging, SharePoint sites, and conference calls don’t cut it.

The challenge of two opposing forces: Dev and Ops

Imagine yourself helping your children put together a new jigsaw puzzle. Each time you turn your attention to a specific piece, the kids reorganize what you have already completed and they add new pieces, but in the wrong places. For sure, three pairs of hands can be better than one, but they can also create chaos, confusion and significantly elongate the completion of the puzzle.

The collaboration challenge in the DevOps movement is grounded in this analogy. How do you get multiple people working together across teams, locations, and time zones to build and get things deployed faster without chaos, confusion, and delay? How do you get these teams to speak the same language and collaborate together with a singular purpose when their priorities and motivations are so different?

Faced with this challenge, it’s easy to see why many organizations have stayed in their comfort zone of ‘waterfall’ releases and keep the number of releases per year small. The issue is this method isn’t meeting the demands of the business, the market and competition. As a result, more and more business leaders are going around their IT organizations. Options like public cloud, SaaS, open source tools, skunk works IT and outsourcing are making it easier for them to control IT decisions and implementations within the business unit or department itself.

So let’s dive deeper to understand the two forces at the heart of the issue: development (focused on the creation or modification of applications to support a business need) and operations (delivering defined services with stability and quality). It appears these forces are working in opposition, but both groups are focused on doing what leadership asks of them.

Developers tend to think their job is done once the application is rapidly created, but it’s not because no actual value has been delivered to the business until the application is operational and in production. Operations is severely disciplined when services experience performance and availability issues and they have come to learn that uncontrolled change is the biggest cause of these issues.  As a result, operations teams often believe their number one job is to minimize change to better control impact on performance and availability. This causes operations to be a barrier to the rapid change required to give the business the speed and agility it needs.

Critical to enabling DevOps is an explicit recognition of this situation and the ability to link discrete phases of the application development and operations lifecycle to enabling ‘fast, continuous flow’ – from defining requirements, to architecting the design, to building the functionality, to testing the application, to deploying the application to both pre-production and production environments and to managing all the underlying infrastructure change required for the application to operate efficiently and effectively in all environments.

Why current approaches don’t work

There are several challenges in achieving this ideal.

  1. Developers hate to document (can you blame them?), and, when they do, their communication is in a context they understand, not necessarily empathetic with the language that operations speaks. The view from operations is that the documentation they receive is incomplete, confusing, and/or misleading. With the rapid pace of development, this challenge is getting worse with documentation becoming more and more transient as developers “reconfigure the puzzle” on the fly.
  2.  Today’s operations teams typically take responsibility for production environments and their stability. That means there is usually a group wedged in between the two – the quality assurance (QA) team. QA’s job is to validate the application works as expected and often they require multiple environments for each release. This group is typically juggling multiple releases and is, in essence, working on and reconfiguring multiple puzzles at the same time. The challenge of keeping QA environments in sync with both in-process releases and production can be maddening (just talk to any QA leader and they’ll tell you first-hand). The documentation coming from development is inadequate, and the documentation coming from production is often no better, since most operations teams store much of their most current information about configurations in personal files or simply in their brains.
  3. The ad hoc handoffs from development to operations and QA take time away from development’s primary mission: creating applications that advance the business. Some suggest developers should operate and support what they code in order to reduce handoffs and the risk of information distortion or loss. A fundamental risk with this approach is opportunity cost. Does a developer really understand the latest and greatest technology available for infrastructure and how to flex and scale those technologies to fit the organization need? Do you even want them to or would you rather they be coding instead?
  4. Others have suggested that Operations move upstream and own all environments from dev to QA to production, and treat configuration and deployment scripts as code just like a developer would. This may sound like a good option, but it can create a constraint on your operations team and cause valuable intelligence to become hidden in scripts. A particular application deployment could have one or more software packages required and potentially hundreds of different configuration settings. If all that information is embedded in a script, how will other team members know this if they go to make a change to the underlying infrastructure to apply a security patch, upgrade an OS version, or any of the other changes made in IT every day?

Real DevOps transformation doesn’t mean that you give everyone new jobs, instead, it’s about creating an environment where teams can collaborate together with a common language and where information is immediately available at the point of execution and in a context unique to each team.

A better way forward?

In The Phoenix Project, written by DevOps thought leaders Kevin Behr, Gene Kim and George Spafford, the authors promote the need for optimizing the entire end-to-end flow of the discrete processes required for application delivery – applying principles as they achieved agility in the discrete manufacturing process.

Manufacturing in the 1980s resembled IT operations today, employing rigid silos of people and automation for efficiency and low cost, but this became a huge barrier to the agility, stability and quality the market demanded. They learned if you optimize each workstation in a plant, you don’t optimize for the end-to-end process. They also learned that if quality and compliance processes were ancillary to the manufacturing process, it slowed things down, drove up costs and actually decreased quality and compliance.

Successful manufacturers brought a broader view and optimized end-to-end flow rather than operate in a particular silo. They also brought quality and compliance processes inline with the manufacturing process. By addressing quality and compliance early in the cycle and at the moment that an issue occurred, cycle times decreased significantly, costs plummeted and quality and compliance increased dramatically.

These same principles can be applied to IT resulting in:

  • faster time to market;
  • greater ability to react to competitive pressures;
  • deployments with fewer errors
  • continuous compliance with policies; and
  • improved productivity.

DevOps can best be realized when IT operates in a social, collaborative environment that ensures all groups are working with a visual model in their context with the necessary information from downstream and upstream teams, as well as in collaborating with relevant experts at the moment when clarifications are needed or issues arise.

To merge creation with operation, the core idea behind DevOps, requires a cultural change and new methods in which cross-functional teams are in a state of continuous collaboration as they deliver their piece of the puzzle at the right time, in the right way, in-context with the other teams in other silos. Operating something that never existed before requires documentation so that operations teams have the information they need to manage change with stability and quality.

With more modern collaboration methods, self-documenting capabilities are now possible as development, release and operations do their respective jobs, including visualization of documentation with analytics and with the perspective and context each team needs to effectively do their job downstream. These types of capabilities will transform organizational culture and break down barriers to collaboration that impede agility, stability and quality.

Is this simply nirvana, and unachievable in the real world?  No. Manufacturing achieved the same results by applying these principles; the fundamental point being made in The Phoenix Project.

The goal is not to write code or to keep infrastructure up and running, or to release new applications or to maintain quality and compliance. Instead, the goal is for IT to take the discrete silos of people, tools, information and automation, and create a continuous delivery environment through facilitated collaboration and communication. This will drive the cultural and operational transformation necessary to enable IT to respond to business needs with agility while ensuring operational stability and quality.

John Balena is senior vice president of worldwide sales and services at Houston-based ITinvolve. He formerly served as the line of business leader for the DevOps Line of Business at one of the “Big 4” IT management software vendors.