It my previous post, I described what ITIL® is, and from that you can see that it is not something you can just do. Now, I am hardly an expert on ITIL, but I will try to use my understanding of the framework to see how it can be applied to smaller shops to improve their IT services.
I believe there are four mains areas that can be targeted for implementing ITIL in a smaller IT organization. The four areas are best summarized by the questions they strive to answer - How are we doing? Where does the business want us to be? How do we get there? How do we keep improving?
Step 1 - How are we doing?
What does this require? First, you need to be capturing all incidents in your help desk software. This includes the issues that go directly to your Inbox or the call received on your cell phone on the way in to the office. Additionally, these incidents should be correctly categorized by the faulting component (server, software, hardware, etc) before the incident is closed. That component can then be linked to the service(s) provided to the business.
This process can serve IT by giving you a complete picture of which components have frequent issues and which services are affected by those components. I believe this is a good first step because most organizations already have help desk software in use, but maybe it is not being use to its fullest potential.
Step 2 - Where does the business want us to be?
In ITIL Service Design, there is a key process called Service Level Management. The output of this process is a written Service Level Agreement (SLA) where the business and IT organization come to agreement on key service targets and responsibilities. In this concept, the SLA is not a contract with performance penalties which we are used to seeing with our suppliers. Instead, the SLA is where expectations are set between the business and IT. It is really used to open up a dialog which gives IT a set of boundaries and priorities to work in, and gives the business or customer a certain expected level of service.
For smaller companies, a multi-level SLA is probably most appropriate. In a multi-level SLA, you may set an umbrella SLA that covers all services and may define basic uptimes and availability, response times from your service desk, along with recovery time and recovery point objectives for disaster recovery planning. Additional SLA's can be setup for key high-priority services where different availability levels and response times may be required.
Remember though, that defining an SLA is a two-way street and is not meant for the business customer to make unreasonable and costly demands of IT. You cannot not have an agreement without both sides meeting in the middle.
Step 3 - How do we get there?
With the information provided from step 1 and the expectation levels set in step 2, my guess is that you will have some work to do to close the performance gap on those SLAs. Now, you could spend a lot of time creating measurements to figure out which processes are generating the issues seen by end users; or you could just look in the mirror. The issues are probably caused by you, me, and everyone else that touches and changes our IT systems. Unless you are going to stop making changes to your systems, it is time to start managing those changes in a more controlled manner.
ITIL Service Transition describes the processes for change management and configuration management. This is a huge topic that cannot be covered in a few paragraphs. However to start, I believe you first need to classify the changes made to your systems and weed out the riskiest of the bunch. ITIL defines three types of changes - Standard, Normal and Emergency.
Normal changes are going to be most meaningful changes to your services. This includes developing a patch to an internal application, deploying new servers or replacing existing services. This process is where most errors are introduced and should be more tightly managed. In general, this means having a Change Advisory Board that includes customer input, systems to test changes and perform user acceptance testing, and establishing rollback procedures if the change results in failures. How many of us don't do enough testing or don't solicit enough feedback from the business?
Standard changes are those that are part of a well known process with minimal risk and are usually performed in a automated process. This could include patching your Windows servers with WSUS on a monthly basis or making file permission changes at the request of a manager. The key is that the process is well defined and infrequently causes failures. These Standard changes do not need to go through the full change management process involved with a Normal change and may just require an administrator logging the change in some system.
Emergency changes are changes that need to be deployed quickly to resolve an outage or service degradation issue. Obviously full testing and user acceptance may not be performed here. Additionally, the change may be approved in an expedited manner by management and not involve a full change advisory board.
Step 4 - How do we keep improving?
Ok, so you have your help desk in order, have an SLA in place with your business units and are now managing your riskiest changes through a defined changed management process. Now what? How do we keep improving our services and reducing costs? How do you know your SLA is being met or exceeded? Through Continuous Service Improvement (CSI) you can address this.
In CSI, you start to define measurements and metrics for your various services, process and technologies. The should be taking in a top-down process, first considering the business vision and the value provided by IT towards that vision. From there, which services provide that value and which components (hardware, software, etc) or processes support those services. This means proactively monitoring something like disk usage on a DB server and being able to tie that monitoring data to a service or services which are providing value to the business. It also means keeping statistics on your change management process to see which changes cause failures and why. CSI will allow you to apply resources and money to where it will be most beneficial.
Implementing ITIL is definitely not an overnight process and can take some organizations seven or more years to complete. In addition, for smaller organizations, implementing many parts of the ITIL Service Lifecycle are just not feasible with limited resources and the many hats that most IT staff members already wear. However, as outlined above, I do believe there are parts of ITIL that can bring benefits to even the smallest of IT shops.
ITIL® is a Registered Trade Mark of the Office of Government Commerce in the United Kingdom and other countries.