System Cutover Planning
The team has worked like Trojans and have built a system
that satisfies all the business requirements, tested it and verified that it
meets all the quality standards set for it, and they’ve delivered it on time.
Now you can sit back and congratulate yourself on what a great job you did
managing this project. Well not quite…… Your job isn’t done until the new
system is in production and responsibility for support has been turned over to
operations. To get there you need to plan the perfect cutover.
Is the System Ready?
Before planning the cutover there are some questions that
need to be answered, such as "What will the system perform like in the
production environment?”, "Do we have all the data we need for production?”,
"Can the new system support all the users who need to use it?” These are
questions that should all be answered before you are ready for a production
cutover. Let’s tackle the questions in order. The answer to the first question,
obviously, should be "it will perform just as well as it did in the test
environment” You can answer the question this way because a complete round oftesting was performed in a test environment which duplicates the production
environment in every way except the presence of the users. This environment is
sometimes called the staging environment. The server which runs the system
should be a clone of the production server. If the system runs in a distributed
environment, both host and client should be cloned on a network that duplicates
the production network. Frequently our new or updated systems must run in an
environment with a standard operating system and additional "off the shelf”
software. Operating systems and OTS software in the staging environment should
be the exact duplicate of what they will be in the production environment and
all software versions should be the same. Ensure that any patches that will be
applied to the production system are also applied to the staging environment.
Should your new system require an updated operating system
and/or updated ancillary software ensure that you will be installing the same
operating system, ancillary software and patches on the production environment
as on the staging environment. Testing your new or updated system on the same
hardware and software it will run on in the production environment is a
critical part of testing. Frequently software will behave differently depending
on the hardware/OS combination it runs on. The system may work on the new
combination but may behave differently depending on the environment so the full
suite of tests should be used for testing in the staging environment.
Most systems nowadays require some information to be stored
and retrieved. This may be minimal, such as a set of userids and passwords for
user login, or it may be extensive requiring a large relational database. Data
that must be propagated from the previous production environment must be
identified and a plan crafted for capturing it and installing it in the new
system during cutover. In the meantime, testing in the staging environment
should be done with a data set that simulates the production environment as
closely as possible. The data should mimic production as closely as possible in
the areas of volume and distribution. This is usually accomplished in systems
with a large relational database by taking a snapshot of the production data,
translating it to the new data format, and loading it into the staging
environment. This translation process is a key to the cutover. During the
cutover, the process used to translate for loading into the staging environment
must be repeated to port the data to the new production environment. Not only
does the process need to be duplicable, it needs to be streamlined so download,
translation, and load occur quickly.
Your project may or may not have delivered performance
improvements. If it did, these improvements need to be verified in the staging
environment. If no performance improvements were required, it should perform at
least as well as the old. Testing should include measuring the performance of
frequently used functions under load. For example, if the maximum number of
users the system must support is 1,000, how quickly is the 1,000thuser logged in? Benchmarks should be specified in the areas of performance, load,
and stress testing and testing against these benchmarks should be performed in
the staging environment. Only when all the benchmarks have been met or exceeded
are you ready for cutover.
Are the Users Ready?
The system may be ready for the users but are the users
ready for the system? New systems generally deliver new functionality which the
business community needs to meet new market demands, a need to reduce effort,
performance improvements, etc. Users must be readied so that they can take
advantage of the new system as soon as it is activated. New functionality
generally means training the user community but beyond this, they must be
communicated with so that they know when the new system is implemented. Cutting
over without notifying the user community, even a trained user community, will
result in a deluge of calls to support.
Cutovers require a window during which neither new nor old
systems are available. This is especially true of systems which use large
volumes of data. Data must be frozen so that it can be downloaded from the old
system, translated, and uploaded to the new system. Users will not have access
to the data during this time so should be notified so that they can plan ahead
for the period of inactivity. Cutting the new system into service during normal
working hours without notifying the user community will certainly trigger a
deluge of calls to support and may cause more damage because a deadline isn’t
met. Your cutover will include a bulletin before the shutdown of the old system
but your user community should be educated in the cutover process well in
advance of the bulletin.
The Cutover Plan
Work that can be done without disrupting the production
environment should be done in advance of the cutover. Tasks such as hardware
installations, database installations, OS installations, software installations
should all be done in advance of the actual cutover. The cutover plan needs to
identify and schedule all the activities that must occur at the time of
cutover. Cutover of new systems which replace existing ones will typically
require the cutover to happen during off-peak hours. The time should be
identified by your plan. Zero hour marks the start of your cutover activities.
The plan should include the activity to be performed, the responsible prime,
and the amount of time allotted to the task. Identifying a backup to the prime
will give you an extra layer of security.
The first task will be the bulletin notifying the user
community that the shutdown will occur at zero hour. You may want to issue
several bulletins to ensure that all users receive the notification (i.e. users
that log on 30 minutes before shutdown won’t receive the bulletin you send 1
hour before shutdown). The next task is the download of any data from the
production environment. The download, translation, and loading of data should
follow the procedure defined during testing.
There are several ways of approaching the cutover: provide a
new production environment and retire the old one, use the existing production
environment, and swapping the staging and production environments. Which
approach you take will determine your next steps and will be influenced by how
much money the organization has to spend on the system and how mission critical
the system is. Providing a new production environment and retiring the old or
swapping the staging and production environments will allow you to perform
activities like hardware installation, database installations, software
upgrades, etc. before the cutover. Reusing the existing production environment
will likely require you to perform these activities during cutover. Each
activity should be proven on the staging environment and timed so that a
reasonable duration can be estimated for your cutover plan. Since the goal here
is to limit the amount of down time, try to schedule as many activities in
parallel as possible without risking failure. Once the production environment
has been upgraded you can load the translated data.
The next step should be a "smoke test” of the new production
system. The smoke test should be thorough enough to ensure that no cutover
steps have been missed but streamlined enough so that it can be performed in a
relatively brief duration. User logins are always a good candidate for smoke
tests. Any work that is performed frequently by users is another.
The last step will be to perform any OS updates necessary to
point users to the new system and notification that the system is ready. This
bulletin is also an opportunity to inform users of any changes in the system,
such as version numbers, new features, etc. Make the new version number obvious
and point users to any documentation describing the upgrade in your bulletin.
Arrange to have the new system monitored for a time after cutover. Since most
cutovers occur during non-peak usage, the monitoring should last at least until
the system experiencing a peak usage situation, for example a Monday morning.
Your plan should always be tested on the staging environment
to ensure that it is complete (i.e. all necessary activities are identified)
and that durations are reasonable. If the team has trouble completing a task at
10:00 am on a Tuesday when no-one is breathing down their necks, they will fail
when they attempt it at midnight on Saturday with the VP Operations looking on.
Rollback Strategy
Remember I mentioned that a smoke test and monitoring are
essential parts of the plan? What happens if the smoke test fails or monitoring
reveals an unacceptable degree of system degradation during peak usage? The
answer is a rollback. The rollback restores the previous system and data to the
production environment and is like another cutover. The rollback strategy will
depend on the cutover approach used. Does the production environment need to be
altered to roll back or is it simply a matter of installing the old database
and data in the staging environment and pointing users to it? The rollback
strategy should be tested with the cutover plan to ensure that it works. This
may seem like a lot of work, especially since you will likely have to roll the
staging environment forward again before cutover, but it is well worth the
effort especially on mission critical systems.
And Finally
The cutover plan, including the rollback strategy, should be
reviewed with SMEs from previous cutovers and the support group. SMEs from
previous cutovers are a valuable source of information about what works well in
your organization and what doesn’t work well. Lessons Learned are another
valuable source of information but the authors of the lessons are even more
valuable. The support group will bear the brunt of any slip-ups in planning or
executing the cutover so should feel comfortable that the plan hasn’t missed
any steps and that execution will deliver them a supportable system when the
handoff happens post cutover. The
cutover plan will be a key deliverable at the Gate Review meeting, or pre-Gate
meeting, held to determine cutover readiness. The cutover plan won’t seem like
an important deliverable during the rush to complete the project by the deadline
but paying attention to the details of that plan well in advance of the cutover
will be well worth your effort. Failure to pay attention to this critical
deliverable will spoil all the hard work the team has expended to get you to
this point. No matter how well the team performed during the rest of the
project, a disaster at cutover time will be remembered long after the good work
is forgotten. A bad cutover is like a fumble on the goal line at the Super
Bowl. The running back may have gained 150 yards during the game but will be
remembered for costing his team the game with his fumble.
|