In IT, a single point of failure is that part of the business infrastructure, network, server, or other business system that would bring the entire company or system to a halt if it malfunctioned or became compromised. Single Points of Failure need to be minimized in any system that require a high level of reliability or availability.
You can make an IT system more reliable simply by adding some kind of redundancy to all potential Single Points of Failure. So for instance, if you have a small business with only a small network with one router, if that router fails, the whole network is down, so keep a spare router handy. Other examples would be:
- Have one or more backup servers in place beside your main servers, ready to bring online should your main servers go down
- Have a backup Internet connection in case your primary goes offline
- Have a power generator in place to power your systems in case of a power outage
Some systems though, are more difficult to add redundancy to, especially those companies with limited resources. These systems include complex applications, and security. In the case of applications, you can have a secondary copy of your application on a backup server, with it’s own database that is synchronized with your main application so its always up to date, and with security, you can likewise have a second, offline security system, ready to bring online if required.
Identifying potential Single Points of Failure involves assessing your IT systems and locating the critical components of each particular sub-system within your IT infrastructure. Ask yourself, if this component fails, will it:
- Disrupt your users, or prevent them from using the tools to do their job
- Prevent one or more departments from being functional
- Bring the entire network down
If any of these are true for any particular sub-system, then you’ve identified a Single Point of Failure.
With a multiple server environment, such as a major support center, or a data center, its quite possible that every server may have components or software that is a Single Point of Failure. in this case, for full redundancy and high-availability, you may need to replicate the entire server cluster, and add a load balancing component to keep service levels high.
And of course, in all situations, you should always make sure that your backup systems are working correctly, and that you have up-to-date – or at least recent – backups available, of all of your data and system configurations, in case the worst should happen.
Contact TechPoint for more information on Single Points of Failure, backup strategies, redundant systems, and redundancy.
