Key Server Performance Metrics For Actionable Monitoring
On the face of it, providing a web hosting service seems like a fairly straightforward job. A hosting provider owns a collection of servers (essentially, very powerful computers), which it rents to website owners who store their sites on them.
The servers are never switched off, and the websites are accessible 24/7. From then on, all the hosting provider needs to do is cover the electricity bill and ensure that the websites it hosts don’t break the rules.
Of course, there is a lot more to it than that. A server has to provide a stable and secure environment for the websites hosted on it. It needs to be configured and maintained in a way that would ensure that all the applications perform well and are not inhibited in any way. This is an enormous challenge that requires careful planning, coordination, and a lot of know-how.
The number of different metrics that need to be kept in check if the server is to operate correctly is virtually endless, and the hosting provider must make sure that if the server’s well-being is put under threat, its team of technical experts know about it immediately and can react before it’s too late. Today, we’re going to cover some of the most crucial aspects that server administrators monitor constantly in order to ensure a reliable service.
Uptime
This is what interests customers the most. One of the main advantages of doing business online is that unlike a physical office or shop, the website is available 24/7. It’s the job of the hosting provider to ensure that the servers are accessible for as much of the time as possible.
This isn’t as easy as keeping everything plugged in. An extremely complex ecosystem that consists of both hardware and software is essential for the existence of a hosting environment for your website, and inevitably, things do go wrong every now and then. Keeping the outages to a minimum is essential, though.
To ensure that they can take the appropriate actions in a timely manner, server administrators keep a close eye on how long the service has been down for and, ideally, meticulously eliminate the reasons for every single outage in order to minimize the risk of future service interruptions. Generally speaking, an uptime percentage of less than 99% is considered something that should be looked into, and if it drops below 95%, then there’s definitely cause for concern.
Concurrent users and Requests Per Second (RPS)
For many, the best way to check how well the website behaves is to determine how many users it can support simultaneously. Indeed, the number of visitors is the ultimate measurement of how popular a website is, and it should play a key role when evaluating the project’s needs.
During stress tests, administrators simulate a high number of simultaneous sessions in order to get a rough estimation of how many visitors the server can handle at once. The statistics on the number of users should be kept a close eye on because if a website becomes too popular, it could hamper the performance or even bring the entire server down.
The number of concurrent users isn’t directly related to the load on the server, though. For example, a user who clicks lots of links and spends no more than a couple of seconds on each page will put more strain on the server than the one who reads a lengthy article and doesn’t interact with the website in the meantime.
Every single click a user makes generates multiple different requests that need to be processed by the server. If the website is popular, we could be talking about thousands of requests every second. This is the real load the server must be able to deal with.
Too many simultaneous requests could slow down its performance and completely bring it down, That’s why the server administrator must figure out what the maximum number of requests per second is, and they must then monitor this metric closely in order to ensure that if the real-world load gets anywhere near it, they can take the appropriate actions.
Error rates
The greater the load on the server, the greater the chance of users getting an error message. The occasional failure to process a request isn’t really a major cause for concern, but the number of errors the server generates should nevertheless be monitored closely.
More specifically, administrators should be looking at it in relation to the overall number of requests. An increasing percentage of errors could signify a serious problem, and the reason for it should be investigated thoroughly. Server errors generate 5XX codes, and there are mechanisms that can alert administrators whenever a higher number of errors is registered.
Thread count
The percentage of errors that users see could be directly connected to the number of threads that the server needs to process at any given time. During the configuration phase, administrators usually set limits on the number of threads each process can generate, and if that limit is surpassed, requests could be put on hold. If they stay on hold for too long, they will eventually timeout, and the users will get an error message.
Keeping an eye on the number of active threads is an essential part of assessing how much of the server’s capacity is being utilized at any given time, and it could tell a lot about the requirements of the projects currently hosted on it. This can help administrators figure out what sort of changes to the hardware or software configuration they need to make in order to optimize the performance.
System-level performance metrics – CPU and memory utilization and disk usage
We mustn’t forget that a server is, essentially, a large computer. It has an operating system, and processes run on it and utilize the underlying hardware. Monitoring how much of the resources are in use should always be high on system administrators’ priority list.
High CPU or RAM usage can slow websites down significantly, and if the server runs out of storage space, it won’t be able to record new information, which could hinder certain tasks and lead to a lot of frustration for end users.
Most hosting providers will give you easy-to-use tools that can help you monitor these metrics closely. Utilizing them as much as possible is of utmost importance because they can give you information that could be vital for reducing downtime and limiting the impact of a problem that might not yet be visible to everyone. An increased load on the processor and RAM, for example, could mean that one of the projects hosted on the server is taking up too many resources, but it could also indicate a potential issue with the hardware component itself.
Average Response Time (ART) and Peak Response Time (PRT)
You could argue that from a user’s perspective, these are the most important metrics of them all. Whenever you’re visiting a website, you send requests, which the server must respond to. The time it takes the sending of the requests and the response is the actual load time for the website.
Every single interaction with the website generates multiple requests (for the HTML document, the CSS sheets, the images, the JavaScript files, etc.). Some requests take longer to process than others, and when they’re testing a server, one of the main data points administrators look for is the Average Response Time (ART).
It’s calculated by dividing the time it takes to respond to all requests by the number of requests. It’s a good indicator of how well the server performs under load, and if it’s too high, it could mean that there’s a problem.
A decent ART doesn’t necessarily mean that everything’s fine, though. Administrators also record the Peak Response Time (PRT) when they test their servers’ performance with the idea of singling out requests that are taking longer to process. That way, they can identify potential issues more easily.
For example, let’s imagine that you have a server that is seemingly running well and, after being bombarded with hundreds of requests per second, shows a relatively low ART. A close look at the statistics, however, could reveal that some of the database queries are taking longer, therefore creating a high PRT. Even if the overall performance is good, a high PRT could indicate a problem and should be looked into.
Security-related metrics
Customers tend to be more focused on uptime and speed, and they often forget that one of the biggest challenges associated with running a website nowadays is securing it against hackers. Server administrators shouldn’t make the same mistake.
All the work that has gone into optimizing the website and server for the best possible performance and uptime could be undone by a Distributed Denial of Service (DDoS) attack. Server owners must have measures and strict protocols in place that can effectively mitigate any potential attacks before they cause significant downtime.
Sadly, DDoS is far from the only security issue. Dozens of processes run at the same time on a production server, which often means that detecting malicious activity could be difficult. In addition to ensuring that all security patches have been applied, server administrators must have mechanisms in place that can track and log activities related to file modifications and configuration changes. Prevention and early detection are essential in keeping people’s websites safe.
Additional metrics
You’d think that keeping all the metrics we’ve already mentioned in check can guarantee perfect performance, but you’d be wrong. Sometimes, the problems aren’t rooted in the physical machine or its configuration.
Outdated or buggy applications, themes, and plugins can also slow a website down immensely, and there are tools available that can effectively pinpoint the problem. Application performance monitoring is a major part of maintaining both the server and the websites hosted on it in normal working order.
Chances are, the applications installed and running on the server use SQL databases of some sort. Optimizing the connection between the apps and the databases can not only improve the website’s performance significantly, but it can also reduce CPU usage and lower the overall load on the server. Pretty much the same goes for the web server.
Regardless of whether you’re using Apache or one of its competitors, it’s important to ensure that the piece of software that is responsible for processing and responding to all the requests, along with all of its components, is optimized and running smoothly.
Final takes
As you can see, creating a stable hosting environment is far more difficult than setting up a server and making sure that no one powers it off. It’s an incredibly complex continuous process. The technology evolves all the time, and with it, website owners’ requirements change as well. Staying on top of all the shifts and movements in the industry is one of the biggest challenges hosting companies face.
FAQ
Why is monitoring so important?
Web hosting is about a lot more than just renting storage space for some website files and databases. Website owners pay for maximum uptime and optimal performance, and the hosting provider optimizes its servers so that they can offer just that.
This isn’t a set-and-forget process, though. Servers are incredibly complex systems that rely on the correct functioning of many different hardware and software components, and the dynamic online landscape means that the challenges hosting companies face every day are endless. Keeping a close eye on the servers at all times is the only real way of ensuring that they meet the customers’ demands.
Who is responsible for a website’s performance?
A hosting company’s job is to provide a reliable and secure environment for customers’ projects. Keeping servers in perfect working order and ensuring that they are properly configured is the most basic step towards achieving this.
Even the best hosting setup, however, can’t guarantee solid performance if the website itself isn’t configured properly or if it has outgrown the capacity of its hosting plan. Both the hosting company and the website owner must keep a close eye on a number of different metrics if the project is to perform well.
How can actionable monitoring affect a website’s popularity?
Merely monitoring the server is far from enough. Hosting providers must also ensure that the data they gather is correctly assessed and that proper actions are taken based on it. On the one hand, this improves the website’s overall performance, which is considered an important SEO metric and can help with the project’s Google ranking. On the other, a faster, problem-free website will also result in happier users and can lead to some good old-fashioned word-of-mouth marketing.
Can active monitoring help you optimize your websites?
By monitoring their servers 24/7, hosting providers can collect vital information not only about the machine itself but also about the projects hosted on it. If they see a problem, system administrators can inform the website owner and help them fix the issues. This will improve the website’s performance and could help reduce the overall load on the server.