Chapter 17

Plan for Hosting

Summary

A website needs to be deployed in a public environment to function correctly. You need to determine the technical, organizational, and financial parameters of this hosting.

Does This Apply?

To be honest, there’s a high chance this doesn’t apply to you. If you have an IT staff that manages all your infrastructure, then they’ll manage most of this. An external development team that’s building your website will often also manage it, so they might handle all this as well.

However, even if this chapter doesn’t apply, you might find it interesting as it covers some of the underlying technologies and architecture of how the internet works. You will be affected by hosting at some level, so you may as well know how it works.

Also, this might apply to your budget. Even if you don’t have to deal with hosting at a tactical level, your organization might want to take it out of your budget, so it might be worth a read.

Narrative

Back in the days when stock markets were in-person, a trader needed a “seat” on a market. These were the days where traders would get together in crowds around pricing boards and yell out dollar amounts to try to find someone to trade with.

Seats on the New York Stock Exchange were closely guarded and very expensive. When the practice ended in 2006, a seat cost about $3.5 million. Traders would risk everything just to scrape together enough money to effectively buy a place to work. Just having the seat, of course, wasn’t enough – that was just the price of entry to work.

And that matters. To do anything, you have to have a seat at the table. You could be the greatest in the world at something, but if no one knows about it, that’s not going to do much good.

The same is true of websites. Your website has to be somewhere in order to be available to the world. It would be the greatest website in the world on your local hard drive. It’s not going to do you much good there.

At their base level, websites are just a collection of files. Those files have to be copied to a computer somewhere, in a format the computer can understand, and that computer needs to be hooked to the internet and configured in such a way so when a visitor types your domain name into their browser, the mighty power of internet routing directs them to your website on that computer in such a way that it can match up their request to some resource and return the correct response.

That’s the long version.

Here’s the short version: your website needs to be hosted.

“Hosting” is a word that represents all the stuff it takes to make your website less theoretical and more actual. It’s the processes and infrastructure to make your website available to visitors.

Depending on your staffing, organizational, and contractual scenario, you might be involved in this a great deal, not at all, or any gradient in between. You might depend on an internal data center within your organization. Or, you may depend on a subscription service in which you never need to worry about a thing. Regardless of whether you handle it, or someone else handles it, or some company in Seattle, Washington handles it, your site will be hosted.

In this chapter, we’re going to talk about hosting in general, and a lot of hosting-related topics. You might never come into contact with any of this stuff after you finish reading this. Or you might.

The key is: you need to find out.

To go back to the original point: your website has to be somewhere. So you need to get some questions answered before you have any chance of actually unleashing your magnificent creation onto the world.

Episode 17: Plan for Hosting (w/ Elias Lundmark)

Corey asks Deane a brutally honest question: as non-developers, why should we care about hosting at all? Then, Elias Lundmark, product manager for cloud hosting at Optimizely, joins us to talk about website hosting in common terms — cloud versus on-premises, the reality (and politics) of “five 9s,” and the things you need to understand before choosing a hosting provider or vendor offering.

35:29 | Elias Lundmark

Past Episodes

The Hosting Account

We’re going to roll up a lot of functionality and discussion into the idea of a hosting account. This is a service you purchase from some provider that allows you to run a website and expose it to the internet.

Hosting accounts have varying capabilities. At the bare minimum, a hosting account is going to provide the ability to store a set of files and the ability to make those files available at specific URLs1.

Beyond that, let’s talk about some of the other things you might need to know.

Programming Language

Briefly, let’s cover two terms:

Request: data your web browser sends to a server
Response: data the server sends back to your browser in response to your request

When your web browser – perhaps your trusty copy of Firefox or Chrome – requests an image file, for example, the server responds with a set of bytes that represent that image, along with some other hidden data such as what type of image it is. This is a request-response.

The simplest request-response is:

Browser: “Give me this file.”

Server: “Here are the contents of that file.”

But not everything is that simple. Sometime right after the web was born, someone got the idea that we might do things to those files before we send them back. So instead of sending requests to actual files, some decided to send them to programming scripts that would execute on the server and return the results.

Now we have:

Browser: “Execute a script based on this URL.”

Server: “I have executed that script, and here are the results.”

(In reality, when your browser makes the request, it just has a URL which is a series of characters – /products/, for example. It has no idea if that represents a file or a script. It’s the server’s job to figure this out.)

Flash forward twenty years, and this is largely how the web works. URLs point to scripts just as often as they map to files. Sometimes those scripts are an actual file (like /news.php), and sometimes they’re just some designation that a larger process reads in and uses as input (like /news/topics/africa).

Most every hosting account is going to be able to execute some language. Different accounts will service different languages. Common languages include PHP, .NET (pronounced, “dot net”; also commonly referred to as “C#” or “C-Sharp” for reasons that aren’t important), Java, Ruby, or Python.

Whatever CMS you select will be based on some language, and it will need a hosting account capable of understanding and executing that language.

Databases

Databases are places to store and retrieve information, and they function as very complicated and multi-layered spreadsheets. If you have 10,000 news articles, for example, you could put those in a database for safekeeping, and then query the database to return certain articles based on criteria. We might ask questions like this:

“Give me all the articles.”
“Give me the last twenty articles that were written, ordered by date, newest first.”
“Give me all the articles containing the word ‘economics.’”

Refer back to the programming language discussion above – any one of those scripts, when executing, might contact a database, get some data back, and then incorporate that data into its response. Common databases include MySQL (pronounced “my sequel” or “my ess que elle” – no one really agrees), SQL Server (see a trend there? SQL is a common database query language), and Oracle.

Every CMS is based on a database of some type. In fact, a CMS might be considered a super database that wraps a raw database in another level of functionality for people who want to manage content.

The “Technology Stack”

The programming language of the hosting account, combined with the database it supports, comprises what’s known as the “technology stack.”

It’s called this because every CMS runs on an underlying “stack” of technologies. At the lowest level, there’s the operating system of the computer itself. Then the web server that runs on it. Then the programming language that executes. Then the database that’s included. And then the CMS sits on top of all of these.

Here’s an example for Optimizely:

Programming Framework: NET MVC
Programming Language: C#
Web Server: Internet Information Server
Database: SQL Server
Operating System: Microsoft Windows

Each piece builds on the one “below” it. Optimizely requires all the pieces in this stack, from top to bottom.

Almost every CMS requires a specific technology stack, and this tends to be a rigid requirement, which means you’ll need to acquire a hosting account that supports the corresponding technology stack.

Shared Hosting vs. Owned Hosting

The hosting accounts we’ve been talking about are quite simple. They’re what’s called “shared hosting,” where you buy an account on a server that hosts a bunch of other websites, not just yours. You’re buying an apartment inside a bigger apartment building with other tenants.

However, some systems require more than that. Some systems need to “own” the entire computing environment. They have hooks and tendrils that need to dig deeper into the hosting system than just a small, shared hosting account will provide. The technology stack for these systems requires you to have administrative control of an entire server.

Acquiring a Hosting Account

Before you worry about how to get a hosting account, you should find out if you even need to do it. Two reasons why you might not need to care at all:

Many CMS platforms come with hosting built-in (they’re “in the cloud”)
If someone is building your website for you, they can sometimes host it when it’s launched

Those two scenarios probably represent the majority of situations. It’s uncommon for someone to build a website with absolutely no guidance on where it’s going to live in the long-term.

If you’re working for a large enough organization, there’s also a chance that your IT department wants to host the website on their own servers in their own data center. This is becoming less common, as IT staff like the idea of the website being someone else’s problem, but you still see this occasionally in healthcare and finance where there might be privacy and security concerns.

If none of those apply, you will need to find a place to host your website. Thankfully, compatible hosting accounts are easily purchasable from many vendors, unattended, with nothing but a credit card.

At the higher end of the spectrum, if you have a massive site serving lots and lots of traffic, you’ll probably tend towards a large enterprise cloud company, such as Amazon Web Services (AWS) or Microsoft Azure (“az-sure”, like “assure” but with a soft “z” sound in the middle; also, some people emphasize the two syllables, while some run it all together).

These two companies provide immense computing platforms that can scale2 easily and quickly.

For example, if you run a TV commercial during the Super Bowl, you could expand your website capacity by a factor of 10x in just minutes, then reduce it later when the traffic has died down. This ability to quickly increase capacity is known as burst scaling.

Unfortunately, this can also be quite expensive and complicated. If you have a website that needs this level of hosting, then you likely need to find a qualified infrastructure architect who can put together an environment for you.

If your website isn’t quite so large, then thousands of hosting vendors can provide you with hosting accounts on various technology stacks. And, there are a handful of companies that have platforms specifically designed for the CMS you want to use — typically open-source, such as Pantheon for Drupal or WP Engine for WordPress. We’re not endorsing anyone, but just be aware that many hosting vendors claim to have special expertise in particular platforms.

Hosting vendors will price their plans on several different axes, all supposedly based on how much load you’re going to put on their servers.

Number of inbound domain names
Whether you need Secure Sockets Layer (SSL)3
Number of databases
Amount of traffic in and out, or necessary bandwidth4
Number of user accounts to transfer files
How much storage space you need for files

They usually sell these things in packages. Something like “basic” gives you certain amounts of each; “business” gives you a little more; “enterprise” even more, etc. (They all love three package levels, for some reason.)

When you create a hosting account, you’ll be offered multiple ways to actually get your files out there. These might not mean anything to you, but they matter to your development team:

Manually push using File Transfer Protocol (FTP or SFTP)
Deploy directly from public source control systems like GitHub
Synchronize against file storage systems like Dropbox or Amazon S3
Publish directly from code environments via plugins to those tools

We’ll talk a little more about deployment in Chapter 20: Implement the Back-End Functionality.

Once your files are in your hosting account, and the domain name is attached to the account (we’ll discuss this soon), then you technically have a functioning website available on the internet. Congratulations.

Uptime, Capacity, and Reliability

Your website needs to stay available, and hosting issues can severely affect that availability. Think of this in terms of errors: if you make a mistake on one part of your site’s code, that problem might take down a page or a set of pages. With hosting, however, those mistakes or errors are absolute – the website just goes away completely.

Reliability in hosting is known as uptime – literally, how often is the server “up” and available for connections. The opposite is downtime. Hosting providers don’t really split hairs about uptime or downtime. The server is either up or down, and they don’t really do any shades of gray.

Sometimes, they might have a “slowdown” based on some external factor. For instance, maybe their upstream connection to the broader internet is congested with traffic5. They might announce this as a service degradation, but more often they just do their reporting and service quality in terms of uptime and downtime.

Uptime is reported in percentages, representing how much time a server was available during the year.

How often should your server be up? Well, 100% of the time, of course. But you usually can’t get to 100% on a single server. Servers have to be rebooted occasionally. They need maintenance. Hard drives might fail. There are lots of things that could go wrong.

Here are some uptime percentages and the amount of downtime they would represent in a single year.

99% means the server would be down for almost four full days per year
99.9% is about nine hours of downtime
99.99% is just under one hour

99.999% is about five minutes
99.9999% is just 30 seconds downtime per year

In the hosting industry, these different levels of reliability are known as “nines,” as in “how many nines of uptime do you offer?” Clearly, the more nines, the less downtime6.

What’s acceptable? It depends on how important your site is and when the downtime happens.

If you have a small campaign microsite, targeted towards North American business users who are usually active only during business hours, and the downtime can be scheduled overnight, then perhaps a total of nine hours of downtime spread throughout the year is okay for you, and three nines (99.9%) is acceptable.

But if you’re Amazon, this isn’t gonna work. Amazon pulls in tens of millions of dollars in revenue from all over the world every hour (every minute, even). Additionally, their website isn’t an optional part of their business – no website, no money. Amazon can never be down.

Is it possible to achieve perfect, constant uptime? That depends. How much money do you have?

Each “nine” you add to your minimum requirements will add costs. Remember that servers will always need to go down for occasional maintenance, so if you need “high availability,” your website has to be spread across multiple servers and synchronized. This allows individual servers to go up or down without the website being affected. Something called a load balancer sits in front of all these servers (called a server farm), and it knows which servers are up or down, and it routes requests only to the functional servers. A server could go down, and the load balancer would just stop sending it traffic until it came back up.

Additionally, hosting companies need to guard against entire data centers going down, which means they will synchronize across continents, so that if the East Coast of the United States somehow goes offline, you can route traffic to the West Coast.

The difference between four nines (99.99%) and five nines (99.999%) is usually where costs begin to creep upwards markedly. Six nines (99.9999%) is where things start to get very expensive. From that point forward, the annual hosting costs might start to overshadow all the other costs associated with the project.

Bells and Whistles

What we’ve explained above is the bare minimum you need to have a website connected to the internet. But there are some extra things to consider with hosting:

Caching

When you ask for a specific page — making a request to the database to populate some template — know that this takes work. It consumes CPU cycles. It takes time — sometimes enough time to make a website feel sluggish. Which makes it beneficial to save a request that’s already been made — rather than pull that information a second or third (or hundredth) time, you can ask for it to be saved for the future.

These saved responses are called a cache (sounds like “cash”) and the action of doing this is called caching7.

Caching is performed at all levels in computing – the CPU inside your computer even caches certain data to make it run faster. For a website, caching could take place at multiple points.

The CMS might hold onto content that it has retrieved from its own repository.
The web server might hold onto responses it has generated8.
Some infrastructure in front of the web server might also hold onto responses; requests are sent through these dedicated caching servers first and might be handled at that level, and never even touch the actual server.
The visitor’s browser might hold onto data it has received and not request a new version for a period of time.

The downside to caching is that a cache can “get stale,” which means the underlying content has changed, but since an older version was cached, visitors are still seeing that instead. To prevent this, systems have methods to invalidate their cache, which is a fancy way of saying they’ll delete the cached data so that it has to be requested from scratch (and then they’ll re-cache the fresh data).

Caching can make a website very fast, but sometimes stale cache can cause strange issues where two people see different things.

Monitoring

Let’s face it: most of the time, you’re just trusting that your website is running. You don’t know this for sure unless you go look. If your website mysteriously goes offline every night at 11 p.m. and comes back online at 4 a.m., and if you’re not awake and looking at the website at those times, how would you ever know?

This is where a monitoring service comes in. There are subscription-based automated services that will check your website multiple times a day – every hour, every minute, whatever – to make sure it’s functioning.

Monitoring systems can check lots of different things. At the highest level, they can simply make sure the website responds, which would verify the server and network connection are up. Going deeper, some systems can monitor at a deeper level, actually replicating user testing. They can request specific pages, virtually click on various navigation options, and submit search forms, all in order to verify that the site is running smoothly.

In QA and testing parlance, the usage of these checks is called test coverage. Ideally, you would create automated tests to give you maximum coverage – to automatically explore every nook and cranny of your website on a timed schedule. The corresponding downside is the expense and the effort, both to create the tests and to change them when the underlying functionality of content changes in such a way that it makes the test invalid.

Location and Recovery

Hosting can be spread across multiple servers or even multiple locations.

Content Delivery Networks: On a high-volume site, sometimes it’s helpful to serve rarely-changing static assets from another server. By doing this, you relieve your main server from the workload of something it doesn’t really need to do, improving site performance
Data Residency and Sovereignty: The location of data servers might also come into play for legal reasons. Occasionally, you need to be concerned with where the actual servers storing your data are physically located, especially in the case of collecting personal information. Different laws might apply to data stored in different places.
Disaster Recovery (DR): If a disaster strikes and your entire hosting infrastructure goes offline, how quickly can you recover.

Disaster recovery is the process in which your entire website is constantly replicated to another environment (called, appropriately, a DR environment or DR server), which means you always have a duplicate version of your website on standby, ready for traffic to be routed to it at any time.

Domain Names and DNS

Earlier, I congratulated you for getting your files out to your hosting account and “getting on the internet.” But I skipped an important part: your domain name. There’s a step where you have to give the world your web address. Specifically, you need to point your domain name – www.myawesomewebsite.com – to the website you built.

We’re going to cover this at a pretty high level.

Computers on the internet are actually identified by numeric labels called IP addresses9 – something like, 12.34.56.7810. Given that address, the magic of the internet can send your request to the computer to which it’s assigned.

But humans can’t remember those, so someone came up with the idea of using easier-to-remember text labels as stand-ins for IP addresses. These text labels are called domain names.

So, your domain name of www.myawesomewebsite.com is actually mapped to an IP address. It’s resolved to this address by way of the domain naming system (DNS). When you type your domain name into your browser, it contacts the global DNS system to find the IP address to which that domain name is assigned, then sends your request to that computer.

It isn’t a one-to-one match. A server might have a single IP address, but be serving 10,000 websites at the same time. This still works because requests for all of those 10,000 websites will come into that server bearing the domain name they want. That server can evaluate that domain name, then send the request to the files of the hosting account configured for that specific domain name.

So the process for making this all happen takes all these steps:

You need to acquire a hosting account and determine what its IP address is.
You need to acquire a domain name by purchasing it from any number of vendors who sell them.
You need to configure a DNS record that tells the global DNS system what IP address that domain name should map to.
You need to configure your hosting account – which exists at the IP address from step #2 – to handle inbound requests for that domain name.

As I said before, this is wildly simplified. Hosting vendors often combine the domain purchasing and hosting process. You can get a domain name and hosting account in one transaction, and the vendor will handle all the mapping for you automatically.

What this also means is that if you’re rebuilding your website, launching it often just means changing where your domain name points. You might build your new website on a new hosting account, serviced by an alternate domain name, like new.myawesomewebsite.com. When it’s time to launch, you just change where the DNS record for www.myawesomewebsite.com points, then shut down the old hosting account (which shouldn’t be receiving traffic any longer).

DNS takes some time to change. Since the DNS system is contacted a lot, DNS servers will cache11 the mapping between domain names and IP addresses. They might check for changes once an hour, or once every twenty-four hours. So when you change where a domain name points, it sometimes takes a while for everyone to see the new website, and you can have situations where someone in one part of the country is seeing the new site, and someone somewhere else is still seeing the old site.

Even if none of this interests you, there’s something you need to be aware of: for you to launch your new website, someone in your organization will likely need to change a DNS record. If this is the case, find out who this person is.

There’s nothing more frustrating than getting ready for a big launch … only to find out that no one knows who has access to re-configure the domain name, or that you didn’t allow enough time for the DNS cache to update. The person who can do this – likely a system administrator of some kind – needs to be acutely aware of your launch schedule and be on standby to make the required change.

Inputs and Outputs

Before you discuss hosting, you need to know what technology stack your website will run on. This means that you need to select a CMS. Additionally, you need to determine if hosting is even your problem. It might not be, for reasons discussed in this chapter. If hosting is something you need to manage, then the output of this process is an acquired hosting account, ready to receive the finished website.

The Big Picture

You’ll need to have a development environment – a place to build – figured out before you can start building your website. And clearly, you’ll need to have a production environment – a place to store the finished site – set before you can launch.

Staffing

If you have them, you probably need to involve your IT staff in these discussions. There’s a lot of stuff here that might be affected by your organization’s IT policies. Ideally, you can hand this problem off completely to someone on that side.

Resources

Articles

“Load Balancing,” samwho, by Sam Rose
“An Unofficial Guide to Whatever-as-a-Service,” Gadgetopia, by Deane Barker