Chapter 14

Know Your Integrations

Summary

A website will often need to communicate with some external system. These situations can be fraught with peril. The key is to know what they are early, and to plan for the unknown.

Does This Apply?

If all of the content and functionality for your site will be delivered from inside the walls of your CMS, then you don’t need to worry about integrating with anything. And this can be true of smaller, content-based sites. However, as your site grows, you’ll likely run into an integration need, sooner or later.

Narrative

If there’s one thing that is universally reviled in all of academia, it’s the group project.

Usually, throughout your time in school, you’re at your limits, just barely managing to get your own stuff done. But then, inevitably, your professor assigns a project to be done with a partner, or – even worse – in teams of three or four. Your heart sinks. Why? Because:

We have to coordinate with other people, which leads to communication issues.
We’re limited by the worst-performing member. Our grade is affected by anyone who doesn’t step up.
The final product runs the risk of being a mish-mash of styles, instead of an integrated whole.

In working as a group, we’re now at the mercy of group dynamics, and we start to lose control of things. This is not something we’re keen to concede, unless we’re the weak link ourselves.

But if you think you hated group projects in school, just wait until you deal with them in a development project. Just wait until you have to invite a third party into your code via integration.

Episode 14: Know Your Integrations (w/ Greg Dunlap)

Corey and Deane discuss the four major parts of a content model. Then, Greg Dunlap, Director of Strategy at Lullabot, joins us to define a web integration, discuss the finer details of development risks and runtime risks — as well as real-time vs scheduled data — and praise the efficiency of using Google Docs as a workflow tool. Corey and Greg give Deane a music lesson, too.

46:53 | Greg Dunlap

Past Episodes

What Is an Integration?

An “integration” is when two systems – meaning, two different pieces of software – have to talk to each other in some way. The two systems have to be … wait for it … integrated.

When building a website, one of those systems will be your content management system (CMS). The other system could be anything.

This is done because some content or functionality is not provided from your CMS. Often, content in your CMS needs to be augmented with content from some other source, or some functionality is provided by some other system and you’d like to make it appear as though that other system works as an integrated (ahem) part of your website. Examples of this include a university course management system, in which course information is pulled from a central source, or “single sign-on,” in which users’ login credentials are connected to a centralized authentication system.

In each of these cases, you’re gluing together two systems to provide a consolidated experience. You’re reaching beyond the CMS and communicating with an external system, which requires making sense of how the CMS responds and incorporates that response into the flow of your website.

This is where the fun starts.

The Challenges of Integration

Much like your group project at school, integrations can be fraught with peril. Many organizations have problems keeping one system running, and now we’re asking it to keep more than one system in good working order, and keep those systems talking to each other.

There are two types of risk:

Development risk: Risk incurred during the development and launch of the project. There is a risk of the two systems being incompatible, or of the work to connect them together taking too long and consuming too many resources. This risk, thankfully, is one-time.
Runtime risk: The ongoing risk of keeping the two systems working together over the long term. Both need to stay running, communicating without unreasonable delay. This risk is often independent of development risk: you can get the systems working together, only to have the partnership break down in production when they lose connection with each other. If you’re really unlucky, this might happen randomly, multiple times a week.

Thankfully, some integrations are so common as to be productized. You’re not the first organization to attempt to link, say, a shopping interface with order processing software like Square, so the communication between them is known and already developed. In fact, several products are specifically designed to be connected, like a digital asset management (DAM) system, for example.

However, in other cases, the integration you need might be the only time anyone has ever tried to glue together those two systems, either because the system you want to integrate is relatively rare, or it’s a one-off custom system that your organization built internally.

For example, a university might have a system that shows the status of washing machines and dryers in the laundry area of a residence hall. The university might want to display this information on the page for each residence hall so students know when there are free machines available.

This integration is likely quite rare. It’s not likely to be productized for even one CMS, much less multiple.

Not to overdo the group project metaphor, but if you have to do a group project, clearly you want to do it with people you know. If you already have established friendships and communication patterns, it makes things easier. The person you have to work with is already in the contacts in your phone, you know their schedule, you know how they relate to other people, and you may have known them long enough that your relationship has already had its ups and down, so you’ve fought, talked it out, and made up multiple times.

Other times, you’ve never met this person. In these cases, you cross your fingers and hope for the best.

Real Time vs. Scheduled

Let’s talk quickly about the nature of content, and how it’s treated by a technical system like a CMS. A large percentage of integrations are intended to combine content from two sources – usually your CMS and some external system. When doing this, a key question is how close to real-time/ instant does this connection have to be?

Content is naturally categorized using the term WORM – Write Once, Read Many. We might publish an article one time, and it is read 10,000 times. Once published, the content doesn’t change. Since it hasn’t changed, there’s ultimately no reason to retrieve it again.

But our content isn’t always this static. We often change content – data is updated, course schedules change, or product prices change. So when we talk about keeping content synced between integrations, we’re juggling three factors of timing:

Communication: can the systems speak to each other?
Velocity: how fast does the content change?
Latency: how quickly do we need to make those changes?

The question becomes whether you require real-time changes or whether you can settle for less frequent scheduled changes. Either one works, but know that all things come with a cost. A good rule of thumb: the more volatile the external content is, the more it will cost to integrate, in terms of schedule, budget, and risk.

How often do the external content change? And how quickly do you need those changes reflected in your content exposed to the public?

Let’s return to our university example from above. Say we have a page for each faculty member. That page has some very subjective, marketing-ish copy like their biography and credentials. This is maintained by the marketing staff, directly inside the CMS.

However, also displayed on that page is more fundamental information like the professor’s email, phone number, and name. This information does not need to be maintained in the CMS, because it’s stored elsewhere. Duplicating the phone number in the CMS would mean maintaining this information in yet another place (the “double entry” problem).

This is a classic case for an integration. Ideally, the professor’s page on the website would be a hybrid of content coming from both the CMS and the internal course management system.

Figure 14.1: The delivery of a faculty page to the requesting visitor is an internal mixing of content from the CMS repository and an external system.

To connect these two systems, let’s consider a combination of technical and logical questions. For any integration, you’ll go through a similar exercise:

Can the CMS and the course management system communicate with each other?
How fast and stable is this line of communication?
How often does the information in the course management system change?
How quickly do we need those changes reflected on the website?
What is the risk of this content becoming outdated?

For the purposes of this example, let’s assume that yes, the web server and course management system are located on the same network and can communicate with each other. But, for fun, let’s say this connection is unstable. The course management system is on an old server, and gets overloaded from time to time when students are registering for classes. The server admin has said – quite gruffly, unfortunately – that you new-fangled kids cannot have on-demand access to this server.

Okay, that’s questions #1 and #2 answered.

For question #3, let’s consider how often this information changes. Professors are normally hired during the summer – not a lot of faculty starts at other times of the year. Also, once a faculty member starts at the university, their phone numbers and email addresses rarely change. The courses they teach do change, but maybe once a year, at most. It has a low velocity.

Highly related is question #4. What are our requirements for immediacy? If a professor did change their email address, how quickly do we need that change reflected on the website? Right away? In a perfect world, sure. But in the real world, a 24-hour delay isn’t going to kill anyone. We can tolerate high latency in our content changes.

Finally, question #5: what is the risk of incorrect content? Of course, we want our content to be up-to-date all the time, but at the end of the day, is anyone going to die if an email address is temporarily wrong? Probably not. There are likely multiple other ways to contact this professor, and the sender would get a returned email to notify them that their email wasn’t received and they should use other means. Clearly, our risk is low.

In this case, we could likely use an “import and update” pattern. We can configure our CMS to import content from the course management system, then keep it updated on a schedule. Once every twenty-four hours, let’s say, in the wee hours of the morning when load is low, a scheduled job would check for new or changed information and update the content in the CMS.

If a professor changed their email address sometime in the afternoon on a Tuesday, it would be wrong for the rest of that day, but correct itself overnight.

Our university example might seem contrived, but this set of circumstances is fairly common. External content sources often move slowly, and the content is low-risk.

However, there are exceptions.

Consider if you provide stock quotes on your website.

This information is highly volatile – it has a high velocity
We need to see changes immediately – we require low latency.
People will be making financial decisions based on this information – it has high risk1.

In this case, you can’t do an import and update. Doing that even once per minute would likely be more latency and therefore more risk than people are willing to tolerate. This type of content simply demands a real-time connection between systems.

Determining if Integration Is Necessary

With all of this talk of latency and risk, let’s ask an important question: are we even sure that integration is even the right choice? Because there are times when it’s not.

In our university example, an integration is probably the right way to go. The content we want – phone numbers, emails, etc. – is locked away in our course management system without a public viewing option, so we have to get that content into our CMS where we can display it.

However, consider the stock quote example. If we want to display stock quotes on our website, we open up a Pandora’s Box of issues. An argument could be made that it’s more trouble than it’s worth.

Removing a Link in the Chain

One option would be to not display stock quotes, of course. We could decide there are too many issues and just throw in the towel on the idea entirely. In this case, we could simply link users to another site to get the quote.

However, another option would be to remove a link in the chain by moving the integration to the client, rather than the server.

What we mean here is moving integration from the website — the situation we’ve largely focused on up until this point, in which a user asks for data from the site and the site then turns around and grabs that data from another service — and taking a link out of that chain by patching the other service directly to the user. We can do this a couple of ways:

An inline frame (an IFRAME) could contain a window into a source’s website that would display the stock quote. For many services, they provide URLs specifically for this.
Some JavaScript could execute in the user’s browser and request content directly from the source.

Technical details aside, both of these methods have done the same thing: they’ve removed one link in the chain. Now the user is bypassing our website and getting content from the source, which is always going to be more direct and involve less risk.

Showing the Seam

On the other hand, sometimes the system you want to integrate with is just too feature-rich and constantly evolving to “proxy” all its functionality into your website.

For example, there are services to which banks subscribe that manage the process of applying for a mortgage. These systems allow users to fill out a lengthy, multi-step application, upload documents, authorize credit checks, attest to facts, etc. They’re so complicated that an ecosystem of vendors has sprung up to build these systems. Banks don’t build them internally because they’re complex and other people have solved the problems.

While it might seem noble and ambitious to build this into your site, it’s not a realistic approach. Initial development would be staggering and costly – remember, the reason this external vendor stays in business is because they’ve built something that’s difficult to replicate, and they evolve it over time with new releases and features.

Sometimes, it’s better to just show the seam. If the customer wants to apply for a mortgage, let them know you use an external service for this, and hand them off. These services will often let you “skin” their tools with your logo and colors, so it has some branding connection.

Would it be perfect if this was on-brand and on-site? Sure. But it’s often not worth the trouble.

The Costs of Integration

Given the volatile nature of integrations, they can be hard to scope. Often, you’ll be trying to integrate one system with another in a combination that you or your implementation partner has never done before.

Thus, the answer to the inevitable question of “How much will it cost?” might very well be “… we have no idea.”

Integrations can often be a whirlwind trip into the unknown. Even if it seems straightforward, these things can turn on the smallest of issues – some minor thing pops up that makes attaining the goal impossible, due to a quirk of how that factor intersects with your particular requirements.

If you’re using an internal development group, this can mess with your timeline. If you’re using an external development partner, both your timeline and your budget are at risk. Problems with unknown integrations have run wild, soaking up tens of thousands of dollars that were intended for something else.

When an external partner is asked to budget for one of these, they’ll generally do one of two things:

Leave the budget open-ended – they might provide a good-faith estimate, but allow for overages.
If a firm, fixed number is demanded, they’ll pad that number like a bad living room sofa from the 70s.

Or, in taking a long view of your problem, you may find one, single integration that’s problematic and unknown, while the rest are completely mainstream. In these cases, you may consider a third option — a two-stage budget: a fixed price on everything but the unknown integration, which is kept open-ended.

Practical vs. Perfection

Remember, everything is a trade-off, and until a CMS is created that does everything you could ever want to do, we’re going to be stuck integrating with other systems. The trick is to be wise about when to do this, and when to just show the seam and have your users get their content or functionality directly from other systems.

Every answer to an integration question is a combination of budgetary, schedule, and technical factors. There’s no blanket approach that works in all cases, so be prepared to consider all the factors when making a decision.

Inputs and Outputs

The input to this phase is an understanding of where everything on your website is coming from. Consider every hope and dream you want for your website – are you going to create all that content (or have it created)? If not, then where is it coming from? You need to be able to answer that question in every case, and the answers to those questions are the output of this phase. You need to be able to articulate those data-sharing requirements to whoever is scoping and developing your website.

The Big Picture

This phase can sort of be lumped into scoping and budget, but it needs to happen just before that – you need to go into that process with an understanding of where all your content is coming from. To scope anything, a developer needs to know the technical factors they’ll be juggling. So, the project can’t be scoped for schedule or budget until this is complete.

Staffing

The logical component of this can be done by a content strategist. As part of the planning process for the website, whoever is in charge of content should naturally know where this content is coming from, whether internally to the CMS, or externally from somewhere else. The scoping component of this is technical by nature, and needs to be done by a developer responsible for figuring out the development schedule and budget.

Resources

Articles

“The ‘Import and Update’ Pattern,” Gadgetopia, by Deane Barker
“Using Proxy Content Objects for Non-CMS Content,” Gadgetopia, by Deane Barker

Books

Enterprise Integration Patterns by Gregor Hohpe and Bobby Woolf

Presentations

“The Future Might Be Distributed” by Deane Barker