Websites have a shelf life of about 5 years, give or take. Once a site gets stale, it’s time to update. You may be going from one CMS to another, i.e., WordPress to Drupal, or you may be moving from Drupal 6 to Drupal 8. Perhaps the legacy site was handcrafted, or it may have been built on Squarespace or Wix.
Content is the lifeblood of a site. A developer may be able to automate the migration, but in many cases, content migration from an older site may be a manual process. Indeed, the development of a custom tool to automate a migration can take weeks to create, and end up being far costlier than a manual effort.
Before setting out, determine if the process is best accomplished manually or automatically. Let’s look at the most common concerns for developers charged with migrating content from old to new.
1. It’s All About Data Quality
Old data might not be very structured, or even structured at all. A common bad scenario occurs when you try to take something that was handcrafted and unstructured and turn it into a structured system. Case in point would be an event system managed through HTML dumped into pages.
There's tabular data, there are dates, and there are sessions; these structured things represent times and days, and the presenters who took part. There could also be assets like video, audio, the slides from the presentation, and an accompanying paper.
What if all that data is in handcrafted HTML in one big blob with links? If the HTML was created using a template, you might be able to parse it and figure out which fields represent what, and you can synthesize structured data from it. If not, and it's all in a slightly different format that's almost impossible to synthesize, it just has to be done manually.
2. Secret Data Relationships
Another big concern is a system that doesn't expose how data is related.
You could be working on a system that seems to manage data in a reasonable way, but it's very hard to figure out what’s going on behind the scenes. Data may be broken into components, but then it does something confusing.
A previous developer may have used a system that's structured, but used a page builder tool that inserted a text blob in the top right corner and other content in the bottom left corner. In that scenario, you can't even fetch a single record that has all the information in it because it's split up, and those pieces might not semantically describe what they are.
3. Bad Architecture
Another top concern is a poorly architected database.
A site can be deceptive because it has structured data that describes itself. The system could find stuff as each element was requested, but then it is really hard to find the list of elements and load all of the data in a coordinated way.
It's just a matter of your architecture. It’s important to have a clearly structured, normalized database with descriptively named columns. And you need consistency, with all the required fields actually in all the records.
4. Automated Vs. Manual Data Migration
Your migration needs to make some assumptions about what data it’s going to find and how it can use that to connect to other data.
Whether there are 6 or 600,000 records of 6 different varieties, it's the same amount of effort to automate a migration. So how do you know if you should be automating, or just cutting and pasting?
Use a benchmark. Migrate five pieces of content and time out how long that takes. Multiply by the number of pieces of content in the entire project to try to get a baseline of what it would take to do it manually. Then estimate the effort to migrate in an automated fashion. Then double it. Go with the number that’s lower.
One of the reasons to pick a system like Drupal is that the data is yours. It's an open platform. You can read the code and look at the database. You can easily extract all of the data and take it wherever you want.
If you’re with a hosted platform, that may not be the case. It's not in the hosted platform’s best interest to give you a really easy way to extract everything so you can migrate it somewhere else.
If you're not careful and you pick something because it seems like an easy choice now, you run the risk of getting locked in. That can be really painful because the only way to get everything out is to cut and paste. It’s still technically a migration. It's just not an automated one.