Auditing Drupal Sites Through The Layers of the Stack

Layers of mountains

This is Part 3 of a series on How to do a Site Audit. You may want to start with Part 1.

In part 2 of this series I explained why you'd want to organize a site audit report by business value or ‘concern.’ For your own process as an auditor, it makes sense to look through each layer of the site, going potentially as far ‘down’ as the server’s operating system and perhaps as far ‘up’ as the tone of the site’s messaging. Along the way you’ll pass through server-side technology including the web server and database into server-side code then up to client-side code, the site’s administrative backend, and the end user experience. Let’s look at some of the things you should look at on some key layers: the server, database, core and contributed code, custom code, backend and frontend.

The Server

Many large Drupal sites these days are hosted on standard platforms like Acquia or Pantheon. When this is the case you don’t need to do your own investigation of the security, performance or recovery strategy for the hosting. The more non-standard the hosting is the more likely it is to have issues.

Common site audit findings on the server layer:

  • Insecure file permissions. The Security review module will help identify these if you are able to run it on the live environment.

  • Lack of an opcode cache impacting performance. As newer versions of PHP include OPCache, this is becoming less common.

  • Lack of an HTTP proxy for caching. Unless the site gets mainly authenticated user requests, it should have Varnish or other caching of requests.

  • Lack of version control on the codebase (or changes on the codebase that are not in version control).

  • Poor security practices for SSH. Use of passwords instead of SSH keys, use of insecure passwords, and the fact that they just sent you the SSH credentials via email are all common.

  • Unsecured services. For example they might be using Apache Solr as a search index but have not configured their firewall to protect that service from the public.

  • Lack of recovery strategy. No automated backups and backup rotations, including storing backups on a separate server and separate server location (and no, a repeating Outlook reminder to go do a backup is not an automated backup system).

  • The OS is behind in security updates, or the OS is no longer supported by security updates.

What About The Database?

The Drupal application mostly manages the database, which is why it’s rare to have dedicated DBAs involved in Drupal projects. Still you’ll want to poke around a copy of the database just to make sure you don’t find anything odd.

Common findings in MySQL:

  • Huge sessions table. This may be caused by an incorrect Apache configuration.

  • Tables that correspond to modules that are no longer present.

  • Huge variables table, possibly also including settings from modules that are no longer present.

  • Tables which are MyISAM instead of .InnoDB

All Your Codebase Are Belong to Audit

People are module crazy in Drupal. The most common problem you’ll find in the code is that there is an ungodly amount of it.  I used to say over 100 directories inside of sites/all/modules/contrib was a decent cutoff, but now that seems to be the norm in Drupal 7 and 150 is a better estimate.

Other common codebase issues:

  • Core or contributed modules behind in security updates.

  • Hacked core or contributed modules without patch tracking; use Hacked! module to find these but check results to avoid false positives.

  • Modules in the codebase that are not in use.

Custom Code

The amount of custom code (custom modules and themes) varies greatly between sites. An experienced Drupal developer will start to have a sense of what a reasonable amount of custom code is depending on the site’s complexity. Too much custom code comes from developers who are unfamiliar with the Drupal ecosystem while too little can mean that the site was assembled by non-coders.

You don’t necessarily need to look at every line of custom code (depending on what you promised for this audit), but you want to get a broad overview and then take some detailed samples. Reading the code often gives me an idea of the mentality of the site’s developers and their process: can you tell there were multiple developers with different standards and approaches? Is there code elegant, does it show pride in their craft? Or is it hacked together as a bunch of copy/paste from Stack Overflow?

Security, maintainability, and (if it’s especially bad) performance are concerns to look for in the custom code.

Baby’s Got Back

Drupal doesn't really have a concept of a back end and front end, but to better jive with with most people’s mental model let’s refer to all pages with an admin/* path and the content editing pages as Drupal’s ‘back end.’

Jon Peck's Site Audit module is a really great place to start for the backend audit, as most of its checks relate to the Drupal site on that level. But I also like to get OCD: I log in as the superuser and literally look at every administrative configuration page on the site. Not only does this help you find problems but it's a great way to get to know the architecture of the site. The most common problem will be a 'too-many-ism'. Too many (including unused) Views, Content types, Roles, Vocabularies, Blocks, Menus, Input formats, Image styles, etc. It’s easier to add than to remove, so Drupal sites tend to bloat with growing complexity.

To really see the experience of staff that's using the site, you have to log in, see what they're seeing and try adding content. Click around as an administrator, an editor, and so on. What's it like to add content for the first time? How does the WYSIWYG function? Is it clear what all the fields are for on the content types? If you click the wrong thing do you end up with a broken-looking homepage? If it's hard to manage content, then an effective CMS has not been built.

You Frontin’

The front end is what the end user sees on the site. Typically this is the anonymous user experience, but for an intranet it would be authenticated users. Common issues you might find would include:

  • An unresponsive site, or a site which is somewhat broken at some screen widths. If you test the site at all different widths you might find content gets cut off or things that aren't usable in some ranges.
  • If the audience is using old browser versions (for example, an intranet where the organization is still using IE 8 on all machines), check for cross-browser issues.
  • SEO issues mainly fall under the front-end. How does a crawler see the site? You might be shocked at what you see (or don't see) when you turn off Javascript in your browser.
  • Design is not about pretty colors. Do design problems distract you from finding important issues? How is the readability of the content? Try reading some lengthy articles and see for yourself.

Tune back in later for tips on writing up the site audit report and what to do after delivery.

Ready to get started?

Tell us about your project