Feed the Feeds (importing referenced RSS data)

Computer
Laurence Liss
Laurence Liss

When a client asks for a way to pull content onto a site through RSS, the obvious choice is to use Drupal's Feeds module. I've never been really in love with this module but it does the job well. We recently had an interesting case that required extending the normal functionality of feeds to interact with custom content types. The scenario was as follows:

  • The site lists many organizations and the events of those organizations.
  • Site users connect with and follow events and news from the various organizations.
  • Each organization may run several websites each having its own set of RSS feeds.
  • The organizations desired the ability to have their content added to their pages dynamically and without effort.
  • The user would visit the organization's page and see a list of news and articles from one or more of that organization's feeds.

The idea is fairly straight forward. But feeds does not support this kind of association by default. If you're not familiar with feeds here's a brief rundown of how it works:

  1. Feeds allows you to designate a content type to be the source of a feed, or it will create a feed content type for you.
  2. You then create new nodes of this content type, adding the URL of the feed to be imported to each node.
  3. A designated times, the feed importer will be used to fetch information from the designated RSS URL. The data will be added to Drupal as nodes. You can choose what kind of node you would like the imported data to be.

This was a Drupal 7 site and our idea was to use references to reference each feed importer to the organization in question. For example the feed importers for developer.apple.com would reference the organization node for Apple as would the feed importer for news.apple.com while the feed importers for developer.microsoft.com and news.microsoft.com would reference the Microsoft organization node. Make sense so far? We then created a new content type for partner's news called Partner News. To this we added normal title, body, date, and ID information and also another reference field for organization. What we really wanted was for the partner news nodes to automatically inherit the reference field from the importer that created them. So playing off the example above, we wanted each news item that was added to Drupal from the feed at news.apple.com to inherit the reference to the Apple organization node so that we could later create a view on the Apple organization node that displayed all the imported feeds associated with Apple. The trick is that this function didn't exist. Lucky for us Feeds module provides some useful hooks to extend its base functionality. The custom module we created is fewer than 50 lines if you ignore all the comments. The following screen shots show a typical Feeds importer setup.

In this case I am attaching my feed importer to content type called importer. This means that when I create a new importer node, I will see that feeds has added a new field to the content type giving me a place to add the URL of the feed to be imported.

Under settings I designate that imported nodes should be Partner News nodes. That means that each item in an RSS feed that is imported will become its own Partner News node. I also set Feeds to update nodes rather than replace them or creating new ones if it finds duplicate data.

Finally we designate the mapping. The mapping defines to feeds what elements from an imported RSS element should be added to what part of the new node. Some of these are pretty obvious. We map title to title, date to date, description to body, and GUID to GUID. This last one (GUID) provides a unique identifier for updating feed data and is required if you want the nodes to update rather than duplicate.

But what we want doesn't exist. We want to see a source element that says something like “Feed Importer's Organization Reference” and a target that says something like “Organization Reference”, so that we can map from one to the other.

To do this start a custom module in the standard fashion (http://drupal.org/node/1074360). I'll call my module feedmapper. In feedmapper.module add the following function:

<?php
/**
* Implements hook_feeds_parser_sources_alter().
*/
function feedmapper_feeds_parser_sources_alter(&$sources, $content_type) {
$sources['field_importer_reference'] = array(
'name' => t('Organizations\'s NID'),
'description' => t('The node ID of the partner.'),
'callback' => 'feedmapper_get_organization_nid',
);
}

 

This adds a new source to the dropdown on the feed importer configuration.

The callback describes a function that will actually handle the data processing. You should prefix it with the name of you module but it can be named anything that makes sense. I haven't written this function yet, but we'll get to that shortly.

Next I'll add a function that specifies a new target.

/**
* Implements hook_feeds_processor_targets_alter().
*/
function feedmapper_feeds_processor_targets_alter(&$targets, $entity_type, $bundle_name) {
$targets['field_importer_reference'] = array(
'name' => 'Organization Reference',
'description' => 'the node reference for the partner',
'real_target' => 'field_importer_reference', // Specify real target field on node. This is on the content type.
'callback' => 'feedmapper_feeds_set_target',
);
}


Note that the source and the target both reference the same field field_importer_reference. This is due to the fact that I am reusing the field across content types. If you had different field names for each content type, you would need to make the target and source point to the specific field names you created.

Now you can assign this mapping. Of course it doesn't do anything yet because we haven't written the appropriate callbacks.

The set method is actually pretty easy because feeds handles that for us providing we pass the correct data at first. We need to focus on retrieving the correct node Id from the feed importer. To do this we access a property of the feed object. This property is feed_nid which as you can guess returns the value of the feed's node id. But now that we have the nid, retrieving another field's data is fairly trivial, we just need to make sure we're using the correct type of node so we run a check on the node type and then get the field in question:

/**
* Find the node id of the feed source and use it to find the associated organization.
*/
function feedmapper_get_organization_nid(FeedsSource $source) {
$nid = $source->feed_nid;
$feed = node_load($nid);
if ($feed->type == 'importer') { //this needs to be the name of the importer content type.
$partner_nid = $feed->field_importer_reference;
}
else {
$partner_nid = NULL;
}
return $partner_nid;
}

 

And that's it. Now the creating of a feed importer is tied to an organization and every time news is imported via feeds the incoming news item is automatically linked to its parent organization. I've attached a working version of the module outlined here along with a Feature that should get you started. Good luck.

Ready to get started?

Tell us about your project