Add Feeds to Your Site With MagpieRSS

If content is king, then fresh, ever-changing, dynamic content must be the Emperor. If what you’re looking for is new, daily content on your site, why not take it from someone else? Goodness knows there are a million RSS feeds out there just begging for aggregation.

Integrating someone else’s news feed is relatively quick and painless when compared to the dull ache of writer’s block. While there are any number of RSS aggregation tools out there for you to use, many prefer MagpieRSS. It’s a light, flexible, open source PHP script that’s practically bomb-proof.

Though it’s currently the fourth most-popular programming language out there, some programmers are loathe to work in PHP. Possibly, they’re leery of the server-side load stress which, yes, in certain circumstances, can become an issue. However, for simple tasks like RSS aggregation, a PHP solution is far less taxing than a JavaScript equivalent. Or maybe it’s just the fear that having to use *.php file name extensions is going to break all of the file references in a site. Who knows? It is time to get over these fears.

Contents

  1. Overview
  2. Gathering a Feed with MagpieRSS
  3. Helpful Tips
  4. Caching In
  5. Convert to PHP Without The Pain

Overview

This tutorial will start with a few working examples of MagpieRSS in action. We’re going to fetch an RSS news feed from Wired.com and load it onto our page. Then, we’ll discuss caching, which is the single most important thing you’ll need to understand if you want to start using MagpieRSS. Lastly, for the folks whose web pages still end with *.html, we’ll go over a few approaches to integrating the PHP scripts into your present configuration without having to rename all your files.

Gathering a Feed with MagpieRSS

If you want to start incorporating RSS feeds onto your site, the first thing you’ll need to do is download the current MagpieRSS source files. For basic use, you’ll only need five of the files:the four *.inc files and the extlib directory. You can upload these files pretty much anywhere, but I like to keep my sites neat and tidy, so let’s put them all inside a new directory we’ll call “magpie.”

Next, create a web page called news.php and put it in the magpie directory. Cut and paste the following code into news.php:


<html><head><title>MagpieRSS Test</title></head><body>

<?php

include('rss_fetch.inc');

$url = "http://feeds.wired.com/wired/index";

$feed = fetch_rss($url);

echo "<p><a href="" .$feed->items[0]['link']. "">"

.$feed->items[0]['title']. "</a></p>";

?>



</body></html>

When you load news.php in your browser, you should see a single link to the most current Wired News article. If you don’t see anything, double check that PHP is installed and running on your server by loading the classic <?php phpinfo(); ?> test file. (phpinfo)

The beauty of MagpieRSS is that the fetch_rss() function doesn’t particularly care whether or not the feed you’re reading is properly validated XML. It just takes all the feed tags and all their corresponding blobs of information and plunks them into a bunch of different arrays.

It’s worth noting that the variable $items is MagpieRSS-defined. The arrays contained within $items are named according to element tags particular to the feed’s format (RSS 0.91, Atom, et cetera.)

While you can expect a certain amount of consistency between different feeds — ‘title’, ‘link’ and ‘description’ being pretty much universal — the best way to see a feed’s specific variables is to display the whole thing with this line:


 echo "<"pre>"";

 print_r($feed);

 echo "<"/pre>"";


Helpful Tips

If you’ve never worked with PHP at all, you’ll want to check out Webmonkey’s Tutorial:PHP Tutorial for Beginners, or maybe skim through some of the PHP manual pages. An understanding of echo, string and array are about all you’ll need. However, if you’re afraid of the geeks over at php.net (I know I am), probably the best thing to understand is that anything inside the echo string will appear in your HTML source code exactly as if you’d typed it there yourself. View Source after you load a PHP page if you don’t believe me. This is why, as with most programming languages, you’ll need to use a backslash to escape any double quotes you’re using for HTML element attributes, lest you mangle everything with malformed PHP syntax.

MagpieRSS returns a few custom errors in addition to reporting basic PHP ERRORS. If you’d rather set it up so your users won’t see these custom errors, you’ll want to turn off their display by including error_reporting(0); at the beginning of your script. It’s good practice to return the error reporting to its original state at the end of a script if you changed it at the beginning, so include @ini_restore('error_reporting'); at the end of your script.


Putting this all together with an array_slice() function, here’s some code that will return the five most recent news articles from Wired:


<?php

include('./rss_fetch.inc');

error_reporting(0);

$url = "http://feeds.wired.com/wired/index";

$rss = fetch_rss($url);

if ($rss) {

	$items = array_slice($rss->items, 0, 5);

	foreach ($items as $item) {

		echo "<p><a href="" .$item['link']. "">"

		.$item['title']. "</a></p>";

	}

}

@ini_restore('error_reporting');

?>


Nice and tidy, oui?


Caching In

If you’re going to dump an RSS feed onto your site, caching that feed is important. Very, very important.

Caching will speed up your load times and save you from serving up error messages should the remote RSS server have trouble. On another level, however, it’s about Being Responsible. Running MagpieRSS without a functioning cache folder will cause your site to query the feed source every single time someone loads one of your pages containing the script. Depending on your traffic, this might mean you’re hitting the remote site hundreds or thousands of times per day. If the remote site’s administrator isn’t quite on the ball, they’ll be sad and frustrated by their site’s mysterious slowness. If they know what they’re doing, they’re going to ban your access faster than you can say 403.

According to the MagpieRSS FAQ, “By default, Magpie will attempt to create a directory named ‘cache’ in the working directory of the PHP script which invoked it.” If you’re running your own server, this will most likely work for you. Load any page using the fetch_rss() script, then look for a newly created “cache” directory and then check to see that it contains a file with a long weird name (e.g., “25cd55bbc2766c84b57a3302daa8ba2e”). If you find the new directory and the strange file, you’re golden.

If you’re paying a hosting provider to host your site, there’s a high probability that your server permissions are (wisely) too secure to allow a PHP script to automatically create a cache directory. If this is the case, you’ll need to troubleshoot by hand.

One method, the success of which will vary from reader to reader, is to create a directory called “cache” and change the ownership of the new directory to whatever your server uses for web-user and web-owner.

Here’s another hacky little solution to the problem. To give MagpieRSS the permissions it needs to automatically create the cache folder, you’ll need to temporarily change the permissions of the parent directory containing your PHP file to be world-writable. Please note that this method is terribly insecure and, generally speaking, a Very Bad Idea, but it only takes about three seconds, so you can probably risk it.

Let’s assume that your news.php file is within a directory called “magpie”. From the command line, you’d type chmod 0777 magpie/. Next, load news.php by calling it up in a browser. Magpie will have the necessary permissions to create the cache directory, so it should do so as soon you call news.php. Double check that the cache directory was created, then peek inside it for one of those awkwardly-named cache files. Now, immediately change the magpie folder permissions back to “0755″! Post haste!

If the file you’re using to run the magpie scripts resides in your site’s top level directory, this little hack may not work. However, you can just go through the motions in a subdirectory, then edit your files to explicitly define wherever it is you’ve finally managed to create the cache directory. You can either edit the default location, defined on Line 20 of rss_cache.inc (var $BASE_CACHE = './cache';) or you can include this line at the top of any localized script:

define('MAGPIE_CACHE_DIR', './some_directory/cache');

Now that you’ve got your cache folder created, it’s time to customize the age at which your feed expires. If your source feed has infrequent updates – most blogs tend to post a mere handful of entries per day at best – you should probably edit line 21 of the rss_cache.inc file.

Find var $MAX_AGE = 3600; and change the default 3600 seconds (one Earth hour) to a larger number. If you can’t be bothered to do the math yourself, feel free to take advantage of the amazing mathematical power of computers and try something like var $MAX_AGE = (3600*12);. This forces the cache to refresh at a reasonable rate of twice a day.


Convert to PHP Without The Pain

If you’re not yet a bona fide PHP junky, tutorials like this will probably make you mad because they so blithely assumed that you’re willing to rename all your *.html files with *.php extensions when, of course, you aren’t.

Assuming that many of you feel the same way about renaming all your pages and navigation links and serving external links your 404, here are two suggestions for how you might integrate the MagpieRSS scripts into your existing site with a minimal amount of havoc.

If you only need the news feed to appear on a single page, and your site is hosted on an Apache server, your extension troubles are solved with a simple 301 Redirect. A 301 redirect is a permanent redirect that grabs users looking for some specific file and redirects them toward whatever new file you want them to see instead. The real beauty of the 301 Redirect is that, unlike a meta refresh tag, it speaks to search spiders in the magical language of HTTP. A 301 Redirect will let spiders know that they should permanently replace their existing cache with the new URL.

You’ll need to edit the .htaccess file in your site’s top level directory to include a line like this:

redirect 301 /index.html http://www.your_domain.com/index.php"

If you don’t have an .htaccess file in your site’s top level directory, feel free to create one and drop that line in there.

The second suggestion is infinitely sneakier, and, depending how good your JavaScript skills are, a little more complex. The trick here is that PHP can basically masquerade as any type of content you want as long as you include the proper content header. So, one workaround for including PHP output in documents without having to alter their *.html extensions is to send the code via JavaScript.

There isn’t really enough space in this tutorial to give you a full-blown example using proper DOM syntax, but we do want to leave you with some proof that this JavaScript-as-PHP approach does in fact work. Point being, please don’t send grumpy letters about how we copped out with the document.write example, OK?

Chose an HTML file one level above the magpie directory. Find the spot where you want to output your news feed and put in something like this:


<script type="text/javascript" src="./magpie/jsnews.php"></script>



Create a new file called “jsnews.php” and put it in the magpie directory. The basic content of jsnews.php could look like this:


<?php

header("Content-type:text/javascript");

include('./rss_fetch.inc');

error_reporting(0);

$url = "http://feeds.wired.com/wired/index";

$rss = fetch_rss($url);

if ($rss) {

	$items = array_slice($rss->items, 0, 5);

	$news_string = "";

	foreach ($items as $item) {

		$news_string = $news_string. "<p><a href='"

		.$item['link']. "'>" .$item['title'].

		"</a></p>";

	}

}

echo "document.write("$news_string")";

@ini_restore('error_reporting');

?>


What this does is define your $news_string variable using the MagpieRSS script as in prior examples. However, instead of outputting a clean echo to write the HTML on the fly, you’re sending the whole output to a string variable. That string variable gets passed to an echo in order to return a one line bit of document.write() JavaScript. Instead of using echo to output successive lines within a foreach array-parsing loop, you’ll need to concatenate the string output to a single variable. And mind those quotes! You won’t be able to pass any double quotes to the JavaScript echo, so all your HTML attributes will need to be enclosed within single quotes.

You can do so many great things with PHP that balking at the file extension issue hardly seems worth the bother. While the JavaScript hack does work, it’s kind of, well, silly. For starters, as a server-side language, PHP output is accessible regardless of platform. JavaScript, as you know, is not universally accessible, and users without JavaScript won’t see a thing.

Another reason PHP is better without JavaScript is that it’s the only way to get the search engine optimization boost fresh, ever-changing content gives your site. Although visitors will see fresh content regardless of whether you use PHP or JavaScript, most search engines won’t bother to fully parse your JavaScript, so all they really see is the unchanging, static script tag.