[ZOIS] Home Page * Contact ZOIS * Technical Notes

Building an Asynchronous Portal using Lightweight HTML Injection

ZOIS Technical Note TN-2011-07-01.

Author and Audience

The aggregation nature of some web-pages makes for 'Portals' which are often unsatisfactory beasts in that they are can appear complex and slow to load. This Technical Note presents one such solution using some asynchronous Javascript Techniques. Importantly is also presents a fall-back solution for when Javascript is not available. It is anticipated that the audience would be familiar with how the web works, Javascript, PHP and programming in general. Written by Martin Sullivan[au], ZOIS Limited, Cockermouth.

Abstract

The underlying mechanisms behind the RSS Asynchronous Portal Example are discussed. It uses Asynchronous Javascript technology, but has a fallback mechanism using a 'refresh hack' for non-Javascript use. It was inevitable, given feedback from the rest of the site, that the example would feature Cockermouth and the Jobcentre Database Mirror.

Introduction

The author has at various times been involved in 'Portal' projects. They have used various technologies to provide a unified central page which contained links to diverse content, usually held at some geographically remote web-site. The content for such pages then comes from a number of locations and servers both inside and outside the location. The aesthetics of these Portal pages tends to be rather 'busy' with a predominately boxy look-and-feel. They load slowly as data has to be retrieved from a number of databases over sometimes slow links. Traditionally, the assembled page cannot be rendered until all of the components have been acquired.

While these kinds of systems have been studied intensively, resulting in a number of standardised mechanism[jp], the author was intrigued by somewhat lighter JASON-like approaches found around the web. Some systems have been built as experiments and demonstrators. These can be currently found on the Home site-server[cr].

Materials and Platform

The original, more modest, plan was to demonstrate how an Real Simple Syndication (RSS)[rs] feed could be integrated into a localised home page with the aim of increasing the use of the RSS feeds provided by the Jobcentre Plus Mirror[nj] system. RSS feeds seem to be increasingly popular and they provide a good mechanism to allow third-parties to produce localised pages of which the latest postings at the local Jobcentre could be a part. The system then uses an XML using RSS Schema with the necessary manipulations are performed in PHP[ph] and the pages displayed with CSS[cs] stylings borrowed from the existing ZOIS site. The critical asynchronous component is provided by the Javascript[js] XMLHttpRequest[hr] function.

You appear to be running without Javascript enabled, so this may be of particular interest.

Although laudably modest at outset, another aim imposed itself. The technology used is Javascript, which as a downloaded executable code presents a not inconsiderable security risk. Such are the risks that many users tend to switch it off, or use a selective blocking tool such as NoScript. While the original designers of Javascript, no doubt, intended it be used as a useful adjunct, sadly now seems to be core to many web-sites. Such web-sites offer a poor experience when Javascript is turned off. They will often fail completely or offer only a rather rude suggestion that one should switch-on Javascript if one is to enter their on-line shop or read their important diatribe. It was decided that demonstrator should not display such behaviour, but instead degrade to a non-Javascript mechanism that emulated the asynchronous behaviour produced by using Javascript to some useful degree.

Method

Much of this methodology is a kind of AJAX-light. While 'proper' AJAX[ax] manipulates the Document Object Model (DOM), for which there is a Javascript Application Programming Interface. The DOM is a kind of internal database of the document that is rendered through the HTML layout engine, but its manipulation is, in the author's opinion, complicated. The techniques in described in this note use a simpler mechanism of direct-injection of HTML into an existing page. These techniques for the core of the "Asynchronous HTML and HTTP" mechanism, known as AHAH[ah]. While AHAH is not without its critics, mainly on the ground of code-presentation separation and performance, it is thought to be the best technique to use in this case.

The Injection Function

The functionality of the site is achieved by 'injecting' HTML asynchronously into an existing HTML element having obtained it using Javascript's' XMLHttpRequest. Since the initial data is encoded as an RSS feed, in XML, it is necessary to obtain it and convert it to an 'inject-able' bit of HTML.

function inject ($url, $items, $thats_it = YES) {

    $opts = array (
        'cookiesession' => YES, 
	'redirect' => 3,
        'timeout' => 180,
        'useragent' => "ZOIS RSS " .
	    "http://" .  $_SERVER['HTTP_HOST'] .  "/" .
            $_SERVER['PHP_SELF'] .  " PECL::HTTP (PHP)");

    $c = http_parse_message (http_get ($url, $opts))->body;
                   // cache this stuff in a future release

    $t = get_thing ('title', $c);
    echo "<div id=\"rssinject\"><h3>$t</h3>\n";

    $t = get_thing ('description', $c);
 
    echo "<p>$t\n";

    $count = preg_match_all ('/<item>(.*?)<\/item>/si', $c,
 	&$matches);

    $number = $items < $count ? $items : $count;
 
    echo "<p>\n";

    for ($ix = 0; $ix < $number; $ix++)
	display_entry ($matches[1][$ix]);

    echo "</div><!-- rssinject -->\n";

    if ($thats_it)  // it's recursive HTML injection, so
	exit;       // promptly, so they don't get all the other
                    // guff.
} // inject

As will be elucidated, this function is designed to be used in two slightly different ways. The first is a recursive function that only returns the HTML that is going to be injected by Javascript dynamically and the second is when it is being used to populate a rather more conventional static page. In the first instance $thats_it is true, so we return promptly, else return to the calling function for further processing.

This function uses get_thing and display_entry. These function extract text from the XML and display it in HTML respectively. Here's the code for get_thing:

function get_thing ($thing, $entry) {
	preg_match ('/<' . $thing . '>(.*?)<\/' . $thing . '>/si',
		    $entry,
		    &$matches);

	$r = preg_replace ('/<!\[CDATA\[(.*?)\]\]>/s', '$1', $matches[1]);
				// remove CDATA protection
	return (preg_replace ('/<script[^>]*>.*?<\/script>/si', '', $r));
				// But defang any XSS
} // get_thing

Which is fairly self explanatory, but we take care to defang anything which may appear dubious. The sites that RSS is obtained for the demonstration are largely trusted, but it is wise to be cautious.

This is the code for display_entry:

function display_entry ($entry = null) {
	
	if ($entry == null) 
		return;

	$title = get_thing ('title', $entry);
	$date = get_thing ('pubdate', $entry);
	$description = get_thing ('description', $entry);
	$link = get_thing ('link', $entry);

	echo "<h4><a href=\"$link\">$title</a></h4>\n";
	echo "<p>$description\n";
	echo "<em>$date</em>";

} // display_entry

Which has various chunks of text extracted from the RSS stream and re-written in HTML. These are further styled by Cascading Style Sheets (CSS).

Getting the HTML and Injecting It

The asynchronous methodology uses Javascript. The guts of this is the function multi_ahah. It is examined in detail in this section.

var req = new Array();

function multi_ahah(url, target_id, announce) {
	if (document.getElementById(target_id).innerHTML != 
		"Loading ... " + announce) {
		return;
	} // if

The HTML fragment as an initial 'Loading ...' text. This serves two purposes; notifying the user that something is about to happen and as a placeholder for the injected HTML. Should the initial HTML fragment not be present then it is assumed that the desired HTML has already been injected.

	if (window.XMLHttpRequest) {
		req[target_id] = new XMLHttpRequest();
		req[target_id].onreadystatechange = function() {
			multi_ahahDone(target_id);
		};
        	req[target_id].open("GET", url, true);
		req[target_id].send(null);
        } //if
} // multi_ahah

A request is generated and a 'call-back' registered, in this case, multi_ahahDone.

function multi_ahahDone(target_id) {
	if (req[target_id].readyState == 4) {	// only if req is "loaded"
		if (req[target_id].status == 200 || 
			req[target_id].status == 304) { // only if "OK"
           		results = req[target_id].responseText;
			document.getElementById(target_id).innerHTML = 
				results;
		} else {
			document.getElementById(target_id).innerHTML = 
			    "<h4>Ahah error:\n" + 
			    req[target_id].statusText + "</h4>";
		} // else
	} // if
} // multi_ahahDone

For readers of other Notes on this site this code might seem familiar[qd], and indeed it should be. This is a direct evolution of the Javascript that powers the Bubble of Further Information effect on part of the Office Detail pages of the Jobcentre Database Mirror[nj]. It is documented elsewhere, but the observant will also note that this code has been adapted to track multiple requests. Since the HTML injection is straight forward, directly into the page, there's no fancy animation required. The code that invokes this is started directly the page is displayed, with the 'Loading ...' text.

<script type="text/javascript">
document.write ('<div id="feed-0">Loading ... Jobcentre Plus latest for Cockermouth</div>');
multi_ahah ('/crsse.php?url=http%3A%2F%2Fhome.zois.co.uk%2Fjcprss.php&items=5', 'feed-0', 'Jobcentre Plus latest for Cockermouth');
</script>

Just to complicate matters even further, the above fragment is automatically generated by a PHP script, so the div id is unique and the URL and item count can be retrieved from a fairly central place, in this case a PHP array. The Javascript's URL fragment is self-referential, and when the PHP script receives the appropriate arguments it knows to invoke the inject function, discussed above.

$url = array_key_exists ("url", $_GET) ? $_GET["url"] : NULL;
$items = array_key_exists ("items", $_GET) ? $_GET["items"] : 0;

// Is it a recursive injection for AJAXy stuff?
if (isset ($url) && $items > 0)
	inject ($url, $items);	// produce the 'inner HTML', doesn't
				// return

Simulation in the Absence of Javascript

Much of the rest of the code is concerned with generating an appropriately pretty and informative container for this Javascript based example. The question then arose, what to do in the absence of Javascript, if it is turned off or not available in the browser.

The mechanism chosen was the automated update. In this technique a 'holding page' is displayed while an outstanding request to the back-end server is run, when that page is completed it is displayed in the holding page's stead. The technique works best if the holding page and the completed pages are largely similar, with respect to static text, pictures and decoration. While the technique can be applied to a graded multiple-update approach, in this instance only one final update was used.

Firstly, the PHP code needs to realise that the browser does not support Javascript, or that it has been switched off.

	echo "<noscript>\n";
	echo "<meta http-equiv=\"Refresh\" content=\"0;";
	echo $_SERVER['PHP_SELF'];
	echo "?noscript=1";
	echo "\">\n";
	echo "</noscript>\n";

This code causes the page to refresh immediately, with the same calling URL, but with "noscript=1" appended to it. The noscript value indicates to the back-end that we've displayed a place-holder and now the real page should be constructed. When this page is ready it will replace the currently displayed one.

The above code needs to be bracketed, to stop the page being constantly refreshed.

if (!$noscript) {
	echo "<noscript>\n";
	echo "<meta http-equiv=\"Refresh\" content=\"0;";
	echo $_SERVER['PHP_SELF'];
	echo "?noscript=1";
	echo "\">\n";
	echo "</noscript>\n";
} // noscript

The $noscript variable having been set thusly:

if (!isset ($noscript))		// if not already set
	$noscript = array_key_exists ("noscript", $_GET) ? 
		$_GET["noscript"] : NO;

Elsewhere in the code, the inject function can now be used to acquire the RSS XML, convert it to HTML and display it normally.

	if ($noscript)		// called again with noscript option
		echo inject ($feed[0], $feed[1], NO);
The original page will have a placeholder at this point ...
	else { 			// something to look at, while we wait
		echo "<noscript>Loading ... $feed[2]</noscript>\n";

Specialised Responses to Unusual Browsers

This all works well, with browsers tested including Opera, Firefox, Internet Explorer, Safari and Chrome; with and without Javascript enabled. The only browser that appears to have difficulties with this refresh technique is Lynx, which treats refresh URLs specially and asks, in this instance, supercilious questions of them. A small fragment of code deals with this:

if (preg_match ('/Lynx/', $_SERVER['HTTP_USER_AGENT']))
	$noscript = YES;	// Lynx is special. Keep the Faith.

The casual reader should note that the author normally disapproves of adjusting web-server behaviour based on User-Agent strings. He is also noted for the use of YES and NO in Boolean situations, as may be observed in various code fragments found in this Note. It is felt that this is easier to read, but requires this:

define ("YES",  true);
define ("NO",  false);

A full working example of code using these techniques is available from the author.

Presentation using CSS

The HTML derived from the RSS XML is presented using Cascading Style Sheet 'navigation' tag. Normally this is used to provide navigation side-bars, such as may be found with this Technical Note, but by placing them in isolation, after a <br clear=all>, they should appear to be a series of narrow columns which should arrange themselves side-by-side depending on the page width. Such multi-column formatting seems traditional on news sites, even if it is not achieved in this way. The CSS code looks like this:

/* Navigation is a bunch of ancillary text that should float to the right
   as a separate column is there's space, and stay at the bottom (or
   wherever it's put) if not.   */
#navigation {
    float: left;
    margin-top: 15px;
    margin-right: 2%;
    margin-left: 2%;
    max-width: 245px;
    padding: 10px;
    background: ivory;
    border-style: dotted;
    border-color: grey;
    border-width: 1px
} /* #navigation */

#navigation img {
    float: left;
    padding: 2px
} /* img */

Discussion

The mechanisms behind the Cockermouth RSS[cr] and related Generalised RSS Example pages have been discussed. This approach can equally well be used in more conventional Portal based sites. A typical example would involve the selection of data for a customer from a variety of databases. It could then be displayed asynchronously as it was presented by the remote servers. Such systems are likely to be closed and thus Javascript can be trusted. In such trusted systems there would be no need to provide a non-Javascript alternative.

Updates

As with other Technical Notes, feedback is actively solicited. The author may be contacted via the e-mail address found on his public biography page[au]. Should something require changing or enhancing then the fact will be acknowledged with attribution in this Update section.

References

References found in this section, and in particular the HTML links were correct at time of writing (2011-07-01).

[au] Martin Sullivan:
http://www.zois.co.uk/people/martin_sullivan
[jp] Java Portlet Specification - JSR 168:
http://developers.sun.com/portalserver/reference/techart/jsr168
[nj] The Unofficial National Jobcentre Plus Mirror:
http://home.zois.co.uk/jcpnational.html
[rs] Real Simple Syndication:
http://www.techxtra.ac.uk/rss_primer
[ph] PHP Hypertext Prepocessor:
http://www.php.net
[cs] Cascading Style Sheets:
http://www.w3.org/Style/CSS
[hr] XMLHttpRequest:
http://www.w3.org/TR/XMLHttpRequest
[js] Javascript:
http://www.ecmascript.org
[qd] TN-2009-11-15 Quick and Dirty Ajax:
http://www.zois.co.uk/tn/tn-2009-11-15.html
[cr] Cockermouth RSS Asynchronous Portal Example:
http://home.zois.co.uk/crsse.php
[ax] Ajax:
http://en.wikipedia.org/wiki/Ajax_(programming)
[ah] AHAH - Asynchronous HTML and HTTP:
http://microformats.org/wiki/rest/ahah

~Z~


Date: 2011-07-01


Break Frame * E-mail Webmaster * Copyright