[ZOIS] Home Page * Contact ZOIS * Technical Notes

The Jobcentre Plus Mirror Find Interface

ZOIS Technical Note TN-2010-11-11.

Author and Audience

An attempt has been made to produce a generalised web-interface to the Jobcentre Plus Mirror (JCPM) database[jp]. This TN concerns itself with describing this interface which that attempts to interpret a users query with respect to skill and location. Written by Martin Sullivan[au], ZOIS Limited, Cockermouth.

Abstract

A new interface to the Jobcentre Plus Mirror database is noted allowing a user to search based on key-words found in a vacancy description and on location. This interface is found here: http://home.zois.co.uk/jcpfind.php.

Introduction

The Jobcentre Plus Mirror[jp] is an indexed database held at ZOIS. It is the result of scraping the Jobcentre Plus and latterly the Jobs Direct web-sites. This data is presented as an FTP-able file in Comma Separated Form on a nightly bases[ft]. In an attempt to augment and enhance this for users a web-based front-end has been developed which attempts to provide a generalised skill and location interface. By its nature this interface should be intuitive to use and thus not require a great deal of description. This TN therefore concerns itself with the underlying technology and code.

Materials and Platform

The interface uses a PHP backed web-service hosted on Apache at ZOIS's Home server. The database engine is provided by PostgreSQL. The interface is designed to be used by any browser, including text-based ones such a Lynx, and narrow low-bandwidth ones such as are found on Mobile Phones.

The Find interface has been added to the Navigation stylings of this TN.

The 'JCPM Find' system currently presents a single box as an invitation to enter a query. The query consists of a series of key-words separated by logical elements. So for example 'painter OR plumber' would match vacancies which had either of those terms, while 'painter AND plumber' would match only vacancies that had both the terms together in the description. There is provision to construct elaborate terms using a 'not' operator and parenthesis. Should a search term simply contain key-words then the words are silently 'AND-ed' together. In effect 'pastry chef' is 'pastry AND chef'. White space can be quoted using single quotes so that matches for 'pastry chef' and only 'pastry chef' can be made.

The query is localised by entering a constraining location after an in key word. So 'painter or decorator in leeds' would attempt vacancies only in and around Leeds. Jobcentre Offices 'owning' vacancies are examined and those with vacancies of any type mentioning the location are assumed to be valid for further search. This makes the location term a little fuzzy, with respect to physical geography, which is desired. To illustrate, while 'Chester' may be the desired location, the term 'Chester' may appear in other Local Offices, for example Wrexham, and thus in these instances Wrexham's vacancies would be searched too.

Should the user postulate a query that does not have any hits then it is assumed that they misspell part or all of the query. The various parts of the query are then examined and matches made using Knuth's Soundex algorithm[sx]. The reformulated query is then silently re-examined and should it produce hits it is provided as a suggestion to the user. They can then pick one which provides them with the closest suggestion to their actual intention and a table of results is provided in the normal way.

Under the covers, key-words are tokenised and matched using PostgreSQL's Tsearch2 mechanism[t2], which is comprehensively indexed using the GIN[gn] mechanism. Both Tsearch2 and GIN are now part of the standard PostgreSQL distribution.

The Soundex codes are produced using the code found in the Fuzzystrmatch package, which is a contribution package for PostgreSQL, version 8.4, at time of writing. Some manual intervention by the PostgreSQL master-user is required to install the new functions, like soundex, on the database in question.

The JCPM Find system uses the words found in a Vacancy Description and Location only, and they are treated separately.

Discussion

The JCPM Find interface is one of a number of such interfaces that aim to either improve on or aggregate services based on the Jobcentre Plus FTP feed. It has the advantage of being 'home-grown' and thus has direct access to the Mirror database itself. It is heavily indexed to allow complexed searches to occur on quite modest hardware and hopefully people will find it useful in searching for a Job.

There are some limitations to searching. The key-words are search naively, without regard to meaning. Thus generalised terms may trigger false-positives and unnecessary hits. This is particularly noticeable with regard to Location, when a number of places incorporate a similar word (for example 'Newcastle').

Updates

As with other Technical Notes, feedback is actively solicited. The author may be contacted via the e-mail address found on his public biography page[au]. Should something require changing or enhancing then the fact will be acknowledged with attribution in an Update section.

Added Find Interface
A JCPM Find interface has been added to the Navigation stylings of this Technical Note. Find it either on the left or the bottom of the page, depending upon the page width. 2011-06-15

References

References found in this section, and in particular the HTML links were correct at time of writing (2010-11-11).

[au]. Martin Sullivan:
http://www.zois.co.uk/people/martin_sullivan
[jp]. The Unofficial National Jobcentre Plus Mirror:
http://home.zois.co.uk/jcpnational.html
[ft]. The Jobcentre Mirror FTP site:
ftp://ftp.zois.co.uk/pub/jcp
[sx]. The Soundex Algorithm, Described in:
Knuth DE (1968) The Art of Computer Programming Volume III, Addison-Wesley, Massachuetts
[t2]. Tsearch2:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2
[gn]. GIN Index Types:
http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html

~Z~


Date: 2011-06-15


Break Frame * E-mail Webmaster * Copyright