Jobcentre Plus Mirror Data Definitions
ZOIS Technical Note TN-2010-12-01.
Author and Audience
An Application Programming Interface has been presented
that will allow web-based programs to access the Jobcentre Plus Mirror
database[jp]. This TN concerns itself with describing
the various data fields used by that API. The audience should be familiar
with programming and database techniques and wishing to add value to
the data provided. Written by Martin Sullivan[au],
ZOIS Limited, Cockermouth.
Abstract
A series of possibly vague definitions is given on the data-items
found within the Jobcentre Plus Mirror Database. This database is an
unofficially collected set of vacancy details found at the UK government
sponsored Jobseekers Direct web-site.
Introduction
The Jobcentre Plus Mirror[jp] is an indexed database
held at ZOIS. It is the result of scraping the Jobcentre
Plus and latterly the Jobs direct web-sites. This data is presented as
an FTP-able file in Comma Separated Value form on a nightly bases[ft]. This work has now been augmented by a series of
interfaces presented with the intention of being called as a Value Added
Service from other web-sites.
Materials and Platform
The underlying database is PostgreSQL[pg]. The types therefore reflect the types used in this database. These are, in summary:
- text
- Variable length unlimited character strings.
- char, char(n)
- Single character or fixed-length string, blank padded.
- date
- Is a date type with a resolution of one day. Dates will be generally transferred in ISO-8601 format (1957-01-30, as an example). The calendar is the one used in the UK. As the underlying system is PostgreSQL, a number of input values can be parsed into a meaningful internal date. For example 'today' and 'yesterday' as well as '30 January 1957'. Care must be taken lest a invalid syntax error occur, and ISO-8601 is preferred.
- timestamp
- This is a date type with a resolution of 1 second (although the value is held higher precision internally). ISO-8601 format is preferred once again, so midnight on the above date is 1957-01-30 00:00:00)
Method
Much of this is designed to be human parsable. It is thus quite vague and contains little standard, coded information.
- title
- Summary of vacancy text
- reference
- LMS reference, expected to be unique. The reference is provided by the Labour Market System, and underlying database of vacancies held by the Department of Work and Pensions (or their subcontractors on their behalf). This database forms the core of the Jobseekers Direct web-site and subsequently our scraping efforts. It consists of three letters identifying an 'owning' Jobcentre Plus Office, a slash ('/') and a serial number. The LMS reference is held to be unique and is treated as such by the JCPM. text
- location
- Short description of location, can contain postcode data. text
- hours
- Working hours description, particularly for part-time and split-shifts. text
- wage
- Short description of payment. No easily parsable text. text
- work_pattern
- Largely blank, but contains part-time details. text
- employer
- Sometime obfuscated employer name. text
- employer_ref
- A largely numeric reference to employer. Limited value. text
- pension
- Appears to either be blank or "Pension available", text
- duration
- Permanent (P) or Temporary (T), where noted. The official policy seems to be that any contract of employment with a term greater than six-months is Permanent. char enumerates (P|T)
- closing_date
- Noted closing date, if any. date
- description
- Detailed description of the vacancy. text
- apply
- Application details. text
- added
- Official provided posting date, if not provided then the current date is used. date
In addition there are a number of additional fields that are derived from the above.
- office_code
- Three letter office code, derived from reference. char(3).
- summary
- A truncated variation of the description with standardised Boiler Plate text describing Local Enterprise Partnerships and so forth removed. The summary is subsequently truncated to 200 chars and is designed to be used in extended search results and so forth. text
- noted
- The time the vacancy was noted by the JCPM system. timestamp
- pattern
- A PostgreSQL full-text search[t2] pattern comprising
a series of key words optionally interspersed with optional logical
controls. For example:
plumber plumber AND 'central heating' plumber OR carpenter
Discussion
This data description is designed to be used in conjunction with the Jobcentre Plus Mirror Application Programming Interface[ap]. This system is documented both in its initial 'index' page and elsewhere in other Technical Notes.
As with other Technical Notes, feedback is actively solicited. The
author may be contacted via the e-mail address found on his public
biography page[au]. Should something require changing
or enhancing then the fact will be acknowledged with attribution in an
Update section.
References
References found in this section, and in particular the HTML links were correct at time of writing (2010-11-30).
- [au]. Martin Sullivan:
- http://www.zois.co.uk/people/martin_sullivan
- [jp]. Jobcentre Plus Mirror Database:
- http://home.zois.co.uk/jcpnational.html
- [ft]. Jobcentre Plus Mirror FTP site:
- ftp://ftp.zois.co.uk/pub/jcp
- [pg]. PostgreSQL
- http://www.postgresql.org
- [t2]. Tsearch2:
- http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2
- [ap]. The JSON Application Programming Interface to the Jobcentre Plus Mirror:
- http://home.zois.co.uk:591
~Z~