A technical article by Tim Miles-Board (tmb at ecs.soton.ac.uk), EPrints Services developer
Carry out a search in EPrints and it is simple to export the results into a number of different formats.
What you might not know is that each export has a special URL which contains all the information needed to carry out the search again. Whenever you access this URL EPrints will re-run the search and export the new list of results – so you effectively have a live feed of search results in your chosen format. This can be a useful technique for feeding data to (for example) a Web portal – just copy and paste the URL from EPrints into your CMS.
A customer took this a step further by creating a saved search for each data feed that would be used in their CMS – this gave them a useful way to label and manage each feed.
In 3.2, it is possible to export everything matched by a saved search – this gives you a URL like this:
http://myrepository.com/cgi/saved_search/export_repo_XMLforCMS.xml?savedsearchid=123&_output=XMLforCMS&_action_export=1
This works in exactly the same way as a search export – EPrints loads the appropriate saved search, re-runs it and then exports the results in the chosen format (in this case some specially crafted XML). So for our customer it was then a simple case of copying and pasting these saved search export URLs into their CMS.
However when the customer upgraded to version 3.3, they found that these URLs were no longer supported – instead of the expected XML they now got redirected to the normal (HTML) search results page you would see in a browser.
We considered 2 approaches to resolving this:
1) Migrate to CRUD API
3.3 introduced a CRUD (Create, Read, Update, Delete) API which allows data and files to be manipulated directly without using the normal Web UI. For example, the following will grab metadata about a saved search (note not the items it matches!) in Atom XML:
$ curl -X GET -i -H ‘Accept: application/atom+xml’ http://myrepository.com/id/saved_search/123
There is currently no way to export the list of items matched by a saved search – so we proposed extending CRUD like this:
$ curl -X GET -i -H ‘Accept: application/atom+xml’ http://myrepository.com/id/saved_search/123/contents
The “contents” modifier is already used to access files attached to a given item, or items owned by a given user – so it seems to make sense to extend it to mean “list of items matched” in the context of a saved search (find out more on github). This would make the URLs used in the CMS much simpler, but the existing URLs would all need to be changed and the CMS would need to be able to add custom HTTP Accept headers to the request to specify the desired format.
2) Support legacy URLs
We coded a simple extension to /cgi/saved_search which adds support for legacy export URLs in 3.3:
# intercept legacy export URL
if( defined $repo->param( "_action_export" ) && $repo->param( "_action_export" ) eq "1" && defined $repo->param( "_output" ) )
{
my $format = $repo->param( "_output" );
my $plugin = $repo->plugin( "Export::$format" );
# security - check plugin is publicly visible and saved search is allowed to be viewed
if( defined $plugin && $plugin->param( "visible" ) eq "all" && $saved_search->permit( "saved_search/view" ) )
{
$repo->send_http_header( "content_type"=>$plugin->param( "mimetype" ) );
# export items matching saved search
print $saved_search->make_searchexp->perform_search->export( $format );
exit;
}
}
This would require no changes to the existing CMS setup.
Conclusion
We decided that the best approach for our customer was to add support for legacy URLs in the short term, and plan to migrate to the CRUD API in the longer term, once support for exporting items matching a saved search is available in a release version of EPrints.