Jena Geography Dataset
A dataset of about 200 geography services that have been gathered from web sites like seekda.com, xmethods.com, webservicelist.com, programmableweb.com, and geonames.org. The dataset has been built for the retrieval of (semantic) web service technology.
The Jena Geography Dataset (JGD) is a collection of about 200 geography services that have been gathered from web sites like seekda.com, xmethods.com, webservicelist.com, programmableweb.com, and geonames.org. It is available via the OPOSSum Portal. For convenience, the smaller subcollections Jena Geography Dataset 150, Jena Geography Dataset 100 and Jena Geography Dataset 50 containing only 150, 100 and 50 services respectively are also available.
This collection has been explicitly developed within OPOSSum. We gathered over 200 real service operations from public sources like seekda.com, xmethods.com, webservicelist.com, programmableweb.com, and geonames.org. All services belong to the domain of geography and geocoding. They all come with links to their implementations and/or provider websites. A significant portion of the services is commercial, although the majority of them is free. The services have been manually added to OPOSSum. The services are described using natural language. Furthermore, the input and output parameter types have been manually linked to WordNet sense keys. The natural language documentation of the services was retrieved from the WSDL documentation elements (if available) and the websites of the service providers. In some cases, the services have been invoked and the (sometimes skinny) documentation from the provider websites has been extended based on the gained insights. Note that the NL documentation of the services is divided in the documentation on the service level and the documentation of the input and output parameters (the latter is visible only on the service details page). Information about required passwords, logins, or license keys has sometimes but not always been copied.
Semantic descriptions for the JGD50 subset of the dataset have been created by different groups within the context of the JGD evaluation (3rd track at the 2009 S3 Contest on Semantic Service Selection).
All services are tagged with tags linked to WordNet sense keys. The tags currently being used are: address lookup, addresses, airport, altitude, articles, bearing, cash machine, city, congressional district number, converter, country, currency, demography, destination point, directions, distance, elevation, geocoding, geographic area, geographic information, ip address, iso code, map, mid point, postal code, public buildings, reverse geocoding, saltwater, search, sunrise, sunset, time zone, weather.
119 of the services are WSDL-based, the others are REST-based services. For the WSDL-based services the WSDLs are attached to the service entries. Note that if a WSDL contained several operations, we added those operations that represent a cohesive functionality as a single service to OPOSSum. Thus, WSDLs attached to a service may describe several more operations in addition to the operation that represents the service they are attached to. Also, different OPOSSum services representing different functionality but resulting from one WSDL will have that same identical (original) WSDL attached.
For better usability, we created derived versions of the original WSDLs by removing the additional operations, bindings, messages and types (e.g., for this service). Furthermore, we created fictitious WSDL descriptions for the REST-based services that did not originally have WSDL descriptions (e.g., for this service). Both, the downstripped derived WSDLs as well as those created from scratch for the REST-based services are marked by corresponding comments.
The relevant WSDL descriptions for the JGD services can be directly accessed via OPOSSum by searching for WSDL descriptions belonging to the Jena Geography Dataset, e.g. for the full dataset, as well as JGD150, JGD100 and JGD50.
The Geography collection contains several fictitious requests, but these are currently not available via OPOSSum. We also created full redundant relevance judgments independently by four human assessors for the requests and the 201 services according to different (binary, graded, and multidimensional) definitions of relevance. These are also not available yet, since we plan to perform a retrieval evaluation experiment with this dataset and want to avoid the judgments to be available prior to the execution of the evaluation. In case you are interested in the relevance judgments, please contact us.
A domain ontology has been created based upon the PROTON ontologies. This ontology may be helpful when working with the dataset: