BrowserID versus DDOS
This article is the third in a three-part series on implementing the RDFa Test Suite. The first article discussed the use of Sinatra, Backbone.js and Bootstrap.js in creating the test harness. The second article discussed the use of JSON-LD. In this article, we focus on our use of BrowserID in responding to a Distributed Denial of Service Attack (DDOS).
RDFa Test Suite
Working on the updated RDFa Test Suite has really been a lot of fun. It was a great opportunity to explore new Web application technologies, such as Bootstrap.js and Backbone.js. The test suite is a single-page web application which uses a Sinatra based service to run individual test cases.
The site was becoming stable, and we were starting to flesh out more test cases for odd corner cases, when the site started to not respond. Manu Sporny, who’s company Digital Bazaar is kindly donating hosting for the web site, noticed that there were a number of Ruby processes that were consuming all available Ruby workers, and causing new requests to block. The service is fairly resource intensive, as it must invoke an external processor and run a SPARQL query over the results for each test. It seemed as if the site was being hammered by a large number of overzealous search crawlers! Naturally, we put a robots.txt in place, expecting that conforming search engines would detect the site’s crawl preferences and back off, but that didn’t happen. Upon further examination of the server logs, we noted requests were streaming in from all over the world! Clearly, we were under attack. (Who might wish ill of the RDFa development effort? Who knows, but most likely this was just an anonymous, and not specifically malicious attack).
My first thought was to make use of a secret api token, configured into the server and the web app, but that didn’t really do the trick either; it seemed that modern day malware actually just executes the JavaScript, so it picks up the API key naturally!
BrowserID to the Rescue!
Okay, how about authentication? It’s typically a pain, and we were reluctant to put up barriers in front of people who might want to test their own processors or see how listed processors perform. The two current contenders are WebID and BrowserID.
WebID has the laudable goal of combining personally maintained profile
information with SSL certificates (it was previously known as FOAF+SSL).
Basically, it’s a mechanism to allow users to use a profile page as
their identity. This could come off of their blog, Facebook, Twitter or
other social networking site. By configuring an SSL certificate into the
browser and pointing to their profile page, a service can determine that
the profile page actually belongs to the user. (There’s much more to it,
you can read more in the WebID
Spec). A key advantage here
is that the service now has access to all of the self-asserted information
the user want’s to provide about themselves as defined in their profile
page, such as foaf:name, foaf:knows, and so forth. The chief downside
is that the common source of existing user identities in the world haven’t
bought into this, and there’s a competing solution that offers similar
benefits.
BrowserID is a Mozilla initiative to enable people with e-mail addresses to use those e-mails to login to websites, kind of like OpenID - only more secure. Basically, as I understand it, a service wanting to support this would include the BrowserID JavaScript client code in the Web application and use a simple Sign In button that invokes this code. That sends a request off the the identity provider (IDP) to authenticate the user, which has probably already happened in the past and maintained in a cookie. The IDP then sends a response which invokes a callback. The client then does a call back to the service to complete the login passing whether or not the login was successful as well as the e-mail address that logged in.
The beauty is, using a tool such as the sinatra-browserid Ruby gem,
this becomes dirt simple! Basically, on the API side, put in a call to
authorized? to determine if the user is authorized. If not, either
direct them to a login screen, or in the case of the RDFa Test Suite,
place an informational message telling them why we need them to login, and
identify the BrowserID button at the top of the page.
In the principle entry-point to the test suite on the service side is
/test-suite/check-test/:version/:suite/:num. The only real change to
this method was to check for authorization before performing the test.
# Run a test
get '/test-suite/check-test/:version/:suite/:num' do
return [403, "Unauthorized access is not allowed"] unless authorized?
# Get the SPARQL query
source = File.open(File.expand_path("../tests/#{num}.sparql"))
# Do host-language specific modifications of the SPARQL query.
query = SPARQL.parse(source)
# Invoke the processor and retrieve results, parsed into an RDF graph
graph = RDF::Graph.load(params['rdfa-extractor'] + test_path(version, suite, num, format))
# Run the query
result = query.execute(graph)
# Return results as JSON
{:status => result}.to_json
end
In the banner, we add a little bit of Haml:
...
%div.navbar-text.pull-right
- if email
%p.email
Logged in as
%span.email
= email
%a{:href => '/test-suite/logout'}
(logout)
- else
= render_login_button
When the page is returned, the email variable is set if the user is
authorized, so they’ll see the email address if they’ve authenticated, and
a login button otherwise. The render_login_button has handled entirely
by sinatra-browserid; no muss, no fuss!
The only other thing to do is to not show the test cases in the test
suite, unless the user has authenticated, which we can tell because
$("span.email") won’t be empty. In our application.js, we use this to
either show the tests, or an explanation:
// If logged in, create primary test collection and view
if ($("span.email").length > 0) {
this.testList = new TestCollection([], {version: this.version});
this.testList.fetch();
this.testListView = new TestListView({model: this.testList});
} else {
this.unauthorizedView = new UnauthorizedView();
}
That’s pretty much all there is too it. The only complication I faced is that, when developing with shotgun, the session ID is changed with each invocation, so it wasn’t remembering the login. By fixing the session secret this problem went away. Total time from discovery of the problem to deployed solution: about 1 hour. Not too bad.
It’s important to note that the RDFa Test Suite is stateless, and we don’t really need any personal information; we don’t collect information anywhere, even in our logs. BrowserID basically becomes a gate keeper to help ward off abuse. It imposes a very low barrier of entry, so it doesn’t interfere with people using the site anyway they choose.
I do miss other user asserted information, such as the user’s name and
so-forth. OpenID, another single-signon initiative
that has lost momentum lately, provides a Simple Registration
Extension
add-on that allows users to assert simple information such as nickname,
mail, fullname and so forth. IMO, the right way to do this is with
something like FOAF or the schema.org Person class. Perhaps
BrowserID will provide something like this in the future.
The Use of JSON-LD in the RDFa Test Harness
This article is the second in a three-part series on implementing the RDFa Test Suite. The first article discussed the use of Sinatra, Backbone.js and Bootstrap.js in creating the test harness. In this article, we focus on JSON-LD, a Linked Data technology that complements RDFa is creating modern Web applications.
Test Manifest
The RDFa test manifest is a Turtle document used to specify the tests that apply to different versions and host languages in RDFa. Turtle is a great language for representing information in a reasonably human-understandable way. Most people authoring RDF by hand stick to Turtle, because of it’s ease of use and concise way of expressing Linked Data graphs. For example, to specify a specific test entry, we could write some Turtle as follows:
<test-cases/0001> a test:TestCase;
dc:title "Predicate establishment with @property";
rdfatest:rdfaVersion "rdfa1.0", "rdfa1.1";
rdfatest:hostLanguage "xml", "xhtml1", "html4", "html5", "xhtml5";
test:classification test:required;
test:informationResourceInput <test-cases/0001.html>;
test:informationResourceResults <test-cases/0001.sparql> .
Basically, this defines a (relative) URL identifying the test case, gives it a title, describes the relevant RDFa versions and host languages, says it’s required, and shows the files used to provide input and to test the results. The problem is, this is not a convenient form to use programatically. Modern Web applications make use of JSON for representing data, for one reason because JSON can be represented natively in JavaScript, but also because it has a convenient representation in Ruby and other languages.
Let’s look at the equivalent test representation in JSON-LD:
{
"@context": "http://rdfa.info/contexts/rdfa-test.jsonld",
"@graph": [
{
"@id": "http://rdfa.info/test-suite/test-cases/0001",
"@type": "test:TestCase",
"num": "0001",
"classification": "test:required",
"description": "Predicate establishment with @property",
"input": "http://rdfa.info/test-suite/test-cases/0001.html",
"results": "http://rdfa.info/test-suite/test-cases/0001.sparql",
"expectedResults": true,
"hostLanguages": ["html4","html5","xhtml1","xhtml5","xml"],
"versions": ["rdfa1.0","rdfa1.1"]
}
]
}
Other than the encapsulating elements, this looks pretty similar to the Turtle representation. There are a couple
of differences though: instead of dc:title, we use the term description, instead of rdfatest:hostLanguage, we
use hostLanguages. How are these related? The key is looking at the @context value. Looking at
http://rdfa.info/contexts/rdfa-test.jsonld, we see the following:
{
"@context": {
"dc": "http://purl.org/dc/terms/",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"rdfatest": "http://rdfa.info/vocabs/rdfa-test#",
"test": "http://www.w3.org/2006/03/test-description#",
"classification": {"@id": "test:classification"},
"contributor": {"@id": "dc:contributor"},
"description": {"@id": "dc:title"},
"expectedResults":{"@id": "test:expectedResults",
"@type": "xsd:boolean"},
"hostLanguages": {"@id": "rdfatest:hostLanguage",
"@container": "@set"},
"input": {"@id": "test:informationResourceInput",
"@type": "@id"},
"num": {"@id": "rdfatest:num"},
"purpose": {"@id": "test:purpose"},
"versions": {"@id": "rdfatest:rdfaVersion",
"@container": "@set"},
"reference": {"@id": "test:specificationReference"},
"results": {"@id": "test:informationResourceResults",
"@type": "@id"}
}
}
The context does exactly that: it provides a context for interpreting JSON data. Note the definition of hostLanguages:
this indicates that hostLanguages is a term definition, meaning that the term is replaced with the
@id value, in this case rdfatest:hostLanguage, the same as used in Turtle. Both of these expand to an equivalent
IRI http://rdfa.info/vocabs/rdfa-test#hostLanguage.
In RDF, and in Linked Data in general, everything is described
as a resource, either an IRI, a Literal or a Blank Node (basically a variable representing something we don’t know
or don’t want to identify). The "@container": "@set" bit just says to expect that the value of hostLanguages will
always be an array, to make processing more convenient.
Because we use terms in JSON Object key positions, this means that access from JavaScript can be quite convenient. Taking a look at the test suite Test model description, we can download the Manifest with an Ajax request and access elements using ‘.’ notation, such as the following:
var filteredTests = _.filter(this.loadedData, function(data) {
return _.include(data.versions, version) &&
_.include(data.hostLanguages, hostLanguage);
});
Another advantage in using JSON is that the parse time is negligible. The manifest has about 3000 triples, which can actually take a while to parse as Turtle, but opening and parsing the JSON document is substantially faster.
As with many modern Web applications, the RDFa Test Suite is a single-page application that uses Ajax calls to communicate with the server. The first call is to retrieve the JSON manifest. Subsequent calls retrieve test results, also expressed as JSON. The manifest is used to populate a Backbone.js Collection. When a specific version and hostLanguage is selected, this collection is filtered to show only relevant tests, as is described in the previous example. The Collection then drives a view element, which instantiates a view for each model to be tested.
Collating Test Results
The second area where JSON-LD is used within the RDFa Test Suite is for collating test results. After running a series of tests, a test user can generate EARL test results. Being an RDFa test suite, this report is naturally expressed in RDFa. Here the Backbone.js view technology comes in to play, since it is easy to use an HTML template to generate individual results, with the RDFa markup backed into the template.
The basic EARL template looks like the following:
<script id='earl-item-template' type='text/template'>
<h4>
[
<span property='rdfatest:rdfaVersion'><%= version %></span>
<span property='rdfatest:hostLanguage'><%= hostLanguage %></span>
]
Test <%= num %>:
<span property='dc:title'><%= description %></span>
<span property='earl:mode' resource='earl:automatic' />
</h4>
<p property='dc:description'><%= purpose %></p>
<div class='property processorURL resource detailsURL'
typeof='earl:Assertion'>
<span property='earl:assertedBy' resource='' />
<span class='resource processorURL' property='earl:subject' />
<span class='resource docURL' rel='earl:test' />
<p property='earl:result' typeof='earl:TestResult'>
Result:
<strong class='resource outcome'
property='earl:outcome'
resource=''><%= result %></strong>
</p>
</div>
</script>
The Earl view uses this template to generate a report for an individual test entry and fills in attribute or content values from within the view:
var EarlItemView = Backbone.View.extend({
template: _.template($('#earl-item-template').html()),
render: function () {
var JSON = this.model.toJSON();
JSON.processorURL = this.options.processorURL;
this.$el.html(this.template(JSON));
this.$el.attr("resource", this.model.docURL());
this.$(".property.processorURL")
.attr("property",JSON.processorURL);
this.$(".resource.processorURL")
.attr("resource", JSON.processorURL);
this.$(".resource.detailsURL")
.attr("resource", this.model.detailsURL());
this.$(".resource.docURL")
.attr("resource", this.model.docURL());
this.$(".resource.outcome")
.attr("resource", 'earl:' +
this.model.get('result').toLowerCase());
return this;
}
});
The result is a test result for a specific processor with a specific RDFa version and host-language. You can see an example report here.
However, this is not the end of it; to exit the W3C Candidate Recommendation phase, it’s necessary to have at least two interoperable implementations. What is needed, then, is a collated report that combines the output from several different processors into a single report. Because each individual report is an information resource representing a specific RDF graph, we can parse all of these documents into a single graph. But, to generate an HTML result, it would be convienent to have all the data available in a format convenient to use with Ruby Haml.
This is where JSON-LD use in languages like Ruby come to play. Ruby has great libraries for working with JSON, which basically transforms the JSON to a combination of Ruby native Hash, Array, String, Number and Boolean values. A JSON-LD representation a test assertion entry looks like the following:
{
"@id": "http://rdfa.info/test-suite/test-details/rdfa1.1/...",
"@type": "earl:Assertion",
"assertedBy": "http://rdfa.info/test-suite/",
"test": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
"subject": "http://rubygems.org/gems/rdf-rdfa",
"result": {
"@type": "earl:TestResult",
"outcome": "earl:pass"
}
}
Transforming this to Ruby gives essentially the exact same representation, so we can iterate over this using Ruby Haml. The natural thing to do is see how we can represent EARL test results through a hierarchical test structure.
As it happens, the EARL representation is not actually ideal. Each assertion is listed with a subject that indicates the specifics of the processor, test, version and host language. It indicates that it is asserted by the test suite, the test being run, the processor being tested, and the result of this test. However, I’d like to show the results in a tabular form, with the test suite at the top, followed by sections for each version and host language, and a table with a row for each generic test and a column for each processor. A typical result looks like the following:
| Test | clj-rdfa | librdfa | pyRdfa | RDF::RDFa |
|---|---|---|---|---|
| 0001 Predicate establishment with @property | PASS | PASS | PASS | PASS |
To take advantage of JSON-LD chaining, we really want a data structure that we can easily iterate on. By adding some extra markup to the report, we can do this using JSON-LD Framing, basically a query language for JSON-LD that allows us to change the data into a format we want to use. The frame document allows us to specify how we’d like our output. An abbreviated example is the following:
{
"@context": "http://rdfa.info/contexts/rdfa-earl.jsonld",
"@type": "earl:Software",
"rdfa1.1": {
"@type": "rdfatest:Version",
"html5": [{"@type": "earl:TestCase"}]
}
}
This says show items of type earl:Software with a property (associated with the version), referencing
an object of type rdfa:Version, which has a property for each host language, which references
a list of earl:TestCase items. This gives us a JSON-LD snippet such as the following:
{
"@context": "http://rdfa.info/contexts/rdfa-earl.jsonld",
"@id": "http://rdfa.info/test-suite/",
"@type": [
"earl:Software",
"doap:Project"
],
"homepage": "http://rdfa.info/",
"name": "RDFa Test Suite",
"rdfa1.1": {
"@type": "rdfatest:Version",
"html5": [
{
"@id": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
"@type": "earl:TestCase",
"num": "0001",
"title": "Predicate establishment with @property",
"description": "Tests @property ...",
"mode": "earl:automatic",
"http://rubygems.org/gems/rdf-rdfa": {
"@id": "http://rdfa.info/test-suite/...",
"@type": "earl:Assertion",
"assertedBy": "http://rdfa.info/test-suite/",
"test": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
"subject": "http://rubygems.org/gems/rdf-rdfa",
"result": {
"@type": "earl:TestResult",
"outcome": "earl:pass"
}
},
"http://www.w3.org/2012/pyRdfa": { "@type": "earl:Software", ... },
"https://github.com/niklasl/clj-rdfa": { "@type": "earl:Software", ... },
"https://github.com/rdfa/librdfa": { "@type": "earl:Software", ... },
"https://github.com/rdfa/librdfa": { "@type": "earl:Software", ... },
"http://rubygems.org/gems/rdf-rdfa": { "@type": "earl:Software", ... }
}
]
}
}
We’ve basically wrapped each individual test case in a structure that inverts the information contained within the test case. Now we can use this within a Haml template to create the HTML we’re interested in.
To see the complete EARL report, look here.
Conclusions
JSON-LD is the right technology for dealing with RDF and Linked Data in Web applications. It has a convenient representation for working from within various programming languages, such as JavaScript and Ruby. It’s use in implementing that RDFa Test Suite proves it’s worth as a complementary technology for working with Linked Data on the Web along with RDFa.
Next up, we talk about the Distributed Denial of Service attack against the test suite and how we solved this very easily and quickly using BrowserID.
A new RDFa Test Harness
Implementing the RDFa Test Suite as a modern Web application using Sinatra, Backbone.js and Bootstrap.js.
Recently, RDFa entered the Candidate Recommendation phase for releasing RDFa Core 1.1, RDFa 1.1 Lite, and XHTML+RDFa 1.1 as W3C Standards. I’ve been using RDFa for a couple of years, originally as part of the Connected Media Experience, and lately because I’ve become passionate about the Semantic Web. For the last 10 months, or so, this has extended to my becoming an Invited Expert in the W3C, where I’ve worked on RDFa, HTML microdata and JSON-LD.
This is an introductory blog post on the creation of a new RDFa Test Suite. Here we discuss the use of Sinatra, Backbone.js and Bootstrap.js to run the test suite. Later will come articles on the usefulness of JSON-LD as a means of driving a test harness, generating test reports, and the use of BrowserID to deal with Distributed Denial of Service attacks that cropped up overnight (now available here).
RDFa Test Suite
Along with other RDF parsers and serializers (see sidebar), I have an RDFa parser and serializer. In implementing the parser, and while working on new features for RDFa 1.1, the RDFa Test Suite has been an invaluable resource. In my testing, I would use the test manifest, describing the sets of inputs and expected outputs in the form of a SPARQL ASK query.
A basic RDFa test is a small amount of markup intended to test a single feature.
<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/elements/1.1/">
<head>
<title>Test 0001</title>
</head>
<body>
<p>
This photo was taken by
<span class="author"
about="photo1.jpg"
property="dc:creator">Mark Birbeck</span>.</p>
</body>
</html>
In this example, we’re testing that the @about attribute sets the subject, @property sets the property and the
text content sets the object of a single RDF statement. Rendered as Turtle, it would look like the following:
@prefix dc: "http://purl.org/dc/elements/1.1/" .
<photo1.jpg> dc:creator "Mark Birbeck" .
A query to test this looks like the following:
PREFIX dc: <http://purl.org/dc/elements/1.1/>
ASK WHERE {
<http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/photo1.jpg>
dc:creator "Mark Birbeck" .
}
Note that the relative IRI in the @about is expanded relative to the document location, as is tested in the SPARQL query.
Using the test suite requires a publicly available endpoint, for which I released the RDF Distiller to test my implementation. The test suite works with a provided URL, which invokes the processor with a test document. Basically, it does the following:
- The Web application performs a
GETon the/test-suite/check-test/:version/:suite/:numservice URL along with the processor endpoint as a query parameter. - The service invokes the processor endpoint passing the URL of the test document.
- The processor then parses that document and returns a result in a different RDF format (for example Turtle or RDF/XML).
- The processor parses the returned RDF document into a graph, and performs a SPARQL query against that graph.
- The result is a
trueorfalsevalue, which determines if the test passes or not. - The result is formatted as JSON and returned the Web application.
- The Web application updates the test status in the UI.
- If running all tests, the completion event triggers the next test to run.
Sinatra
Sinatra is a great lightweight framework for deploying simple Ruby applications on the web. The needs of this application, while requiring a lot of different libraries, were really fairly simple. Basically, return a page listing the various tests, respond to requests for test case source documents, activate a test with a specified processor endpoint and return the results.
The basic setup of the app is fairly straight forward:
# Return the test suite driver page
get '/test-suite/' do
haml :test_suite
end
# Return a particular test, or SPARQL query
get '/test-suite/test-cases/:version/:host_language/:num' do
source = File.open(File.expand_path("../tests/#{num}.html"))
case host_language
when 'xhtml'
# do XHTML-specific formatting of the test
when 'html'
# do HTML-specific formatting of the test
when 'xml'
when 'svg'
end
end
# Run a test
get '/test-suite/check-test/:version/:suite/:num' do
# Get the SPARQL query
source = File.open(File.expand_path("../tests/#{num}.sparql"))
# Do host-language specific modifications of the SPARQL query.
query = SPARQL.parse(source)
# Invoke the processor and retrieve results, parsed into an RDF graph
graph = RDF::Graph.load(params['rdfa-extractor'] + test_path(version, suite, num, format))
# Run the query
result = query.execute(graph)
# Return results as JSON
{:status => result}.to_json
end
Backing up the Sinatra application are a number of Ruby Gems for working with Linked Data and SPARQL. In addition to reading and writing RDFa, there are gems for managing RDF graphs, reading other formats, such as Turtle and RDF/XML, and running the SPARQL queries.
Driving the test suite is an Web application built using Backbone.js and Bootstrap.js.
Backbone.js
Backbone is a JavaScript model-viewer-controller framework for building responsive applications in JavaScript. It encourages building modular applications split into multiple classes with weak interdependencies. Models and Collections are used to maintain application state, and reflect information from a server. The RDFa test suite has two main models and a collection.
The Version model keeps track of information about what is being run. This includes the RDFa version and host language being tested along with the current processor endpoint. It looks something like the following:
window.Version = Backbone.Model.extend({
defaults: {
processorURL: "http://www.w3.org/2012/pyRdfa/extract?uri=",
processorName: "pyRdfa",
processorDOAP: "http://www.w3.org/2012/pyRdfa",
// List of processors
processors: {}
}
// Appropriate host languages for the current version
hostLanguages: function() {
return {
"rdfa1.0": ["SVG", "XHTML1"],
"rdfa1.1": ["HTML4", "HTML5", "SVG", "XHTML1", "XHTML5", "XML"],
"rdfa1.1-vocab": ["HTML4", "HTML5", "SVG", "XHTML1", "XHTML5", "XML"]
}[this.get("version")];
}
});
The Test model, uses the test manifest to instantiate a number of Test model instances. Changing information in the Version model causes different tests to be enabled or disabled, as appropriate for the given RDFa version and host language. It also affects URL generation for retrieving and running different tests. In addition to instantiating tests, the Test Collection also allows the complete sequence of tests to be run, by listening to an event for a completion event from running a test on the first test model and initiating the test of the next.
Styling the User Interface
I’m no designer, but I like a good looking and efficient user interface. Fortunately, the people at Twitter do too, and they released Bootstrap.js as a means of tackling common problems. I won’t go into detail here, but check out their example page to get an idea of the things you can do with Bootstrap. What I immediately noticed about it is that I didn’t really need to worry about layout. Note that you can even run the Test Suite from an iPhone!
Data Driven Tests
Of course, returning the test suite HTML is just part of the problem, we also need to get details about each test to the page, so that it can respond to requests to run specific tests. The tests are managed through a test manifest, which is kept in Turtle format to make it easy to add tests. A typical entry looks like the following:
<test-cases/0001> a test:TestCase;
dc:title "Predicate establishment with @property";
rdfatest:rdfaVersion "rdfa1.0", "rdfa1.1";
rdfatest:hostLanguage "xml", "xhtml1", "html4", "html5", "xhtml5";
test:classification test:required;
test:informationResourceInput <test-cases/0001.html>;
test:informationResourceResults <test-cases/0001.sparql> .
The basically describes an IRI for the test, in this case test-cases/0001 relative to the location of the test suite,
the title of the test, the RDFa versions and host languages it applies to and a reference to the input and result documents.
RDFa has over 200 such tests defined. This is all well and good, but requiring yet another data format is an added complication.
Better to have the tests defined in a format more appropriate for use within an Web application, such as JSON. As it happens
JSON-LD is another specification that is still underway, but proving to be quite flexible and useful for our needs. For a
peek at the JSON-LD version of the RDFa test suite manifest, look here.
More on using JSON-LD, and why it’s such a good match for RDFa in the next post.
RDF.rb 0.3.5 and SPARQL 0.1.0
I added some minor updates to RDF.rb and re-issued versions 0.3.5 for the rdf and linkeddata gems. These updates are mostly to better support HTTP content negotaion and to find appropriate readers and writers based on file extension, mime-type, and content sniffing. There are also some minor fixes to aid jRuby and Ruby 1.9.3 support.
More notably, I’ve released 0.1.0 of the SPARQL gem. The logical behavior is unchanged from the previous release, but it now includes
Rack and Sinatra support to easily create middleware for a SPARQL endpoint. When used with the Linked Data gem, this
includes a range of RDF serializations for DESCRIBE and CONSTRUCT queries. It also adds HTTP Accept headers to outgoing requests
using FROM and FROM NAMED for RDF/XML and Turtle.
As a simple example, the Sinatra example in the README performs a simple query against a small repository:
#!/usr/bin/env ruby -rubygems
require 'sinatra'
require 'sinatra/sparql'
repository = RDF::Repository.new do |graph|
graph << [RDF::Node.new, RDF::DC.title, "Hello, world!"]
end
get '/sparql' do
SPARQL.execute("SELECT * WHERE { ?s ?p ?o }", repository)
end
A minimal SPARQL endpoint can be described as follows:
# Sinatra example
#
# Call as http://localhost:4567/sparql?query=uri,
# where `uri` is the URI of a SPARQL query, or
# a URI-escaped SPARQL query, for example:
# http://localhost:4567/?query=SELECT%20?s%20?p%20?o%20WHERE%20%7B?s%20?p%20?o%7D
require 'sinatra'
require 'sinatra/sparql'
require 'uri'
get '/' do
settings.sparql_options.merge!(:standard_prefixes => true)
repository = RDF::Repository.new do |graph|
graph << [RDF::Node.new, RDF::DC.title, "Hello, world!"]
end
if params["query"]
query = query.to_s =~ /^\w:/ ? RDF::Util::File.open_file(params["query"]) : :URI.decode(params["query"].to_s)
SPARQL.execute(query, repository)
else
service_description(:repo => repository)
end
end
This can be run using ruby -rubygems example.rb, or with rackup or shotgun as rackup example.rb
To load a complete to the query repository, or a full dataset including multiple context, load the repository as follows:
repository = RDF::Repository.load("http://path-to-repo")
This will incur a large startup time for each request, but you can also use a persistent store such as rdf-mongo:
repository = RDF::Mongo::Repository.new()
This will instantiate a persistent MongoDB store, which can be initialized one time using RDF::Mongo::Repository.load. Subsequent instantiations will use the persistent storage, and have better query performance for larger datasets.
For a more complete implementation, see the RDF Distiller running at http://rdf.greggkellogg.net/sparql and freely available to download and modify for your own purposes.
Follow up questions to public-rdf-ruby.
Palau Januay 2012
Just got back from my second trip to Palau, a wonderful place to dive. You can check out the photo album, or see the video slideshow along with some video clips.
Ruby and the Semantic Web
This evening, I gave a talk on using Ruby RDF.rb and assorted gems at the Lotico San Francisco Semantic Meetup. I’ve uploaded slides to Slide Share.
I also showed a simple demo using the GitHub API to create FOAF and DOAP records for accounts and repositories, and to do some simple navigation. The demo is running at http://greggkellogg.net/github-lod, and source is (of course) available on GitHub.
The demo is not intended to be a complete application, but it shows some basic capabilities [Ruby LinkedData][(http://rubygems.org/gems/linkeddata) for generating RDF in a variety of formats from Active Record models (which cache the GitHub API calls). The Web-pages are, of course, marked up with RDFa, and you can use content-negotiation, or append an appropriate extension to the URLs, to retrieve the data in alternative RDF formats.
Sea of Cortez
I just returned from a week on the Rocio del Mar diving the northern Sea of Cortez (aka the Gulf of California). We dove along the Midriff Islands and had great encounters with Sperm Whales and Whale Sharks, not to mention the often over-exhuberant Sea Lions.
Check out the photos.
SPARQL 1.0 for Ruby
I’ve just released version 0.0.2 of the Ruby sparql gem. This version is based on earlier work by Pius and Arto and incorporates from SPARQL Grammar and SPARQL Algebra. Further documentation is available here.
This gem integrates with RDF.rb and uses rdf-xsd to provide additional literal semantics.
Why release SPARQL for Ruby? Probably not because of the killer performance, at least right now. However, I believe it’s important that Ruby have a complete tool chain for manipulating Linked Data (including RDF and SPARQL), and this was the remaining piece.
In spite of the 0.0.2 release number, is is a fully functioning implementation of SPARQL 1.0 semantics and passes all the DAWG data-r2 test cases. The gem makes use of RDF::Query to perform basic BGP operations on RDF::Queryable objects (such as RDF::Repository). The gem has some support for query optimization, but this remains largely unimplemented and will be addressed in future releases. I’d also like to support SPARQL 1.1 queries and udpates at some point.
This is a pure Ruby implementation and does not directly rely on any native libraries (although, some RDF readers such as RDFa and RDF/XML presently do).
The basic strategy is to parse SPARQL and transform it into an S-Expression-based algebra, pretty close to that used by Jena ARQ (SPARQL S-Expressions, or SSE). This allows SSE to be used directly for performing queries, or to parse SPARQL grammar to SSE.
The linkeddata gem has also been updated to have a soft reference to SPARQL, in addition to new processors for RDF::Turtle, JSON::LD, and RDF::Microdata.
The gem is tested on Ruby 1.8.7, 1.9.2 and JRuby. (JRuby has some spec issues, probably due to Nokogiri differences)
Many thanks to Pius Uzamere and helping to make this release happen, and to Arto Bendiken for the work in RDF.rb, SPARQL::Algebra and SPARQL::Grammar that preceded this.
RDF.rb 0.3.4 released
After several months of gathering updates for RDF.rb, we’ve released version 0.3.4 with several new features:
- Update to BGP query model to support SPARQL semantics,
- Expandable Litereal support, to allow further implementation of XSD datatypes outside of RDF.rb (see RDF::XSD gem),
- More advanced content type detection to allow better selection of the appropriate reader from those available on the client. (Includes selecting among HTML types, such as Microdata and RDFa)
- Improved CLI with the
rdfexecutable providing access to all loaded readers and writers for cross-language serialization and deserialization.</http:>
As an example of format detection, consider the following:
require 'linkeddata'
RDF::Graph.load("http://greggkellogg.net/foaf.ttl")
should load Turtle or N3 readers if installed. This becomes more important for ambiguous file types, such as HTML, which could be either RDFa or Microdata, and application/xml, which could be TriX, RDF/XML or even RDFa.
See documentation for more specifics on this version of RDF.rb. Note that I’ve attempted to incorporate suggestions for improving the documentation.
Most of the reader/writer gems have been updated to match this release, in particular JSON::LD, RDF::Microdata, RDF::N3, RDF::RDFa, RDF::RDFXML, and RDF::Turtle.
A future update to the linkeddata gem should reference the latest versions of each, but a simple gem update will work too.
There is a slight semantic change for repositories to support SPARQL: a context of false should not match a variable context. This is straight out of SPARQL semantics. Repository implementors who have provided custom implementations of #query_pattern should check behavior against rdf-spec version 0.3.4 to verify correct operation.
Next up is a release of SPARQL implemented in pure Ruby. This gem provides full support for SPARQL 1.0 queries.
RDF::RDFa update with vocabulary expansion, RDF collections and more
I’ve updated RDF::RDFa with updates from recent changes to RDF Core:
- Deprecate explicit use of @profile
- Add rdfa:hasVocabulary when encountering @vocab
- Implemented Reader#expand to perform vocabulary expansion using RDFS rules 5, 7, 9 and 11.
Additionally, experimental support for RDF Collections (lists) has been added, based on RDF Webapps working group Wiki notes.
Remove RDFa Profiles
RDFa Profiles were a mechanism added to allow groups of terms and prefixes to be defined in an external resource and loaded to affect the processing of an RDFa document. This introduced a problem for some implementations needing to perform a cross-origin GET in order to retrieve the profiles. The working group elected to drop support for user-defined RDFa Profiles (the default profiles defined by RDFa Core and host languages still apply) and replace it with an inference regime using vocabularies. Parsing of @profile has been removed from this version.
Vocabulary Expansion
One of the issues with vocabularies was that they discourage re-use of existing vocabularies when terms from several vocabularies are used at the same time. As it is common (encouraged) for RDF vocabularies to form sub-class and/or sub-property relationships with well defined vocabularies, the RDFa vocabulary expansion mechanism takes advantage of this.
As an optional part of RDFa processing, an RDFa processor will perform limited RDFS entailment, specifically rules rdfs5, 7, 9 and 11. This causes sub-classes and sub-properties of type and property IRIs to be added to the output graph.
RDF::RDFa::Reader implements this using the #expand method, which looks for rdfa:hasVocabulary properties within the output graph and performs such expansion. See an example in the usage section.
RDF Collections (lists)
One significant RDF feature missing from RDFa was support for ordered collections, or lists. RDF supports this with special properties rdf:first, rdf:rest, and rdf:nil, but other RDF languages have first-class support for this concept. For example, in Turtle, a list can be defined as follows:
[ a schema:MusicPlayList;
schema:name "Classic Rock Playlist";
schema:numTracks 5;
schema:tracks (
[ a schema:MusicRecording; schema:name "Sweet Home Alabama"; schema:byArtist "Lynard Skynard"]
[ a schema:MusicRecording; schema:name "Shook you all Night Long"; schema:byArtist "AC/DC"]
[ a schema:MusicRecording; schema:name "Sharp Dressed Man"; schema:byArtist "ZZ Top"]
[ a schema:MusicRecording; schema:name "Old Time Rock and Roll"; schema:byArtist "Bob Seger"]
[ a schema:MusicRecording; schema:name "Hurt So Good"; schema:byArtist "John Cougar"]
)
]
defines a playlist with an ordered set of tracks. RDFa adds the @member attribute, which is used to identify values (object or literal) that are to be placed in a list. The same playlist might be defined in RDFa as follows:
<div vocab="http://schema.org/" typeof="MusicPlaylist">
<span property="name">Classic Rock Playlist</span>
<meta property="numTracks" content="5"/>
<div rel="tracks" member="">
<div typeof="MusicRecording">
1.<span property="name">Sweet Home Alabama</span> -
<span property="byArtist">Lynard Skynard</span>
</div>
<div typeof="MusicRecording">
2.<span property="name">Shook you all Night Long</span> -
<span property="byArtist">AC/DC</span>
</div>
<div typeof="MusicRecording">
3.<span property="name">Sharp Dressed Man</span> -
<span property="byArtist">ZZ Top</span>
</div>
<div typeof="MusicRecording">
4.<span property="name">Old Time Rock and Roll</span>
<span property="byArtist">Bob Seger</span>
</div>
<div typeof="MusicRecording">
5.<span property="name">Hurt So Good</span>
<span property="byArtist">John Cougar</span>
</div>
</div>
</div>
This basically does the same thing, but places each track in an rdf:List in the defined order.
You can try both these and other RDF gems a the distiller.