Gregg Kellogg

Gregg's personal blog

Semantic Web

RDF.rb and SPARQL gem updates

RDF.rb and SPARQL updates Version 1.0.6 of RDF.rb and 1.0.7 of SPARQL gems released.

I recently pushed updates to RDF.rb and SPARQL gems. These updates contain some useful new features:

RDF.rb

The main RDF gem has many updates to bring it closer to the coming update to the RDF 1.1 specifications (RDF Concepts, RDF Semantics, Turtle, TriG N-Triples, N-Quads, and JSON-LD). Notable changes since the 1.0 release are:

  • Make distinction between Plain and Simple literals; Simple literals have no language, plain literals may also have a language. And, in preparation for RDF 1.1, literals having a datatype of xsd:string are considered Simple literals. Language-tagged literals may have a datatype of rdf:langString.
  • Improved support for queries using hash patterns (thanks to @markborkum).
  • Update N-Triples (and N-Quads) readers and writers to support the RDF 1.1 version, including new escape sequences and support for UTF-8, in addition to ASCII. When writing, characters are escaped depending on the specified encoding or that inferred from the output file.
  • Term and Statement comparison is dramatically improved improving statement insertion and query performance.
  • Other literal changes required to support SPARQL 1.1

On the 1.1 branch:

  • RDF::URI re-written to not require Addressable. This improves general performance of using URIs by about 50% (depends on 1.9+ features, so not included in the 1.0 branch).
  • Support for Ruby versions less than 1.9.2 is dropped.

SPARQL

The SPARQL gem is not based on the SPARQL 1.1 grammar, and now includes some features from SPARQL 1.1, including all functions and builtins and variable bindings. Look for new features to be added incrementally; once a critical mass is reached, I'll update the gem version to 1.1 to reflect that this is essentially a 1.1 compatible version of SPARQL. Eventually this will include SPARQL Update and Federated Queries as well.

Other gems

  • JSON::LD is released as version 1.0.0 and is full compatible with the last-call version of the JSON-LD specifications, including support for framing.
  • RDF::Turtle is fully compatible with the RDF 1.1 version.
    • Also includes a Freebase-specific reader for fastest performance reading Freebase dumps.
  • RDF::RDFa is compatible with RDFa Core 1.1 and HTML+RDFa 1.1
  • RDF::TriG is released as a 1.0.0 version, based on the RDF 1.1 note
  • RDF::Raptor now uses Raptor 2.0, and is fully compatible with the latest version of Redland/Raptor.

Published on Sat, 11 May 2013 23:12:00 GMT under , . Tags , , ,

RDF.rb 1.0 Release

I'm happy to announce the 1.0 release of RDF.rb and related Ruby gems. This release has been a long time coming, and the library has actually been quite stable for some time.

RDF.rb is a Ruby Gem implementing core [RDF][] concepts, such as Graph, Repository, Statement, and Query. It also supports core readers and writers (parsers and serializers) for N-Triples and N-Quads.

Through other gems, more readers and writers are implemented, including:

Additional readers and writers are available through Redland Raptor bindings using the RDF::Raptor gem.

In addition to native support for BGP-based queries, there is a full-conferment SPARQL 1.0 gem (SPARQL gem), and SPARQL 1.1 client gem (SPARQL::Client).

All of these gems can be packaged together using the Linkeddata, Rack::LinkedData, and Sinatra::LinkedData gems.

There are also a number of storage adaptors for popular backends, including the following:

Background

I first became involved with RDF when working on the Connected Media Experience (CME) design (see blog entry). Having designed a proprietary metadata standard for Gracenote and Warner Music Group (later for CME), through review with Lucas Gonze, I was introduced to the Music Ontology, which had done an extremely thorough modeling of the music domain. This caused significant debate in the CME community, which lead to an updated design based on RDF, and many of the ideas from the Music Ontology.

During the same review, I was also introduced to RDFa as a mechanism of embedding music metadata within a web page. Since CME is endeavoring to create a standard for enhanced digital media packages, HTML5 and RDFa are natural technologies to utilize.

My way of methodology architectures and specifications is based on parallel prototyping, to validate the details of a design, and sometimes serve as the bases for an implementation. For several years (starting in 2007) I had been using Ruby on Rails and was quite invested in the Ruby. My first attempts to integrate RDFa into the implementation made use of the Raptor parser, which unfortunately was not up-to-date with respect to the RDFa 1.0 specification current at the time. Additionally, I found that the Ruby Bindings suffered from memory leaks, and went to look for a native Ruby implementation. This lead to Tom Morris' Reddy gem. This was a port of the Python rdflib package to Ruby, which was going in the right direction, but had fallen into disuse. I created my own fork, and later released as RdfContext which had complete implementations for RDF/XML, RDFa 1.0, and Notation3 (N3-rdf level).

In the mean time, Arto Bendiken and Ben Lavender had been working on RDF.rb, taking a different approach closer to Sesame. The design of RDF.rb is quite elegant, making effective use of natural Ruby idioms, and taking an approach based heavily on using Ruby module extensions. After a prod by Nick Humfrey who had started a port of my RDF/XML parser, I jumped in and ported the bulk of my RDF parsers and serializers, eventually adding several more, to make the RDF.rb platform one of the most complete in terms of standards support across all major (and minor) serialization formats. In 2010, I was asked to join the RDF.rb core development team, and soon became the primary maintainer after Arto and Ben became fully committed with Dydra.

Arto and I finally got together recently to move all of the gems primary repositories to the Ruby RDF organization on GitHub, and release them as 1.0. The next significant release should be 1.1, to coincide with the release of the RDF 1.1 specs from the W3C.

Published on Fri, 25 Jan 2013 20:40:00 GMT under , .

JSON-LD and MongoDB

For the last several months, I've been engaged in an interesting project with Wikia. Wikia hosts hundreds of thousands of special-interest wikis for things as varied as pokemon, best cellphone rate comparisons, TV shows and Video Games.

For those of you not aware of Wikia, it is an outgrowth of the MediaWiki and was founded by Jimmy Wales as a for-profit means of using the MediaWiki platform for exactly such interests.

Recently MediaWiki Deutschland started work on WikiData, an effort to use Semantic Web principles to create a factual knowledge base that can be used within Wikis (typically to replace Infobox information, which can vary between different language versions). This is a somewhat different direction than Semantic Media Wiki, which is more about using Wiki markup to express semantic relationships within a Wiki. As it happens JSON-LD is being considered as the data representation model for WikiData.

Linked Data at Wikia

As it turns out, Wikia has been quite interested in leveraging these tools. I did mention that Wikia is a for-profit company; one way they do this is through in-page advertising, but the amount of knowledge curated by the hundreds of thousands of communities is staggering. Unfortunately, native Wiki markup just isn't that semantic. However, much of the information represented is factual (at least within the world-view of the wiki community).

To that end, I've been working on an experiment using JSON-LD and MongoDB to power a parallel structured data representation of much of the information contained in a wiki. The idea is to add a minimal amount of markup (hopefully) to the Wiki text and templates so that information can be represented in the generated HTML using RDFa. This allows the content of the Wiki to be mirrored in a MongoDB-based service using JSON-LD. Once the data has been freed from the context of the limited Wiki markup, it can now be re-purposed outside of the Wiki itself.

Knowledge modeling and data representation

Why use RDFa and not Microdata? The primary driver is the need to use multiple vocabularies to represent information. In my opinion, any new vocabulary needs to take into consideration schema.org; microdata works great with schema.org, and can generate RDF (see Microdata to RDF) as long as you're constrained to a single vocabulary, don't need to keep typed data, and don't need to capture actual HTML markup. Unfortunately, any serious application beyond simple Search Engine Optimization (SEO) does need to use these features. In our case, much of the interesting data to capture are fragments of the Wiki pages themselves. Moreover, the content of any Wiki, much less one that has as much special meaning as, say, a Video Game, needs to describe relationships that are not natively part of the schema.org vocabulary. Schema does provide an extension mechanism partly for this purpose, and recently the ability to tag subjects with an additional type, not part of the primary vocabulary (presumably schema.org) was introduced. But, once the decision is made to use multiple vocabularies, RDFa has better mechanisms in place anyway.

At Wikia, we define a vocabulary as an extension to schema.org, that is, the classes defined within that vocabulary are sub-classes of schema.org classes, although typically the properties are not sub-properties of schema.org properties (we may revisit this). For example, a wikia:VideoGame is a sub-class of schema:CreativeWork, and a wikia:WikiText is a sub-class of schema:WebPageElement. There are additional class and property definitions to describe the structural detail common to Video Games in describing characters, levels, weapons, and so forth. An RDFa description will assert both the native class (e.g., wikia:VideoGame) and the schema.org extension class (e.g. schema:CreativeWork/VideoGame). This allows search engines to make sense of the structured data, without the need to understand an externally defined vocabulary.

However, for Wikia's purposes, and that of people wanting to work with in the Wikia structured-data echo-system, having a vocabulary that models the information contained within Wikia Wikis can be of great benefit. Key to this is knowning how much to model with classes and properties, and how much to leave to things such as naming conventions and keywords. In fact, there are likely cases where more per-wiki modeling is required, and we are continuing to explore ways in which we can further extend the vocabularies, without imposing a large burden on ontology development, and to keep the data reasonably generically useful.

Linked Data API

Although RDFa structured in HTML can be quite useful as an API itself, modern Single Page Applications are better served through RESTful interfaces with a JSON representation. JSON-LD was developed as a means of expressing Linked Data in JSON. It is fully compatible with RDF. Indeed, many of the concepts used in RDFa can be seen in JSON-LD – Compact IRIs, language- and datatyped-values, globally identified properties, and the basic graph data model of RDF.

Furthermore, a JSON-LD-based service allows resource descriptions, that may be spread across multiple HTML pages, to be consolidated into individual subject definitions. By storing these subject definitions in a JSON-friendly datastore such as MongoDB, the full power of a scaleable document store becomes available to the data otherwise spread out across numerous Wiki pages. But, the fact that the JSON-LD can be fully generated from the RDFa contained in the generated Wiki pages, ensures that the data will remain synchronized.

In the future, with the growth and adoption of systems such as WikiData, we can expect to see much of the factual information currently expressed as Wiki markup moved to outside services. The needs of the Wiki communities remain paramount, as they are at the heart of the data explosion we've seen in the hundreds of thousands of Wikis hosted at Wikia and elsewhere, not to mention WikiPedia and related MediaWiki projects.

As the communities become more comfortable with using knowledge stores, such as WikiData and Wikia's linked data platform, we should see a further explosion in the amount of structured information available on the web in general. The real future, then, relies not only in the efforts of communities to curate their information, but in the ability to use the principles of the Semantic Web and Linked Data to infer connections based on distributed information.

I'll be speaking more about JSON-LD and MongoDB at NoSQL Now! later this week in San Jose. Slides for my talk are available on slideshare.

Published on Tue, 21 Aug 2012 13:50:00 GMT under , , .

BrowserID versus DDOS

BrowserID vs DDOS How BrowserID saved the RDFa Test Suite from a DDOS (Distributed Denial of Service Attack).

This article is the third in a three-part series on implementing the RDFa Test Suite. The first article discussed the use of Sinatra, Backbone.js and Bootstrap.js in creating the test harness. The second article discussed the use of JSON-LD. In this article, we focus on our use of BrowserID in responding to a Distributed Denial of Service Attack (DDOS).

RDFa Test Suite

Working on the updated RDFa Test Suite has really been a lot of fun. It was a great opportunity to explore new Web application technologies, such as Bootstrap.js and Backbone.js. The test suite is a single-page web application which uses a Sinatra based service to run individual test cases.

The site was becoming stable, and we were starting to flesh out more test cases for odd corner cases, when the site started to not respond. Manu Sporny, who’s company Digital Bazaar is kindly donating hosting for the web site, noticed that there were a number of Ruby processes that were consuming all available Ruby workers, and causing new requests to block. The service is fairly resource intensive, as it must invoke an external processor and run a SPARQL query over the results for each test. It seemed as if the site was being hammered by a large number of overzealous search crawlers! Naturally, we put a robots.txt in place, expecting that conforming search engines would detect the site’s crawl preferences and back off, but that didn’t happen. Upon further examination of the server logs, we noted requests were streaming in from all over the world! Clearly, we were under attack. (Who might wish ill of the RDFa development effort? Who knows, but most likely this was just an anonymous, and not specifically malicious attack).

My first thought was to make use of a secret api token, configured into the server and the web app, but that didn’t really do the trick either; it seemed that modern day malware actually just executes the JavaScript, so it picks up the API key naturally!

BrowserID to the Rescue!

Okay, how about authentication? It’s typically a pain, and we were reluctant to put up barriers in front of people who might want to test their own processors or see how listed processors perform. The two current contenders are WebID and BrowserID.

WebID has the laudable goal of combining personally maintained profile information with SSL certificates (it was previously known as FOAF+SSL). Basically, it’s a mechanism to allow users to use a profile page as their identity. This could come off of their blog, Facebook, Twitter or other social networking site. By configuring an SSL certificate into the browser and pointing to their profile page, a service can determine that the profile page actually belongs to the user. (There’s much more to it, you can read more in the WebID Spec). A key advantage here is that the service now has access to all of the self-asserted information the user want’s to provide about themselves as defined in their profile page, such as foaf:name, foaf:knows, and so forth. The chief downside is that the common source of existing user identities in the world haven’t bought into this, and there’s a competing solution that offers similar benefits.

BrowserID is a Mozilla initiative to enable people with e-mail addresses to use those e-mails to login to websites, kind of like OpenID - only more secure. Basically, as I understand it, a service wanting to support this would include the BrowserID JavaScript client code in the Web application and use a simple Sign In button that invokes this code. That sends a request off the the identity provider (IDP) to authenticate the user, which has probably already happened in the past and maintained in a cookie. The IDP then sends a response which invokes a callback. The client then does a call back to the service to complete the login passing whether or not the login was successful as well as the e-mail address that logged in.

The beauty is, using a tool such as the sinatra-browserid Ruby gem, this becomes dirt simple! Basically, on the API side, put in a call to authorized? to determine if the user is authorized. If not, either direct them to a login screen, or in the case of the RDFa Test Suite, place an informational message telling them why we need them to login, and identify the BrowserID button at the top of the page.

In the principle entry-point to the test suite on the service side is /test-suite/check-test/:version/:suite/:num. The only real change to this method was to check for authorization before performing the test.

# Run a test
get '/test-suite/check-test/:version/:suite/:num' do
  return [403, "Unauthorized access is not allowed"] unless authorized?

  # Get the SPARQL query
  source = File.open(File.expand_path("../tests/#{num}.sparql"))

  # Do host-language specific modifications of the SPARQL query.
  query = SPARQL.parse(source)

  # Invoke the processor and retrieve results, parsed into an RDF graph
  graph = RDF::Graph.load(params['rdfa-extractor'] + test_path(version, suite, num, format))

  # Run the query
  result = query.execute(graph)

  # Return results as JSON
  {:status => result}.to_json
end

In the banner, we add a little bit of Haml:

...
%div.navbar-text.pull-right
  - if email
    %p.email
      Logged in as
      %span.email
        = email
      %a{:href => '/test-suite/logout'}
        (logout)
  - else
    = render_login_button

When the page is returned, the email variable is set if the user is authorized, so they’ll see the email address if they’ve authenticated, and a login button otherwise. The render_login_button has handled entirely by sinatra-browserid; no muss, no fuss!

The only other thing to do is to not show the test cases in the test suite, unless the user has authenticated, which we can tell because $("span.email") won’t be empty. In our application.js, we use this to either show the tests, or an explanation:

// If logged in, create primary test collection and view
if ($("span.email").length > 0) {
  this.testList = new TestCollection([], {version: this.version});
  this.testList.fetch();
  this.testListView = new TestListView({model: this.testList});
} else {
  this.unauthorizedView = new UnauthorizedView();
}

That’s pretty much all there is too it. The only complication I faced is that, when developing with shotgun, the session ID is changed with each invocation, so it wasn’t remembering the login. By fixing the session secret this problem went away. Total time from discovery of the problem to deployed solution: about 1 hour. Not too bad.

It’s important to note that the RDFa Test Suite is stateless, and we don’t really need any personal information; we don’t collect information anywhere, even in our logs. BrowserID basically becomes a gate keeper to help ward off abuse. It imposes a very low barrier of entry, so it doesn’t interfere with people using the site anyway they choose.

I do miss other user asserted information, such as the user’s name and so-forth. OpenID, another single-signon initiative that has lost momentum lately, provides a Simple Registration Extension add-on that allows users to assert simple information such as nickname, mail, fullname and so forth. IMO, the right way to do this is with something like FOAF or the schema.org Person class. Perhaps BrowserID will provide something like this in the future.

Published on Fri, 30 Mar 2012 15:58:00 GMT under , , .

The Use of JSON-LD in the RDFa Test Harness

The Use of JSON-LD in the RDFa Test Harness How JSON-LD allows an efficient representation of RDF graphs convenient for use in Ruby, JavaScript and Haml.

This article is the second in a three-part series on implementing the RDFa Test Suite. The first article discussed the use of Sinatra, Backbone.js and Bootstrap.js in creating the test harness. In this article, we focus on JSON-LD, a Linked Data technology that complements RDFa is creating modern Web applications.

Test Manifest

The RDFa test manifest is a Turtle document used to specify the tests that apply to different versions and host languages in RDFa. Turtle is a great language for representing information in a reasonably human-understandable way. Most people authoring RDF by hand stick to Turtle, because of it’s ease of use and concise way of expressing Linked Data graphs. For example, to specify a specific test entry, we could write some Turtle as follows:

<test-cases/0001> a test:TestCase;
   dc:title "Predicate establishment with @property";
   rdfatest:rdfaVersion "rdfa1.0", "rdfa1.1";
   rdfatest:hostLanguage "xml", "xhtml1", "html4", "html5", "xhtml5";
   test:classification test:required;
   test:informationResourceInput <test-cases/0001.html>;
   test:informationResourceResults <test-cases/0001.sparql> .

Basically, this defines a (relative) URL identifying the test case, gives it a title, describes the relevant RDFa versions and host languages, says it’s required, and shows the files used to provide input and to test the results. The problem is, this is not a convenient form to use programatically. Modern Web applications make use of JSON for representing data, for one reason because JSON can be represented natively in JavaScript, but also because it has a convenient representation in Ruby and other languages.

Let’s look at the equivalent test representation in JSON-LD:

{
  "@context": "http://rdfa.info/contexts/rdfa-test.jsonld",
  "@graph": [
    {
      "@id": "http://rdfa.info/test-suite/test-cases/0001",
      "@type": "test:TestCase",
      "num": "0001",
      "classification": "test:required",
      "description": "Predicate establishment with @property",
      "input": "http://rdfa.info/test-suite/test-cases/0001.html",
      "results": "http://rdfa.info/test-suite/test-cases/0001.sparql",
      "expectedResults": true,
      "hostLanguages": ["html4","html5","xhtml1","xhtml5","xml"],
      "versions": ["rdfa1.0","rdfa1.1"]
    }
  ]
}

Other than the encapsulating elements, this looks pretty similar to the Turtle representation. There are a couple of differences though: instead of dc:title, we use the term description, instead of rdfatest:hostLanguage, we use hostLanguages. How are these related? The key is looking at the @context value. Looking at http://rdfa.info/contexts/rdfa-test.jsonld, we see the following:

{
  "@context": {
    "dc":         "http://purl.org/dc/terms/",
    "xsd":        "http://www.w3.org/2001/XMLSchema#",
    "rdfatest":   "http://rdfa.info/vocabs/rdfa-test#",
    "test":       "http://www.w3.org/2006/03/test-description#",

    "classification": {"@id": "test:classification"},
    "contributor":    {"@id": "dc:contributor"},
    "description":    {"@id": "dc:title"},
    "expectedResults":{"@id": "test:expectedResults",
                       "@type": "xsd:boolean"},
    "hostLanguages":  {"@id": "rdfatest:hostLanguage",
                       "@container": "@set"},
    "input":          {"@id": "test:informationResourceInput",
                       "@type": "@id"},
    "num":            {"@id": "rdfatest:num"},
    "purpose":        {"@id": "test:purpose"},
    "versions":       {"@id": "rdfatest:rdfaVersion",
                       "@container": "@set"},
    "reference":      {"@id": "test:specificationReference"},
    "results":        {"@id": "test:informationResourceResults",
                       "@type": "@id"}
  }
}

The context does exactly that: it provides a context for interpreting JSON data. Note the definition of hostLanguages: this indicates that hostLanguages is a term definition, meaning that the term is replaced with the @id value, in this case rdfatest:hostLanguage, the same as used in Turtle. Both of these expand to an equivalent IRI http://rdfa.info/vocabs/rdfa-test#hostLanguage. In RDF, and in Linked Data in general, everything is described as a resource, either an IRI, a Literal or a Blank Node (basically a variable representing something we don’t know or don’t want to identify). The "@container": "@set" bit just says to expect that the value of hostLanguages will always be an array, to make processing more convenient.

Because we use terms in JSON Object key positions, this means that access from JavaScript can be quite convenient. Taking a look at the test suite Test model description, we can download the Manifest with an Ajax request and access elements using ‘.’ notation, such as the following:

var filteredTests = _.filter(this.loadedData, function(data) {
  return _.include(data.versions, version) &&
         _.include(data.hostLanguages, hostLanguage);
});

Another advantage in using JSON is that the parse time is negligible. The manifest has about 3000 triples, which can actually take a while to parse as Turtle, but opening and parsing the JSON document is substantially faster.

As with many modern Web applications, the RDFa Test Suite is a single-page application that uses Ajax calls to communicate with the server. The first call is to retrieve the JSON manifest. Subsequent calls retrieve test results, also expressed as JSON. The manifest is used to populate a Backbone.js Collection. When a specific version and hostLanguage is selected, this collection is filtered to show only relevant tests, as is described in the previous example. The Collection then drives a view element, which instantiates a view for each model to be tested.

Collating Test Results

The second area where JSON-LD is used within the RDFa Test Suite is for collating test results. After running a series of tests, a test user can generate EARL test results. Being an RDFa test suite, this report is naturally expressed in RDFa. Here the Backbone.js view technology comes in to play, since it is easy to use an HTML template to generate individual results, with the RDFa markup backed into the template.

The basic EARL template looks like the following:

<script id='earl-item-template' type='text/template'>
  <h4>
    [
     <span property='rdfatest:rdfaVersion'><%= version %></span>
     <span property='rdfatest:hostLanguage'><%= hostLanguage %></span>
    ]
    Test <%= num %>:
    <span property='dc:title'><%= description %></span>
    <span property='earl:mode' resource='earl:automatic' />
  </h4>
  <p property='dc:description'><%= purpose %></p>
  <div class='property processorURL resource detailsURL'
       typeof='earl:Assertion'>
    <span property='earl:assertedBy' resource='' />
    <span class='resource processorURL' property='earl:subject' />
    <span class='resource docURL' rel='earl:test' />
    <p property='earl:result' typeof='earl:TestResult'>
      Result:
      <strong class='resource outcome'
              property='earl:outcome'
              resource=''><%= result %></strong>
    </p>
  </div>
</script>

The Earl view uses this template to generate a report for an individual test entry and fills in attribute or content values from within the view:

var EarlItemView = Backbone.View.extend({
  template: _.template($('#earl-item-template').html()),

  render: function () {
    var JSON = this.model.toJSON();
    JSON.processorURL = this.options.processorURL;

    this.$el.html(this.template(JSON));
    this.$el.attr("resource", this.model.docURL());
    this.$(".property.processorURL")
      .attr("property",JSON.processorURL);
    this.$(".resource.processorURL")
      .attr("resource", JSON.processorURL);
    this.$(".resource.detailsURL")
      .attr("resource", this.model.detailsURL());
    this.$(".resource.docURL")
      .attr("resource", this.model.docURL());
    this.$(".resource.outcome")
      .attr("resource", 'earl:' +
                        this.model.get('result').toLowerCase());
    return this;
  }
});

The result is a test result for a specific processor with a specific RDFa version and host-language. You can see an example report here.

However, this is not the end of it; to exit the W3C Candidate Recommendation phase, it’s necessary to have at least two interoperable implementations. What is needed, then, is a collated report that combines the output from several different processors into a single report. Because each individual report is an information resource representing a specific RDF graph, we can parse all of these documents into a single graph. But, to generate an HTML result, it would be convienent to have all the data available in a format convenient to use with Ruby Haml.

This is where JSON-LD use in languages like Ruby come to play. Ruby has great libraries for working with JSON, which basically transforms the JSON to a combination of Ruby native Hash, Array, String, Number and Boolean values. A JSON-LD representation a test assertion entry looks like the following:

{
  "@id": "http://rdfa.info/test-suite/test-details/rdfa1.1/...",
  "@type": "earl:Assertion",
  "assertedBy": "http://rdfa.info/test-suite/",
  "test": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
  "subject": "http://rubygems.org/gems/rdf-rdfa",
  "result": {
    "@type": "earl:TestResult",
    "outcome": "earl:pass"
  }
}

Transforming this to Ruby gives essentially the exact same representation, so we can iterate over this using Ruby Haml. The natural thing to do is see how we can represent EARL test results through a hierarchical test structure.

As it happens, the EARL representation is not actually ideal. Each assertion is listed with a subject that indicates the specifics of the processor, test, version and host language. It indicates that it is asserted by the test suite, the test being run, the processor being tested, and the result of this test. However, I’d like to show the results in a tabular form, with the test suite at the top, followed by sections for each version and host language, and a table with a row for each generic test and a column for each processor. A typical result looks like the following:

Test clj-rdfa librdfa pyRdfa RDF::RDFa
0001 Predicate establishment with @property PASS PASS PASS PASS

To take advantage of JSON-LD chaining, we really want a data structure that we can easily iterate on. By adding some extra markup to the report, we can do this using JSON-LD Framing, basically a query language for JSON-LD that allows us to change the data into a format we want to use. The frame document allows us to specify how we’d like our output. An abbreviated example is the following:

{
  "@context": "http://rdfa.info/contexts/rdfa-earl.jsonld",
  "@type": "earl:Software",
  "rdfa1.1": {
    "@type": "rdfatest:Version",
    "html5": [{"@type": "earl:TestCase"}]
  }
}

This says show items of type earl:Software with a property (associated with the version), referencing an object of type rdfa:Version, which has a property for each host language, which references a list of earl:TestCase items. This gives us a JSON-LD snippet such as the following:

{
  "@context": "http://rdfa.info/contexts/rdfa-earl.jsonld",
  "@id": "http://rdfa.info/test-suite/",
  "@type": [
    "earl:Software",
    "doap:Project"
  ],
  "homepage": "http://rdfa.info/",
  "name": "RDFa Test Suite",
  "rdfa1.1": {
    "@type": "rdfatest:Version",
    "html5": [
      {
        "@id": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
        "@type": "earl:TestCase",
        "num": "0001",
        "title": "Predicate establishment with @property",
        "description": "Tests @property ...",
        "mode": "earl:automatic",
        "http://rubygems.org/gems/rdf-rdfa": {
          "@id": "http://rdfa.info/test-suite/...",
          "@type": "earl:Assertion",
          "assertedBy": "http://rdfa.info/test-suite/",
          "test": "http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/0001.html",
          "subject": "http://rubygems.org/gems/rdf-rdfa",
          "result": {
            "@type": "earl:TestResult",
            "outcome": "earl:pass"
          }
        },
        "http://www.w3.org/2012/pyRdfa": { "@type": "earl:Software", ... },
        "https://github.com/niklasl/clj-rdfa": { "@type": "earl:Software", ... },
        "https://github.com/rdfa/librdfa": { "@type": "earl:Software", ... },
        "https://github.com/rdfa/librdfa": { "@type": "earl:Software", ... },
        "http://rubygems.org/gems/rdf-rdfa": { "@type": "earl:Software", ... }
      }
    ]
  }
}

We’ve basically wrapped each individual test case in a structure that inverts the information contained within the test case. Now we can use this within a Haml template to create the HTML we’re interested in.

To see the complete EARL report, look here.

Conclusions

JSON-LD is the right technology for dealing with RDF and Linked Data in Web applications. It has a convenient representation for working from within various programming languages, such as JavaScript and Ruby. It’s use in implementing that RDFa Test Suite proves it’s worth as a complementary technology for working with Linked Data on the Web along with RDFa.

Next up, we talk about the Distributed Denial of Service attack against the test suite and how we solved this very easily and quickly using BrowserID.

Published on Tue, 20 Mar 2012 22:52:00 GMT under , , .

A new RDFa Test Harness

A new RDFa Test Harness Implementing the RDFa Test Suite as a modern Web application using Sinatra, Backbone.js and Bootstrap.js.

Recently, RDFa entered the Candidate Recommendation phase for releasing RDFa Core 1.1, RDFa 1.1 Lite, and XHTML+RDFa 1.1 as W3C Standards. I’ve been using RDFa for a couple of years, originally as part of the Connected Media Experience, and lately because I’ve become passionate about the Semantic Web. For the last 10 months, or so, this has extended to my becoming an Invited Expert in the W3C, where I’ve worked on RDFa, HTML microdata and JSON-LD.

This is an introductory blog post on the creation of a new RDFa Test Suite. Here we discuss the use of Sinatra, Backbone.js and Bootstrap.js to run the test suite. Later will come articles on the usefulness of JSON-LD as a means of driving a test harness, generating test reports, and the use of BrowserID to deal with Distributed Denial of Service attacks that cropped up overnight (now available here).

RDFa Test Suite

Along with other RDF parsers and serializers (see sidebar), I have an RDFa parser and serializer. In implementing the parser, and while working on new features for RDFa 1.1, the RDFa Test Suite has been an invaluable resource. In my testing, I would use the test manifest, describing the sets of inputs and expected outputs in the form of a SPARQL ASK query.

A basic RDFa test is a small amount of markup intended to test a single feature.

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/elements/1.1/">
<head>
   <title>Test 0001</title>
</head>
<body>
  <p>
    This photo was taken by
    <span class="author"
          about="photo1.jpg"
          property="dc:creator">Mark Birbeck</span>.</p>
</body>
</html>

In this example, we’re testing that the @about attribute sets the subject, @property sets the property and the text content sets the object of a single RDF statement. Rendered as Turtle, it would look like the following:

@prefix dc: "http://purl.org/dc/elements/1.1/" .
<photo1.jpg> dc:creator "Mark Birbeck" .

A query to test this looks like the following:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
ASK WHERE {
    <http://rdfa.info/test-suite/test-cases/rdfa1.1/html5/photo1.jpg>
      dc:creator "Mark Birbeck" .
}

Note that the relative IRI in the @about is expanded relative to the document location, as is tested in the SPARQL query.

Using the test suite requires a publicly available endpoint, for which I released the RDF Distiller to test my implementation. The test suite works with a provided URL, which invokes the processor with a test document. Basically, it does the following:

  1. The Web application performs a GET on the /test-suite/check-test/:version/:suite/:num service URL along with the processor endpoint as a query parameter.
  2. The service invokes the processor endpoint passing the URL of the test document.
  3. The processor then parses that document and returns a result in a different RDF format (for example Turtle or RDF/XML).
  4. The processor parses the returned RDF document into a graph, and performs a SPARQL query against that graph.
  5. The result is a true or false value, which determines if the test passes or not.
  6. The result is formatted as JSON and returned the Web application.
  7. The Web application updates the test status in the UI.
  8. If running all tests, the completion event triggers the next test to run.

Sinatra

Sinatra is a great lightweight framework for deploying simple Ruby applications on the web. The needs of this application, while requiring a lot of different libraries, were really fairly simple. Basically, return a page listing the various tests, respond to requests for test case source documents, activate a test with a specified processor endpoint and return the results.

The basic setup of the app is fairly straight forward:

# Return the test suite driver page
get '/test-suite/' do
  haml :test_suite
end

# Return a particular test, or SPARQL query
get '/test-suite/test-cases/:version/:host_language/:num' do
  source = File.open(File.expand_path("../tests/#{num}.html"))
  case host_language
  when 'xhtml'
    # do XHTML-specific formatting of the test
  when 'html'
    # do HTML-specific formatting of the test
  when 'xml'
  when 'svg'
  end
end

# Run a test
get '/test-suite/check-test/:version/:suite/:num' do
  # Get the SPARQL query
  source = File.open(File.expand_path("../tests/#{num}.sparql"))

  # Do host-language specific modifications of the SPARQL query.
  query = SPARQL.parse(source)

  # Invoke the processor and retrieve results, parsed into an RDF graph
  graph = RDF::Graph.load(params['rdfa-extractor'] + test_path(version, suite, num, format))

  # Run the query
  result = query.execute(graph)

  # Return results as JSON
  {:status => result}.to_json
end

Backing up the Sinatra application are a number of Ruby Gems for working with Linked Data and SPARQL. In addition to reading and writing RDFa, there are gems for managing RDF graphs, reading other formats, such as Turtle and RDF/XML, and running the SPARQL queries.

Driving the test suite is an Web application built using Backbone.js and Bootstrap.js.

Backbone.js

Backbone is a JavaScript model-viewer-controller framework for building responsive applications in JavaScript. It encourages building modular applications split into multiple classes with weak interdependencies. Models and Collections are used to maintain application state, and reflect information from a server. The RDFa test suite has two main models and a collection.

The Version model keeps track of information about what is being run. This includes the RDFa version and host language being tested along with the current processor endpoint. It looks something like the following:

window.Version = Backbone.Model.extend({
  defaults: {
    processorURL: "http://www.w3.org/2012/pyRdfa/extract?uri=",
    processorName: "pyRdfa",
    processorDOAP: "http://www.w3.org/2012/pyRdfa",

    // List of processors
    processors: {}
  }

  // Appropriate host languages for the current version
  hostLanguages: function() {
    return {
      "rdfa1.0": ["SVG", "XHTML1"],
      "rdfa1.1": ["HTML4", "HTML5", "SVG", "XHTML1", "XHTML5", "XML"],
      "rdfa1.1-vocab": ["HTML4", "HTML5", "SVG", "XHTML1", "XHTML5", "XML"]
    }[this.get("version")];
  }
});

The Test model, uses the test manifest to instantiate a number of Test model instances. Changing information in the Version model causes different tests to be enabled or disabled, as appropriate for the given RDFa version and host language. It also affects URL generation for retrieving and running different tests. In addition to instantiating tests, the Test Collection also allows the complete sequence of tests to be run, by listening to an event for a completion event from running a test on the first test model and initiating the test of the next.

Styling the User Interface

I’m no designer, but I like a good looking and efficient user interface. Fortunately, the people at Twitter do too, and they released Bootstrap.js as a means of tackling common problems. I won’t go into detail here, but check out their example page to get an idea of the things you can do with Bootstrap. What I immediately noticed about it is that I didn’t really need to worry about layout. Note that you can even run the Test Suite from an iPhone!

Data Driven Tests

Of course, returning the test suite HTML is just part of the problem, we also need to get details about each test to the page, so that it can respond to requests to run specific tests. The tests are managed through a test manifest, which is kept in Turtle format to make it easy to add tests. A typical entry looks like the following:

<test-cases/0001> a test:TestCase;
   dc:title "Predicate establishment with @property";
   rdfatest:rdfaVersion "rdfa1.0", "rdfa1.1";
   rdfatest:hostLanguage "xml", "xhtml1", "html4", "html5", "xhtml5";
   test:classification test:required;
   test:informationResourceInput <test-cases/0001.html>;
   test:informationResourceResults <test-cases/0001.sparql> .

The basically describes an IRI for the test, in this case test-cases/0001 relative to the location of the test suite, the title of the test, the RDFa versions and host languages it applies to and a reference to the input and result documents. RDFa has over 200 such tests defined. This is all well and good, but requiring yet another data format is an added complication. Better to have the tests defined in a format more appropriate for use within an Web application, such as JSON. As it happens JSON-LD is another specification that is still underway, but proving to be quite flexible and useful for our needs. For a peek at the JSON-LD version of the RDFa test suite manifest, look here. More on using JSON-LD, and why it’s such a good match for RDFa in the next post.

Published on Sun, 18 Mar 2012 00:00:00 GMT under , , .

RDF.rb 0.3.5 and SPARQL 0.1.0

I added some minor updates to RDF.rb and re-issued versions 0.3.5 for the rdf and linkeddata gems. These updates are mostly to better support HTTP content negotaion and to find appropriate readers and writers based on file extension, mime-type, and content sniffing. There are also some minor fixes to aid jRuby and Ruby 1.9.3 support.

More notably, I’ve released 0.1.0 of the SPARQL gem. The logical behavior is unchanged from the previous release, but it now includes Rack and Sinatra support to easily create middleware for a SPARQL endpoint. When used with the Linked Data gem, this includes a range of RDF serializations for DESCRIBE and CONSTRUCT queries. It also adds HTTP Accept headers to outgoing requests using FROM and FROM NAMED for RDF/XML and Turtle.

As a simple example, the Sinatra example in the README performs a simple query against a small repository:

#!/usr/bin/env ruby -rubygems
require 'sinatra'
require 'sinatra/sparql'

repository = RDF::Repository.new do |graph|
  graph << [RDF::Node.new, RDF::DC.title, "Hello, world!"]
end

get '/sparql' do
  SPARQL.execute("SELECT * WHERE { ?s ?p ?o }", repository)
end

A minimal SPARQL endpoint can be described as follows:

# Sinatra example
#
# Call as http://localhost:4567/sparql?query=uri,
# where `uri` is the URI of a SPARQL query, or
# a URI-escaped SPARQL query, for example:
#   http://localhost:4567/?query=SELECT%20?s%20?p%20?o%20WHERE%20%7B?s%20?p%20?o%7D
require 'sinatra'
require 'sinatra/sparql'
require 'uri'

get '/' do
  settings.sparql_options.merge!(:standard_prefixes => true)
  repository = RDF::Repository.new do |graph|
    graph << [RDF::Node.new, RDF::DC.title, "Hello, world!"]
  end
  if params["query"]
    query = query.to_s =~ /^\w:/ ? RDF::Util::File.open_file(params["query"]) : :URI.decode(params["query"].to_s)
    SPARQL.execute(query, repository)
  else
    service_description(:repo => repository)
  end
end

This can be run using ruby -rubygems example.rb, or with rackup or shotgun as rackup example.rb

To load a complete to the query repository, or a full dataset including multiple context, load the repository as follows:

repository = RDF::Repository.load("http://path-to-repo")

This will incur a large startup time for each request, but you can also use a persistent store such as rdf-mongo:

repository = RDF::Mongo::Repository.new()

This will instantiate a persistent MongoDB store, which can be initialized one time using RDF::Mongo::Repository.load. Subsequent instantiations will use the persistent storage, and have better query performance for larger datasets.

For a more complete implementation, see the RDF Distiller running at http://rdf.greggkellogg.net/sparql and freely available to download and modify for your own purposes.

Follow up questions to public-rdf-ruby.

Published on Thu, 09 Feb 2012 18:59:00 GMT under , .

Ruby and the Semantic Web

This evening, I gave a talk on using Ruby RDF.rb and assorted gems at the Lotico San Francisco Semantic Meetup. I’ve uploaded slides to Slide Share.

I also showed a simple demo using the GitHub API to create FOAF and DOAP records for accounts and repositories, and to do some simple navigation. The demo is running at http://greggkellogg.net/github-lod, and source is (of course) available on GitHub.

The demo is not intended to be a complete application, but it shows some basic capabilities [Ruby LinkedData][(http://rubygems.org/gems/linkeddata) for generating RDF in a variety of formats from Active Record models (which cache the GitHub API calls). The Web-pages are, of course, marked up with RDFa, and you can use content-negotiation, or append an appropriate extension to the URLs, to retrieve the data in alternative RDF formats.

Published on Wed, 07 Dec 2011 06:21:00 GMT under , . Tags , , ,

SPARQL 1.0 for Ruby

I’ve just released version 0.0.2 of the Ruby sparql gem. This version is based on earlier work by Pius and Arto and incorporates from SPARQL Grammar and SPARQL Algebra. Further documentation is available here.

This gem integrates with RDF.rb and uses rdf-xsd to provide additional literal semantics.

Why release SPARQL for Ruby? Probably not because of the killer performance, at least right now. However, I believe it’s important that Ruby have a complete tool chain for manipulating Linked Data (including RDF and SPARQL), and this was the remaining piece.

In spite of the 0.0.2 release number, is is a fully functioning implementation of SPARQL 1.0 semantics and passes all the DAWG data-r2 test cases. The gem makes use of RDF::Query to perform basic BGP operations on RDF::Queryable objects (such as RDF::Repository). The gem has some support for query optimization, but this remains largely unimplemented and will be addressed in future releases. I’d also like to support SPARQL 1.1 queries and udpates at some point.

This is a pure Ruby implementation and does not directly rely on any native libraries (although, some RDF readers such as RDFa and RDF/XML presently do).

The basic strategy is to parse SPARQL and transform it into an S-Expression-based algebra, pretty close to that used by Jena ARQ (SPARQL S-Expressions, or SSE). This allows SSE to be used directly for performing queries, or to parse SPARQL grammar to SSE.

The linkeddata gem has also been updated to have a soft reference to SPARQL, in addition to new processors for RDF::Turtle, JSON::LD, and RDF::Microdata.

The gem is tested on Ruby 1.8.7, 1.9.2 and JRuby. (JRuby has some spec issues, probably due to Nokogiri differences)

Many thanks to Pius Uzamere and helping to make this release happen, and to Arto Bendiken for the work in RDF.rb, SPARQL::Algebra and SPARQL::Grammar that preceded this.

Published on Mon, 03 Oct 2011 16:34:00 GMT under , .

RDF.rb 0.3.4 released

After several months of gathering updates for RDF.rb, we’ve released version 0.3.4 with several new features:

  • Update to BGP query model to support SPARQL semantics,
  • Expandable Litereal support, to allow further implementation of XSD datatypes outside of RDF.rb (see RDF::XSD gem),
  • More advanced content type detection to allow better selection of the appropriate reader from those available on the client. (Includes selecting among HTML types, such as Microdata and RDFa)
  • Improved CLI with the rdf executable providing access to all loaded readers and writers for cross-language serialization and deserialization.</http:>

As an example of format detection, consider the following:

require 'linkeddata'
RDF::Graph.load("http://greggkellogg.net/foaf.ttl")

should load Turtle or N3 readers if installed. This becomes more important for ambiguous file types, such as HTML, which could be either RDFa or Microdata, and application/xml, which could be TriX, RDF/XML or even RDFa.

See documentation for more specifics on this version of RDF.rb. Note that I’ve attempted to incorporate suggestions for improving the documentation.

Most of the reader/writer gems have been updated to match this release, in particular JSON::LD, RDF::Microdata, RDF::N3, RDF::RDFa, RDF::RDFXML, and RDF::Turtle. A future update to the linkeddata gem should reference the latest versions of each, but a simple gem update will work too.

There is a slight semantic change for repositories to support SPARQL: a context of false should not match a variable context. This is straight out of SPARQL semantics. Repository implementors who have provided custom implementations of #query_pattern should check behavior against rdf-spec version 0.3.4 to verify correct operation.

Next up is a release of SPARQL implemented in pure Ruby. This gem provides full support for SPARQL 1.0 queries.

Published on Thu, 29 Sep 2011 00:00:00 GMT under , .

Powered by Typo – Thème Frédéric de Villamil | Photo Glenn