Editor
The collaborative SPARQL Builder Development group, Metadata specification team.
- Norio KOBAYASHI (ACCC, RIKEN)
- Atsuko YAMAGUCHI (DBCLS, ROIS)
- Kouji KOZAKI (Univ. Osaka)
- Kai LENZ (ACCC, RIKEN)
- Hogyan WU (DBCLS, ROIS)
Overview
SPARQL Builder Metadata Specification defines the data schema (metadata) for an RDF that describes RDF data structure in a SPARQL endpoint. The RDF metadata file is generated by a software module called “crawler” that extracts such metadata from SPARQL endpoints in advance, and used to construct a class graph including classes, properties, domains and ranges for properties by executing SPARQL queries with a low load even for the SPARQL endpoint having large data.
The metadata is also very useful to write a SPARQL query since the metadata briefly describes the corresponding RDF graph structure of SPRQL endpoint. From this point, although the metadata should be written based on standardised specifications, there is not such standardized vocabulary which supports characteristic data of SPARQL Builder including relationships between classes. Therefore, we generated a specification for the metadata description by adding original vocabulary having namespace “sbm:” to the existing specifications of SPARQL 1.1 Service description (http://www.w3.org/TR/sparql11-service-description/) and VoID (http://www.w3.org/TR/void/).
Prefixes
The prefixes used in this document are the following.
- rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
- rdfs: http://www.w3.org/2000/01/rdf-schema#
- sd: http://www.w3.org/ns/sparql-service-description#
- void: http://rdfs.org/ns/void#
- sbm: http://sparqlbuilder.org/2014/05/rdf-metadata-schema#
Metadata Schema
The metadata schema is shown in the following figure.
Using SPARQL 1.1 Service description, a dataset included in a SPARQL endpoint is described as sb:Dataset. In order to describe the detailed RDF dataset data structure, we employ property partition and class partition defined in void and introduce statistical indicators as categories for property, class and endpoint as our extension.
Statistical indicators
Property Category
This category is defined for each user property used on a SPARQL endpoint except the properties defined in RDF Schema 1.1 such as rdf:type and rdfs:subclassOf. We say class decidable in triple when the triple whose classes of subjects and objects are explicitly defined using rdfs:domain and rdfs:range, and/or can be extracted by classes of subject and object instances. The property category indicates a comprehensiveness of class decidable triples for each property. An instinctive semantics of property category is as follows:
- Property category 1 (Complete): for all triples are class decidable.
- Property category 2 (Complete by inference): for all triples are class decidable but the domain and range classes of the property are not explicitly declared.
- Property category 3 (Partial): some but not all triples are class decidable.
- Property category 4 (none): no triples are class decidable.
Class Category
This category is defined for each dataset on a SPARQL endpoint. We define junk class as a class that is not used to declare a domain or range class nor class of instance as subject or object of triple having a user property.
- Class category 1 (Complete): no junk classes exist.
- Class category 2 (Partial): some but not all classes are junk classes.
- Class category 3 (none): all classes are junk classes.
Endpoint category
This category is about coverage of triples and classes that are not junk on a SPARQL endpoint
- Endpoint category 1 (Complete): the following tow conditions are satisfied: (1) every property category of user property are property category 1 or 2. (2) the class category is 1.
- Endpoint category 3 (none): the class category is 3.
- Endpoint category 2 (partial): the endpoint category is neither 1 nor 3.
Inferred property structure as class relationships
One of the major functionality of the crawler is inference of property domain and/or range classes by extracting subject and/or object classes of triples having the property. Such subject-object classes relationship is here called class relationship. The class relationship is declared as a part of property partition with subject and classes, object datatype, numbers of triples, distinct subjects and distinct objects of the class relationship.
SPARQL Builder Matadata (sbm) vocabulary
Prefixes
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema> .
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix sbm: <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> .
Classes
<http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> a owl:Ontology ;
dc:title “The RDF Metadata Schema vocabulary for SPARQL Builder” .
sbm:ClassRelation a rdfs:Class ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “ClassRelation” ;
rdfs:comment “A Relationship between subject and object classes .” .
sbm:CrawlLog a rdfs:Class ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “CrawlLog” ;
rdfs:comment “A log of crawling on the dataset.” .
Properties
sbm:endpointCategory a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “endpolongCategory” ;
rdfs:comment “Endpolong category of the dataset.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:classCategory a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “classCategory” ;
rdfs:comment “Class category of the dataset.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:propertyCategory a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “propertyCategory” ;
rdfs:comment “Property category of the dataset.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:searchableTriples a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “searchableTriples” ;
rdfs:comment “Number of searchable triples of the dataset.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:classRelation a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “classRelation” ;
rdfs:comment “Describe an instance of ClassRelaton.” ;
rdfs:domain sd:Dataset ;
rdfs:range sbm:ClassRelation .
sbm:datatypes a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “datatypes” ;
rdfs:comment “Number of datatypes of the dataset.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:subjcetClass a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “subjectClass” ;
rdfs:comment “The class of a subject instance of ClassRelation.” ;
rdfs:domain sbm:ClassRelation ;
rdfs:range rdfs:Class .
sbm:objcetClass a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “objectClass” ;
rdfs:comment “The class of an object instance of ClassRelation.” ;
rdfs:domain sbm:ClassRelation ;
rdfs:range rdfs:Class .
sbm:objcetDatatype a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “objectDatatype” ;
rdfs:comment “The datatype of an object literal of ClassRelation.” ;
rdfs:domain sbm:ClassRelation ;
rdfs:range rdfs:Datatype .
sbm:sample a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “sample” ;
rdfs:comment “Description of sample triples of the classRelation.” ;
rdfs:domain sbm:ClassRelation ;
rdfs:range xsd:string .
sbm:subjcetClasses a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “subjectClasses” ;
rdfs:comment “Number of subject classes of the dataset.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:objcetClasses a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “objectClasses” ;
rdfs:comment “Number of object classes of the dataset.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:objcetDatatypes a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “objectDatatypes” ;
rdfs:comment “Number of object datatypes of the dataset.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:propertyCategorySubset a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “propertyCategorySubset” ;
rdfs:comment “Describe sub-datasets of properties associated with given property category.” ;
rdfs:domain sd:Dataset ;
rdfs:range sd:Dataset .
sbm:endpolongAccesses a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “endpolongAccesses” ;
rdfs:comment “Number of access during crawling over the dataset of the endpolong.” ;
rdfs:domain sd:Dataset ;
rdfs:range xsd:long .
sbm:crawlLog a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “crawlLog” ;
rdfs:comment “Describe an instance of CrawlLog.” ;
rdfs:domain sd:Dataset ;
rdfs:range sbm:CrawlLog .
sbm:crawlStartTime a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “crawlStartTime” ;
rdfs:comment “The datetime when the crawling started.” ;
rdfs:domain sbm:CrawlLog ;
rdfs:range xsd:datetime .
sbm:crawlEndTime a rdf:Property ;
rdfs:isDefinedBy <http://www.sparqlbuilder.org/2014/05/rdf-metadata-schema#> ;
rdfs:label “crawlEndTime” ;
rdfs:comment “The datetime when the crawling finished.” ;
rdfs:domain sbm:CrawlLog ;
rdfs:range xsd:datetime .
void:propertyPartition の目的語になる空白ノードのクラスは void:Dataset なので、a void:Dataset を追加しておいた方が良いかもしれません。
http://vocab.deri.ie/void#propertyPartition
ありがとうございました。
ご指摘の通り、void:Datasetを追加宣言しました。
sd:namedGraph プロパティの目的語になる空白ノードのクラスは sd:NamedGraph ですよね?
また、sbm:crawlLog プロパティの sbm:clawlStartTime / sbm:clawlEndTime は sbm:crawl… かな。
図を修正しました。ありがとうございました。
yayamamoのツールで出力する情報として実際のトリプル例がありますが、これを含めるためにsbm:sample述語を加えて頂くことは出来ますでしょうか。
rdfs:domain / rdfs:range はそれぞれsbm:ClassRelation / xsd:stringになります。
以上ご検討お願いします。
新たなプロパティ sbm:sampleを追加しました。
図とプロパティ詳細説明の部分に追記しております。
よろしくお願いいたします。
ご対応どうもありがとうございます。
山口さんにはお伝えしましたが、各クラスや述語について、rdfs:label や rdfs:comment があればそれも含めておいた方が良いと思いました。
以下のような感じでしょうか。
select ?class ?l ?c {
?i a ?class .
{
{?class rdfs:label ?l}
UNION
{?class rdfs:comment ?c}
}
}
select ?p ?l ?c {
?s ?p ?o .
{
{?p rdfs:label ?l}
UNION
{?p rdfs:comment ?c}
}
}
yayamamoのTripleDataProfilerで取得したデータは随時こちらでアクセスできます。
http://tm.dbcls.jp/tdp/
なお、エンドポイント http://tm.dbcls.jp/sbm に対しては、read-onlyとするため、GETのみ許可しています。
typoを見つけましたので、ご報告します。
objcet -> object
subjcet -> subject
endpolong -> endpoint