How to create a SHACL rule to infer rdf:type from rdfs:subClassOf - shacl

In order to validate my RDF graph against my SHACL validation shapes V, I want to infer some triples to keep my shapes simple. In particular, one of the rule I need to implement is (in pseudo code):
(?s, rdf:type, :X) <-- (?s, rdfs:subClassOf, :Y)
I was trying several implementations, ending up with this triple rule (and its variants):
#prefix sh: <http://www.w3.org/ns/shacl#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix : <http://example.com/ex#> .
:s
a sh:NodeShape ;
sh:targetClass rdfs:Resource ;
sh:rule [
a sh:TripleRule ;
sh:subject sh:this ;
sh:predicate rdf:type ;
sh:object :X ;
sh:condition [ sh:property [ sh:path rdfs:subClassOf ;
sh:hasValue :Y ] ]
] .
However the rule does not infer :A rdf:type :X . for data graph
:A rdfs:subClassOf :Y .
(Executing against https://github.com/TopQuadrant/shacl). It is possible to solve this issue with a SPARQL rule, so my question is whether there is an option to do it through Triple rule as well. Thanks for hints!

Why don't you keep the inference rules and the validation separate, as you've noted is possible using SHACL + SPARQL, as this will keep things simpler?
You could use pySHACL and put rules into an ontology file since pySHACL can run ontology rules/inference before applying SHACL validators (see the -i and -e options).

Given the "MAY" in following quote, the advice in previous answer by #NicholasCar is solid IMO.
Purpose of answering here, is just to corroborate and expand with recent experience.
The 2017 W3C SHACL docs regarding Relationship between SHACL and RDFS inferencing:
SHACL implementations MAY, but are not required to, support entailment
regimes. If a shapes graph contains any triple with the predicate
sh:entailment and object E and the SHACL processor does not support E
as an entailment regime for the given data graph then the processor
MUST signal a failure.
(AFAICT the phrase "entailment regime" only refers to SPARQL as standard)
Looking at the section on Property Paths:
SPARQL Property path: rdf:type/rdfs:subClassOf*
SHACL Property path: (rdf:type [ sh:zeroOrMorePath rdfs:subClassOf ] )
In most of the SHACL implementations I've played with basic rdfs type entailment works (obv IFF the rdf:type/rdfs:subClassOf* path is visible to the SHACL validator), so (rdf:type [ sh:zeroOrMorePath rdfs:subClassOf ]) isn't needed explicitly.
The problem comes when you try to stuff advanced paths into the shapes - e.g. following this example to enforce graph contains at least one instance of an abstract type:
sh:path [ sh:inversePath ( rdf:type [ sh:zeroOrMorePath rdfs:subClassOf ] ) ] ;
... isn't working for me in a number of SHACL validation implementations.

Related

converting csv file to rdf using tarql shows empty results

I m using tarql to convert a csv file to rdf the command runs correctly but i can't find the output (nothing is shown in the windows cmd line and no file is generated)
I m using tarql with windows with the following cmd
C:\tarql-master\target\appassembler\bin\tarql.bat --ntriples xx.rq xx.csv
here is my code
PREFIX dc: <http://dcontology/a#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
CONSTRUCT {
?URI owl:class dc:dataset;
dc:identifier ?identifier;
dc:title ?title;
dc:description ?description;
dc:category ?category;
dc:keywords ?keywords;
dc:PublicationDate ?PublicationDate;
dc:UpdateDate ?UpdateDate;
dc:frequencyofupdate ?frequencyofupdate;
dc:Format ?Format;
dc:License ?license
}
FROM <file:Metabase.csv>
WHERE {
BIND (URI(CONCAT('http://dcontology/dataset/', ?identifier)) AS ?URI)
BIND (xsd:integer(?identifier) AS ?identifier)
BIND (xsd:string(?title) AS ?title)
BIND (xsd:string(?description) AS ?description)
BIND (xsd:string(?category) AS ?category)
BIND (xsd:string(?keywords) AS ?keywords)
BIND (xsd:string(?PublicationDate) AS ?PublicationDate)
BIND (xsd:string(?UpdateDate) AS ?UpdateDate)
BIND (xsd:string(?FrequencyOfUpdate) AS ?FrequencyOfUpdate)
BIND (xsd:string(?format) AS ?format)
BIND (xsd:string(?license) AS ?license)
}`
and here is the csv file header`enter image description here
Case sensitivity
The matching of CSV column names and SPARQL variables is case-sensitive.
As you have, for example, "Description" as CSV column name, you need ?Description in the WHERE block of your SPARQL query.
PREFIX dc: <http://dcontology/a#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX owl: <http://owlontology/a#> # see bottom of this answer
CONSTRUCT {
?URI
owl:class dc:dataset ;
dc:identifier ?identifier_int ;
dc:title ?title_str ;
dc:description ?description_str ;
dc:category ?category_str ;
dc:keywords ?keywords_str ;
dc:PublicationDate ?publicationDate_str ;
dc:UpdateDate ?updateDate_str ;
dc:frequencyofupdate ?frequencyOfUpdate_str ;
dc:Format ?format_str ;
dc:License ?license_str .
}
#FROM <file:Metabase.csv>
WHERE {
BIND ( URI(CONCAT( 'http://dcontology/dataset/', ?Identifier )) AS ?URI ) .
BIND ( xsd:integer(?Identifier) AS ?identifier_int ) .
BIND ( xsd:string(?Title) AS ?title_str ) .
BIND ( xsd:string(?Description) AS ?description_str ) .
BIND ( xsd:string(?Category) AS ?category_str ) .
BIND ( xsd:string(?Keywords) AS ?keywords_str ) .
BIND ( xsd:string(?PublicationDate) AS ?publicationDate_str ) .
BIND ( xsd:string(?UpdateDate) AS ?updateDate_str ) .
BIND ( xsd:string(?FrequencyOfUpdate) AS ?frequencyOfUpdate_str ) .
BIND ( xsd:string(?Format) AS ?format_str ) .
BIND ( xsd:string(?License) AS ?license_str ) .
}
Error
If tarql doesn’t find anything to convert (e.g., because the correct column names are not included in the query), it gives no output instead of an error.
Your query, however, should give an error, because you don’t have the owl prefix defined:
Error parsing SPARQL query: Line 5, column 8: Unresolved prefixed name: owl:class
Vocabularies
If you meant to use the OWL ontology, note that there is no owl:class property defined. If you want to say that the ?URI entity belongs to a class, you can use the rdf:type property (or shorthand: a).
dc is the common prefix for DCMI Metadata Terms, which doesn’t define dc:category, dc:keywords, dc:PublicationDate, dc:UpdateDate, or dc:frequencyofupdate.
dc:Format and dc:License only exist as lowercase variants, and dc:dataset only exists as uppercase variant (by convention, lowercase terms refer to properties, and uppercase terms refer to classes).
In case you don’t use the DCMI Metadata Terms vocabulary, it’s a good practice to use a different prefix than dc, because it’s such a well-known vocabulary/prefix.

Simple way to model "inverse cardinality" in SHACL?

We want to transform a UML diagram of an ontology with cardinalities into a SHACL shape to validate if the cardinalities in our data are correct.
Let's say we have Author 1 ---first author ---> 1.n Book, the right part is quite easy to model as:
:AuthorShape a sh:NodeShape;
sh:targetClass :Author;
sh:property [sh:path :firstAuthor; sh:minCount 1].
However now I also want to model the "other end", i.e. that a book cannot have more than 1 first authors:
:FirstAuthorCardinalityOtherEndShape a sh:NodeShape;
sh:targetObjectsOf :firstAuthor;
sh:property [
sh:path [ sh:inversePath :firstAuthor ];
sh:minCount 1;
sh:maxCount 1
];
sh:nodeKind sh:IRI.
However that looks quite convoluted (8 lines instead of 3) and error prone (:firstAuthor is mentioned twice). Is there a simpler way to model this?
For example, this could be like this, but sh:inverseMinCount doesn't exist:
:AuthorShape a sh:NodeShape;
sh:targetClass :Author;
sh:property [sh:path :firstAuthor; sh:minCount 1; sh:inverseMinCount 1; sh:inverseMaxCount 1].
The issue that :firstAuthor is mentioned twice can be avoided by attaching the property to Book, e.g.
:BookShape a sh:NodeShape ;
sh:targetClass :Book ;
sh:property [
sh:path [ sh:inversePath :firstAuthor ] ;
sh:maxCount 1 ;
] .
(You already have AuthorShape, so having BookShape would be a perfectly natural thing to do).
In any case you wouldn't need the sh:minCount 1 because the sh:targetObjectsOf already implies this, although I can see why you would want this from an UML point of view.
And I don't think the design above is much more complex than the forward direction, assuming you're OK with the sh:inversePath overhead, which is unavoidable.

SPARQL construct/insert query and blank nodes

I'm trying to create a SPARQL query to construct or insert graphs, following the BIBFRAME 2.0 Model, using a personal database with a lot of datas. I want to get a result like this:
Subject a bf:Topic, madsrdf:ComplexSubject ;
rdfs:label "Subject" ;
madsrdf:componentList [ a madsrdf:Topic ;
madsrdf:authoritativeLabel "FirstSubject" ] ;
But I do not know how to do it in SPARQL. I tryed with this query, but I always get a lot of blank nodes (as much as registers with empty "?Subject" fields I have in my database):
PREFIX bf: <http://id.loc.gov/ontologies/bibframe/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix madsrdf: <http://www.loc.gov/mads/rdf/v1#>
CONSTRUCT{
?subject a bf:Topic, madsrdf:ComplexSubject ;
rdfs:label ?subject;
madsrdf:componentList [ a madsrdf:Topic ;
madsrdf:authoritativeLabel ?firstsubject ];
} where{ service <http://localhost:.......> {
?registerRow a <urn:Row> ;
OPTIONAL{?registerRow <urn:col:Subject> ?subject ;}
OPTIONAL{?registerRow <urn:col:FirstSubject> ?firstsubject ;}
}
}
#Wences, AKSW answered you, please read more carefully.
You don't use ?registerRow in the CONSTRUCT part, that's why it is executed once for each row.

How to express "all members of containers of class C must be of class M" in rdfs?

I have these triples (expressed in turtle):
:C rdf:subClassOf rdfs:Container.
:M a rdfs:Class.
How do i specify that only instances of :M can be members of :C? I looked through this, but couldn't find the answer.
You can't express this with an RDFS ontology (that is, as an RDF graph interpreted according to the RDFS entailment regime). You can't express this with an OWL DL ontology (that is, an OWL ontology interpreted according to the OWL direct semantics). However, it can be expressed with OWL Full (that is, as an RDF graph interpreted according to the OWL RDF-based semantics). In Turtle:
[
a owl:Restriction;
owl:onProperty rdfs:member;
owl:someValuesFrom :C
]
rdfs:subClassOf :M .
If you wan't to make it compatible with OWL DL, you must not use RDF containers but you can make your own class of containers:
:Container a owl:Class .
:C rdfs:subClassOf :Container .
:M a owl:Class .
:member a owl:ObjectProperty .
[
a owl:Restriction;
owl:onProperty :member;
owl:someValuesFrom :C
]
rdfs:subClassOf :M .

Sparql queries over collection and rdf:containers?

Hi all rdf/sparql developpers. Here a question that have been nagging me for a while now but it seems nobody has answered it accurately since the rdf and sparql specifications have been released.
To state the case, RDF defines several ways to deal with multi-valued properties for resources; from creating as many triples with same subjet-predicate uris to collections or containers. That's all good since each pattern has its own characteristics.
But seen from the SPARQL point-of-view, it seems to me that querying those structures leads to overly complicated queries that (that's worse) are unable to transcribe into a sensible resultset: you cannot use variables to query arbitrary-length and propertyPath does not preserve "natural" order.
In a naïve way, in many SELECT or ASK queries, if I want to query or filter on the container's or list's values, I won't most of the time care what the underlying pattern really is (if any). So for instance:
<rdf:Description rdf:about="urn:1">
<rdfs:label>
<rdf:Alt>
<rdf:li xml:lang="fr">Exemple n°1</rdf:li>
<rdf:li xml:lang="en">Example #1</rdf:li>
</rdf:Alt>
</rdfs:label>
<my:release>
<rdf:Seq>
<rdf:li>10.0</rdf:li>
<rdf:li>2.4</rdf:li>
<rdf:li>1.1.2</rdf:li>
<rdf:li>0.9</rdf:li>
</rdf:Seq>
</my:release>
</rdf:Description>
<rdf:Description rdf:about="urn:2">
<rdfs:label xml:lang="en">Example #2</rdfs:label>
</rdf:Description>
Obviously I would expect both resource to answer the query:
SELECT ?res WHERE { ?res rdfs:label ?label . FILTER ( contains(?label, 'Example'#en) }
I would also expect the query :
SELECT ?ver WHERE { <urn:1> my:release ?ver }
to return the rdf:Seq elements (or any rdf:Alt's for that matter) in original order (for the other patterns, it wouldn't matter if original order is preserved or not so why not keep it anyway ?) - unless explicitely specified through an ORDER BY clause.
Of course, it would be necessary to preserve compatibility with the old way, so perhaps a possibility would be to extend the propertyPath syntax with a new operator?
I feel it would simplify a lot the day-to-day SPARQL use-case.
Does it make sense to you?
Moreover, do you see any reason why not to try implementing this?
EDIT corrected the example's urn:2 rdfs:label value that was incorrect
I realize that this question already has an answer, but it's worth taking a look at what you can do here if you use RDF lists as opposed to the other types of RDF containers. First, the data that you've provided (after providing namespace declarations) in Turtle is:
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix my: <https://stackoverflow.com/q/16223095/1281433/> .
<urn:2> rdfs:label "Example #2"#en .
<urn:1> rdfs:label [ a rdf:Alt ;
rdf:_1 "Exemple n°1"#fr ;
rdf:_2 "Example #1"#en
] ;
my:release [ a rdf:Seq ;
rdf:_1 "10.0" ;
rdf:_2 "2.4" ;
rdf:_3 "1.1.2" ;
rdf:_4 "0.9"
] .
The properties rdf:_n are the difficulty here, since they are the only thing that provides any real order to the elements in the sequence. (The alt doesn't really have an important sequence, although it still uses rdf:_n properties.) You can get all three labels if you use a SPARQL property path that makes the rdf:_n property optional:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?x ?label where {
?x rdfs:label/(rdf:_1|rdf:_2|rdf:_3)* ?label
filter( isLiteral( ?label ))
}
------------------------------
| x | label |
==============================
| <urn:1> | "Exemple n°1"#fr |
| <urn:1> | "Example #1"#en |
| <urn:2> | "Example #2"#en |
------------------------------
Let's look at what you can do with RDF lists instead. If you use lists, then you data is this:
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix my: <https://stackoverflow.com/q/16223095/1281433/> .
<urn:2> rdfs:label "Example #2"#en .
<urn:1> rdfs:label ( "Exemple n°1"#fr "Example #1"#en ) ;
my:release ( "10.0" "2.4" "1.1.2" "0.9" ) .
Now you can get the labels relatively easily:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?x ?label where {
?x rdfs:label/(rdf:rest*/rdf:first)* ?label
filter( isLiteral( ?label ))
}
------------------------------
| x | label |
==============================
| <urn:1> | "Exemple n°1"#fr |
| <urn:1> | "Example #1"#en |
| <urn:2> | "Example #2"#en |
------------------------------
If you want the position of the labels in the list of labels, you can even get that, although it makes the query a bit more complicated:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?x ?label (count(?mid)-1 as ?position) where {
?x rdfs:label ?y .
?y rdf:rest* ?mid . ?mid rdf:rest*/rdf:first? ?label .
filter(isLiteral(?label))
}
group by ?x ?label
-----------------------------------------
| x | label | position |
=========================================
| <urn:1> | "Exemple n°1"#fr | 0 |
| <urn:1> | "Example #1"#en | 1 |
| <urn:2> | "Example #2"#en | 0 |
-----------------------------------------
This uses the technique in Is it possible to get the position of an element in an RDF Collection in SPARQL? to compute the position of each value in the list that is the object of rdfs:label, starting from 0, and assigning 0 to elements that aren't in a list.
RDF defines a vocabulary for collections and containers but they hold no special meaning in terms of how graphs containing them should be interpreted. They aren't intended for and aren't really appropriate for representing multi-valued properties.
In general, saying:
:A :predicate [ a rdf:Alt ; rdf:_1 :B ; rdf:_2 :C ] .
Is not equivalent to
:A :predicate :B , :C .
Let's say the predicate is owl:sameAs:
:A owl:sameAs [ a rdf:Alt ; rdf:_1 :B ; rdf:_2 :C ] .
The above says that :A names an individual containing :B and :C, whereas:
:A owl:sameAs :B , :C .
says that :A, :B, and :C are the same individual.
SPARQL is agnostic about containers and collections (aside from the syntactic shorthand for rdf:List). If you want a more convenient way of working with collections, many RDF APIs including Jena and rdflib have first-class representations for them.
Addendum
The way to model multi-valued properties--that is, to model that both "Example n°1"#fr and and "Example #1"#en are labels for urn:1--is to simply state the two facts:
<rdf:Description rdf:about="urn:1">
<rdfs:label xml:lang="fr">Exemple n°1</rdfs:label>
<rdfs:label xml:lang="en">Example #1</rdfs:label>
...
</rdf:Description>
And the query:
SELECT ?res WHERE { ?res rdfs:label ?label . FILTER ( contains(?label, 'Example'#en) ) }
will match on the English labels for <urn:1> and <urn:2>.
For the my:release property where you have a multi-valued property and an ordering on its values, it's a little trickier. You could define a new property (e.g) my:releases whose value is an rdf:List or rdf:Seq. my:release gives the direct relationship and my:releases an indirect relationship specifying an explicit ordering. With an inferencing store and the appropriate rule, you would only have to provide the latter. Unfortunately this doesn't make it any easier to use the ordering within SPARQL.
An approach that's easier to work with in SPARQL and non-inferencing stores would be to make the versions themselves objects with properties that define the ordering:
<rdf:Description rdf:about="urn:1">
<rdfs:label xml:lang="fr">Exemple n°1</rdfs:label>
<rdfs:label xml:lang="en">Example #1</rdfs:label>
<my:release>
<my:Release>
<dc:issued rdf:datatype="&xsd;date">2008-10-10/dc:issued>
<my:version>10.0</my:version>
</my:Release>
</my:release>
<my:release>
<my:Release>
<my:version>2.4</my:version>
<dc:issued rdf:datatype="&xsd;date">2007-05-01</dc:issued>
</my:Release>
</my:release>
...
</rdf:Description>
In the above, the date can be used to order the results as there is no explicit sequence anymore. The query is only slightly more complex:
SELECT ?ver
WHERE { <urn:1> my:release [ my:version ?ver ; dc:issued ?date ] }
ORDER BY ?date