converting csv file to rdf using tarql shows empty results - csv

I m using tarql to convert a csv file to rdf the command runs correctly but i can't find the output (nothing is shown in the windows cmd line and no file is generated)
I m using tarql with windows with the following cmd
C:\tarql-master\target\appassembler\bin\tarql.bat --ntriples xx.rq xx.csv
here is my code
PREFIX dc: <http://dcontology/a#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
CONSTRUCT {
?URI owl:class dc:dataset;
dc:identifier ?identifier;
dc:title ?title;
dc:description ?description;
dc:category ?category;
dc:keywords ?keywords;
dc:PublicationDate ?PublicationDate;
dc:UpdateDate ?UpdateDate;
dc:frequencyofupdate ?frequencyofupdate;
dc:Format ?Format;
dc:License ?license
}
FROM <file:Metabase.csv>
WHERE {
BIND (URI(CONCAT('http://dcontology/dataset/', ?identifier)) AS ?URI)
BIND (xsd:integer(?identifier) AS ?identifier)
BIND (xsd:string(?title) AS ?title)
BIND (xsd:string(?description) AS ?description)
BIND (xsd:string(?category) AS ?category)
BIND (xsd:string(?keywords) AS ?keywords)
BIND (xsd:string(?PublicationDate) AS ?PublicationDate)
BIND (xsd:string(?UpdateDate) AS ?UpdateDate)
BIND (xsd:string(?FrequencyOfUpdate) AS ?FrequencyOfUpdate)
BIND (xsd:string(?format) AS ?format)
BIND (xsd:string(?license) AS ?license)
}`
and here is the csv file header`enter image description here

Case sensitivity
The matching of CSV column names and SPARQL variables is case-sensitive.
As you have, for example, "Description" as CSV column name, you need ?Description in the WHERE block of your SPARQL query.
PREFIX dc: <http://dcontology/a#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX owl: <http://owlontology/a#> # see bottom of this answer
CONSTRUCT {
?URI
owl:class dc:dataset ;
dc:identifier ?identifier_int ;
dc:title ?title_str ;
dc:description ?description_str ;
dc:category ?category_str ;
dc:keywords ?keywords_str ;
dc:PublicationDate ?publicationDate_str ;
dc:UpdateDate ?updateDate_str ;
dc:frequencyofupdate ?frequencyOfUpdate_str ;
dc:Format ?format_str ;
dc:License ?license_str .
}
#FROM <file:Metabase.csv>
WHERE {
BIND ( URI(CONCAT( 'http://dcontology/dataset/', ?Identifier )) AS ?URI ) .
BIND ( xsd:integer(?Identifier) AS ?identifier_int ) .
BIND ( xsd:string(?Title) AS ?title_str ) .
BIND ( xsd:string(?Description) AS ?description_str ) .
BIND ( xsd:string(?Category) AS ?category_str ) .
BIND ( xsd:string(?Keywords) AS ?keywords_str ) .
BIND ( xsd:string(?PublicationDate) AS ?publicationDate_str ) .
BIND ( xsd:string(?UpdateDate) AS ?updateDate_str ) .
BIND ( xsd:string(?FrequencyOfUpdate) AS ?frequencyOfUpdate_str ) .
BIND ( xsd:string(?Format) AS ?format_str ) .
BIND ( xsd:string(?License) AS ?license_str ) .
}
Error
If tarql doesn’t find anything to convert (e.g., because the correct column names are not included in the query), it gives no output instead of an error.
Your query, however, should give an error, because you don’t have the owl prefix defined:
Error parsing SPARQL query: Line 5, column 8: Unresolved prefixed name: owl:class
Vocabularies
If you meant to use the OWL ontology, note that there is no owl:class property defined. If you want to say that the ?URI entity belongs to a class, you can use the rdf:type property (or shorthand: a).
dc is the common prefix for DCMI Metadata Terms, which doesn’t define dc:category, dc:keywords, dc:PublicationDate, dc:UpdateDate, or dc:frequencyofupdate.
dc:Format and dc:License only exist as lowercase variants, and dc:dataset only exists as uppercase variant (by convention, lowercase terms refer to properties, and uppercase terms refer to classes).
In case you don’t use the DCMI Metadata Terms vocabulary, it’s a good practice to use a different prefix than dc, because it’s such a well-known vocabulary/prefix.

Related

How to create a SHACL rule to infer rdf:type from rdfs:subClassOf

In order to validate my RDF graph against my SHACL validation shapes V, I want to infer some triples to keep my shapes simple. In particular, one of the rule I need to implement is (in pseudo code):
(?s, rdf:type, :X) <-- (?s, rdfs:subClassOf, :Y)
I was trying several implementations, ending up with this triple rule (and its variants):
#prefix sh: <http://www.w3.org/ns/shacl#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix : <http://example.com/ex#> .
:s
a sh:NodeShape ;
sh:targetClass rdfs:Resource ;
sh:rule [
a sh:TripleRule ;
sh:subject sh:this ;
sh:predicate rdf:type ;
sh:object :X ;
sh:condition [ sh:property [ sh:path rdfs:subClassOf ;
sh:hasValue :Y ] ]
] .
However the rule does not infer :A rdf:type :X . for data graph
:A rdfs:subClassOf :Y .
(Executing against https://github.com/TopQuadrant/shacl). It is possible to solve this issue with a SPARQL rule, so my question is whether there is an option to do it through Triple rule as well. Thanks for hints!
Why don't you keep the inference rules and the validation separate, as you've noted is possible using SHACL + SPARQL, as this will keep things simpler?
You could use pySHACL and put rules into an ontology file since pySHACL can run ontology rules/inference before applying SHACL validators (see the -i and -e options).
Given the "MAY" in following quote, the advice in previous answer by #NicholasCar is solid IMO.
Purpose of answering here, is just to corroborate and expand with recent experience.
The 2017 W3C SHACL docs regarding Relationship between SHACL and RDFS inferencing:
SHACL implementations MAY, but are not required to, support entailment
regimes. If a shapes graph contains any triple with the predicate
sh:entailment and object E and the SHACL processor does not support E
as an entailment regime for the given data graph then the processor
MUST signal a failure.
(AFAICT the phrase "entailment regime" only refers to SPARQL as standard)
Looking at the section on Property Paths:
SPARQL Property path: rdf:type/rdfs:subClassOf*
SHACL Property path: (rdf:type [ sh:zeroOrMorePath rdfs:subClassOf ] )
In most of the SHACL implementations I've played with basic rdfs type entailment works (obv IFF the rdf:type/rdfs:subClassOf* path is visible to the SHACL validator), so (rdf:type [ sh:zeroOrMorePath rdfs:subClassOf ]) isn't needed explicitly.
The problem comes when you try to stuff advanced paths into the shapes - e.g. following this example to enforce graph contains at least one instance of an abstract type:
sh:path [ sh:inversePath ( rdf:type [ sh:zeroOrMorePath rdfs:subClassOf ] ) ] ;
... isn't working for me in a number of SHACL validation implementations.

LISP: how to properly encode a slash ("/") with cl-json?

I have code that uses the cl-json library to add a line, {"main" : "build/electron.js"} to a package.json file:
(let ((package-json-pathname (merge-pathnames *app-pathname* "package.json")))
(let
((new-json (with-open-file (package-json package-json-pathname :direction :input :if-does-not-exist :error)
(let ((decoded-package (json:decode-json package-json)))
(let ((main-entry (assoc :main decoded-package)))
(if (null main-entry)
(push '(:main . "build/electron.js") decoded-package)
(setf (cdr main-entry) "build/electron.js"))
decoded-package)))))
(with-open-file (package-json package-json-pathname :direction :output :if-exists :supersede)
(json:encode-json new-json package-json))
)
)
The code works, but the result has an escaped slash:
"main":"build\/electron.js"
I'm sure this is a simple thing, but no matter which inputs I try -- "//", "/", "#//" -- I still get the escaped slash.
How do I just get a normal slash in my output?
Also, I'm not sure if there's a trivial way for me to get pretty-printed output, or if I need to write a function that does this; right now the output prints the entire package.json file to a single line.
Special characters
The JSON Spec indicates that "Any character may be escaped.", but some of them MUST be escaped: "quotation mark, reverse solidus, and the control characters". The linked section is followed by a grammar that show "solidus" (/) in the list of escaped characters. I don't think it is really important in practice (typically it needs not be escaped), but that may explain why the library escapes this character.
How to avoid escaping
cl-json relies on an internal list of escaped characters named +json-lisp-escaped-chars+, namely:
(defparameter +json-lisp-escaped-chars+
'((#\" . #\")
(#\\ . #\\)
(#\/ . #\/)
(#\b . #\Backspace)
(#\f . #\)
(#\n . #\Newline)
(#\r . #\Return)
(#\t . #\Tab)
(#\u . (4 . 16)))
"Mapping between JSON String escape sequences and Lisp chars.")
The symbol is not exported, but you can still refer to it externally with ::. You can dynamically rebind the parameter around the code that needs to use a different list of escaped characters; for example, you can do as follows:
(let ((cl-json::+json-lisp-escaped-chars+
(remove #\/ cl-json::+json-lisp-escaped-chars+ :key #'car)))
(cl-json:encode-json-plist '("x" "1/5")))
This prints:
{"x":"1/5"}

SPARQL construct/insert query and blank nodes

I'm trying to create a SPARQL query to construct or insert graphs, following the BIBFRAME 2.0 Model, using a personal database with a lot of datas. I want to get a result like this:
Subject a bf:Topic, madsrdf:ComplexSubject ;
rdfs:label "Subject" ;
madsrdf:componentList [ a madsrdf:Topic ;
madsrdf:authoritativeLabel "FirstSubject" ] ;
But I do not know how to do it in SPARQL. I tryed with this query, but I always get a lot of blank nodes (as much as registers with empty "?Subject" fields I have in my database):
PREFIX bf: <http://id.loc.gov/ontologies/bibframe/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix madsrdf: <http://www.loc.gov/mads/rdf/v1#>
CONSTRUCT{
?subject a bf:Topic, madsrdf:ComplexSubject ;
rdfs:label ?subject;
madsrdf:componentList [ a madsrdf:Topic ;
madsrdf:authoritativeLabel ?firstsubject ];
} where{ service <http://localhost:.......> {
?registerRow a <urn:Row> ;
OPTIONAL{?registerRow <urn:col:Subject> ?subject ;}
OPTIONAL{?registerRow <urn:col:FirstSubject> ?firstsubject ;}
}
}
#Wences, AKSW answered you, please read more carefully.
You don't use ?registerRow in the CONSTRUCT part, that's why it is executed once for each row.

postgresql json to columns error Character with value must be escaped

I try to load some data from a table containing json rows.
There is one field that can contain special chars as \t and \r, and I want to keep them as is in the new table.
Here is my file:
{"text_sample": "this is a\tsimple test", "number_sample": 4}
Here is what I do:
Drop table if exists temp_json;
Drop table if exists test;
create temporary table temp_json (values text);
copy temp_json from '/path/to/file';
create table test as (select
(values->>'text_sample') as text_sample,
(values->>'number_sample') as number_sample
from (
select replace(values,'\','\\')::json as values
from temp_json
) a);
I keep getting this error:
ERROR: invalid input syntax for type json
DETAIL: Character with value 0x09 must be escaped.
CONTEXT: JSON data, line 1: ...g] Objection to PDDRP Mediation (was Re: Call for...
How do I need to escape those characters?
Thanks a lot
Copy the file as csv with a different quoting character and delimiter:
drop table if exists test;
create table test (values jsonb);
\copy test from '/path/to/file.csv' with (format csv, quote '|', delimiter ';');
select values ->> 'text_sample', values ->> 'number_sample'
from test;
?column? | ?column?
-----------------------------+----------
this is a simple test | 4
As mentioned in Andrew Dunstan's PostgreSQL and Technical blog
In text mode, COPY will be simply defeated by the presence of a backslash in the JSON. So, for example, any field that contains an embedded double quote mark, or an embedded newline, or anything else that needs escaping according to the JSON spec, will cause failure. And in text mode you have very little control over how it works - you can't, for example, specify a different ESCAPE character. So text mode simply won't work.
so we have to turn around to the CSV format mode.
copy the_table(jsonfield)
from '/path/to/jsondata'
csv quote e'\x01' delimiter e'\x02';
In the official document sql-copy, some Parameters list here:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
[ WHERE condition ]
where option can be one of:
FORMAT format_name
FREEZE [ boolean ]
DELIMITER 'delimiter_character'
NULL 'null_string'
HEADER [ boolean ]
QUOTE 'quote_character'
ESCAPE 'escape_character'
FORCE_QUOTE { ( column_name [, ...] ) | * }
FORCE_NOT_NULL ( column_name [, ...] )
FORCE_NULL ( column_name [, ...] )
ENCODING 'encoding_name'
FORMAT
Selects the data format to be read or written: text, csv (Comma Separated Values), or binary. The default is text.
QUOTE
Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.
DELIMITER
Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format, a comma in CSV format. This must be a single one-byte character. This option is not allowed when using binary format.
NULL
Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
HEADER
Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.
cast json as text, instead of getting text value from json. Eg:
t=# with j as (
select '{"text_sample": "this is a\tsimple test", "number_sample": 4}'::json v
)
select v->>'text_sample' your, (v->'text_sample')::text better
from j;
your | better
-----------------------------+--------------------------
this is a simple test | "this is a\tsimple test"
(1 row)
and to avoid 0x09 error, try using
replace(values,chr(9),'\t')
as in your example you replace backslash+t, not the actual chr(9)...

How do I get yason:encode-alist to return the encoded string instead of sending it to a stream?

I'm trying to encode a JSON string from an alist using YASON. The problem is, the return value I'm getting is the original alist I fed it. It's printing the JSON string, and according to the documentation, it goes to *STANDARD-OUTPUT*.
Simple example session:
(ql:quickload :yason)
To load "yason":
Load 1 ASDF system:
yason
; Loading "yason"
(:YASON)
* (defparameter starving-json-eater (yason:encode-alist '(("foo" . "bar") ("baz" . "qux"))))
{"foo":"bar","baz":"qux"}
STARVING-JSON-EATER
* starving-json-eater
(("foo" . "bar") ("baz" . "qux"))
I've tried passing 'starving-json-eater into the stream parameter, but I get an error:
* (setf starving-json-eater (yason:encode-alist '(("foo" . "bar") ("baz" . "qux")) 'starving-json-eater))
debugger invoked on a SIMPLE-ERROR in thread
#<THREAD "main thread" RUNNING {1001E06783}>:
There is no applicable method for the generic function
#<STANDARD-GENERIC-FUNCTION SB-GRAY:STREAM-WRITE-CHAR (1)>
when called with arguments
(STARVING-JSON-EATER #\{).
Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name):
0: [RETRY] Retry calling the generic function.
1: [ABORT] Exit debugger, returning to top level.
((:METHOD NO-APPLICABLE-METHOD (T)) #<STANDARD-GENERIC-FUNCTION SB-GRAY:STREAM-WRITE-CHAR (1)> STARVING-JSON-EATER #\{) [fast-method]
How can I get {"foo":"bar","baz":"qux"} into starving-json-eater?
You can use WITH-OUTPUT-TO-STRING to temporarily bind a variable to an open stream which writes into a string. You may even bind the special variable *standard-output* so that you only change the dynamic context of your code without providing explicitly a different stream argument (like when you redirect streams with processes).
(with-output-to-string (*standard-output*)
(yason:encode-alist '(("a" . "b"))))
Note that binding *standard-output* means that anything that writes to *standard-output* will end up being written in the string during the extent of with-output-to-string. In the above case, the scope is sufficiently limited to avoid unexpectedly capturing output from nested code. You could also use a lexical variable to control precisely who gets to write to the string:
(with-output-to-string (json)
(yason:encode-alist '(("a" . "b")) json))
The trick is to create a throwaway string output stream to catch the value, and then grab it from later:
* (ql:quickload :yason)
To load "yason":
Load 1 ASDF system:
yason
; Loading "yason"
(:YASON)
* (defparameter sated-json-eater (make-string-output-stream))
SATED-JSON-EATER
* (yason:encode-alist '(("foo" . "bar") ("baz" . "qux")) sated-json-eater)
(("foo" . "bar") ("baz" . "qux"))
* (defparameter json-string (get-output-stream-string sated-json-eater))
JSON-STRING
* json-string
"{\"foo\":\"bar\",\"baz\":\"qux\"}"
This can be hidden away in a function:
(defun json-string-encode-alist (alist-to-encode)
(let ((stream (make-string-output-stream)))
(yason:encode-alist alist-to-encode stream)
(get-output-stream-string stream)))