Do current CKAN instances in the field support the JSON-LD DCAT format? - json

I'm working on a graduate course project to develop a query client for CKAN and DCAT catalogs. I've read a lot of documentation and specs, yet a lot of things seem to still be proposals so I figured I needed to reach out to ask someone who knows.
The Project Open Data site discusses the DCAT format to be a JSON-LD based format with a particular schema. The schema makes sense but there is a lot of push in my class around targeting US federal government data from data.gov, which runs CKAN (as many of these data sharing systems do according to my research). Everywhere I'm looking, people are suggesting that CKAN supports DCAT, but I'm just not finding that.
For instance, http://catalog.data.gov/api/3/action/package_show?id=national-stock-number-extract shows a completely different JSON format. It appears to have values that could be used to translate to a JSON-LD DCAT object.
The following properties are in the DCAT schema, but most of the document doesn't conform. It just looks like this is something of a translation to JSON-LD DCAT.
{
key: "bureauCode",
value: [
"007:15"
]
},
{
key: "accrualPeriodicity",
value: "R/PT1S"
},
{
key: "spatial",
value: "National and International"
}
Then I came across this page which shows the expected format I'm looking for, but it says that it's a proposal. Is this still accurate? In the case of data.org, I can simply append .rdf to the end of a dataset URI (one of the features the proposal mentions) and it produces an RDF XML document using DCAT vocabulary. But the same data set accessed via the CKAN API doesn't provide the same functionality.
For instance.
http://catalog.data.gov/dataset/housing-affordability-data-system-hads -> page
http://catalog.data.gov/dataset/housing-affordability-data-system-hads.rdf -> rdf xml
http://catalog.data.gov/api/3/action/package_show?id=housing-affordability-data-system-hads -> CKAN's JSON format
http://catalog.data.gov/api/3/action/package_show?id=housing-affordability-data-system-hads.rdf -> NOT FOUND
So what is the deal exactly? I see that the plugin for DCAT is in development, but has it just not been finished and integrated into CKAN for production?

Support for DCAT is not part of CKAN core, there is however the ckanext-dcat extensions. It is currently still "work in progress", so it's not yet finished.
If you have specific needs that are not yet implemented, you might want to fork the repo and add those features.
I know that the Swedish portal Öppnadata.se uses the ckanext-sweden, which customizes ckanext-dcat to some extend.
The specification that you found really seems outdated, but I couldn't find anything better myself. And I guess it's also the basis for the ckanext-dcat extension.
All that said, this is not first-hand information. I will soon start developing a DCAT based catalogue, and actually tried to answer the questions you posed some time ago. My answer above reflects what I found out until now :)

I think you're mixing up a few things. DCAT is an RDF-vocabulary defined by W3C, this means it is standardised way to describe open data using RDF. RDF is a data model, which has different formats: rdf+xml, turtle, n3, json-ld,... This means I can represent the same information in both JSON or XML.
Like Odi mentioned, CKAN does not support DCAT out of the box, it needs to be installed as a plugin.
Coming to your question now. The api link you mentioned is just that, an api for CKAN. It has nothing to do with DCAT. The information revealed by the API is similar to DCAT, because they both describe the information of the datasets. The easiest way to find what is available by the CKAN instance is to look for a link in the html source of a dataset page.
Example taken from the online demo which links to the turtle DCAT feed: <link rel="alternate" type="text/ttl" href="http://demo.ckan.org/dataset/a83cf982-723f-4859-8c1c-0518f9fd1600.ttl"/>
JSON isn't a popular format for exposing DCAT, but you should be able to find RDF libraries that can read the other formats.

Related

What is the difference between MessagePack, Protobuf and JSON ? Can anyone tell me which one to use when

I need to understand the difference between
- message pack
- protocol buffers
- JSON
Without having jumped in deeply into the matter I'd say the following:
All three are data formats that help you serialize information in a structured form so you can easily exchange it between software components (for example client and server).
While I'm not too familiar with the other two, JSON is currently a quasi-standard due to the fact that it is practically built into JavaScript - it's not a coincidence it is called JavaScript Object Notation. The other two seem to require additional libraries on both ends to create the required format.
So when to use which? Use JSON for REST services, for example if you want to publish your API or need different clients to access it. JSON seems to have the broadest acceptance.

Can CouchDB natively convert plain text to json format?

I'm aware that there are python and powershell methods to convert plain text files, csv's etc.... into json format for upload into NoSQL DBs such as CouchDB.
But according to the CouchDB definitive guide, it makes it seems like there is a native built in way of doing this kind of conversion, without the need for a 3rd party tool.
This older thread appears to hint at this:
Filter and update functions in CouchDB?
This part in particular:
There are other design document functions that are being introduced at the >time of this writing, including _update and _filter that we aren’t covering in >depth here. Filter functions are covered in Chapter 20, Change Notifications. >Imagine a web service that POSTs an XML blob at a URL of your choosing when >particular events occur. PayPal’s instant payment notification is one of >these. With an _update handler, you can POST these directly in CouchDB and it >can parse the XML into a JSON document and save it. The same goes for CSV, >multi-part form, or any other format.
But when I dig deeper I don't find anything concrete.
The supporting wiki link is not clear to me (a beginner with json/NoSQL/curl stuff: http://wiki.apache.org/couchdb/Document_Update_Handlers
Hopefully this is a simple yes/no. And any links to help on this that is better than the above link also appreciated, thank you!
CouchDB supports transforming the internal documents/views into many other formats through the use of show and list functions. It's not a "native" transformation, as you define the transformation yourself, it's not magical.
That being said, there is not a similar mechanism for the reverse (ie: converting some arbitrary format into JSON documents) but you're much better off scripting that with a full-featured language/script and using the bulk docs API to do your imports in batch.

What alternatives are there for creating a REST-full web service API based on JSON?

We're creating a web service and we'd like 2 things:
- to be JSON based
- to be REST-full - how much so, we haven't decided
We've already implemented custom APIs but now we'd like to follow some standards, since at some point it gets a little crazy to remember all the rules, all the exceptions, and all the undocumented parts that the creator also forgot.
Are any of you using some standards that you've found useful? Or At least, what are some alternatives?
So far I know of jsonapi and HAL.
These don't seem to be good enough though, since what we'd optimaly like is to be able to:
+ define, expose and update entities and relations between them
+ define, expose and invoke operations
+ small numbers of requests are preferable, at least where it "makes sense" (i'll leave that as a blank check)
[EDIT]
Apparently, there's OData too: http://www.odata.org/
Are any of you using some standards that you've found useful? Or At least, what are some alternatives?
Between your own question and the comments most of the big names have been mentioned. I just like to also add JSON Hyper Schema:
"JSON Schema is a JSON based format for defining the structure of JSON data. This document specifies hyperlink- and hypermedia-related keywords of JSON Schema."
http://json-schema.org/latest/json-schema-hypermedia.html
It's an extension to JSON schema and fulfils a very similar role to the others mentioned above.
I've been using json-hal for a while and like it a lot, but I'm increasingly drawn to the JSON Schema family of schemas which also handle data model definition and validation. These schemas are also the basis of the excellent Swagger REST API standard:
http://swagger.io/specification/
Hope this helps.

What techniques are available for programatically transforming HTML/DOM in an iOS Application?

I'm processing a variety of RSS feeds, which contain summaries, as well as the target page URL content, and trying to use a uniform transformation method.
XSLT was the first thing that occurred to me to try, as it would accomplish what I want, in a standard way, without a lot of fuss aside from adding new XSLT stylesheets to accommodate uniquely formatted sites and feed content.
Problem: XSLT libraries are considered "private" in iOS, and even linking statically against your own copy will get you rejected by the Apple Store analysis tools.
I've looked into the possibility if injecting the stylesheet and data into a UIWebView that wasn't displayed, but this seems like a really roundabout and hackish way to get at the system's underlying XSLT processor in an "approved" fashion.
What alternative techniques/libraries exist which would let me do this in a standard fashion, ie: without rolling my own.
I'm not sure I fully understand your requirements, but one possbility would be to use libxml (which is allowed in iOS) to parse the XML and if necessary manipulate the DOM. If you really need to do XML transformations this is going to be more effort than XSLT, but if you just need to extract data from the XML, that can be done fairly easily with xpath queries.
That said, I have read several people claiming they got XSLT working on iOS and had their apps approved in the app store. In particular, I've seen this stackoverflow answer claimed as a working solution by multiple people. And if that fails, another answer suggested building the libxslt library yourself with renamed symbols to bypass the app store checks. I would only suggest that as a last resort though.
You'll probably want to look into Hpple for something powerful but light weight / native. See the tutorial on getting started here: http://www.raywenderlich.com/14172/how-to-parse-html-on-ios. Good luck!
I'm going to also recommend TFHpple but I'm also going to elaborate on the solution. I've explored an app that navigates a 3rd party (well, I'm the 3rd party, they're the source but that's semantics) website/data source but there are some pitfalls. The biggest pitfall is obvious: if the data source DOM changes you need to change your app and re-release. A creative way around this would be to publish/expose a global copy of the DOM on a public server that way the end user doesn't have to update their app any time the data source changes (as long as the change isn't radical).
For instance, if your expected DOM search in TFHpple is #"//figure[#class='figure']/a" and then a week from now your data source's resource you're looking for is altered to #"//figure1[#class='figure1']/a" you just opened yourself to an App Store release... UNLESS... you publish the expected DOM searches on a web server you control in a data dictionary that your app can consume and serve out to the various DOM search elements within your app. The only problem I foresee here is that if the data source adds or removes a data element you want to consume you either have to release a build or handle the removal ahead of time (respectively).
Lastly if the data source DOM isn't well formed or consistent you may be beating your head against a wall more times than not.

IDL for JSON REST/RPC interface

We are designing a fairly complex REST API, in which most of the I/O are JSON encoded objects with a specific structure. One challenge we have found is to document the API in such a way that makes it easier for clients to post correct input and process output. Because the data of both the input and output requires fairly complex JSON objects, client developers often introduce bugs related to the structure of the I/O objects.
With all of the JSON web API's these days, I would have hoped for a general solution, but I am having a hard time finding one. I looked into json-schema which is a json-validation schema but both the IETF draft and implementations seem to be fairly immature (even though they have been around for a while, which is not a good sign).
A slightly different approach is offered by Protocol Buffers and Apache Avro, where the schema is not used for validation, but actually required for the encoding/decoding of the message. Of these 2, Avro seems to have rather limited documentation and implementations. ProtoBuf seems better, but I am not sure if this is really suitable to use in the browser to call a JSON api?
Now I am starting to doubt if I am looking at this from the right angle. Are there other methods available to make my API a bit more strong-typed'ish? Or is a formal description of a JSON REST/RPC API something that defeats the purpose of using JSON?
Edit: 6 months after this topic we found mongoose, which is very close to what we were lookin for.
Below a reply I received by email from Douglas Crockford.
I am not a believer in schemas as an alternative to input validation.
There are properties that cannot be verified from the syntax. I think
that was one of the ways that XML went wrong.
If your formats are too complex, then I would look at simplifying
them.
Such systems exist and I'm the author of one of them. It is called Piqi-RPC and it does IDL-based validation of the input and output parameters for RPC-style APIs over HTTP.
It supports JSON, XML and Google Protocol Buffers as data representation formats for input and output of HTTP POST requests. Clients can choose to use any of the three formats and specify their choice using the standard Accept and Content-Type HTTP headers.
So, yes, in theory, you are looking in the right direction. However, at the moment, Piqi-RPC supports writing servers only in Erlang and it wouldn't be very useful for you if you use a different stack. I heard that Apache Thrift also supports JSON over HTTP transport, but I haven't checked. Another kind of similar system I know of (also for Erlang) is called UBF. I have heard of libraries for Java that can parse and validate JSON based on Protocol Buffers specification (e.g. http://code.google.com/p/protostuff/).
The idea itself is far from being new, but there aren't many systems that approach it in practice. It is a challenging problem.
Historically, IDLs were used for interface definition and binary data serialization and not so much for validating dynamic data interchange formats (e.g. XML and JSON) which emerged later. Sun-RPC IDL and CORBA IDL fall in the first category. WSDL would be one of few examples covering both areas, but it is a terrible piece of technology and it would be a bad choice for most modern systems. In addition, there are many schema languages (also known as DDLs -- data definition languages), most of which are highly specialized and work with only one representation format, e.g. XML or JSON schemas. Few of those have stable implementations.
The Piqi project and Piqi-RPC, which is based on it, are build around several fairly simple realizations:
DLL doesn't have to be explicitly tied to any particular data representation format or built around it. Instead, such language can be fairly universal and cover wide range of practical use-cases (e.g. cross-language data serialization and data validation) and data formats (e.g. JSON, XML, Protocol Buffers).
IDL for RPC-style communication can be implemented as a thin, mostly syntactic layer on top of the universal DDL.
Such IDL and interface specifications can be transport agnostic.
Speaking of REST-style APIs over HTTP compared to RPC-style APIs over HTTP.
With RPC-style APIs, service developer or an automated system have to validate three things: function name (according to some service naming scheme), input and, if you choose so, output.
In case of REST-style APIs, people get themselves in trouble for no good reason. Now, they have a lot more stuff to validate: arbitrarily complex URL syntax, including dynamic parameters encoded in URL segments (for all HTTP methods) and URL query string (only for HTTP GET method), HTTP method correspondence (whether it should be GET, POST, PUT, DELETE, etc.), HTTP body when some parameters go there (sometimes they do it manually twice for parameters represented in JSON and XML), custom HTTP headers, and separately -- service documentation. Imagine an IDL supporting all that!
XML is better for RESTful services in many ways. It has native linking (<link href=, for all those HATEOAS fans), native language support (lang="en") and a great ecosystem.
It is also better for future proofing and future API refactorings. Converting this:
<profile>
<username>alganet</username>
</profile>
To support more usernames:
<profile>
<username>alganet</username>
<username>alexandre</username>
</profile>
Is much more simpler to do without breaking existing clients using XML. JSON is hard on that.
If you really need JSON, JSON-Schema is the way to go. It's immature, but I don't know anything better on that case. Maybe your consumers could choose between XML and JSON, so they can choose between a small payload (JSON) or RESTful candies (XML) using Content Negotiation.
I'd say the answer to your last question is yes. If you need a way to constrain and document the JSON "schema", why didn't you go with XML in the first place? It is not that much harder to parse, and being able to enforce a schema for it is a great advantage.