TTL file format - I have no idea what this is - json

I have a file which has a structure, but I don't know what format it is, nor how to parse it. The file extension is ttl, but I have never encountered this before.
Some lines from the file looks like this:
<http://data.europa.eu/esco/label/790ff9ed-c43b-435c-b6b3-6a4a6e8e8326>
a skosxl:Label ;
skosxl:literalForm "gérer des opérations d’allègement"#fr .
<http://data.europa.eu/esco/label/98570af6-b237-4cdd-b555-98fe3de26ef8>
a skosxl:Label ;
esco:hasLabelRole <http://data.europa.eu/esco/label-role/neutral> , <http://data.europa.eu/esco/label-role/male> , <http://data.europa.eu/esco/label-role/female> ;
skosxl:literalForm "particleboard machine technician"#en .
<http://data.europa.eu/esco/label/aaac5531-fc8d-40d5-bfb8-fc9ba741ac21>
a skosxl:Label ;
esco:hasLabelRole "http://data.europa.eu/esco/label-role/female" , "http://data.europa.eu/esco/label-role/standard-female" ;
skosxl:literalForm "pracovnice denní péče o děti"#cs .
And it goes on like this for 400 more MB. Additional attributes are added, for some, but not all nodes.
It reminds me of some form of XML, but I don't have much experience working with different formats. It also looks like something that can be modeles as a graph.
Do you have any idea what data format it is, and how I could parse it in python?

Yes, #Phil is correct that is turtle syntax for storing RDF data.
I would suggest you import it into an RDF store of some sort rather than try and parse 400MB+ yourself. You can use GraphDB, Blazegraph, Virtuso and the list goes on. A search for RDF stores should give many other options.
Then you can use SPARQL to query the RDF store (which is like SQL for relational databases) using Python RDFlib. Here is an example from RDFLib.

That looks like turtle - a data description language for the semantic web.
The :has label and :label are specified for two different semantic libraries defined to share data (esco and skosxl there should not be much problem finding these libraries with a search engine, assuming the data is in the semantic web) . :literal form could be thought of as the value in an XML tag.
They represent ontologies in a data structure:
Subject : 10
Predicate : Name
Object : John
As for python, read the data as a file, use the subject as the keys of a dictionary, put the values in a database, its unclear what you want to do with the data.
Semantic data is open, incomplete and could have an unusual, complex structure. The example above is very simple the primer linked above may help.

Related

What's the difference between json data being encoded or not

What's the purpose (not what it becomes) of doing json_encode on this before I am putting into the database
rating: {cleanliness: 3, publicFacility: 1, roomFacility: 2, security: 2}
to become this
rating: "{"cleanliness":3,"publicFacility":1,"roomFacility":2,"security":2}"
I see no point of doing this cause I need to json_decode it again before serving it back... can anybody clear me out?
Do not store json encoded data in the database. You mitigate the whole point of a relational database this way and make searching for values an expensive task. I see in your sample the attributes cleanliness, publicFacility, roomFacility and security. Those should be columns in your database so you can search for something like "all entries with a cleanliness higher than 3".
It works with the JSON column type but it is more expensive than using normal columns.
Edit: Check the use-case for your database entry. If you are sure you never need to search in or order by the encoded attributes you can store data encoded as json string. However, if your database supports the JSON column type, you should use that one because it allows searching in the stored JSON (but is more expensive than searching in normal columns). </Edit>
Second point: The second code snipped (with the quotation marks) looks like invalid syntax for json.

JSON definition - data structure or data format?

What is more correct? Say that JSON is a data structure or a data format?
It's almost certainly ambiguous and depends on your interpration of the words.
The way I see it:
A datastructure, in conventional Comp Sci / Programming i.e. array, queue, binary tree usually has a specific purpose. Json can be pretty much be used for anything, hence why it's a data-format. But both definitions make sense
In my opinion both is correct.
JSON is also a data format (.json) and also a data strucuture which you can use for instance in Java etc.
But more correct is data strucutre.
JSON (canonically pronounced /ˈdʒeɪsən/ jay-sən;[1] sometimes
JavaScript Object Notation) is an open-standard format that uses
human-readable text to transmit data objects consisting of
attribute–value pairs. It is the most common data format used for
asynchronous browser/server communication (AJAJ), largely replacing
XML which is used by AJAX.
Source: Wikipedia

Is it possible to escape & &apos; present in database?

The data retrieved from database has & or &apos;. How do I escape and show as & or ' without using gsub method?
If you can't stop the data from being inserted like that, then there is code here to create a function in MySQL that you can use in your query in order to return the decoded data.
Or from within Ruby, not using a replace strategy, take a look at how-do-i-encode-decode-html-entities-in-ruby.
First of all, an escape-sequence is found in string-analysis only, not in html or XML where you talk of masquerading. You can escape a string for reasons of concatenation for example. Html-Entities are specific entities which are replaced in urns to masquerade a special character. It is absolutely wrong to save strings still containing html-entities in a db-table. The masked string has to be demasked first, after you "reget" it from post :). Otherwise you try to save html-entities in a special table, eg. for programming reasons. A text-file should do better - try dBase 2 - or simply google the web for a page with an entity-listing.
The second point is that XML is - for the realization of better reading of your own code (in general), thought to be a personally defined markup-language. That is why any non-std-tags within that specification, have to be defined by your own. (It was strange to read about regular entities as "XML-entities", like in the case of "&apos(;)", explained on this entity-page: http://www.madore.org/~david/computers/unicode/htmlent.html)
Std-XML-tags (not entities) are mainly important in aspects of finalizing your html-code to better fit to ongoing programming languages later on, but in my opinion the mentioned ones are still html-entities!
This can and should be performed on the view level, ie, the front-end, since its an HTML entity.
assuming you use jquery, you can do this to make &apos; appear as ' on the HTML.
$('<div/>').html(''').text()
You can find respective entity values in the link above

Is there a Go Language equivalent to Perls' Dumper() method in Data::Dumper?

I've looked at the very similarly titled post (Is there a C equivalent to Perls' Dumper() method in Data::Dumper?), regarding a C equivalent to Data::Dumper::Dumper();. I have a similar question for the Go language.
I'm a Perl Zealot by trade, and am a progamming hobbyist, and make use of Data::Dumper and similar offspring literally hundreds of times a day. I've taken up learning Go, because it looks like a fun and interesting language, something that will get me out of the Perl rut I'm in, while opening my eyes to new ways of doing stuffz... One of the things I really want is something like:
fmt.Println(dump.Dumper(decoded_json))
to see the resulting data structure, like Data::Dumper would turn the JSON into an Array of Hashes. Seeing this in Go, will help me to understand how to construct and work with the data. Something like this would be considered a major lightbulb moment in my learning of Go.
Contrary to the statements made in the C counterpart post, I believe we can write this, and since I'll be passing Dumper to Println, after compilation what ever JSON string or XML page I pass in and decode. I should be able to see the result of the decoding, in a Dumper like state... So, does any more know of anything like this that exists? or maybe some pointers to getting something like this done?
Hi and welcome to go I'm former perl hacker myself.
As to your question the encoding/json package is probably the closest you will find to a go data pretty printer. I'm not sure you really need it though. One of the reasons Data::Dumper was awesome in perl is because many times you really didn't know the structure of the data you were consuming without visually inspecting it. With go though everything is a specific type and every specific type has a specific structure. If you want to know what the data will look like then you probably just need to look at it's definition.
Some other tools you should look at include:
fmt.Println("%#v", data) will print the data in go-syntax form.
fmt.Println("%T", data) will print the data's type in go-syntax
form.
More fmt format string options are documented here: http://golang.org/pkg/fmt/
I found a couple packages to help visualize data in Go.
My personal favourite - https://github.com/davecgh/go-spew
There's also - https://github.com/tonnerre/golang-pretty
I'm not familiar with Perl and Dumper, but from what I understand of your post and the related C post (and the very name of the function!), it outputs the content of the data structure.
You can do this using the %v verb of the fmt package. I assume your JSON data is decoded into a struct or a map. Using fmt.Printf("%v", json_obj) will output the values, while %+v will add field names (for a struct - no difference if its a map, %v will output both keys and values), and %#v will output type information too.

Which word do you use to describe a JSON-like object?

I have a naming issue.
If I read an object x from some JSON I can call my variable xJson (or some variation). However sometimes it is possible that the data could have come from a number of different sources amongst which JSON is not special (e.g. XMLRPC, programmatically constructed from Maps,Lists & primitives ... etc).
In this situation what is a good name for the variable? The best I have come up with so far is something like 'DynamicData', which is ok in some situations, but is a bit long and not probably not very clear to people unfamiliar with the convention.
SerializedData?
A hierarchical collection of hashes and lists of data is often referred to as a document no matter what serialization format is used. Another useful description might be payload or body in the sense of a message body for transmission or a value string written to a key/value store.
I tend to call the object hierarchy a "doc" myself, and the serialized format a "document." Thus a RequestDocument is parsed into a RequestDoc, and upon further identification it might become an OrderDoc, or a CustomerUpdateDoc, etc. An InvoiceDoc might become known generically as a ResponseDoc eventually serialized to a ResponseDocument.
The longer form is awkward, but such serialized strings are typically short-lived and localized in the code anyway.
If your data is the model, name it after the model it's representing. e.g., name it after the purpose of the contents, not the format of the contents. So if it's a list of customer information, name it "customers", or "customerModel", or something like that.
If you don't know what the contents are, the name isn't important, unless you want to differentiate the format. "responseData", "jsonResponse", etc...
And "DynamicData" isn't a long name, unless there is absolutely nothing descriptive to be said about the data. "data" might be just fine.