convert/Import specific XML + XSD into a database automatically? - mysql

what is today the easiest, most automated way to import complex XML (external from an API including an .xsd scheme) into a relational or any database? - I understand there should be a (semi)automatic way to import this for every database, i just yet didnt find it? *
This also comes from the question why use XML complification for relational data? Why isnt API data that comes from a relational database and shall end up in one at most users of the API usually also transfered in rows? Table VS xml / json / yaml - table requires less storage if data is any related? more efficient than compression
:)

You're looking for a universal tool. However the concept of XML is fundamentally incompatible with the concept of a (relational) database.
The universal tool you're looking for should at least cover three fundamtal type of operations:
a) The XML is defined as a projection of a table/row/field concept
<xml>
<table name='myTable'>
<row id='1'>
<field name='myField1' type='string'>myValue1</field>
<field name='myField2' type='date'>01-01-1901</field>
<field name='myField3' type='number'>123456</field>
</row>
</table>
</xml>
b) The XML is to be stored in one XML field in one row of a table
Id Name Date XML
-- ---- ---------- -------------------------
1 MyEx 01-01-2001 <myObject>
<myAttribute name='class'>example</myAttribute>
</myObject>
c) The XML is a projection of a parent/child relationship in the database
<xml>
<order number='123'>
<customer id='1001'>myCustomer</customer>
<orderDate>01-01-2001</orderDate>
<address>wherever to go</address>
<orderDetails>
<orderProduct code='P01'>
<name>myProduct</name>
<amount>15</amount>
<listPrice>$14.00</listPrice>
</orderProduct>
</orderDetails>
</order>
</xml>
In each and every case the tool must specify if you are allowed to import one of such objects or more, and the tool must be able to transform the presenation of values to acceptable storage formats.
All of this is not impossible, but important to check which of these functionalities your selected tool will support.

Related

TTL file format - I have no idea what this is

I have a file which has a structure, but I don't know what format it is, nor how to parse it. The file extension is ttl, but I have never encountered this before.
Some lines from the file looks like this:
<http://data.europa.eu/esco/label/790ff9ed-c43b-435c-b6b3-6a4a6e8e8326>
a skosxl:Label ;
skosxl:literalForm "gérer des opérations d’allègement"#fr .
<http://data.europa.eu/esco/label/98570af6-b237-4cdd-b555-98fe3de26ef8>
a skosxl:Label ;
esco:hasLabelRole <http://data.europa.eu/esco/label-role/neutral> , <http://data.europa.eu/esco/label-role/male> , <http://data.europa.eu/esco/label-role/female> ;
skosxl:literalForm "particleboard machine technician"#en .
<http://data.europa.eu/esco/label/aaac5531-fc8d-40d5-bfb8-fc9ba741ac21>
a skosxl:Label ;
esco:hasLabelRole "http://data.europa.eu/esco/label-role/female" , "http://data.europa.eu/esco/label-role/standard-female" ;
skosxl:literalForm "pracovnice denní péče o děti"#cs .
And it goes on like this for 400 more MB. Additional attributes are added, for some, but not all nodes.
It reminds me of some form of XML, but I don't have much experience working with different formats. It also looks like something that can be modeles as a graph.
Do you have any idea what data format it is, and how I could parse it in python?
Yes, #Phil is correct that is turtle syntax for storing RDF data.
I would suggest you import it into an RDF store of some sort rather than try and parse 400MB+ yourself. You can use GraphDB, Blazegraph, Virtuso and the list goes on. A search for RDF stores should give many other options.
Then you can use SPARQL to query the RDF store (which is like SQL for relational databases) using Python RDFlib. Here is an example from RDFLib.
That looks like turtle - a data description language for the semantic web.
The :has label and :label are specified for two different semantic libraries defined to share data (esco and skosxl there should not be much problem finding these libraries with a search engine, assuming the data is in the semantic web) . :literal form could be thought of as the value in an XML tag.
They represent ontologies in a data structure:
Subject : 10
Predicate : Name
Object : John
As for python, read the data as a file, use the subject as the keys of a dictionary, put the values in a database, its unclear what you want to do with the data.
Semantic data is open, incomplete and could have an unusual, complex structure. The example above is very simple the primer linked above may help.

JSON definition - data structure or data format?

What is more correct? Say that JSON is a data structure or a data format?
It's almost certainly ambiguous and depends on your interpration of the words.
The way I see it:
A datastructure, in conventional Comp Sci / Programming i.e. array, queue, binary tree usually has a specific purpose. Json can be pretty much be used for anything, hence why it's a data-format. But both definitions make sense
In my opinion both is correct.
JSON is also a data format (.json) and also a data strucuture which you can use for instance in Java etc.
But more correct is data strucutre.
JSON (canonically pronounced /ˈdʒeɪsən/ jay-sən;[1] sometimes
JavaScript Object Notation) is an open-standard format that uses
human-readable text to transmit data objects consisting of
attribute–value pairs. It is the most common data format used for
asynchronous browser/server communication (AJAJ), largely replacing
XML which is used by AJAX.
Source: Wikipedia

Generate XML from XSD repeatable elements

Whats the tool to generate xml from xsd and the generated xml should contain more than one entries for the repeatable elements? I tried out tools that are available on eclipse and some online tools like xml-generator, but none of these work. They all generate only one entry for the repeatable elements.
Note: I want to convert the generated xml to json, but the xml-json convertor treats the repeatable elements in the xml as an array only if it has more than one entry.
Generating XMLs from XSD can be quite challenging, if only because what people expect to see may not be possible to be captured by an XSD.
QTAssistant (I am associated with it) has quite extensive features when it comes to sample XML creation.
The simplest (and dumbest) one (available by right clicking on the element in the graphical XSD visualizer) is still able to create two elements if the associated maxOccurs is greater than one.
However, the XML may be off: just because one may have named a field dateTime, it doesn't mean the generated text node will be a valid date time value, if the schema defined it as a string. The tool will also only create one of the choices (if your schema uses xsd:choice), etc.
QTAssistant can make use of additional metadata which gives the user ultimate control over the generated samples. It can even create thousands of XMLs by doing combinations captured using metadata items. (You should contact us on the support site if you're interested in these scenarios).
Regarding XML to JSON conversion, QTAssistant can also correctly convert XMLs to JSON for repeating fields. Given this XML:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- Sample XML generated by QTAssistant (http://www.paschidev.com) -->
<fundamo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:common="http://www.myorg.com/xsd/gen_fin">
<response>
<common:code>code1</common:code>
<common:description>description1</common:description>
</response>
<transaction>
<transactionRef>transactionRef1</transactionRef>
<dateTime>dateTime1</dateTime>
<userName>userName1</userName>
</transaction>
</fundamo>
The corresponding JSON is:
{
"response": {
"code": "code1",
"description": "description1"
},
"transaction": [
{
"transactionRef": "transactionRef1",
"dateTime": "dateTime1",
"userName": "userName1"
}
]
}
You may notice that transaction is an array, even if XML has only one of those elements. This conversion works for valid XMLs, as long as you have a defined XSD for all its content. For the past 2+ years we've been calling it "XSD-aware JSON conversion". It is also possible to define casing conversion strategies (e.g. change upper case to lower case since XML elements tend to be upper case, while JSON "people" prefer them lower case).
If you're in for a free tool or to write your own, I am sure you can use the free evaluation as a source of inspiration to address only the specific features you're interested in.

How to combine multiple MySQL databases using D2RQ?

I have four different MySQL databases that I need to convert into Linked Data and then run queries on the aggregated data. I have generated the D2RQ maps separately and then manually copied them together into a single file. I have read up some material on customizing the maps but am finding it hard to do so in my case because:
The ontology classes do not correspond to table names. In fact, most classes are column headers.
When I open the combined mapping in Protege, it generates only 3 classes (ClassMap, Database, and PropertyBridge) and lists all the column headers as instances of these.
If I import this file into my ontology, everything becomes annotation.
Please suggest an efficient way to generate a single graph that is formed by mapping these databases to my ontology.
Here is an example. I am using the EEM ontology to refine the mapping file generated by D2RQ. This is a section from the mapping file:
map:scan_event_scanDate a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:scan_event;
d2rq:property vocab:scan_event_scanDate;
d2rq:propertyDefinitionLabel "scan_event scanDate";
d2rq:column "scan_event.scanDate";
# Manually added
d2rq:datatype xsd:int;
.
map:scan_event_scanTime a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:scan_event;
d2rq:property vocab:scan_event_scanTime;
d2rq:propertyDefinitionLabel "scan_event scanTime";
d2rq:column "scan_event.scanTime";
# Manually added
d2rq:datatype xsd:time;
The ontology I am interested in has the following:
Data property: eventOccurredAt
Domain: EPCISevent
Range: datetime
Now, how should I modify the mapping file so that the date and time are two different relationships?
I think the best way to generate a single graph of your 4 databases is to convert them one by one to a Jena Model using D2RQ, and then use the Union method to create a global model.
For your D2RQ mapping file, you should read carefully The mapping language, it's not normal to have classes corresponding to columns.
If you give an example of your table structure, I can give you an illustration of a mapping file.
Good luck

MFL - Message Format Language

What is MFL? Why is it used? Does it actually convert data from one encoding to another?
According to Oracle:
A Message Format Language (MFL)
document is a specialized XML document
used to describe the layout of binary
data. It is an Oracle proprietary
language used to define rules to
transform formatted binary data into
XML data.