I am on a red hat system and I have multiple XML files generated from various SOAP requests that are in a format that is not compatible with MySQL's LoadXML function. I need to load the data into MySQL tables. One table will be setup for each type of XML file, depending on the data received via the Soap XML API.
Sample format of one of the files is as this, but each file will have a different number of columns and different column names. I am trying to find a way to convert them to a compatible format in the most generic way possible since I will have to create any customized solution for each API request/response.
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<dbd:DataRetrievalRequestResponse xmlns:dbd="dbd.v1">
<DataObjects>
<ObjectSelect>
<mdNm>controller-ac</mdNm>
<meNm>WALL-EQPT-A</meNm>
</ObjectSelect>
<DataInstances>
<DataInstance>
<instanceId>DSS1</instanceId>
<Attribute>
<name>Name</name>
<value>DSS1</value>
</Attribute>
<Attribute>
<name>Operational Mode</name>
<value>mode-fast</value>
</Attribute>
<Attribute>
<name>Rate - Down</name>
<value>1099289</value>
</Attribute>
<Attribute>
<name>Rate - Up</name>
<value>1479899</value>
</Attribute>
</DataInstance>
<DataInstance>
<instanceId>DSS2</instanceId>
<Attribute>
<name>Name</name>
<value>DSS2</value>
</Attribute>
<Attribute>
<name>Operational Mode</name>
<value>mode-fast</value>
</Attribute>
<Attribute>
<name>Rate - Down</name>
<value>1299433</value>
</Attribute>
<Attribute>
<name>Rate - Up</name>
<value>1379823</value>
</Attribute>
</DataInstance>
</DataInstances>
</DataObjects>
</dbd:DataRetrievalRequestResponse>
</soap:Body>
</soap:Envelope>
Of course I want the data to be entered into a mysql table with column names 'id, Name, Group' rows for each unique instance
Name
Operational Mode
Rate - Down
Rate - Up
DSS1
mode-fast
1099289
1479899
DSS2
mode-fast
1299433
1379823
Do I need to create an XSLT and preprocess this XML data from command line prior to running it to LoadXML to get it into a format that MySQL LoadXML function will accept? This would not be a problem, but I am not familiar with XSLT transformations.
Is there a way to reformat the above XML to straight CSV (preferred), or to another XML format that is compatible, such as the examples given in mysql documentation for loadxml?
<row>
<field name='column1'>value1</field>
<field name='column2'>value2</field>
</row>
I tried doing LOAD DATA INFILE and using ExtractValue function, but some of the values have spaces in them, and the delimiter for ExtractValue is hard coded to single-space. This makes it unusable as a workaround.
Your question is very general (which is fine!) so my answer is also quite general.
Firstly, it's certainly true that XSLT is an ideal generic tool for problems of this sort. I have absolutely no doubt that every one of your SOAP messages could be coerced into a suitable form, using an XSLT that's customised for each type of message, while still remaining structurally very similar, which is what you'd want if you're new to XSLT.
I'm not sure how familiar you are with XPath, XML, XML namespaces, etc, but I think the task here is simple enough to tackle, and if you do have any tricky XPath expressions to write you can always come back to StackOverflow and ask for help.
From what you've said it sounds like you're confident that each SOAP message can be mapped to a single table. I'm going to suggest an XSLT pattern that would be customisable for each type of SOAP message, where you have an xsl:for-each statement that iterates over each row, and within that you create a row element and populate it with fields.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- indent the output, for ease of reading -->
<xsl:output indent="yes"/>
<!-- process the document -->
<xsl:template match="/">
<!-- create the root element of the output -->
<resultset>
<!-- create each row of the output, by iterating over the
repeating elements in the SOAP message -->
<xsl:for-each
select="//DataInstance">
<row>
<!-- create each field -->
<!-- This field is defined individually, and the value
is produced by evaluating the 'instanceId' xpath
relative to the current DataInstance -->
<field name="id"><xsl:value-of select="instanceId"/></field>
<!-- these field can be generated with a loop -->
<xsl:for-each select="Attribute">
<field name="{name}"><xsl:value-of select="value"/></field>
</xsl:for-each>
</row>
</xsl:for-each>
</resultset>
</xsl:template>
</xsl:stylesheet>
Result of this, run over your sample SOAP message:
<resultset>
<row>
<field name="id">DSS1</field>
<field name="Name">DSS1</field>
<field name="Operational Mode">mode-fast</field>
<field name="Rate - Down">1099289</field>
<field name="Rate - Up">1479899</field>
</row>
<row>
<field name="id">DSS2</field>
<field name="Name">DSS2</field>
<field name="Operational Mode">mode-fast</field>
<field name="Rate - Down">1299433</field>
<field name="Rate - Up">1379823</field>
</row>
</resultset>
If you can follow this general pattern, you should be able to write a custom XSLT for every kind of SOAP message in your collection. You will just need to modify the various XPath expressions in the stylesheet:
//DataInstance means "every DataInstance"
instanceId means "the instanceId that's a child of the current ("context") element.
name means "the name element that's a child of the current element.
value means "the value element that's a child of the current element.
In the example SOAP message you gave, the Attribute element maps to a field, so all those elements could be copied generically, with another xsl:for-each, but for your other documents you may have to just define each field element individually, as I did for the id element in my answer.
Related
I'm working with a server.xml file...
Case 1:
<?xml version="1.0" encoding="UTF-8"?>
<Resource name="${app.name}" />
In catalina.properties i have declared the app.name
app.name=or
Case 2:
<?xml version="1.0" encoding="UTF-8"?>
<Resource name="or" />
The problem is why case 2 is working and case 1 not?
Why in case 1 XML entities not parsing?
I.e the output is :
<Resource name= "or" /> //in case 1
<Resource name= "or" /> //in case 2
Key point: Entity expansion happens during XML parsing.
Case 1
In case 1, during parsing, there are no entities in Resources/#name – just ${app.name}, which the program calling the XML parser would presumably go on to substitute the literal text, or, for the variable:
<Resource name="or" />
Downstream processing likely doesn't know how to deal with or, and you have your "not working" case.
Case 2
In case 2, or exists in the XML file prior to parsing. After parsing, effectively, the program calling the XML parser sees the entities expanded:
<Resource name="or" />
and is able to "work" because it knows what to do when #name is "or".
Note that had catalina.properties been an XML file, the expansion would have occurred then that file was parsed, and you'd be back to your "working" case.
Solution
Options include one of the following:
Hardwire the entities in server.xml rather than in catalina.properties.
Force the property substitution to happen prior to XML parsing of server.xml.
Use Unicode characters directly (not encoded as XML entities) in your catalina.properties file.
I'm trying to fill the autocomplete field in Orbeon (version 2016.1) with suggestions which I receive as a JSON.
The JSON I get looks like:
{"status":"success","code":200,"data":{"streets":[{"name":"Street One","id":"1"},{"name":"Street Two","id":"2"},{"name":"Street Three","id":"3"}]}}
I know that the Resource URI should point to my web service (could that URI, or the arguments I need to send, be encoded?), but I don't know how the Items, Label and Value fields should be configured in this case (the label would be name from the json and value should point to the code from the json, of course).
I referred to https://doc.orbeon.com/xforms/submission-json.html but haven't exactly managed to get what I'm trying to.
Can someone help?
Thanks in advance.
Masa
In particular, with your specific JSON, the corresponding XML will look as follows. In general, see the section Seeing the converted XML for how you can create a form in Form Builder that allows you to see what the converted XML is for any JSON.
<json type="object">
<status>success</status>
<code type="number">200</code>
<data type="object">
<streets type="array">
<_ type="object">
<name>Street One</name>
<id>1</id>
</_>
<_ type="object">
<name>Street Two</name>
<id>2</id>
</_>
<_ type="object">
<name>Street Three</name>
<id>3</id>
</_>
</streets>
</data>
</json>
I'm new to Solr and I'm trying to test its functionalities. I come from RDBMS world and was wondering how Solr would perform with my data.
I created a new core:
$ bin/solr create -c test
and successfully loaded a JSON file using:
$ bin/post -c test file.json
The first record of file.json looks like this:
{"attr":"01234"}
but Solr stores it as:
{"attr":1234}
I began defining a Data Import Handler following this tutorial (Youtube video) in order to correctly store my data, and found that JSON can't be processed by DIH. I'm stuck at the definition of data-config.xml because the tutorial treats XML files using the XPathEntityProcessor but can't find a JSON or even a CSV processor (I can easily retrieve a CSV version of file.json, so loading a CSV or a JSON is the same for me). The official documentation is a bit of a mess and doesn't provide many useful examples. The solely processors that probably treat JSON and CSV documents are LineEntityProcessor and PlainTextEntityProcessor ( Official Documentation).
This other link from the Solr Wiki states:
Goals
...
Make it possible to plugin any kind of datasource (ftp,scp etc) and any other format of user choice (JSON,csv etc)
so I guess it is really possible, but HOW?
I found a similar question posted in 2014 that no one answered here, so was wondering if in 2016, with the newer versions of Solar, there is a well known solution to this problem.
So the question is: how to import JSON and CSV documents using a specific data schema?
UPDATE
Executing http://localhost:8983/solr/test/dihupdate?command=full-import doesn't trigger any error but doesn't load any document. Here are the various xml files located in the core directory:
solrconfig.xml
...
<schemaFactory class="ClassicIndexSchemaFactory" />
...
<requestHandler name="/dihupdate" class="org.apache.solr.handler.dataimport.DataImportHandler" startup="lazy">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
...
schema.xml
...
<field name="id" type="long" indexed="true" stored="true" required="true" multiValued="false" />
<field name="attr1" type="string" indexed="true" stored="true" required="true" multiValued="true" />
<field name="date" type="date" indexed="true" stored="false" multiValued="true" />
<field name="attr2" type="string" indexed="true" stored="true" multiValued="true" />
<field name="attr3" type="string" indexed="true" stored="true" multiValued="true" />
<field name="attr4" type="int" indexed="false" stored="true" multiValued="true" />
<uniqueKey>id</uniqueKey>
...
data-config.xml
<dataConfig>
<dataSource type="FileDataSource" />
<document>
<entity name="f" processor="FileListEntityProcessor"
fileName="test.json"
rootEntity="false"
dataSource="null"
recursive="true"
baseDir="/path/to/data/"/>
</document>
</dataConfig>
In the Solr distribution, there is a films example (in example/films) that shows how to index JSON and takes advantage of both exact field definitions and type auto-detect. The instructions (README.txt) include the results you will see if you forget to do one of the steps as well.
I suggest you experiment with that and then apply that knowledge to your own use case.
Defining the schema is either done in schema.xml in your conf directory - this is the traditional way of setting up the expected format for documents (Defining Fields). If you're using the "Managed Schema" mode which is the current default, you'll have to switch to using the classic schema factory. You can then define the fields in your schema.xml by following the example schema, or any resource available on the web that describes how the schema.xml file is structured (you define a field type and then fields that uses that field type).
The other option is the managed schema - this is the default in the most recent releases, and this schema is manipulated through the API that Solr offers. On startup it reads the initial schema from schema.xml (if present), but after that you'll have to modify it through the API or the Admin interface. This API is described (with examples) at the Schema API page in the Solr guide.
Using a StrField (which ìs what the field type string uses) to store 012345 would result in Solr storing just the literal value, 012345, without converting it to an integer. That's probably a good place to start.
what is today the easiest, most automated way to import complex XML (external from an API including an .xsd scheme) into a relational or any database? - I understand there should be a (semi)automatic way to import this for every database, i just yet didnt find it? *
This also comes from the question why use XML complification for relational data? Why isnt API data that comes from a relational database and shall end up in one at most users of the API usually also transfered in rows? Table VS xml / json / yaml - table requires less storage if data is any related? more efficient than compression
:)
You're looking for a universal tool. However the concept of XML is fundamentally incompatible with the concept of a (relational) database.
The universal tool you're looking for should at least cover three fundamtal type of operations:
a) The XML is defined as a projection of a table/row/field concept
<xml>
<table name='myTable'>
<row id='1'>
<field name='myField1' type='string'>myValue1</field>
<field name='myField2' type='date'>01-01-1901</field>
<field name='myField3' type='number'>123456</field>
</row>
</table>
</xml>
b) The XML is to be stored in one XML field in one row of a table
Id Name Date XML
-- ---- ---------- -------------------------
1 MyEx 01-01-2001 <myObject>
<myAttribute name='class'>example</myAttribute>
</myObject>
c) The XML is a projection of a parent/child relationship in the database
<xml>
<order number='123'>
<customer id='1001'>myCustomer</customer>
<orderDate>01-01-2001</orderDate>
<address>wherever to go</address>
<orderDetails>
<orderProduct code='P01'>
<name>myProduct</name>
<amount>15</amount>
<listPrice>$14.00</listPrice>
</orderProduct>
</orderDetails>
</order>
</xml>
In each and every case the tool must specify if you are allowed to import one of such objects or more, and the tool must be able to transform the presenation of values to acceptable storage formats.
All of this is not impossible, but important to check which of these functionalities your selected tool will support.
i need to know how to link my xsl transformation to my database i currently have it set up doing a html conversion but need to insert the data into a database the xslt is a single file only used for conversion and is run in a php script, i saw a thread on it a few days ago but forgot to save it and now cant for the life of me find anything on this. the xml is a feed not a file this is what the xsl looks like:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="ref" select="lr_rates/hoel_ref"/>
<xsl:for-each select="//room">
<xsl:variable name="ref" select="ref"/>
<xsl:variable name="type" select="type_description"/>
<xsl:variable name="descr" select="description"/>
<xsl:variable name="avail" select="rooms_available"/>
<xsl:variable name="rate" select="rack_rate"/>
<xsl:variable name="cur" select="rate/requested_currency"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
then this is what the database request might look like
"DELETE FROM `pt_rooms` WHERE hotel_ref = '$id'";
"INSERT INTO `pt_rooms`(hotel_ref,room_ref,room_type,description,availability,price,currency) VALUES ('$id','$ref','$type','$descr','$avail','$rate','$cur')";
i think i would probably need a mysql_connect statement as-well as its not an application just a singl xsl in a php script if anyone could link to a good explanation
XSLT is a XML Transformation, meaning you convert any XML file from XML to any other text-based format that you can come up with. It doesn't magically interact with your database without having some kind of processing code that takes the result and executes the queries. Since you need this and mentioned PHP, you would be better of processing the XML in PHP in the first place and populate the database that way.