XML output from MySQL query - mysql

is there any chance of getting the output from a MySQL query directly to XML?
Im referring to something like MSSQL has with SQL-XML plugin, for example:
SELECT * FROM table WHERE 1 FOR XML AUTO
returns text (or xml data type in MSSQL to be precise) which contains an XML markup structure generated
according to the columns in the table.
With SQL-XML there is also an option of explicitly defining the output XML structure like this:
SELECT
1 AS tag,
NULL AS parent,
emp_id AS [employee!1!emp_id],
cust_id AS [customer!2!cust_id],
region AS [customer!2!region]
FROM table
FOR XML EXPLICIT
which generates an XML code as follows:
<employee emp_id='129'>
<customer cust_id='107' region='Eastern'/>
</employee>
Do you have any clues how to achieve this in MySQL?
Thanks in advance for your answers.

The mysql command can output XML directly, using the --xml option, which is available at least as far back as MySql 4.1.
However, this doesn't allow you to customize the structure of the XML output. It will output something like this:
<?xml version="1.0"?>
<resultset statement="SELECT * FROM orders" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<field name="emp_id">129</field>
<field name="cust_id">107</field>
<field name="region">Eastern</field>
</row>
</resultset>
And you want:
<?xml version="1.0"?>
<orders>
<employee emp_id="129">
<customer cust_id="107" region="Eastern"/>
</employee>
</orders>
The transformation can be done with XSLT using a script like this:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="resultset">
<orders>
<xsl:apply-templates/>
</orders>
</xsl:template>
<xsl:template match="row">
<employee emp_id="{field[#name='emp_id']}">
<customer
cust_id="{field[#name='cust_id']}"
region="{field[#name='region']}"/>
</employee>
</xsl:template>
</xsl:stylesheet>
This is obviously way more verbose than the concise MSSQL syntax, but on the other hand it is a lot more powerful and can do all sorts of things that wouldn't be possible in MSSQL.
If you use a command-line XSLT processor such as xsltproc or saxon, you can pipe the output of mysql directly into the XSLT program. For example:
mysql -e 'select * from table' -X database | xsltproc script.xsl -

Using XML with MySQL seems to be a good place to start with various different ways to get from MySQL query to XML.
From the article:
use strict;
use DBI;
use XML::Generator::DBI;
use XML::Handler::YAWriter;
my $dbh = DBI->connect ("DBI:mysql:test",
"testuser", "testpass",
{ RaiseError => 1, PrintError => 0});
my $out = XML::Handler::YAWriter->new (AsFile => "-");
my $gen = XML::Generator::DBI->new (
Handler => $out,
dbh => $dbh
);
$gen->execute ("SELECT name, category FROM animal");
$dbh->disconnect ();

Do you have any clue how to achieve this in MySQL?
Yes, go by foot and make the xml yourself with CONCAT strings. Try
SELECT concat('<orders><employee emp_id="', emp_id, '"><customer cust_id="', cust_id, '" region="', region, '"/></employee></orders>') FROM table
I took this from a 2009 answer How to convert a MySQL DB to XML? and it still seems to work. Not very handy, and if you have large trees per item, they will all be in one concatenated value of the root item, but it works, see this test with dummies:
SELECT concat('<orders><employee emp_id="', 1, '"><customer cust_id="', 2, '" region="', 3, '"/></employee></orders>') FROM DUAL
gives
<orders><employee emp_id="1"><customer cust_id="2" region="3"/></employee></orders>
With "manual coding" you can get to this structure.
<?xml version="1.0"?>
<orders>
<employee emp_id="1">
<customer cust_id="2" region="3" />
</employee>
</orders>
I checked this with a larger tree per root item and it worked, but I had to run an additional Python code on it to get rid of the too many openings and closings generated when you have medium level nodes in an xml path. It is possible using backward-looking lists together with entries in a temporary set, and I got it done, but an object oriented way would be more professional. I just coded to drop the last x items from the list as soon as a new head item was found, and some other tricks for nested branches. Worked.
I puzzled out a Regex that found each text between tags:
string = " <some tag><another tag>test string<another tag></some tag>"
pattern = r'(?:^\s*)?(?:(?:<[^\/]*?)>)?(.*?)?(?:(?:<\/[^>]*)>)?'
p = re.compile(pattern)
val = r''.join(p.findall(string))
val_escaped = escape(val)
if val_escaped != val:
string.replace(val, val_escaped)
This Regex helps you to access the text between the tags. If you are allowed to use CDATA, it is easiest to use that everywhere. Just make the content "CDATA" (character data) already in MySQL:
<Title><![CDATA[', t.title, ']]></Title>
And you will not have any issues anymore except for very strange characters like (U+001A) which you should replace already in MySQL. You then do not need to care for escaping and replacing the rest of the special characters at all. Worked for me on a 1 Mio. lines xml file with heavy use of special characters.
Yet: you should validate the file against the needed xml schema file using Python's module xmlschema. It will alert you when you are not allowed to use that CDATA trick.
If you need a fully UTF-8 formatted content without CDATA, which might often be the task, you can reach that even in a 1 Mio lines file by validating the code output (= xml output) step by step against the xml schema file (xsd that is the aim). It is a bit fiddly work, but it can be done with some patience.
Replacements are possible with:
MySQL using replace()
Python using string.replace()
Python using Regex replace (though I did not need it in the end, it would look like: re.sub(re.escape(val), 'xyz', i))
string.encode(encoding = 'UTF-8', errors = 'strict')
Mind that encoding as utf-8 is the most powerful step, it could even put aside all three other replacement ways above. Mind also: It makes the text binary, you then need to treat it as binary b'...' and you can thus write it to a file only in binary mode using wb.
As the end of it all, you may open the XML output in a normal browser like Firefox for a final check and watch the XML at work. Or check it in vscode/codium with an xml Extension. But these checks are not needed, in my case the xmlschema module has shown everything very well. Mind also that vscode/codium can can handle xml problems quite easily and still show a tree when Firefox cannot, therefore, you will need a validator or a browser to see all xml errors.
Quite a huge project could be done using this xml-building-with-mysql, at the end there was a triple nested xml tree with many repeating tags inside parent nodes, all made from a two-dimensional MySQL output.

Related

eXistdb JSON serialization

I have created an XQuery against http://www.w3schools.com/xsl/books.xml
xquery version "3.0";
for $x in collection("/db/books")//book
return
<book title="{$x/title}">
{$x//author}
</book>
If I evaluate it in eXistdb's eXide, I get reasonable output in the preview pane.
<book title="Everyday Italian">
<author>Giada De Laurentiis</author>
etc.
If I try to "run" it, I get the following error in the web browser:
This page contains the following errors:
error on line 4 at column 1: Extra content at the end of the document
Below is a rendering of the page up to the first error.
Giada De Laurentiis
I thought maybe I should serialize it as JSON. Based on a quick reading of http://exist-db.org/exist/apps/wiki/blogs/eXist/JSONSerializer, I added the following two lines after the xquery version line:
declare namespace json="http://www.json.org";
declare option exist:serialize "method=json media-type=text/javascript";
But I get the same acceptable xml preview result and same browser error.
How can I get my output in a web browser, either as XML or JSON?
I looked at https://stackoverflow.com/questions/35038779/json-serialization-with-exist-db-rest-api but didn't see how to use that as a starting point.
I'm glad you figured out that the original issue was that the browser expects well-formed XML, whereas eXide is happy to show you arbitrary nodes.
On the topic of JSON serialization, briefly (I'm on my phone), see http://exist-db.org/exist/apps/wiki/blogs/eXist/XQuery31 in the section entitled "Serialization". Make sure you're running eXist 3.0 RC1.
A top level tag and some additional curly braces are required:
xquery version "3.0";
declare namespace json="http://www.json.org";
declare option exist:serialize "method=json media-type=text/javascript";
<result>
{for $x in collection("/db/books")//book
return
<book title="{$x/title}">
{$x//author}
</book>
}
</result>
Or, for well-formed XML serialization:
xquery version "3.0";
<result>
{for $x in collection("/db/books")//book
return
<book title="{$x/title}">
{$x//author}
</book>
}
</result>
Credit: http://edutechwiki.unige.ch/en/XQuery_tutorial_-_basics

LOAD XML LOCAL INFILE ampersand issue

I want to import XML data which contains ampersands into MySQL.
The import fails to run after a row has been encountered with a raw ampersand (&). Admittedly this is not correct XML but that is what I am working with.
I have tried replacing the raw ampersands with & - this appears in the database as the raw text (not the equivalent ASCII ampersand).
I have tried replacing the raw ampersands with \& - this stops the import routine from running further.
Can you suggest how I can get the raw ampersand into the database using LOAD XML LOCAL INFILE?
Sample raw XML follows:
<?xml version="1.0" ?>
<REPORT>
<CLA>
<PLOT>R&S</PLOT>
<VAL>100.10</VAL>
</CLA>
<CLA>
<PLOT>G&N</PLOT>
<VAL>200.20</VAL>
</CLA>
</REPORT>
Admittedly this is not correct xml but that is what I am working
with.
No, it's not that it's incorrect XML. It is not XML at all because it is not well-formed.
You have two ways forward:
Fix the data by treating it as text to turn it into XML. (Replace
the & with &.)
Load the data into the database using a non-XML data type.

Is it possible to define optional fields in a Smooks CSV reader

I want to read CSV entries with a Smooks CSV reader.
But my problem is how to define a field as optional.
<?xml version="1.0"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd" xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.2.xsd">
<csv:reader fields="field1,field2,OPTONAL_FIELD3,$ignore$+" />
</smooks-resource-list>
Is there any way to define such a configuration?
The data I want to read looks like this:
123,4,opt1
456,7
If field 3 is declared as normal in the configuration, the line must always ends with a ',' like this:
456,7,

Easy way to convert xmlElements XML

I'm making an extension for indesign in flex. I need to convert xmlElements to xml. When you export a file as xml, you get a nicely structured xml file. But how can i easily convert exmlElements to use in my code?
You can call exportFile from a xmlElement. Once that said, you should be able to get a nicely xml structure.
Here is a simple test :
app.activeDocument.xmlElements[0].xmlElements[-1].exportFile ( ExportFormat.XML , File ( Folder.desktop + "/test.xml" ) );
Then the output gives me :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Article>dssds</Article>
Loic

Intelligent RegEx in Perl?

Background
Consider the following input:
<Foo
Bar="bar"
Baz="1"
Bax="bax"
>
After processing, I need it to look like the following:
<Foo
Bar="bar"
Baz="1"
Bax="bax"
CustomAttribute="TRUE"
>
Implementation
This is all I need to do for no more than 5 files, so using anything other than a regular expression seems like overkill. Anyway, I came up with the following (Perl) regular expression to accomplish this:
$data =~ s/(<\s*Foo)(.*?)>/$1$2 CustomAttribute="TRUE">/sig;
Problems
This works well, however, there is one obvious problem. This sort of pattern is "dumb" because if CustomAttribute has already been added, the operation outlined above will simply append another CustomAttribute=... blindly.
A simple solution, of course, is to write a secondary expression that will attempt to match for CustomAttribute prior to running the replacement operation.
Questions
Since I'm rather new to the scripting language and regular expression worlds, I'm wondering whether it's possible to solve this problem without introducing any host language constructs (i.e., an if-statement in Perl), and simply use a more "intelligent" version of what I wrote above?
I won't beat you over the head with how you should not use a regex for this. I mean, you shouldn't, but you obviously know that from what you said in your question, so moving on...
Something that will accomplish what you're asking for is called a negative lookahead assertion (usually (?!...)), which basically says that you don't want the match to apply if the pattern inside the assertion is found ahead of this point. In your example, you don't want it to apply if CustomAttribute is already present, so:
$data =~ s/(<\s*Foo)(?![^>]*\bCustomAttribute=)(.*?)>/$1$2CustomAttribute="TRUE">/sig;
This sounds like it might be a job for XML::Twig, which can process XML and change parts of it as it runs into them, including adding attributes to tags. I suspect you'd spend as much time getting used to Twig and you would finding a regex solution that only mostly worked. And, at the end you'd know enough Twig to use it on the next project. :)
Time for a lecture I guess ;--)
I am not sure why you think using a full-blown XML processor is overkill. It is actually easier to write the code using the proper tool. A regexp will be more complex and will rely on unwritten assumptions about the data, which is dangerous. Some of those assumptions are likely to be: no '>' in attribute values, no CDATA sections, no non-ascii characters in tag or attribute names, consistent attribute value quoting...
The only thing a regexp will give you is the assurance that the output keeps the original format of the data (in your case the fact that the attributes are each on a separate line). But if your format is consistent that can be done, and if not it should not matter, unless you keep you XML in a line-oriented revision control system.
Here is an example with XML::Twig. It assumes you have enough memory to keep any entire Foo element in memory, and it works even on the admittedly contrived bit of XML in the DATA section. It would probably be just as easy to do with XML::LibXML (read the XML in memory, select all Foo elements, add attribute to each of them, output, that's 5 easy to understand lines by my count).
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my( $tag, $att, $val)= ( 'Foo', 'CustomAttribute', 'TRUE');
XML::Twig->new( # only process those elements
twig_roots => { $tag => sub {
# add/set attribute
$_->set_att( $att => $val);
# output and free memory
$_->flush;
}
},
twig_print_outside_roots => 1, # output everything else
pretty_print => 'cvs', # seems to be the right format
)
->parse( \*DATA) # use parsefile( $file) if parsing... a file
->flush; # not needed in XML::Twig 3.33
__DATA__
<doc>
<Foo
Bar="bar"
Baz="1"
Bax="bax"
>
here is some text
</Foo>
<Foo CustomAttribute="TRUE"><Foo no_att="1"/></Foo>
<bar><![CDATA[<Foo no_att="1">tricked?</Foo>]]></bar>
<Foo><![CDATA[<Foo no_att="1" CustomAttribute="TRUE">tricked?</Foo>]]></Foo>
<Foo
Bar=">"
Baz="1"
Bax="bax"
></Foo>
<Foo
Bar="
>"
Baz="1"
Bax="bax"
></Foo>
<Foo
Bar=">"
Baz="1"
Bax="bax"
CustomAttribute="TRUE"
></Foo>
<Foo
Bar="
>"
Baz="1"
Bax="b
ax"
CustomAttribute="TR
UE"
></Foo>
</doc>
You can send your matches through a function with the 'e' modifier for more processing.
my $str = qq`
<Foo
Bar="bar"
Baz="1"
Bax="bax"
CustomAttribute="TRUE"
>
<Foo
Bar="bar"
Baz="1"
Bax="bax"
>
`;
sub foo {
my $guts = shift;
$guts .= qq` CustomAttribute="TRUE"` if $guts !~ m/CustomAttribute/;
return $guts;
}
$str =~ s/(<Foo )([^>]*)(>)/$1.foo($2).$3/xsge;