eXistdb JSON serialization - json

I have created an XQuery against http://www.w3schools.com/xsl/books.xml
xquery version "3.0";
for $x in collection("/db/books")//book
return
<book title="{$x/title}">
{$x//author}
</book>
If I evaluate it in eXistdb's eXide, I get reasonable output in the preview pane.
<book title="Everyday Italian">
<author>Giada De Laurentiis</author>
etc.
If I try to "run" it, I get the following error in the web browser:
This page contains the following errors:
error on line 4 at column 1: Extra content at the end of the document
Below is a rendering of the page up to the first error.
Giada De Laurentiis
I thought maybe I should serialize it as JSON. Based on a quick reading of http://exist-db.org/exist/apps/wiki/blogs/eXist/JSONSerializer, I added the following two lines after the xquery version line:
declare namespace json="http://www.json.org";
declare option exist:serialize "method=json media-type=text/javascript";
But I get the same acceptable xml preview result and same browser error.
How can I get my output in a web browser, either as XML or JSON?
I looked at https://stackoverflow.com/questions/35038779/json-serialization-with-exist-db-rest-api but didn't see how to use that as a starting point.

I'm glad you figured out that the original issue was that the browser expects well-formed XML, whereas eXide is happy to show you arbitrary nodes.
On the topic of JSON serialization, briefly (I'm on my phone), see http://exist-db.org/exist/apps/wiki/blogs/eXist/XQuery31 in the section entitled "Serialization". Make sure you're running eXist 3.0 RC1.

A top level tag and some additional curly braces are required:
xquery version "3.0";
declare namespace json="http://www.json.org";
declare option exist:serialize "method=json media-type=text/javascript";
<result>
{for $x in collection("/db/books")//book
return
<book title="{$x/title}">
{$x//author}
</book>
}
</result>
Or, for well-formed XML serialization:
xquery version "3.0";
<result>
{for $x in collection("/db/books")//book
return
<book title="{$x/title}">
{$x//author}
</book>
}
</result>
Credit: http://edutechwiki.unige.ch/en/XQuery_tutorial_-_basics

Related

Powershell removing escape chars on conversion

I have some automatically generated json files I need to modify using PowerShell. However, when I use the ConvertFrom-Json I'm in some cases losing chars.
I tried using
ForEach-Object {
[System.Text.RegularExpressions.Regex]::Unescape($_)
}
To handle the unescape chars, but no luck
The example of a string getting modified
<?xml version=\"1.0\" encoding=\"UTF-16\"?><ExchangeRates>
is getting transformed to
<?xml version="1.0" encoding="UTF-16"?><ExchangeRates>
Losing the backslashes.
How would I getting around this without transforming the unintentional parts of the file ?
I redid the testing in a clean environment, and found out I had something that enforced a UTF8 encoding when I loaded the content into a son object in PowerShell which in this case causes the chars to be converted into escape chars which results in them being replaced by nothing in my case
tldr; UTF8 formatting when doing the convertfrom-json was causing the problem

What is the best value for "Unit Separator" in XML?

I used Unit Separator (US/0x1f) in database. When I export to XML 1.0 file, it is not accepted and leave the attribute with empty value.
I have data in database like this:
"option1=10;option2=20;option3=aaa[US]bbb[US]ccc;"
I'm assuming to export to XML 1.0 file like this:
<elementname, attr1="option1=10;option2=20;option3=aaa[US]bbb[US]ccc;"/>
However, the [US] is not accepted by XML 1.0. Any suggestions?
I can replace '\37' (oct 37, hex 1f) with something like "XXX", "$", "(0x1f)"... before writing to XML;
I can replace it when importing from XML and write to database. However, if I replace it with "& # x 1 F ;", which is the HTML Entity for Unit separator, I end up with "& a m p ; # x 1 F ;", which is definitely not what I wanted.
If I manually modify the XML file to "& # x 1 F ;", I can not use MSXML to load it, giving error "Invalid Unicode Character".
Any suggestions?
Thank you
Summary:
Let's make an analogy: Let's think about how the compiler works, there are two phases: "Pre-compile" and "Compile".
For XML File Generation, it acts like the "Compile" phase. E.g. convert "<" to "& l t ;"
However, the Unit Separator is not supported by XML 1.0, so the "Compile" phase will not convert it to HTML Entity "& # x 1 F ;"
So we have to seek solution in the "Pre-Compile" phase, which is our own application's responsibility.
When writing:
Option1: <unit>aaa</unit><unit>bbb</unit>
Option2: simply use "_x241F_" to replace "\37" in the string if "_x241F_" is not conflicting with any existing token in the string.
When reading:
According to Option1: Load the elements, catenate to a single string with "\37" as separator.
According to Option2: simply use "\37" to replace "_x241F_".
I've also found out that MSXML (even the highest version MSXML6.dll) will not load XML 1.1 .
So if we are unfortunately using MSXML, we have to write our own "Pre-Compile" code to handle the Unicode characters before feeding the "Compile" phase.
Note: I borrowed the idea of "_ x 2 4 1 F _" from here.
Thanks for everyone's help
There is no HTML entity for U+001F UNIT SEPARATOR. Besides, HTML entities would be irrelevant when dealing with generic XML.
The character references would be  and , in HTML and in XML, but the character is not allowed in HTML or in XML. For XML 1.0, which this seems to be about, please refer to section 2.2 Characters, where the normative definition is the following production (the associated comment is misleading, and comments are non-normative):
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
The conclusions to be drawn depend on the meaning and purpose of UNIT SEPARATOR in the text. It has no generally defined meaning; it is up to applications to assign a meaning to it and process it accordingly.
Usually UNIT SEPARATOR is used to separate units of some kind, so the natural approach would be to process the incoming data so that instead of such separators, the data, when converted to XML format, has units denoted by markup. So for data like aaa[US]bbb[US]ccc where [US] is UNIT SEPARATOR, you would generate something like <unit>aaa</unit><unit>bbb</unit><unit>ccc</unit>.
This website
http://www.fileformat.info/info/unicode/char/1f/index.htm
suggests one of the following:
HTML Entity (decimal) 
HTML Entity (hex)

How to embed HTML string syntax in CoffeeScript using VIM?

I have looked at how to embed HTML syntax in JavaScript string from HTML syntax highlighting in javascript strings in vim.
However, when I use CoffeeScript I cannot get the same thing working by editing coffee.vim syntax file in a similar way. I got recursive errors which said including html.vim make it too nested.
I have some HTML template in CoffeeScript like the following::
angular.module('m', [])
.directive(
'myDirective'
[
->
template: """
<div>
<div>This is <b>bold</b> text</div>
<div><i>This should be italic.</i></div>
</div>
"""
]
)
How do I get the template HTML syntax in CoffeeScript string properly highlighted in VIM?
I would proceed as follows:
Find out the syntax groups that should be highlighted as pure html would be. Add html syntax highlighting to these groups.
To find the valid syntax group under the cursor you can follow the instructions here.
In your example the syntax group of interest is coffeeHereDoc.
To add html highlighting to this group execute the following commands
unlet b:current_syntax
syntax include #HTML syntax/html.vim
syn region HtmlEmbeddedInCoffeeScript start="" end=""
\ contains=#HTML containedin=coffeeHereDoc
Since vim complains about recursion if you add these lines to coffee.vim i would go with an autocommand:
function! Coffee_syntax()
if !empty(b:current_syntax)
unlet b:current_syntax
endif
syn include #HTML syntax/html.vim
syn region HtmlEmbeddedInCoffeeScript start="" end="" contains=#HTML
\ containedin=coffeeHereDoc
endfunction
autocmd BufEnter *.coffee call Coffee_syntax()
I was also running into various issues while trying to get this to work. After some experimentation, here's what I came up with. Just create .vim/after/syntax/coffee.vim with the following contents:
unlet b:current_syntax
syntax include #HTML $VIMRUNTIME/syntax/html.vim
syntax region coffeeHtmlString matchgroup=coffeeHeredoc
\ start=+'''\\(\\_\\s*<\\w\\)\\#=+ end=+\\(\\w>\\_\\s*\\)\\#<='''+
\ contains=#HTML
syn sync minlines=300
The unlet b:current_syntax line disables the current syntax matching and lets the HTML syntax definition take over for matching regions.
Using an absolute path for the html.vim inclusion avoids the recursion problem (described more below).
The region definition matches heredoc strings that look like they contain HTML. Specifically, the start pattern looks for three single quotes followed by something that looks like the beginning of an HTML tag (there can be whitespace between the two), and the end pattern looks for the end of an HTML tag followed by three single quotes. Heredoc strings that don't look like they contain HTML are still matched using the coffeeHeredoc pattern. This works because this syntax file is being loaded after the syntax definitions from the coffeescript plugin, so we get a chance to make the more specific match (a heredoc containing HTML) before the more general match (the coffeeHeredoc region) happens.
The syn sync minlines=300 widens the matching region. My embedded HTML strings sometimes stretched over 50 lines, and Vim's syntax highlighter would get confused about how the string should be highlighted. For complete surety you could use syn sync fromstart, but for large files this could theoretically be slow (I didn't try it).
The recursion problem originally experienced by #heartbreaker was caused by the html.vim script that comes with the vim-coffeescript plugin (I'm assuming that was being used). That plugin's html.vim file includes the its coffee.vim syntax file to add coffeescript highlighting to HTML files. Using a relative syntax include, a la
syntax include #HTML syntax/html.vim
you get all the syntax/html.vim files in VIM's runtime path, including the one from the coffeescript plugin (which includes coffee.vim, hence the recursion). Using an absolute path will restrict you to only getting the particular syntax file you specify, but this seems like a reasonable tradeoff since the HTML one would embed in a coffeescript string is likely fairly simple.

Erro loading schemes from Eclipse - Mule IDE

i am attempting to create my first mule server but i get an error for any external scheme i try and include,
my Config file is as follows (working on Eclipse Indigo with mule standalone 3.2 installation):
<?xml version="1.0" encoding="UTF-8"?>
<mule xmlns="http://www.mulesoft.org/schema/mule/core"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:spring="http://www.springframework.org/schema/beans"
xmlns:http="http://www.mulesoft.org/schema/mule/http"
xmlns:vm="http://www.mulesoft.org/schema/mule/vm"
xmlns:quartz="http://www.mulesoft.org/schema/mule/quartz"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/3.2/mule.xsd
http://www.mulesoft.org/schema/mule/quartz http://www.mulesoft.org/schema/mule/quartz/current/mule-quartz.xsd
http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/3.2/mule-http.xsd
http://www.mulesoft.org/schema/mule/vm http://www.mulesoft.org/schema/mule/vm/3.2/mule-vm.xsd">
<flow name="ChatListener">
<quartz:inbound-endpoint jobName="eventTimer" repeatInterval="2000">
<quartz:event-generator-job>
<quartz:payload>Poll Chat DB</quartz:payload>
</quartz:event-generator-job>
</quartz:inbound-endpoint>
<component>
<singleton-object class="com.TimeLineListener.ChatListener" />
</component>
<vm:outbound-endpoint path="ChatMsgs" exchange-pattern="one-way"/>
</flow>
<flow name="TimeLineMsgSender">
<composite-source>
<!-- Incoming Chat Msgs -->
<vm:inbound-endpoint path="ChatMsgs" exchange-pattern="one-way"/>
<!-- Incoming SIEM Msgs -->
<vm:inbound-endpoint path="SIEMMsgs" exchange-pattern="one-way"/>
<!-- Incoming NMS Msgs -->
<vm:inbound-endpoint path="NMSMsgs" exchange-pattern="one-way"/>
</composite-source>
<!-- Tested OutPut endpoint -->
<stdio:outbound-endpoint system="OUT"/>
</flow>
</mule>
and the errors i recieve are:
1.
The prefix "stdio" for element "stdio:outbound-endpoint" is not bound. mule-config.xml ‪/ChatTester‬ line 41 XML Problem
2.
cvc-complex-type.2.4.a: Invalid content was found starting with element 'vm:inbound-endpoint'. One of '{"http://www.mulesoft.org/schema/mule/core":abstract-inbound-endpoint}' is expected. mule-config.xml ‪/ChatTester‬ line 31 XML Problem
3.
cvc-complex-type.2.4.a: Invalid content was found starting with element 'quartz:inbound-endpoint'. One of '{"http://www.mulesoft.org/schema/mule/core":description, "http://www.mulesoft.org/schema/mule/core":composite-source, "http://www.mulesoft.org/schema/mule/core":abstract-inbound-endpoint, "http://www.mulesoft.org/schema/mule/core":abstract-message-processor, "http://www.mulesoft.org/schema/mule/core":abstract-outbound-endpoint, "http://www.mulesoft.org/schema/mule/core":response}' is expected. mule-config.xml ‪/ChatTester‬ line 17 XML Problem
any idea what i"m doing wrong?
1.
The prefix "stdio" for element "stdio:outbound-endpoint" is not bound. mule-config.xml ‪/ChatTester‬ line 41 XML Problem
This one is easy: you're missing the stdio namespace declaration.
2.
cvc-complex-type.2.4.a: Invalid content was found starting with element 'vm:inbound-endpoint'. One of '{"http://www.mulesoft.org/schema/mule/core":abstract-inbound-endpoint}' is expected. mule-config.xml ‪/ChatTester‬ line 31 XML Problem
3.
cvc-complex-type.2.4.a: Invalid content was found starting with element 'quartz:inbound-endpoint'. One of '{"http://www.mulesoft.org/schema/mule/core":description, "http://www.mulesoft.org/schema/mule/core":composite-source, "http://www.mulesoft.org/schema/mule/core":abstract-inbound-endpoint, "http://www.mulesoft.org/schema/mule/core":abstract-message-processor, "http://www.mulesoft.org/schema/mule/core":abstract-outbound-endpoint, "http://www.mulesoft.org/schema/mule/core":response}' is expected. mule-config.xml ‪/ChatTester‬ line 17 XML Problem
For these ones: I don't know. Maybe due to the mix of "current" and "3.2" you're using in the namespace locations? Try only with "3.2" instead of current to see if it helps.
Otherwise, nothing visibly crazy in your config :)
Actually, this is a problem of Eclipse and doesn't relate to your configuration. Hope this helps:
"Since Mule’s schemas are not split over multiple schema files, it’s safe to turn off this feature. In Eclipse’s preferences, go to XML > XML Files > Validation and clear the Honour all schema locations check box. Mule config files should now validate without errors again."-MuleSolf Blog.
For more details: http://blogs.mulesoft.org/overcoming-xml-validation-errors-in-eclipse-35/

XML output from MySQL query

is there any chance of getting the output from a MySQL query directly to XML?
Im referring to something like MSSQL has with SQL-XML plugin, for example:
SELECT * FROM table WHERE 1 FOR XML AUTO
returns text (or xml data type in MSSQL to be precise) which contains an XML markup structure generated
according to the columns in the table.
With SQL-XML there is also an option of explicitly defining the output XML structure like this:
SELECT
1 AS tag,
NULL AS parent,
emp_id AS [employee!1!emp_id],
cust_id AS [customer!2!cust_id],
region AS [customer!2!region]
FROM table
FOR XML EXPLICIT
which generates an XML code as follows:
<employee emp_id='129'>
<customer cust_id='107' region='Eastern'/>
</employee>
Do you have any clues how to achieve this in MySQL?
Thanks in advance for your answers.
The mysql command can output XML directly, using the --xml option, which is available at least as far back as MySql 4.1.
However, this doesn't allow you to customize the structure of the XML output. It will output something like this:
<?xml version="1.0"?>
<resultset statement="SELECT * FROM orders" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<field name="emp_id">129</field>
<field name="cust_id">107</field>
<field name="region">Eastern</field>
</row>
</resultset>
And you want:
<?xml version="1.0"?>
<orders>
<employee emp_id="129">
<customer cust_id="107" region="Eastern"/>
</employee>
</orders>
The transformation can be done with XSLT using a script like this:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="resultset">
<orders>
<xsl:apply-templates/>
</orders>
</xsl:template>
<xsl:template match="row">
<employee emp_id="{field[#name='emp_id']}">
<customer
cust_id="{field[#name='cust_id']}"
region="{field[#name='region']}"/>
</employee>
</xsl:template>
</xsl:stylesheet>
This is obviously way more verbose than the concise MSSQL syntax, but on the other hand it is a lot more powerful and can do all sorts of things that wouldn't be possible in MSSQL.
If you use a command-line XSLT processor such as xsltproc or saxon, you can pipe the output of mysql directly into the XSLT program. For example:
mysql -e 'select * from table' -X database | xsltproc script.xsl -
Using XML with MySQL seems to be a good place to start with various different ways to get from MySQL query to XML.
From the article:
use strict;
use DBI;
use XML::Generator::DBI;
use XML::Handler::YAWriter;
my $dbh = DBI->connect ("DBI:mysql:test",
"testuser", "testpass",
{ RaiseError => 1, PrintError => 0});
my $out = XML::Handler::YAWriter->new (AsFile => "-");
my $gen = XML::Generator::DBI->new (
Handler => $out,
dbh => $dbh
);
$gen->execute ("SELECT name, category FROM animal");
$dbh->disconnect ();
Do you have any clue how to achieve this in MySQL?
Yes, go by foot and make the xml yourself with CONCAT strings. Try
SELECT concat('<orders><employee emp_id="', emp_id, '"><customer cust_id="', cust_id, '" region="', region, '"/></employee></orders>') FROM table
I took this from a 2009 answer How to convert a MySQL DB to XML? and it still seems to work. Not very handy, and if you have large trees per item, they will all be in one concatenated value of the root item, but it works, see this test with dummies:
SELECT concat('<orders><employee emp_id="', 1, '"><customer cust_id="', 2, '" region="', 3, '"/></employee></orders>') FROM DUAL
gives
<orders><employee emp_id="1"><customer cust_id="2" region="3"/></employee></orders>
With "manual coding" you can get to this structure.
<?xml version="1.0"?>
<orders>
<employee emp_id="1">
<customer cust_id="2" region="3" />
</employee>
</orders>
I checked this with a larger tree per root item and it worked, but I had to run an additional Python code on it to get rid of the too many openings and closings generated when you have medium level nodes in an xml path. It is possible using backward-looking lists together with entries in a temporary set, and I got it done, but an object oriented way would be more professional. I just coded to drop the last x items from the list as soon as a new head item was found, and some other tricks for nested branches. Worked.
I puzzled out a Regex that found each text between tags:
string = " <some tag><another tag>test string<another tag></some tag>"
pattern = r'(?:^\s*)?(?:(?:<[^\/]*?)>)?(.*?)?(?:(?:<\/[^>]*)>)?'
p = re.compile(pattern)
val = r''.join(p.findall(string))
val_escaped = escape(val)
if val_escaped != val:
string.replace(val, val_escaped)
This Regex helps you to access the text between the tags. If you are allowed to use CDATA, it is easiest to use that everywhere. Just make the content "CDATA" (character data) already in MySQL:
<Title><![CDATA[', t.title, ']]></Title>
And you will not have any issues anymore except for very strange characters like (U+001A) which you should replace already in MySQL. You then do not need to care for escaping and replacing the rest of the special characters at all. Worked for me on a 1 Mio. lines xml file with heavy use of special characters.
Yet: you should validate the file against the needed xml schema file using Python's module xmlschema. It will alert you when you are not allowed to use that CDATA trick.
If you need a fully UTF-8 formatted content without CDATA, which might often be the task, you can reach that even in a 1 Mio lines file by validating the code output (= xml output) step by step against the xml schema file (xsd that is the aim). It is a bit fiddly work, but it can be done with some patience.
Replacements are possible with:
MySQL using replace()
Python using string.replace()
Python using Regex replace (though I did not need it in the end, it would look like: re.sub(re.escape(val), 'xyz', i))
string.encode(encoding = 'UTF-8', errors = 'strict')
Mind that encoding as utf-8 is the most powerful step, it could even put aside all three other replacement ways above. Mind also: It makes the text binary, you then need to treat it as binary b'...' and you can thus write it to a file only in binary mode using wb.
As the end of it all, you may open the XML output in a normal browser like Firefox for a final check and watch the XML at work. Or check it in vscode/codium with an xml Extension. But these checks are not needed, in my case the xmlschema module has shown everything very well. Mind also that vscode/codium can can handle xml problems quite easily and still show a tree when Firefox cannot, therefore, you will need a validator or a browser to see all xml errors.
Quite a huge project could be done using this xml-building-with-mysql, at the end there was a triple nested xml tree with many repeating tags inside parent nodes, all made from a two-dimensional MySQL output.