XML character entities not parsing - html

I'm working with a server.xml file...
Case 1:
<?xml version="1.0" encoding="UTF-8"?>
<Resource name="${app.name}" />
In catalina.properties i have declared the app.name
app.name=or
Case 2:
<?xml version="1.0" encoding="UTF-8"?>
<Resource name="or" />
The problem is why case 2 is working and case 1 not?
Why in case 1 XML entities not parsing?
I.e the output is :
<Resource name= "or" /> //in case 1
<Resource name= "or" /> //in case 2

Key point: Entity expansion happens during XML parsing.
Case 1
In case 1, during parsing, there are no entities in Resources/#name – just ${app.name}, which the program calling the XML parser would presumably go on to substitute the literal text, or, for the variable:
<Resource name="or" />
Downstream processing likely doesn't know how to deal with or, and you have your "not working" case.
Case 2
In case 2, or exists in the XML file prior to parsing. After parsing, effectively, the program calling the XML parser sees the entities expanded:
<Resource name="or" />
and is able to "work" because it knows what to do when #name is "or".
Note that had catalina.properties been an XML file, the expansion would have occurred then that file was parsed, and you'd be back to your "working" case.
Solution
Options include one of the following:
Hardwire the entities in server.xml rather than in catalina.properties.
Force the property substitution to happen prior to XML parsing of server.xml.
Use Unicode characters directly (not encoded as XML entities) in your catalina.properties file.

Related

How do I reformat XML to work with MySQL LoadXML

I am on a red hat system and I have multiple XML files generated from various SOAP requests that are in a format that is not compatible with MySQL's LoadXML function. I need to load the data into MySQL tables. One table will be setup for each type of XML file, depending on the data received via the Soap XML API.
Sample format of one of the files is as this, but each file will have a different number of columns and different column names. I am trying to find a way to convert them to a compatible format in the most generic way possible since I will have to create any customized solution for each API request/response.
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<dbd:DataRetrievalRequestResponse xmlns:dbd="dbd.v1">
<DataObjects>
<ObjectSelect>
<mdNm>controller-ac</mdNm>
<meNm>WALL-EQPT-A</meNm>
</ObjectSelect>
<DataInstances>
<DataInstance>
<instanceId>DSS1</instanceId>
<Attribute>
<name>Name</name>
<value>DSS1</value>
</Attribute>
<Attribute>
<name>Operational Mode</name>
<value>mode-fast</value>
</Attribute>
<Attribute>
<name>Rate - Down</name>
<value>1099289</value>
</Attribute>
<Attribute>
<name>Rate - Up</name>
<value>1479899</value>
</Attribute>
</DataInstance>
<DataInstance>
<instanceId>DSS2</instanceId>
<Attribute>
<name>Name</name>
<value>DSS2</value>
</Attribute>
<Attribute>
<name>Operational Mode</name>
<value>mode-fast</value>
</Attribute>
<Attribute>
<name>Rate - Down</name>
<value>1299433</value>
</Attribute>
<Attribute>
<name>Rate - Up</name>
<value>1379823</value>
</Attribute>
</DataInstance>
</DataInstances>
</DataObjects>
</dbd:DataRetrievalRequestResponse>
</soap:Body>
</soap:Envelope>
Of course I want the data to be entered into a mysql table with column names 'id, Name, Group' rows for each unique instance
Name
Operational Mode
Rate - Down
Rate - Up
DSS1
mode-fast
1099289
1479899
DSS2
mode-fast
1299433
1379823
Do I need to create an XSLT and preprocess this XML data from command line prior to running it to LoadXML to get it into a format that MySQL LoadXML function will accept? This would not be a problem, but I am not familiar with XSLT transformations.
Is there a way to reformat the above XML to straight CSV (preferred), or to another XML format that is compatible, such as the examples given in mysql documentation for loadxml?
<row>
<field name='column1'>value1</field>
<field name='column2'>value2</field>
</row>
I tried doing LOAD DATA INFILE and using ExtractValue function, but some of the values have spaces in them, and the delimiter for ExtractValue is hard coded to single-space. This makes it unusable as a workaround.
Your question is very general (which is fine!) so my answer is also quite general.
Firstly, it's certainly true that XSLT is an ideal generic tool for problems of this sort. I have absolutely no doubt that every one of your SOAP messages could be coerced into a suitable form, using an XSLT that's customised for each type of message, while still remaining structurally very similar, which is what you'd want if you're new to XSLT.
I'm not sure how familiar you are with XPath, XML, XML namespaces, etc, but I think the task here is simple enough to tackle, and if you do have any tricky XPath expressions to write you can always come back to StackOverflow and ask for help.
From what you've said it sounds like you're confident that each SOAP message can be mapped to a single table. I'm going to suggest an XSLT pattern that would be customisable for each type of SOAP message, where you have an xsl:for-each statement that iterates over each row, and within that you create a row element and populate it with fields.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- indent the output, for ease of reading -->
<xsl:output indent="yes"/>
<!-- process the document -->
<xsl:template match="/">
<!-- create the root element of the output -->
<resultset>
<!-- create each row of the output, by iterating over the
repeating elements in the SOAP message -->
<xsl:for-each
select="//DataInstance">
<row>
<!-- create each field -->
<!-- This field is defined individually, and the value
is produced by evaluating the 'instanceId' xpath
relative to the current DataInstance -->
<field name="id"><xsl:value-of select="instanceId"/></field>
<!-- these field can be generated with a loop -->
<xsl:for-each select="Attribute">
<field name="{name}"><xsl:value-of select="value"/></field>
</xsl:for-each>
</row>
</xsl:for-each>
</resultset>
</xsl:template>
</xsl:stylesheet>
Result of this, run over your sample SOAP message:
<resultset>
<row>
<field name="id">DSS1</field>
<field name="Name">DSS1</field>
<field name="Operational Mode">mode-fast</field>
<field name="Rate - Down">1099289</field>
<field name="Rate - Up">1479899</field>
</row>
<row>
<field name="id">DSS2</field>
<field name="Name">DSS2</field>
<field name="Operational Mode">mode-fast</field>
<field name="Rate - Down">1299433</field>
<field name="Rate - Up">1379823</field>
</row>
</resultset>
If you can follow this general pattern, you should be able to write a custom XSLT for every kind of SOAP message in your collection. You will just need to modify the various XPath expressions in the stylesheet:
//DataInstance means "every DataInstance"
instanceId means "the instanceId that's a child of the current ("context") element.
name means "the name element that's a child of the current element.
value means "the value element that's a child of the current element.
In the example SOAP message you gave, the Attribute element maps to a field, so all those elements could be copied generically, with another xsl:for-each, but for your other documents you may have to just define each field element individually, as I did for the id element in my answer.

Unable to parse xml with null content in azure logic apps

I am getting an XML with following structure
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<datym>
<bla bla>
</datym>
<datym>
<bla bla>
</datym>
</Data>
This i can successfully parsed to json and do all the work. Sometimes i am getting an empty xml with following format.
<?xml version="1.0" encoding="UTF-8"?>
<Data></data>
This however fail to parse as an xml or json using logic apps.So how do i do a validation if this is parsable XML or the empty XML? i thought of using contains() function after initiate a string but this is huge performance hit.
Thanks for your ideas.
I thought your empty xml example is till parsable. I tried to parse a xml file to a json file. This is my xml content.
<Invoices
xmlns="http://gateway.com/schemas/Invoices"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://gateway..com/schemas/Invoices Invoices.xsd">
<DocumentInfo></DocumentInfo>
<Header></Header>
<Documents></Documents>
</Invoices>
After parse, this is the json content:
{
"Invoices": {
"#xmlns": "http://gateway.com/schemas/Invoices",
"#xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"#xsi:schemaLocation":
"http://gateway..com/schemas/InvoicesInvoices.xsd",
"DocumentInfo": "",
"Header": "",
"Documents": ""
}
}
So maybe you could refer to my Logic App flow. I used a xml file as a display.
Hope this could help you, if you still have other questions, please let me know.
I actually find a way around this. So i thought i will answer my own so future others would find it useful.
My method is using XPATH.
Simply check the first node. If it returns empty array then its empty otherwise go with the normal processing.
xpath(xml(base64ToString(variables('content'))),'//datym')
or
xpath(xml(base64ToString(variables('content'))),'//datym[1]')

Generate XML from XSD repeatable elements

Whats the tool to generate xml from xsd and the generated xml should contain more than one entries for the repeatable elements? I tried out tools that are available on eclipse and some online tools like xml-generator, but none of these work. They all generate only one entry for the repeatable elements.
Note: I want to convert the generated xml to json, but the xml-json convertor treats the repeatable elements in the xml as an array only if it has more than one entry.
Generating XMLs from XSD can be quite challenging, if only because what people expect to see may not be possible to be captured by an XSD.
QTAssistant (I am associated with it) has quite extensive features when it comes to sample XML creation.
The simplest (and dumbest) one (available by right clicking on the element in the graphical XSD visualizer) is still able to create two elements if the associated maxOccurs is greater than one.
However, the XML may be off: just because one may have named a field dateTime, it doesn't mean the generated text node will be a valid date time value, if the schema defined it as a string. The tool will also only create one of the choices (if your schema uses xsd:choice), etc.
QTAssistant can make use of additional metadata which gives the user ultimate control over the generated samples. It can even create thousands of XMLs by doing combinations captured using metadata items. (You should contact us on the support site if you're interested in these scenarios).
Regarding XML to JSON conversion, QTAssistant can also correctly convert XMLs to JSON for repeating fields. Given this XML:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- Sample XML generated by QTAssistant (http://www.paschidev.com) -->
<fundamo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:common="http://www.myorg.com/xsd/gen_fin">
<response>
<common:code>code1</common:code>
<common:description>description1</common:description>
</response>
<transaction>
<transactionRef>transactionRef1</transactionRef>
<dateTime>dateTime1</dateTime>
<userName>userName1</userName>
</transaction>
</fundamo>
The corresponding JSON is:
{
"response": {
"code": "code1",
"description": "description1"
},
"transaction": [
{
"transactionRef": "transactionRef1",
"dateTime": "dateTime1",
"userName": "userName1"
}
]
}
You may notice that transaction is an array, even if XML has only one of those elements. This conversion works for valid XMLs, as long as you have a defined XSD for all its content. For the past 2+ years we've been calling it "XSD-aware JSON conversion". It is also possible to define casing conversion strategies (e.g. change upper case to lower case since XML elements tend to be upper case, while JSON "people" prefer them lower case).
If you're in for a free tool or to write your own, I am sure you can use the free evaluation as a source of inspiration to address only the specific features you're interested in.

How to transform & nbsp; in XSLT

I have a following xslt:
<span><xsl:text disable-output-escaping="yes"><![CDATA[ Some text]]></xsl:text></span>
After transformation I get:
<span>&nbsp;Some text</span>
which is rendered as: & nbsp;Some text
I want to render & nbsp; as space character. I have tried also change disable-output-escaping to no, but it didn't helped.
thanks for help.
The other two answers are correct, but I decided to take a little broader view to this subject.
What everyone should know about CDATA sections
CDATA section is just an alternative serialization form to an escaped XML string. This means that parser produces the same result for <span><![CDATA[ a & b < 2 ]]></span> and <span> a & b < 2 </span>. XML applications work on the parsed data, so an XML application should produce the same output for both example input elements.
Briefly: escaped data and un-escaped data inside a CDATA section mean exactly the same.
In this case
<span><xsl:text disable-output-escaping="yes"><![CDATA[ Some text]]></xsl:text></span>
is identical to
<span><xsl:text disable-output-escaping="yes">&nbsp;Some text</xsl:text></span>
Note that the & character has been escaped to & in the latter serialization form.
What everyone should know about disable-output-escaping
disable-output-escaping is a feature that concerns the serialization only. In order to maintain the well-formedness of the serialized XML, XSLT processors escape & and < (and possibly other characters) by using entities. Their escaped forms are & and <. Escaped or not, the XML data is the same. XSLT elements <xsl:value-of> and <xsl:text> can have a disable-output-escaping attribute but it is generally advised to avoid using this feature. Reasons for this are:
XSLT processor may produce only a result tree, which is passed on to another process without serializing it between the processes. In such case disabling output escaping will fail because the XSLT processor is not able to control the serialization of the result tree.
An XSLT processor is not required to support disable-output-escaping attribute. In such case the processor must escape the output (or it may raise an error) so again, disabling output escaping will fail.
An XSLT processor must escape characters that cannot be represented as such in the encoding that is used for the document output. Using disable-output-escaping on such characters will result in error or escaped text so again, disabling output escaping will fail.
Disabling output escaping will easily lead to malformed or invalid XML so using it requires great attention or post processing of the output with non-XML tools.
disable-output-escaping is often misunderstood and misused and the same result could be achieved with more regular ways e.g. creating new elements as literals or with <xsl:element>.
In this case
<span><xsl:text disable-output-escaping="yes"><![CDATA[ Some text]]></xsl:text></span>
should output
<span> Some text</span>
but the & character got escaped instead, so in this case the output escaping seems to fail.
What everyone should know about using entities
If an XML document contains an entity reference, the entity must be declared, if not, the document is not valid. XML has only 5 pre-defined entities. They are:
& for &
< for <
> for >
" for "
&apos; for '
All other entity references must be defined either in an internal DTD of the document or in an external DTD that the document refers to (directly or indirectly). Therefore blindly adding entity references to an XML document might result in invalid documents. Documents with (X)HTML DOCTYPE can use several entities (like ) because the XHTML DTD refers to a DTD that contains their definition. The entities are defined in these three DTDs: http://www.w3.org/TR/html4/HTMLlat1.ent , http://www.w3.org/TR/html4/HTMLsymbol.ent and http://www.w3.org/TR/html4/HTMLspecial.ent .
An entity reference does not always get replaced with its replacement text. This could happen for example if the parser has no net connection to retrieve the DTD. Also non-validating parsers do not need to include the replacement text. In such cases the data represented by the entity is "lost". If the entity gets replacement works, there will be no signs in the parsed data model that the XML serialization had any entity references at all. The data model will be the same if one uses entities or their replacement values. Briefly: entities are only an alternative way to represent the replacement text of the entity reference.
In this case the replacement text of is   (which is same than   using hexadecimal notation). Instead of trying to output the entity, it will be easier and more robust to just use the solution suggested by #phihag. If you like the readability of the entity you can follow the solution suggested by #Michael Krelin and define that entity in an internal DTD. After that, you can use it directly within your XSLT code.
Do note that in both cases the XSLT processor will output the literal non-breaking space character and not the entity reference or the   character reference. Creating such references manually with XSLT 1.0 requires the usage of disable-output-escaping feature, which has its own problems as stated above.
I think you should use  , because entity is likely to be not defined. And no CDATA.
One more possibility is to define nbsp entity for your xsl file:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY nbsp " ">
]>
<xsl:stylesheet version="1.0" …
In CDATA, all values are literal. You want:
<span><xsl:text> Some text</xsl:text></span>

how to save serializable object to database field

I'm using linq to sql and I have the need to save a few of our serializable objects to a field in our sql server database. I've defined the field as varbinary(MAX) but I'm not 100% sure that's correct. So, does anyone know how I can save an object to this field and then read it back? Thanks.
I think this depends on the format the objects are serialized in. For instance, if you are serializing out to xml then sql server (at least the latest version) supports an xml type. If you are indeed serializing out to binary, then sticking with binary is probably okay. Picking a db type that is similar to the type of the serialization will minimize a transformation bridging the db type to the object type.
I have gotten the object to serialize but I'm running into a problem getting the resulting xml saved. The object field is defined as xml so in linq to sql it's an xelement. I'm using the following code to serialize and get the xml string. But when I try to load the xml string to the xelement I get the "Illegal characters in path" error. However when I look at the generated xml I don't see any illegal characters. Here is the generated xml.
using (StringWriter sw = new StringWriter())
{
var x = new XmlSerializer(sessionObject.ObjectToStore.GetType());
x.Serialize(sw, sessionObject.ObjectToStore);
sessionRecord.sessionObject = XElement.Load(sw.ToString());
}
<?xml version="1.0" encoding="utf-16"?>
<SerializableJobSearch xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Keywords>electronics</Keywords>
<StateList />
<Radius>50</Radius>
<City />
<State />
<DisciplineIdList />
<IndustryIdList />
<Direct>false</Direct>
<TempHire>false</TempHire>
<Contract>false</Contract>
<PerHour>false</PerHour>
<PerYear>false</PerYear>
<DegreeLevel />
<IncludeJobsRequiringLess>false</IncludeJobsRequiringLess>
<AddedWithin>0</AddedWithin>
<OrderBy>Rank</OrderBy>
<OrderByRank>true</OrderByRank>
<resultsPerPage>50</resultsPerPage>
<NeedToMake />
</SerializableJobSearch>