XML to Json conversion with HTML string data - json

I have XML documents that I am trying to convert to Json but some of the string fields have HTML tags in them (from copy/paste of text fields from Word documents). The source XML looks like this:
<my:Request_Description>
<html xml:space="preserve" xmlns="http://www.w3.org/1999/xhtml">
<div>test</div>
</html>
</my:Request_Description>
When JsonConvert.SerializeXmlNode is called the Json ends up as this:
"Request_Description": {
"html": {
"#xml:space": "preserve",
"#xmlns": "http://www.w3.org/1999/xhtml",
"#significant-whitespace": [
"\r\n ",
"\r\n"
],
"div": "test"
}
}
I tried to just declare the field as a string but when calling deserializeobject the error is Unexpected character encountered while parsing value.
Is there something I should do on the serializexmlnode to make the Json result different? Or is there something I can do on the deserializeobject to have it ignore the HTML tag?
Ideally the json would be something like below but I assume some escape characters would need to be included for the quote marks. The main point being that HTML tags do NOT denote a separate node but instead are part of the value for the node. I started looking into XSLT and thought that might be an option.
{
"Request_Description": "<html xml:space="preserve" xmlns="http://www.w3.org/1999/xhtml"><div>test</div></html>"
}

Switched to using XDocument and this code worked.
XElement req_desc = newxdoc.Root.Element("Request_Description");
if (req_desc != null)
{
XElement replacenode = new XElement(req_desc.Name, req_desc.Value);
req_desc.Parent.Add(replacenode);
req_desc.Remove();
}

Related

odd result converting XML to JSON

I get an odd result when converting XML to JSON. I am using JsonConvert.SerializeXmlNode():
XmlNodeList requestNode = xmlDocument.GetElementsByTagName("root","*");
XmlNode objNode = requestNode[0];
string json = JsonConvert.SerializeXmlNode(objNode);
If my nodes include a namespace prefix and URL, the JsonConvert.SerializeXmlNode comes back with odd looking JSON having attributes like :
{"prefix:Amount":{"#xmlns:prefix":"http://BLA","#text":"1000"}}.
I expect :
{"prefix:Amount": 100, etc etc.}
The XML am trying to convert looks something like:
<a:root>
<prefix:Amount xmlns:prefix="http://BLA>1000</prefix:Amount>
</a:root>
Your XML is not well-formed: xmlns:prefix="http://BLA is missing a closing quote.

Docx4j - Replacing Word merge field with HTML content

I am trying to replace a Word merge field "test" with an HTML content :
String myText = "<html><body><h1>Hello</h1></body></html>";
using Docx4j.
String myText = "<html><body><h1>Hello</h1></body></html>";
try {
WordprocessingMLPackage docxOut =
WordprocessingMLPackage.load(new java.io.File("/tmp/template.docx"));
Map<DataFieldName, String> data = new HashMap<>();
data.put(new DataFieldName("test"), myText);
org.docx4j.model.fields.merge.MailMerger.performMerge(docxOut, data, true);
docxOut.save(new java.io.File("/tmp/newTemplate.docx"));
} catch (Docx4JException e) {
LOGGER.error(e.getMessage());
}
As a result, I have an output (newTemplate.docx) with my merge field replaced by
"<html><body><h1>Hello</h1></body></html>"
without being interpreted as HTML. I tried adding :
docxOut.getContentTypeManager().addDefaultContentType("html", "text/html");
but it still didn't work. I am not even sure if interpreting HTML while replacing a Word merge field can be done using Docx4j or if I'm missing something.
Any help would be welcome.
You can use the OpenDoPE approach to bind a content control to a Custom XML element which contains escaped XHTML.

HtmlAgilityPack The '"' character, hexadecimal value 0x22, cannot be included in a name

This line:
Dim NewHTMLString As String = XDocument.Parse(htmldoc.DocumentNode.OuterHtml).ToString()
Produces this error:
The '"' character, hexadecimal value 0x22, cannot be included in a name.
This is the line in the HTML it says is wrong:
if ( typeof JSON != 'object' || !JSON.stringify || !JSON.parse ) { document.write( "<scr" + "ipt type=\"text\/javascript\" src=\"http:\/\/blahblah"><\/script>\n" ); };
That's because XDocument meant to deal with XML, hence it doesn't support arbitrary Javascript string. XDocument thinks this part : <scr", as beginning of an XML node and double-quote (") character in the XML node name is considered invalid.
I was using XDocument in the answer to your previous question to get beautifully formatted XML output in console, and I did that because I know exactly that my HTML is XML compliant. In this case, your HTML isn't valid from XML point of view and it isn't clear what you're trying to achieve using XDocument here. If you simply need to check result from modification you did to the original HTML, you can either directly print htmldoc.DocumentNode.OuterHtml to console or save the HTML to a new file like so :
htmldoc.Save("path_to_new_file.html")

Passing apostrophe as part of JSON string

I have a problem that my JSON service is not being called, due to bad format probably.
Still, I dont understand what is wrong it it. I read about it and found out that apostrophes should not be escaped. Also when I escape them, it doesnt work.
"{
"fields": [
{
"Text": "PaymentReminders",
"Value": "'yes'"
}
]
}"
And yes, I really need 'yes' to be under apostrophes.
I am expecting a String on server side, which I then deserialize. It works without apostrophes.
Thanks!
edit1:
This is the structure that accepts in on the server:
Public Class TemplateField
Public Property Value() As String = "val"
Public Property Text() As String = "tex"
End Class
Public Class FieldsList
Public Property fields() As TemplateField()
End Class
and it gets deserialzed like this:
Dim jsSerializer As New JavaScriptSerializer
Dim fieldsArray As EventInfoDetails.FieldsList
fieldsArray = jsSerializer.Deserialize(Of EventInfoDetails.FieldsList)(fields)
and all that works, unless it contains apostrophes. Like I cannot stick apostrophe inside a string.
JSON does not only not require to escape apostrophes, but in fact it does not allow doing so (contrary to JavaScript). So your
"Value": "'yes'"
Is perfectly valid JSON. This is, unless you were inserting this JSON as a String literal inside JavaScript code, in which case it would be JavaScript the one requiring you to escape your ' as \' (you'd need two escapes, the JSON one and the JavaScript one on top of it).
Anyway, there's something strange about your code:
"{
"fields": [
{
"Text": "PaymentReminders",
"Value": "'yes'"
}
]
}"
Why is your entire JSON structure surrounded by quotes (")? Is it a string literal of any kind inside other programming language? In such case, you might need to follow that language's escaping rules for those quote (") symbols. Both Java and VB, for example, would use \" there...

Escape included file in freemarker

I am trying to render a json that contains escaped xml. So " should be \", new lines \n, etc. To make it more readable I want to divide that into two files, one with json, the second one with xml. Both of them must be templates as they have some dynamic values.
{
"xml" : "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n ..."
}
json.ftl:
{
"xml" : "<#include "xml.ftl">"
}
xml.ftl:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>...
How can I achieve that escaping? I know that there are js_string and json_string but the problem is that I do not know how to apply them with include. Thanks for help
You could make a macro like this:
<#macro includeAsJsonString templateName>
<#local captured><#include templateName></#local>
${captured?json_string}<#t>
</#macro>
and then you do this:
{
"xml" : "<#includeAsJsonString 'xml.ftl' />"
}
(Of course, you don't have to create a macro for this, but I think that's more reusable.)