Dart/Flutter json.decode and multiline value - json

I have json data with multiline. When I use json.decode, I've a error.
"FormatException (FormatException: Control character in string
... "count": "1", "price": "5", "description": "It is a long established fact
My Json Data
var str = {
"description": "It is a long established fact
that a reader will be distracted by the readable content
of a page when looking at its layout. The point
of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English."
}
Thanks,

Not sure if you want to encode or decode.
If you'd like to encode (create a JSON string out of data), you should make sure that the variable you supply to json.encode is of type Map<String, dynamic>. Also if you want to have multiline strings, you should use triple quotes """ for that in Dart. Here's an example
var data = {
"description": """It is a long established fact
that a reader will be distracted by the readable content
of a page when looking at its layout. The point
of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English."""
};
final jsonString = json.encode(data);
If you want to decode (turning a JSON string into a Dart object), your input string should be formatted properly. JSON doesn't have support strings so you'll need to add line breaks \n, these then have to be ignored in your string declaration as well, just like the quotes within your JSON, resulting in \\n and \":
var str = "{\"description\": \"It is a long established fact\\nthat a reader will be distracted by the readable content\\nof a page when looking at its layout. The point\\nof using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English.\"}";
final data = json.decode(str);

Related

HTML Purifier: disable syntax repair

Consider the following setup of HTML Purifier:
require_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.EscapeInvalidTags', true);
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
If you run the following case:
$dirty_html = "<p>lorem <script>ipsum</script></p>";
//output
<p>lorem <script>ipsum</script></p>
As expected, instead of removing the invalid tags, it just escaped them all.
However, consider these other test cases:
case 1
$dirty_html = "<p>lorem <b>ipsum</p>";
//output
<p>lorem <b>ipsum</b></p>
//desired output
<p>lorem <b>ipsum</p>
case 2
$dirty_html = "<p>lorem ipsum</b></p>";
//output
<p>lorem ipsum</p>
//desired output
<p>lorem ipsum</b></p>
case 3
$dirty_html = "<p>lorem ipsum<script></script></p>";
//output
<p>lorem ipsum<script /></p>
//desired output
<p>lorem ipsum<script></script></p>
Instead of just escaping the invalid tags, first it repairs them and then escapes them. This way things can get very strange, for example:
case 4
$dirty_html = "<p><a href='...'><div>Text</div></a></p>";
//output
<p></p><div>Text</div></p>
Question
Therefore, is it possible to disable the syntax repair and just escape the invalid tags?
The reason you're seeing a syntax repair is because of the fundamental way that HTML Purifier approaches the topic of HTML sanitation: It first parses the HTML to understand it, then decides which of the elements to keep in the parsed representation, then renders the HTML.
You might be familiar with one of stackoverflow's most famous answers, which is an amused and exasperated observation that true regular expressions can't parse HTML - you need additional logic, since HTML is a context-free language, not a regular language. (Modern 'regular' expressions are not formal regular expressions, but that's another matter.) In other words, if you actually want to know what's going on in your HTML - so that you correctly apply your white- or blacklisting - you need to parse it, which means the text ends up in a totally different representation.
An example of how parsing causes changes between input and output is that HTML Purifier strips extraneous whitespace from between attributes, which may not bother you in your case, but still stems from that the parsed representation of HTML is quite different from the text representation. It's not trying to preserve the form of your input - it's trying to preserve the function.
This gets tricky when there is no clear function and it has to start guessing. To pick an example, imagine while going through the HTML input, you come across what looks like an opening <td> tag in the middle of nowhere - you can consider it valid if there was an unclosed <td> tag a while back as long as you add a closing tag, but if you had escaped the first tag as <td>, you would need to discard the text data that would have been in the <td> since - depending on browser rendering - it may put data into parts of the page visually outside the fragment, i.e. places that are not clearly user-submitted.
In brief: You can't easily disable all syntax repair and/or tidying without having to rummage through the parsing guts of HTML Purifier and ensuring no information you find valuable is lost.
That said, you can try switching the underlying parsing engine with Core.LexerImpl and see if it gets you better results! :) DOMLex definitely adds missing ending nodes right from the get-go, but from a cursory glance, DirectLex may not. There is a large chunk of autoclosing logic in HTMLPurifier's MakeWellFormed strategy class which might also pose a problem for you.
Depending on why you want to preserve this data, though (to allow analysis?), saving the original input separately (while leaving HTML Purifier itself be) may provide you with a better solution.

How to format text in WKWebView by wrapping it into HTML using Swift iOS?

I am loading text into a WKWebView. My goal is to:
set font size to a fixed value
align text vertically in the WKWebView
align text horizontally in the WKWebView
I have not been able to find functions on the WKWebView that would do that. So I am trying to wrap the string into some HTML code.
Here is my code:
func loadQuestions() {
questionView.questionLabel.text = questionArray[questionIndex].questionText
let questionText: String = questionArray[questionIndex].questionText!
let answerText: String = questionArray[questionIndex].answerText!
let htmlWrap = "<p>\(questionText)</p><p>\(answerText)</p>"
answerView.webView.loadHTMLString(htmlWrap, baseURL: nil)
}
This works, but the moment I am trying to add some advanced formatting that I get using a converter website (since I don't know html) I get all the quotation marks (") which invalidate my string.
let htmlWrap = "<p style="text-align: center;"><span style="font-size: 36pt;">\(questionText)</span></p><p style="text-align: center;"><span style="font-size: 36pt;">\(answerText)</span></p>"
That above will not compile.
It seems I can use the escape character (\) in the html string before each quotation mark (") and that seems to work if it is a simple line. But does not work for complex html. So how do I do this?
Thank you!
Two ways:
As you already know, using \ is a good way to go around this, because for Swift a string starts with " and ends with ". Adding \ before " with convert it into a part of the string and not the control character specifying the end of the string. So you can copy the output of your converter and paste it in a text editor and find and replace " with \". There is no other way if you want to use " inside the HTML string (if the option of directly getting the html string from your converter using an API to store into a variable is ruled out).
The other way is to use ' instead of " inside the HTML string. So again you will have to copy the output of your converter to a text editor and find and replace " with '.
Method 1: Escaping
As you alluded to in your initial question, you can always escape any embedded quote characters by prefixing the '\' character. If you go this route, I would make two passes in a global find-and-replace: first replace a single backslash () with two backslashes, then replace each quote with \".
Method 2: Multiline String Literal
As of Swift 4.0 (a year and a half old now), you can also use multiline string literals, which are bounded by """ (three double-quotes) to get yourself partway there. You would do it like this:
let htmlWrap = """
<p style="text-align: center;"><span style="font-size: 36pt;">\(questionText)</span></p><p style="text-align: center;"><span style="font-size: 36pt;">\(answerText)</span></p>"
"""
The first line break (before "
Indentation up to the point of the terminating """ will not be preserved. So for instance the string above would not include the extra four-space indent I put in the code sample, because the terminating """ is also indented by four spaces.
Taking text out of some other generator and inserting into your document between the multiline literal string markers should generally be a safe copy-paste except in the very odd situation where you have three quotes actually in your HTML (it has no special meaning in HTML, so would only come if your plain text had that construct in it, such as if you were quoting this answer).
An important thing to note, however, is that escaping in multiline literals still escapes. So, something like this will not end up as it is written:
let testing = "Testing 123"
let htmlWrap = """
<p>\(testing)</p>
"""
... would end up printing "Testing 123" in your web view.
Method 3: Raw Multiline String Literals (Swift 5+)
Swift 5 (in Xcode 10.2 which is in beta now) remedies this by adding Raw Strings, which can also be multiline like so:
let testing = "Testing 123"
let htmlWrap = #"""
<p>\(testing)</p>
"""#
... which will print out "(testing)" in your web view as you might expect.
Method 4: Separate the HTML into its own file
In my opinion, however, if I am dealing with HTML, I put in its own file, and load that text into a string using the String(contentsOfFile: myFile) initializer to load it in. This keeps HTML text in its own file that as a bonus Xcode will be able to intelligently help you with if you ever go in to edit it.

In XML, is there a way to turn all of a node's children into a single string including the nested XML tags?

I have bits like the following in an XML file that is a data source for an HTML page that uses CSS and javascript only. The special XML codes are my own, and I want to process them with javascript.
<listitem>regular text could be in here</listitem>
<listitem>possibly with <b>HTML markup</b></listitem>
<listitem>or <special>special xml</special></listitem>
What I dream of is a way to get from .getElementsByTagName("listitem") to the following array.
["regular text could be in here", "possibly with <b>HTML markup</b>", "or <special>special xml</special>"]
That way, I could process each listitem as part of the array. However, the XML parser breaks apart all the XML for each listitem. Other than using CDATA, which gets messy, is there another way?
I think the answer is:
Array.prototype.slice.call(document.getElementsByTagName("listitem")).map(function(x) {return x.innerHTML})
It will return:
["regular text could be in here", "possibly with <b>HTML markup</b>", "or <special>special xml</special>"]

preserving linebreaks in values in JSON key-value pairs

I am not sure if line breaks are allowed in JSON values. I certainly am unable to create the following in JSON
{"foo": "I am not sure if line breaks are
allowed in JSON values. I certainly
am unable to create the following in JSON"}
The following certainly does not work
{"foo": "I am not sure if line breaks are\nallowed in JSON values. I certainly\nam unable to create the following in JSON"}
Preamble: I want to send a long message like above either to the browser or to the console app and display it neatly formatted so it is legible to the user.
If you are displaying (or inserting) the json value directly in HTML, you can't use it as it is because in html new lines are ignored and replaced by a space.
For example if you have:
<p>Hello,
I'm in other line.</p>
It will be represented as:
Hello, I'm in other line.
You must convert the new lines to paragraph or <br>, for example:
<p>Hello,<br>
I'm in other line.</p>
That will be show as:
Hello,
I'm in other line
If this is your case, you can simply use String.replace to change \n into <br>\n.
If you're displaying it in HTML, will simple do.
Or in a text area or something, try "\r\n", wrapped in double quotes.
Double backslashes are to escape.

Structured text in JSON

I have been looking for a way to capture structured text (sections, paragraphs, emphasis, lists, etc.) in JSON, but I haven't found anything yet. Any suggestions? (Markdown crossed my mind, but there might be something better out there.)
How about something like this:
[ { "heading": "Foobar Example" },
{ "paragraph":
[
"This is normal text, followed by... ",
{ "bold": "some bold text" },
"etc."
]
}
]
That is:
use a string for plain text without formatting or other mark-up;
use an array whenever you want to indicate an ordered sequence of certain text elements;
use an object where the key indicates the mark-up and the value the text element to which the formatting is applied.
HTML is a well-established way to describe structured text, in a plain-text format(!). Markdown (as you mentioned) would work as well.
My view is that your best bet is probably going to be using some sort of plain-text markup such as those choices, and place your text in a single JSON string variable. Depending on your application, it may make sense to have an array of sections, containing an array of paragraphs, containing an array of normal/bold/list sections etc. However, in the general case I think good old-fashioned blocks are markup will ironically be cleaner and more scalable, due to the ease of passing them around, and the well-developed libraries for full-blown parsing if/when required.
There also seems to be a specification that might accomplish this Markdown Syntax for Object Notation (MSON)
Not sure if for you it's worth the trouble of implementing the spec, but it seems to be an option.