Structured text in JSON - json

I have been looking for a way to capture structured text (sections, paragraphs, emphasis, lists, etc.) in JSON, but I haven't found anything yet. Any suggestions? (Markdown crossed my mind, but there might be something better out there.)

How about something like this:
[ { "heading": "Foobar Example" },
{ "paragraph":
[
"This is normal text, followed by... ",
{ "bold": "some bold text" },
"etc."
]
}
]
That is:
use a string for plain text without formatting or other mark-up;
use an array whenever you want to indicate an ordered sequence of certain text elements;
use an object where the key indicates the mark-up and the value the text element to which the formatting is applied.

HTML is a well-established way to describe structured text, in a plain-text format(!). Markdown (as you mentioned) would work as well.
My view is that your best bet is probably going to be using some sort of plain-text markup such as those choices, and place your text in a single JSON string variable. Depending on your application, it may make sense to have an array of sections, containing an array of paragraphs, containing an array of normal/bold/list sections etc. However, in the general case I think good old-fashioned blocks are markup will ironically be cleaner and more scalable, due to the ease of passing them around, and the well-developed libraries for full-blown parsing if/when required.

There also seems to be a specification that might accomplish this Markdown Syntax for Object Notation (MSON)
Not sure if for you it's worth the trouble of implementing the spec, but it seems to be an option.

Related

Why do some strings contain " " and some " ", when my input is the same(" ")?

My problem occurs when I try to use some data/strings in a p-element.
I start of with data like this:
data: function() {
return {
reportText: {
text1: "This is some subject text",
text2: "This is the conclusion",
}
}
}
I use this data as follows in my (vue-)html:
<p> {{ reportText.text1 }} </p>
<p> {{ reportText.text2 }} </p>
In my browser, when I inspect my elements I get to see the following results:
<p>This is some subject text</p>
<p>This is the conclusion</p>
As you can see, there is suddenly a difference, one p element uses and the other , even though I started of with both strings only using . I know and technically represent the same thingm, but the problem with the string is that it gets treated as a string with 1 large word instead of multiple separate words. This screws up my layout and I can't solve this by using certain css properties (word-wrap etc.)
Other things I have tried:
Tried sanitizing the strings by using .replace( , ), but that doesn't do anything. I assume this is because it basically is the same, so there is nothing to really replace. Same reason why I have to use blockcode on stackoverflow to make the destinction between and .
Logged the data from vue to see if there is any noticeable difference, but I can't see any. If I log the data/reportText I again only see string with 's
So I have the following questions:
Why does this happen? I can't seem to find any logical explanation why it sometimes uses 's and sometimes uses 's, it seems random, but I am sure I am missing something.
Any other things I could try to follow the path my string takes, so I can see where the transformation from to happens?
Per the comments, the solution devised ended up being a simple unicode character replacement targeting the \u00A0 unicode code point (i.e. replacing unicode non-breaking spaces with ordinary spaces):
str.replace(/[\\u00A0]/g, ' ')
Explanation:
JavaScript typically allows the use of unicode characters in two ways: you can input the rendered character directly, or you can use a unicode code point (i.e. in the case of JavaScript, a hexadecimal code prefixed with \u like \u00A0). It has no concept of an HTML entity (i.e. a character sequence between a & and ; like ).
The inspector tool for some browsers, however, utilizes the HTML concept of the HTML entity and will often display unicode characters using their corresponding HTML entities where applicable. If you check the same source code in Chrome's inspector vs. Firefox's inspector (as of writing this answer, anyway), you will see that Chrome uses HTML entities while Firefox uses the rendered character result. While it's a handy feature to be able to see non-printable unicode characters in the inspector, Chrome's use of HTML entities is only a convenience feature, not a reflection of the actual contents of your source code.
With that in mind, we can infer that your source code contains unicode characters in their fully rendered form. Regardless of the form of your unicode character, the fix is identical: you need to target these unicode space characters explicitly and replace them with ordinary spaces.

In XML, is there a way to turn all of a node's children into a single string including the nested XML tags?

I have bits like the following in an XML file that is a data source for an HTML page that uses CSS and javascript only. The special XML codes are my own, and I want to process them with javascript.
<listitem>regular text could be in here</listitem>
<listitem>possibly with <b>HTML markup</b></listitem>
<listitem>or <special>special xml</special></listitem>
What I dream of is a way to get from .getElementsByTagName("listitem") to the following array.
["regular text could be in here", "possibly with <b>HTML markup</b>", "or <special>special xml</special>"]
That way, I could process each listitem as part of the array. However, the XML parser breaks apart all the XML for each listitem. Other than using CDATA, which gets messy, is there another way?
I think the answer is:
Array.prototype.slice.call(document.getElementsByTagName("listitem")).map(function(x) {return x.innerHTML})
It will return:
["regular text could be in here", "possibly with <b>HTML markup</b>", "or <special>special xml</special>"]

Potential pitfalls with my new markup language?

Something that's really bothered me about XHTML, and XML in general is why it's necessary to indicate what tag you're closing. Things like <b>bold <i>bold and italic</b> just italic</i> aren't legal anyways. Thus I think using {} makes more sense. Anyway, here's what I came up with:
doctype;
html
{
head
{
title "my webpage"
javascript '''
// code here
// single quotes do not allow variable substitution, like PHP
// triple quotes can be used like Python
'''
}
body
{
table {
tr {
td "cell 1"
td "cell 2"
td #var|filter1|filter2:arg
}
}
p "variable #var in a string"
p "variable #{var|withfilter}"
input(type=password, value=secret); // attributes are specified like this
br; // semi-colons are used on elements that don't have content
p { "strings are" "automatically" "concatenated together" #andvars "too" }
}
}
Tags that only contain one element do not need to be enclosed in braces (for example td "cell 1" the td is closed immediately after the text). Strings are outputted directly, except double-quoted strings allow variable substitution, and single quotes do not. I'm adopting a filtering scheme similar to Django's. The thing I'm most concerned about, I think, is variable substitution in double-quotes.. I don't want people to have to open and close single quotes everywhere because the syntax things are being treated as vars that shouldn't. I don't think the # character is very commonly used in code. I was going to use $ like PHP, but jQuery uses that, and I want to allow people to do substitutions in their JS too (of course, if they don't need to, they should use single quotes!)
Templates will use "dictionaries". By default, it uses this HTML dict, with familiar tags, but you can easily add your own. "Tags" may consist of not just one, but multiple HTML tags.
Still need to decide how to do loops and including partials...
Edit: Started an open source project, for those interested.
I believe you can get close to that with the syntax of TCL script language.
The thing I like the most about your idea is the removal of the (to me very) redundant information in the closing tags of the (has it's roots in) SGML markup.
Another clean option IMO is to go the road of using indenting to specify scope, eliminating braces all together. With the assumption of a little editor support, I can imagine this happening.
I think it's possibly stiflling that globally used specifications cater to the theorhetical person using VI or Notepad to type out their markup...

Formatting a String Array to Display to Users

What is the best format to communicate an array of strings in one string to users who are not geeks?
I could do it like this:
Item1, Item2, Item3
But that becomes meaningless when the strings contain spaces and commas.
I could also do it this way:
"Item1", "Item2", "Item3"
However, I would like to avoid escaping the array elements because escaped characters can be confusing to the uninitiated.
Edit: I should have clarified that I need the formatted string to be one-line. Basically, I have a list of lists displayed in a .Net Winforms ListView (although this question is language-agnostic). I need to show the users a one-line "snapshot" of the list next to the list's name in the ListView, so they get a general idea of what the list contains.
You can pick a character like pipe (|) which are not used much outside programs. It also used in wiki markup for tables which may be intuitive to those who are familiar with wiki markup.
Item1| Item2| Item3
In a GUI or color TUI, shade each element individually. In a monochrome TUI, add a couple of spaces and advance to the next tab position (\t) between each word.
Using JSON, the above list would look like:
'["Item1", "Item2", "Item3"]'.
This is unambiguous and a syntax in widespread use. Just explain the nested syntax a little bit and they'll probably get it.
Of course, if this is to be displayed in a UI, then you don't necessarily want unambiguous syntax as much as you want it to actually look like something intended for the end user. In that case it would depend exactly how you are displaying this to the user.
Display each element as a cell in a table.
How about line breaks after each string? :>
Display each string on a separate line, with line numbers:
1. Make a list
2. Check it twice
3. Say something nice
It's the way people write lists in the real world, y'know :)
Use some kind of typographical convention, for example a bold hashmark and space between strings.
milk # eggs # bread # apples # lettuce # carrots
CSV. Because the very first thing your non-technical user is going to do with delimited data is import it into a spreadsheet.

TextArea() - handling text content as list of words

I like to have access to items/list of words in TextArea, like Word(2). Is there native support for that in TextArea() or some good List object to be used?
E.g.
1stword = TextArea.TextAsList(1)
2ndword = TextArea.TextAsList(2)
Since there is already .htmltext, is there some HTML object that could be used to make such a list easily?
Nope. Just get the text in there and use a String.Split.
Besides, if there were such a method, it would also do the split.