System.Text.Json parse document that exists internal to a string - json

I receive string content that starts with a JSON value (could be simple or complex) and has some additional content afterward. I'd like to be able to parse the JSON document.
I don't have control over the string, so I can't put any kind of delimiter after the JSON content that would enable me to isolate it.
Examples:
"true and some more" - yields <true>
"false this is different" - yields <false>
"5.6/7" - yields <5.6>
"\"a string\""; then this" - yields <"a string">
"[null, true]; and some more" - yields <[null, true]>
"{\"key\": \"value\"}, then the end" - yields <{"key": "value"}>
The issue is the trailing content. The parser expects the input to end and throws an exception:
')' is invalid after a single JSON value. Expected end of data.
There isn't an option in JsonDocumentOptions to allow trailing content.
As a bonus, if you can give a solution that uses ReadOnlySpan<char>, that'd be aweseme.

The suggested answer of the custom reader wasn't working for me because the problem existed in the base reader: it just doesn't like certain trailing characters.
Since I still wanted to rely on JsonDocument.Parse() to extract the element for me, I really just needed to find where the element stopped break that bit off as a separate piece and submit that to the parse method. Here's what I came up with:
public static bool TryParseJsonElement(this ReadOnlySpan<char> span, ref int i, out JsonElement element)
{
try
{
int end = i;
char endChar;
switch (span[i])
{
case 'f':
end += 5;
break;
case 't':
case 'n':
end += 4;
break;
case '.': case '-': case '0':
case '1': case '2': case '3':
case '4': case '5': case '6':
case '7': case '8': case '9':
end = i;
var allowDash = false;
while (end < span.Length && (span[end].In('0'..'9') ||
span[end].In('e', '.', '-')))
{
if (!allowDash && span[end] == '-') break;
allowDash = span[end] == 'e';
end++;
}
break;
case '\'':
case '"':
end = i + 1;
endChar = span[i];
while (end < span.Length && span[end] != endChar)
{
if (span[end] == '\\')
{
end++;
if (end >= span.Length) break;
}
end++;
}
end++;
break;
case '{':
case '[':
end = i + 1;
endChar = span[i] == '{' ? '}' : ']';
var inString = false;
while (end < span.Length)
{
var escaped = false;
if (span[end] == '\\')
{
escaped = true;
end++;
if (end >= span.Length) break;
}
if (!escaped && span[end] == '"')
{
inString = !inString;
}
else if (!inString && span[end] == endChar) break;
end++;
}
end++;
break;
default:
element = default;
return false;
}
var block = span[i..end];
if (block[0] == '\'' && block[^1] == '\'')
block = $"\"{block[1..^1].ToString()}\"".AsSpan();
element = JsonDocument.Parse(block.ToString()).RootElement;
i = end;
return true;
}
catch
{
element = default;
return false;
}
}
It doesn't care so much about what's in the middle except (for strings, objects, and arrays) to know whether it's in the middle of a string (where it would be valid for the end character to be found) and checking for \-delimited characters. It works well enough for my purposes.
It takes a ReadOnlySpan<char> and an integer by reference. i needs to be the start of the expected JSON value, and it will be advanced to the next character after, if a valid value is found. It also follows the standard Try* pattern of returning a bool with an output parameter for the value.

Related

xpath in apps script?

I made a formula to extract some Wikipedia data in Google Seets which works fine. Here is the formula:
=regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Geography')]]"))),"\[[^\]]+\]","")&char(10)&char(10)&iferror(regexreplace(join("",flatten(IMPORTXML(D2,".//p[preceding-sibling::h2[1][contains(., 'Education')]]"))),"\[[^\]]+\]",""))
Where D2 is a URL like https://en.wikipedia.org/wiki/Abbeville,_Alabama
This extracts some Geography and Education data from the Wikipedia page. Trouble is that importxml only runs a few times before it dies due to quota.
So I thought maybe better to use Apps Script where there are much higher limits on fetching and parsing. I could not see a good way however of using Xpath in Apps Script. Older posts on the web discuss using a deprecated service called Xml but it seems to no longer work. There is a Service called XmlService which looks like it may do the job but you can't just plug in an Xpath. It looks like a lot of sweating to get to the result. Any solutions out there where you can just plug in Xpath?
Here is an alternative solution I actually do in a case like this.
I have used XmlService but only for parsing the content, not for using Xpath. This makes use of the element tags and so far pretty consistent on my tests. Although, it might need tweaks when certain tags are in the result and you might have to include them into the exclusion condition.
Tested the code below in both links:
https://en.wikipedia.org/wiki/Abbeville,_Alabama#Geography
https://en.wikipedia.org/wiki/Montgomery,_Alabama#Education
My test shows that the formula above used did not return the proper output from the 2nd link while the code does. (Maybe because it was too long)
Code:
function getGeoAndEdu(path) {
var data = UrlFetchApp.fetch(path).getContentText();
// wikipedia is divided into sections, if output is cut, increase the number
var regex = /.{1,100000}/g;
var results = [];
// flag to determine if matches should be added
var foundFlag = false;
do {
m = regex.exec(data);
if (foundFlag) {
// if another header is found during generation of data, stop appending the matches
if (matchTag(m[0], "<h2>"))
foundFlag = false;
// exclude tables, sub-headers and divs containing image description
else if(matchTag(m[0], "<div") || matchTag(m[0], "<h3") ||
matchTag(m[0], "<td") || matchTag(m[0], "<th"))
continue;
else
results.push(m[0]);
}
// start capturing if either IDs are found
if (m != null && (matchTag(m[0], "id=\"Geography\"") ||
matchTag(m[0], "id=\"Education\""))) {
foundFlag = true;
}
} while (m);
var output = results.map(function (str) {
// clean tags for XmlService
str = str.replace(/<[^>]*>/g, '').trim();
decode = XmlService.parse('<d>' + str + '</d>')
// convert html entity codes (e.g.  ) to text
return decode.getRootElement().getText();
// filter blank results due to cleaning and empty sections
// separate data and remove citations before returning output
}).filter(result => result.trim().length > 1).join("\n").replace(/\[\d+\]/g, '');
return output;
}
// check if tag is found in string
function matchTag(string, tag) {
var regex = RegExp(tag);
return string.match(regex) && string.match(regex)[0] == tag;
}
Output:
Difference:
Formula ending output
Script ending output
Education ending in wikipedia
Note:
You still have quota when using UrlFetchApp but should be better than IMPORTXML's limit depending on the type of your account.
Reference:
Apps Script Quotas
Sorry I got very busy this week so I didn't reply. I took a look at your answer which seems to work fine, but it was quite code heavy. I wanted something I would understand so I coded my own solution. not that mine is any simpler. It's just my own code so it's easier for me to follow:
function getTextBetweenTags(html, paramatersInFirstTag, paramatersInLastTag) { //finds text values between 2 tags and removes internal tags to leave plain text.
//eg getTextBetweenTags(html,[['class="mw-headline"'],['id="Geography"']],[['class="wikitable mw-collapsible mw-made-collapsible"']])
// **Note: you may want to replace &#number; with ascII number
var openingTagPos = null;
var closingTagPos = null;
var previousChar = '';
var readingTag = false;
var newTag = '';
var tagEnd = false;
var regexFirstTagParams = [];
var regexLastTagParams = [];
//prepare regexes to test for parameters in opening and closing tags. put regexes in arrays so each condition can be tested separately
for (var i in paramatersInFirstTag) {
regexFirstTagParams.push(new RegExp(escapeRegex(paramatersInFirstTag[i][0])))
}
for (var i in paramatersInLastTag) {
regexLastTagParams.push(new RegExp(escapeRegex(paramatersInLastTag[i][0])))
}
var startTagIndex = null;
var endTagIndex = null;
var matches = 0;
for (var i = 0; i < html.length - 1; i++) {
var nextChar = html.substr(i, 1);
if (nextChar == '<' && previousChar != '\\') {
readingTag = true;
}
if (nextChar == '>' && previousChar != '\\') { //if end of tag found, check tag matches start or end tag
readingTag = false;
newTag += nextChar;
//test for firstTag
if (startTagIndex == null) {
var alltestsPass = true;
for (var j in regexFirstTagParams) {
if (!regexFirstTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
startTagIndex = i + 1;
//console.log('Start Tag',startTagIndex)
matches++;
}
}
//test for lastTag
else if (startTagIndex != null) {
var alltestsPass = true;
for (var j in regexLastTagParams) {
if (!regexLastTagParams[j].test(newTag)) alltestsPass = false;
}
if (alltestsPass) {
endTagIndex = i + 1;
matches++;
}
}
if(startTagIndex && endTagIndex) break;
newTag = '';
}
if (readingTag) newTag += nextChar;
previousChar = nextChar;
}
if (matches < 2) return 'No matches';
else return html.substring(startTagIndex, endTagIndex).replace(/<[^>]+>/g, '');
}
function escapeRegex(string) {
if (string == null) return string;
return string.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
My function requires an array of attributes for the start tag and an array of attributes for the end tag. It gets any text in between and removes any tags found inbetween. One issue I also noticed was there were often special characters (eg  ) so they need to be replaced. I did that outside the scope of the function above.
The function could be easily improved to check the tag type (eg h2), but it wasn't necessary for the wikipedia case.
Here is a function where I called the above function. the html variable is just the result of UrlFetchApp.fetch('some wikipedia city url').getContextText();
function getWikiTexts(html) {
var geography = getTextBetweenTags(html, [['class="mw-headline"'], ['id="Geography']], [['class="mw-headline"']]);
var economy = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Economy']], 'span', [['class="mw-headline"']])
var education = getTextBetweenTags(html, 'span', [['class="mw-headline"'], ['id="Education']], 'span', [['class="mw-headline"']])
var returnString = '';
if (geography != 'No matches' && !/Wikipedia/.test(geography)) returnString += geography + '\n';
if (economy != 'No matches' && !/Wikipedia/.test(economy)) returnString += economy + '\n';
if (education != 'No matches' && !/Wikipedia/.test(education)) returnString += education + '\n';
return returnString
}
Thanks for posting your answer.

html to plaintext with NodeJS on server side

on the server-side, using Nodejs. I receive a text message containing HTML. I want a function that converts the html to plain text. And please don't tell me to add the tag <plaintext> or <pre>. (convert_to_html function doesn't exist in nodejs)
socket.on('echo', (text) => {
plaintext = convert_to_html(text);
socket.emit('echo', {
message: plaintext
});
});
ideal results:
input: <h1>haha i am big</h1>
plaintext(what i want plaintext to be): <h1 &60;haha i am big </h1 &60;
output: <h1>haha i am big</h1>
current result:
input: <h1>haha i am big</h1>
plaintext: <h1>haha i am big</h1>
output: haha i am big
You can use the insertAdjacementHTML method on the browser side, here you go an example
socket.on("response", function (msg) {
const messages = document.getElementById("messages");
messages.insertAdjacentHTML("beforebegin", msg);
window.scrollTo(0, document.body.scrollHeight);
});
still don't have a proper solution. while i wait for one, i will use reserved characters as a temporary solution.
https://devpractical.com/display-html-tags-as-plain-text/#:~:text=You%20can%20show%20HTML%20tags,the%20reader%20on%20the%20browser.
function parse_to_plain_text(html){
var result = "";
for (var i = 0; i < html.length; i++) {
var current_char = html[i];
if (current_char == ' '){
result += " "
}
else if (current_char == '<'){
result += "<"
}
else if (current_char == '>'){
result += ">"
}
else if (current_char == '&'){
result += "&"
}
else if (current_char == '"'){
result += """
}
else if (current_char == "'"){
result += "&apos;"
}
else{
result += current_char;
}
}
return result;
}

Normalizing a string

I am pretty new at coding so I would appreciate any advice or suggestion.
I have a script which normalizes the header, however, it ignores the numbers at the beginning and I have run into an issue, where I have multiple columns with the same header title so I have added a number in front to differentiate them. For example "1.Project #"
The purpose of the script is to do a mail merge. I was wondering if there is a way to modify it so that it does not ignore the number in front? Please see script below:
// Normalizes a string, by removing all alphanumeric characters and using mixed case
// to separate words. The output will always start with a lower case letter.
// This function is designed to produce JavaScript object property names.
// Arguments:
// - header: string to normalize
// Examples:
// "First Name" -> "firstName"
// "Market Cap (millions) -> "marketCapMillions
// "1 number at the beginning is ignored" -> "numberAtTheBeginningIsIgnored"
function normalizeHeader(header) {
var key = "";
var upperCase = false;
for (var i = 0; i < header.length; ++i) {
var letter = header[i];
if (letter == " " && key.length > 0) {
upperCase = true;
continue;
}
if (!isAlnum(letter)) {
continue;
}
if (key.length == 0 && isDigit(letter)) {
continue; // first character must be a letter
}
if (upperCase) {
upperCase = false;
key += letter.toUpperCase();
} else {
key += letter.toLowerCase();
}
}
Thank you in advance!

Ambiguous use of operator '!='?

I am trying to create an if else statement. If the randomNumber equals the text of a label, then I want to add 1 to the CorrectLabel. If they do not equal each other than I want to add 1 to the IncorrectLabel. Here is my code:
#IBAction func checkButton(sender: UIButton) {
if ( "\(randomImageGeneratorNumber)" == "\(currentCountLabel.text)"){
currectAmountCorrect += 1
CorrectLabel.text = "\(currectAmountCorrect)"
}else if ("\(randomImageGeneratorNumber)" != "\(currentCountLabel.text)"){
currentAmountIncorrect += 1
IncorrectLabel.text = "\(currentAmountIncorrect)"
}
}
I am getting an error on the "else if" statement line saying "Ambiguous use of operator '!=' ". I am unsure of what this error means or how to fix it.
What does this error mean and how can it be fixed?
You shouldn't compare like that. Just use .toInt() to cast the labeltext to int and compare it like that:
var currentCount = currentCountLabel.text?.toInt()
if randomImageGeneratorNumber == currentCount {
currectAmountCorrect += 1
CorrectLabel.text = "\(currectAmountCorrect)"
} else {
currentAmountIncorrect += 1
IncorrectLabel.text = "\(currentAmountIncorrect)"
}
There is no need to put your value into a "".
First of all, you don't need to make comparison twice. Your code looks like
if true {
...
} else if false {
...
}
And, yes, int comparison would be better:
if let textAmount = currentCountLabel.text where randomImageGeneratorNumber == textAmount.toInt() {
currectAmountCorrect += 1
CorrectLabel.text = "\(currectAmountCorrect)"
} else {
currentAmountIncorrect += 1
IncorrectLabel.text = "\(currentAmountIncorrect)"
}

Adobe AIR 3.2 Glitch

I just finished a successful build of my program last night. Then, I get up this morning, and after an update fro Adobe AIR 3.1 to AIR 3.2, I find THIS bug!
The same build under 3.1 works perfectly. However, as soon as 3.2 is installed, the following code after stopDrag fails silently. Mind you, it only fails in the packed and installed AIR application. It works perfectly when I test it inside of Adobe Flash Professional CS5.5
WHAT is going on? Here's the code I'm dealing with. Again, this works without error for Adobe AIR 3.1, but fails for 3.2. I cannot get to any other MouseEvent.MOUSE_UP events in my program at this point, due to my structure.
I omitted the irrelevant parts of the code. All the same, there is a lot, due to the fact that I don't know where the error occurs exactly. Instead of everything that is supposed to happen happening, stopDrag is the last line of code that fires in this block.
tile5.addEventListener(MouseEvent.MOUSE_UP, mouseUpHandler5);
function mouseUpHandler5(evt:MouseEvent):void
{
Mouse.cursor = "paw";
var obj = evt.target;
var target = obj.dropTarget;
obj.stopDrag();
if (target != null && target.parent == hsSlot1)
{
brdgcheck(5, 1);
}
}
function brdgcheck(tile:int, slot:int)
{
var ck_brdg_l:String = "osr.Langue.brdg_l" + String(slot);
var ck_brdg_t:String = "osr.Langue.brdg_t" + String(tile);
var ck_slotfilled:String = "Slot" + String(slot) + "Filled";
var ck_tile:String = "tile" + String(tile);
var ck_slot:String = "hsSlot" + String(slot);
var ck_txtTile:String;
switch(tile)
{
case 1:
ck_brdg_t = osr.Langue.brdg_t1;
ck_txtTile = tile1.txtTile1.text;
break;
case 2:
ck_brdg_t = osr.Langue.brdg_t2;
ck_txtTile = tile2.txtTile2.text;
break;
case 3:
ck_brdg_t = osr.Langue.brdg_t3;
ck_txtTile = tile3.txtTile3.text;
break;
case 4:
ck_brdg_t = osr.Langue.brdg_t4;
ck_txtTile = tile4.txtTile4.text;
break;
case 5:
ck_brdg_t = osr.Langue.brdg_t5;
ck_txtTile = tile5.txtTile5.text;
break;
}
switch(slot)
{
case 1:
ck_brdg_l = osr.Langue.brdg_l1;
break;
case 2:
ck_brdg_l = osr.Langue.brdg_l2;
break;
case 3:
ck_brdg_l = osr.Langue.brdg_l3;
break;
case 4:
ck_brdg_l = osr.Langue.brdg_l4;
break;
case 5:
ck_brdg_l = osr.Langue.brdg_l5;
break;
}
if (ck_brdg_l == ck_brdg_t)
{
osr.Sonus.PlaySound("concretehit");
this[ck_slotfilled].visible = true;
switch(slot)
{
case 1:
Slot1Filled.txtSlot1.text = ck_txtTile;
break;
case 2:
Slot2Filled.txtSlot2.text = ck_txtTile;
break;
case 3:
Slot3Filled.txtSlot3.text = ck_txtTile;
break;
case 4:
Slot4Filled.txtSlot4.text = ck_txtTile;
break;
case 5:
Slot5Filled.txtSlot5.text = ck_txtTile;
break;
}
this[ck_tile].visible = false;
this[ck_slot].visible = false;
if (hsSlot1.visible == false && hsSlot2.visible == false && hsSlot3.visible == false && hsSlot4.visible == false && hsSlot5.visible == false)
{
osr.Gradua.Score(true);
osr.Gradua.Evaluate("brdg");
btnReset.visible = false;
hsChar.visible = false;
if (osr.Gradua.Fetch("brdg", "arr_act_stcnt") < 4)
{
bga.gotoAndPlay("FINKEY");
win_key();
}
else
{
bga.gotoAndPlay("FINNON");
}
}
else
{
osr.Gradua.Score(true);
}
}
else
{
osr.Gradua.Score(false);
osr.Sonus.PlaySound("glassbreak");
switch(tile)
{
case 1:
tile1.x = 92.85;
tile1.y = 65.85;
break;
case 2:
tile2.x = 208.80;
tile2.y = 162.85;
break;
case 3:
tile3.x = 324.80;
tile3.y = 65.85;
break;
case 4:
tile4.x = 437.80;
tile4.y = 162.85;
break;
case 5:
tile5.x = 549.80;
tile5.y = 65.85;
break;
}
}
}
EDIT: I found a good workaround, to use "if (hsSlot1.hitTestPoint(mouseX,mouseY) && hsSlot1.visible == true)"
However, a solution to this problem would still be appreciated!