How to send MarkDown to API - json

I'm trying to send Some Markdown text to a rest api. Just now I figure it out that break lines are not accepted in json.
Example. How to send this to my api:
An h1 header
============
Paragraphs are separated by a blank line.
2nd paragraph. *Italic*, **bold**, and `monospace`. Itemized lists
look like:
* this one
* that one
* the other one
Note that --- not considering the asterisk --- the actual text
content starts at 4-columns in.
> Block quotes are
> written like so.
>
> They can span multiple paragraphs,
> if you like.
Use 3 dashes for an em-dash. Use 2 dashes for ranges (ex., "it's all
in chapters 12--14"). Three dots ... will be converted to an ellipsis.
Unicode is supported. ☺
as
{
"body" : " (the markdown) ",
}

As you're trying to send it to a REST API endpoint, I'll assume you're searching for ways to do it using Javascript (since you didn't specify what tech you were using).
Rule of thumb: except if your goal is to re-build a JSON builder, use the ones already existing.
And, guess what, Javascript implements its JSON tools ! (see documentation here)
As it's shown in the documentation, you can use the JSON.stringify function to simply convert an object, like a string to a json-compliant encoded string, that can later be decoded on the server side.
This example illustrates how to do so:
var arr = {
text: "This is some text"
};
var json_string = JSON.stringify(arr);
// Result is:
// "{"text":"This is some text"}"
// Now the json_string contains a json-compliant encoded string.
You also can decode JSON client-side with javascript using the other JSON.parse() method (see documentation):
var json_string = '{"text":"This is some text"}';
var arr = JSON.parse(json_string);
// Now the arr contains an array containing the value
// "This is some text" accessible with the key "text"
If that doesn't answer your question, please edit it to make it more precise, especially on what tech you're using. I'll edit this answer accordingly

You need to replace the line-endings with \n and then pass it in your body key.
Also, make sure you escape double-quotes (") by \" else your body will end there.
# An h1 header\n============\n\nParagraphs are separated by a blank line.\n\n2nd paragraph. *Italic*, **bold**, and `monospace`. Itemized lists\nlook like:\n\n * this one\n * that one\n * the other one\n\nNote that --- not considering the asterisk --- the actual text\ncontent starts at 4-columns in.\n\n> Block quotes are\n> written like so.\n>\n> They can span multiple paragraphs,\n> if you like.\n\nUse 3 dashes for an em-dash. Use 2 dashes for ranges (ex., \"it's all\nin chapters 12--14\"). Three dots ... will be converted to an ellipsis.\nUnicode is supported.

Related

How can I define a Raku grammar to parse TSV text?

I have some TSV data
ID Name Email
1 test test#email.com
321 stan stan#nowhere.net
I would like to parse this into a list of hashes
#entities[0]<Name> eq "test";
#entities[1]<Email> eq "stan#nowhere.net";
I'm having trouble with using the newline metacharacter to delimit the header row from the value rows. My grammar definition:
use v6;
grammar Parser {
token TOP { <headerRow><valueRow>+ }
token headerRow { [\s*<header>]+\n }
token header { \S+ }
token valueRow { [\s*<value>]+\n? }
token value { \S+ }
}
my $dat = q:to/EOF/;
ID Name Email
1 test test#email.com
321 stan stan#nowhere.net
EOF
say Parser.parse($dat);
But this is returning Nil. I think I'm misunderstanding something fundamental about regexes in raku.
Probably the main thing that's throwing it off is that \s matches horizontal and vertical space. To match just horizontal space, use \h, and to match just vertical space, \v.
One small recommendation I'd make is to avoid including the newlines in the token. You might also want to use the alternation operators % or %%, as they're designed for handling this type work:
grammar Parser {
token TOP {
<headerRow> \n
<valueRow>+ %% \n
}
token headerRow { <.ws>* %% <header> }
token valueRow { <.ws>* %% <value> }
token header { \S+ }
token value { \S+ }
token ws { \h* }
}
The result of Parser.parse($dat) for this is the following:
「ID Name Email
1 test test#email.com
321 stan stan#nowhere.net
」
headerRow => 「ID Name Email」
header => 「ID」
header => 「Name」
header => 「Email」
valueRow => 「 1 test test#email.com」
value => 「1」
value => 「test」
value => 「test#email.com」
valueRow => 「 321 stan stan#nowhere.net」
value => 「321」
value => 「stan」
value => 「stan#nowhere.net」
valueRow => 「」
which shows us that the grammar has successfully parsed everything. However, let's focus on the second part of your question, that you want to it to be available in a variable for you. To do that, you'll need to supply an actions class which is very simple for this project. You just make a class whose methods match the methods of your grammar (although very simple ones, like value/header that don't require special processing besides stringification, can be ignored). There are some more creative/compact ways to handle processing of yours, but I'll go with a fairly rudimentary approach for illustration. Here's our class:
class ParserActions {
method headerRow ($/) { ... }
method valueRow ($/) { ... }
method TOP ($/) { ... }
}
Each method has the signature ($/) which is the regex match variable. So now, let's ask what information we want from each token. In header row, we want each of the header values, in a row. So:
method headerRow ($/) { 
my #headers = $<header>.map: *.Str
make #headers;
}
Any token with a quantifier on it will be treated as a Positional, so we could also access each individual header match with $<header>[0], $<header>[1], etc. But those are match objects, so we just quickly stringify them. The make command allows other tokens to access this special data that we've created.
Our value row will look identically, because the $<value> tokens are what we care about.
method valueRow ($/) { 
my #values = $<value>.map: *.Str
make #values;
}
When we get to last method, we will want to create the array with hashes.
method TOP ($/) {
my #entries;
my #headers = $<headerRow>.made;
my #rows = $<valueRow>.map: *.made;
for #rows -> #values {
my %entry = flat #headers Z #values;
#entries.push: %entry;
}
make #entries;
}
Here you can see how we access the stuff we processed in headerRow() and valueRow(): You use the .made method. Because there are multiple valueRows, to get each of their made values, we need to do a map (this is a situation where I tend to write my grammar to have simply <header><data> in the grammar, and defeine the data as being multiple rows, but this is simple enough it's not too bad).
Now that we have the headers and rows in two arrays, it's simply a matter of making them an array of hashes, which we do in the for loop. The flat #x Z #y just intercolates the elements, and the hash assignment Does What We Mean, but there are other ways to get the array in hash you want.
Once you're done, you just make it, and then it will be available in the made of the parse:
say Parser.parse($dat, :actions(ParserActions)).made
-> [{Email => test#email.com, ID => 1, Name => test} {Email => stan#nowhere.net, ID => 321, Name => stan} {}]
It's fairly common to wrap these into a method, like
sub parse-tsv($tsv) {
return Parser.parse($tsv, :actions(ParserActions)).made
}
That way you can just say
my #entries = parse-tsv($dat);
say #entries[0]<Name>; # test
say #entries[1]<Email>; # stan#nowhere.net
TL;DR: you don't. Just use Text::CSV, which is able to deal with every format.
I will show how old Text::CSV will probably be useful:
use Text::CSV;
my $text = q:to/EOF/;
ID Name Email
1 test test#email.com
321 stan stan#nowhere.net
EOF
my #data = $text.lines.map: *.split(/\t/).list;
say #data.perl;
my $csv = csv( in => #data, key => "ID");
print $csv.perl;
The key part here is the data munging that converts the initial file into an array or arrays (in #data). It's only needed, however, because the csv command is not able to deal with strings; if data is in a file, you're good to go.
The last line will print:
${" 1" => ${:Email("test\#email.com"), :ID(" 1"), :Name("test")}, " 321" => ${:Email("stan\#nowhere.net"), :ID(" 321"), :Name("stan")}}%
The ID field will become the key to the hash, and the whole thing an array of hashes.
TL;DR regexs backtrack. tokens don't. That's why your pattern isn't matching. This answer focuses on explaining that, and how to trivially fix your grammar. However, you should probably rewrite it, or use an existing parser, which is what you should definitely do if you just want to parse TSV rather than learn about raku regexes.
A fundamental misunderstanding?
I think I'm misunderstanding something fundamental about regexes in raku.
(If you already know the term "regexes" is a highly ambiguous one, consider skipping this section.)
One fundamental thing you may be misunderstanding is the meaning of the word "regexes". Here are some popular meanings folk assume:
Formal regular expressions.
Perl regexes.
Perl Compatible Regular Expressions (PCRE).
Text pattern matching expressions called "regexes" that look like any of the above and do something similar.
None of these meanings are compatible with each other.
While Perl regexes are semantically a superset of formal regular expressions, they are far more useful in many ways but also more vulnerable to pathological backtracking.
While Perl Compatible Regular Expressions are compatible with Perl in the sense they were originally the same as standard Perl regexes in the late 1990s, and in the sense that Perl supports pluggable regex engines including the PCRE engine, PCRE regex syntax is not identical to the standard Perl regex used by default by Perl in 2020.
And while text pattern matching expressions called "regexes" generally do look somewhat like each other, and do all match text, there are dozens, perhaps hundreds, of variations in syntax, and even in semantics for the same syntax.
Raku text pattern matching expressions are typically called either "rules" or "regexes". The use of the term "regexes" conveys the fact that they look somewhat like other regexes (although the syntax has been cleaned up). The term "rules" conveys the fact they are part of a much broader set of features and tools that scale up to parsing (and beyond).
The quick fix
With the above fundamental aspect of the word "regexes" out of the way, I can now turn to the fundamental aspect of your "regex"'s behavior.
If we switch three of the patterns in your grammar for the token declarator to the regex declarator, your grammar works as you intended:
grammar Parser {
regex TOP { <headerRow><valueRow>+ }
regex headerRow { [\s*<header>]+\n }
token header { \S+ }
regex valueRow { [\s*<value>]+\n? }
token value { \S+ }
}
The sole difference between a token and a regex is that a regex backtracks whereas a token doesn't. Thus:
say 'ab' ~~ regex { [ \s* a ]+ b } # 「ab」
say 'ab' ~~ token { [ \s* a ]+ b } # 「ab」
say 'ab' ~~ regex { [ \s* \S ]+ b } # 「ab」
say 'ab' ~~ token { [ \s* \S ]+ b } # Nil
During processing of the last pattern (that could be and often is called a "regex", but whose actual declarator is token, not regex), the \S will swallow the 'b', just as it temporarily will have done during processing of the regex in the prior line. But, because the pattern is declared as a token, the rules engine (aka "regex engine") does not backtrack, so the overall match fails.
That's what's going on in your OP.
The right fix
A better solution in general is to wean yourself from assuming backtracking behavior, because it can be slow and even catastrophically slow (indistinguishable from the program hanging) when used in matching against a maliciously constructed string or one with an accidentally unfortunate combination of characters.
Sometimes regexs are appropriate. For example, if you're writing a one-off and a regex does the job, then you're done. That's fine. That's part of the reason that / ... / syntax in raku declares a backtracking pattern, just like regex. (Then again you can write / :r ... / if you want to switch on ratcheting -- "ratchet" means the opposite of "backtrack", so :r switches a regex to token semantics.)
Occasionally backtracking still has a role in a parsing context. For example, while the grammar for raku generally eschews backtracking, and instead has hundreds of rules and tokens, it nevertheless still has 3 regexs.
I've upvoted #user0721090601++'s answer because it's useful. It also addresses several things that immediately seemed to me to be idiomatically off in your code, and, importantly, sticks to tokens. It may well be the answer you prefer, which will be cool.

How do I search for a string in this JSON with Python

My JSON file looks something like:
{
"generator": {
"name": "Xfer Records Serum",
....
},
"generator": {
"name: "Lennar Digital Sylenth1",
....
}
}
I ask the user for search term and the input is searched for in the name key only. All matching results are returned. It means if I input 's' only then also both the above ones would be returned. Also please explain me how to return all the object names which are generators. The more simple method the better it will be for me. I use json library. However if another library is required not a problem.
Before switching to JSON I tried XML but it did not work.
If your goal is just to search all name properties, this will do the trick:
import re
def search_names(term, lines):
name_search = re.compile('\s*"name"\s*:\s*"(.*' + term + '.*)",?$', re.I)
return [x.group(1) for x in [name_search.search(y) for y in lines] if x]
with open('path/to/your.json') as f:
lines = f.readlines()
print(search_names('s', lines))
which would return both names you listed in your example.
The way the search_names() function works is it builds a regular expression that will match any line starting with "name": " (with varying amount of whitespace) followed by your search term with any other characters around it then terminated with " followed by an optional , and the end of string. Then applies that to each line from the file. Finally it filters out any non-matching lines and returns the value of the name property (the capture group contents) for each match.

How to target all attributes within an HTML tag [duplicate]

Example:
This is just\na simple sentence.
I want to match every character between This is and sentence. Line breaks should be ignored. I can't figure out the correct syntax.
For example
(?<=This is)(.*)(?=sentence)
Regexr
I used lookbehind (?<=) and look ahead (?=) so that "This is" and "sentence" is not included in the match, but this is up to your use case, you can also simply write This is(.*)sentence.
The important thing here is that you activate the "dotall" mode of your regex engine, so that the . is matching the newline. But how you do this depends on your regex engine.
The next thing is if you use .* or .*?. The first one is greedy and will match till the last "sentence" in your string, the second one is lazy and will match till the next "sentence" in your string.
Update
Regexr
This is(?s)(.*)sentence
Where the (?s) turns on the dotall modifier, making the . matching the newline characters.
Update 2:
(?<=is \()(.*?)(?=\s*\))
is matching your example "This is (a simple) sentence". See here on Regexr
Lazy Quantifier Needed
Resurrecting this question because the regex in the accepted answer doesn't seem quite correct to me. Why? Because
(?<=This is)(.*)(?=sentence)
will match my first sentence. This is my second in This is my first sentence. This is my second sentence.
See demo.
You need a lazy quantifier between the two lookarounds. Adding a ? makes the star lazy.
This matches what you want:
(?<=This is).*?(?=sentence)
See demo. I removed the capture group, which was not needed.
DOTALL Mode to Match Across Line Breaks
Note that in the demo the "dot matches line breaks mode" (a.k.a.) dot-all is set (see how to turn on DOTALL in various languages). In many regex flavors, you can set it with the online modifier (?s), turning the expression into:
(?s)(?<=This is).*?(?=sentence)
Reference
The Many Degrees of Regex Greed
Repetition with Star and Plus
Try This is[\s\S]*?sentence, works in javascript
This:
This is (.*?) sentence
works in javascript.
use this: (?<=beginningstringname)(.*\n?)(?=endstringname)
This worked for me (I'm using VS Code):
for:
This is just\na simple sentence
Use:
This .+ sentence
You can simply use this: \This is .*? \sentence
RegEx to match everything between two strings using the Java approach.
List<String> results = new ArrayList<>(); //For storing results
String example = "Code will save the world";
Let's use Pattern and Matcher objects to use RegEx (.?)*.
Pattern p = Pattern.compile("Code "(.*?)" world"); //java.util.regex.Pattern;
Matcher m = p.matcher(example); //java.util.regex.Matcher;
Since Matcher might contain more than one match, we need to loop over the results and store it.
while(m.find()){ //Loop through all matches
results.add(m.group()); //Get value and store in collection.
}
This example will contain only "will save the" word, but in the bigger text it will probably find more matches.
In case of JavaScript you can use [^] to match any character including newlines.
Using the /s flag with a dot . to match any character also works, but is applied to the whole pattern and JavaScript does not support inline modifiers to turn on/off the flag.
To match as least as possible characters, you can make the quantifier non greedy by appending a question mark, and use a capture group to extract the part in between.
This is([^]*?)sentence
See a regex101 demo.
As a side note, to not match partial words you can use word boundaries like \bThis and sentence\b
const s = "This is just\na simple sentence";
const regex = /This is([^]*?)sentence/;
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
The lookaround variant in JavaScript is (?<=This is)[^]*?(?=sentence) and you could check Lookbehind in JS regular expressions for the support.
Also see Important Notes About Lookbehind.
const s = "This is just\na simple sentence";
const regex = /(?<=This is)[^]*?(?=sentence)/;
const m = s.match(regex);
if (m) {
console.log(m[0]);
}
In case anyone is looking for an example of this within a Jenkins context. It parses the build.log and if it finds a match it fails the build with the match.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
node{
stage("parse"){
def file = readFile 'build.log'
def regex = ~"(?s)(firstStringToUse(.*)secondStringToUse)"
Matcher match = regex.matcher(file)
match.find() {
capturedText = match.group(1)
error(capturedText)
}
}
}
There is a way to deal with repeated instances of this split in a block of text? FOr instance: "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence. ". to matches each instance instead of the entire string, use below code:
data = "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence."
pattern = re.compile('This is (?s).*? sentence')
for match_instance in re.finditer(pattern, data):
do_something(match_instance.group())
I landed here on my search for regex to convert this print syntax between print "string", in Python2 in old scripts with: print("string"), for Python3. Works well, otherwise use 2to3.py for additional conversions. Here is my solution for others:
Try it out on Regexr.com (doesn't work in NP++ for some reason):
find: (?<=print)( ')(.*)(')
replace: ('$2')
for variables:
(?<=print)( )(.*)(\n)
('$2')\n
for label and variable:
(?<=print)( ')(.*)(',)(.*)(\n)
('$2',$4)\n
How to replace all print "string" in Python2 with print("string") for Python3?
Here is how I did it:
This was easier for me than trying to figure out the specific regex necessary.
int indexPictureData = result.IndexOf("-PictureData:");
int indexIdentity = result.IndexOf("-Identity:");
string returnValue = result.Remove(indexPictureData + 13);
returnValue = returnValue + " [bytecoderemoved] " + result.Remove(0, indexIdentity); `
for a quick search in VIM, you could use
at Vim Control prompt: /This is.*\_.*sentence
i had this string
headers:
Date:
schema:
type: string
example: Tue, 23 Aug 2022 11:36:23 GMT
Content-Type:
schema:
type: string
example: application/json; charset=utf-8
Transfer-Encoding:
schema:
type: string
example: chunked
Connection:
schema:
type: string
example: keep-alive
Content-Encoding:
schema:
type: string
example: gzip
Vary:
schema:
type: string
example: Accept-Encoding
Server:
schema:
type: number
example: Microsoft-IIS/10.0
X-Powered-By:
schema:
type: string
example: ASP.NET
Access-Control-Allow-Origin:
schema:
type: string
example: '*'
Access-Control-Allow-Credentials:
schema:
type: boolean
example: 'true'
Access-Control-Allow-Headers:
schema:
type: string
example: '*'
Access-Control-Max-Age:
schema:
type: string
example: '-1'
Access-Control-Allow-Methods:
schema:
type: string
example: GET, PUT, POST, DELETE
X-Content-Type-Options:
schema:
type: string
example: nosniff
X-XSS-Protection:
schema:
type: string
example: 1; mode=block
content:
application/json:
and i wanted to remove everything from the words headers: to content so I wrote this regex (headers:)[^]*?(content)
and it worked as expected finding how many times that expression has occurred.
Sublime Text 3x
In sublime text, you simply write the two word you are interested in keeping for example in your case it is
"This is" and "sentence"
and you write .* in between
i.e. This is .* sentence
and this should do you well

render Displacy Entity Recognition Visualization into Plotly Dash

I want to render a piece of Entity Recognition Visualization by Spacy into a Plotly Dash app.
The html of ER Visualization for rendering is as follows:
<div class="entities" style="line-height: 2.5">
<mark class="entities" style="background: ...>
<span>...</span>
</mark>
<mark class="entities" style="background: ...>
<span>...</span>
</mark>
</div>
I have tried parsing the HTML using BeautifulSoup, and converting the HTML to Dash by the following code. But when I run convert_html_to_dash(html_parsed), it is throwing KeyError: 'style'
html_parsed = bs.BeautifulSoup(html, 'html.parser')
def convert_html_to_dash(el, style = None):
if type(el) == bs.element.NavigableString:
return str(el)
else:
name = el.name
style = extract_style(el) if style is None else style
contents = [convert_html_to_dash(x) for x in el.contents]
return getattr(html,name.title())(contents, style=style)
def extract_style(el):
return {k.strip():v.strip() for k,v in [x.split(": ") for x in
el.attrs["style"].split(";")]}
Not every tag has a style attribute. For tags that don't, you are attempting to access a non-existent key in the attrs dictionary. Python's response is a KeyError.
If you use get() instead, it will return a default value instead of raising a KeyError. You can specify a default value as the second argument to get():
return { k.strip() : v.strip() for k, v in
[ x.split(': ') for x in el.attrs.get('style', '').split(';') ]
}
Here I have chosen the empty string as the default value.
With only this change, your code still remains somewhat brittle. What if the input does not exactly match what you expect?
For one thing, there might not be a space after the colon. Changing split(': ') to split(':') will make it work even if there is no space – if there is one it will be removed anyway since you are calling strip() after splitting.
And what if after splitting on ';' you receive something other than a key-value pair in the list? It is best to check if it is a valid pair (contains exactly one colon), and skip it otherwise.
Your code becomes:
return { k.strip() : v.strip() for k, v in
[ x.split(':') for x in el.attrs.get('style', '').split(';')
if x.count(':') == 1 ] }
Note that I have opted for single-quotation marks. Your code uses both, but it is best to pick one and stick with it.

How do I match a CSV-style quoted string in nom?

A CSV style quoted string, for the purposes of this question, is a string in which:
The string starts and ends with exactly one ".
Two double quotes inside the string are collapsed to one double quote. "Alo""ha"→Alo"ha.
"" on its own is an empty string.
Error inputs, such as "A""" e", cannot be parsed. It's an A", followed by junk e".
I've tried several things, none of which have worked fully.
The closest I've gotten, thanks to some help from user pinkieval in #nom on the Mozilla IRC:
use std::error as stderror; /* Avoids needing nightly to compile */
named!(csv_style_string<&str, String>, map_res!(
terminated!(tag!("\""), not!(peek!(char!('"')))),
csv_string_to_string
));
fn csv_string_to_string(s: &str) -> Result<String, Box<stderror::Error>> {
Ok(s.to_string().replace("\"\"", "\""))
}
This does not catch the end of the string correctly.
I've also attempted to use the re_match! macro with r#""([^"]|"")*""#, but that always results in an Err::Incomplete(1).
I've determined that the given CSV example for Nom 1.0 doesn't work for a quoted CSV string as I'm describing it, but I do know implementations differ.
Here is one way of doing it:
use nom::types::CompleteStr;
use nom::*;
named!(csv_style_string<CompleteStr, String>,
delimited!(
char!('"'),
map!(
many0!(
alt!(
// Eat a " delimiter and the " that follows it
tag!("\"\"") => { |_| '"' }
| // Normal character
none_of!("\"")
)
),
// Make a string from a vector of chars
|v| v.iter().collect::<String>()
),
char!('"')
)
);
fn main() {
println!(r#""Alo\"ha" = {:?}"#, csv_style_string(CompleteStr(r#""Alo""ha""#)));
println!(r#""" = {:?}"#, csv_style_string(CompleteStr(r#""""#)));
println!(r#"bad format: {:?}"#, csv_style_string(CompleteStr(r#""A""" e""#)));
}
(I wrote it in full nom, but a solution like yours, based on an external function instead of map!() each character, would work too, and may be more efficient.)
The magic here, that would also solve your regexp issue, is to use CompleteStr. This basically tells nom that nothing will come after that input (otherwise, nom assumes you're doing a streaming parser, so more input may follow).
This is needed because we need to know what to do with a " if it is the last character fed to nom. Depending on the character that comes after it (another ", a normal character, or EOF), we have to take a different decision -- hence the Incomplete result, meaning nom does not have enough input to make the decision. Telling nom that EOF comes next solves this indecision.
Further reading on Incomplete on nom's author's blog: http://unhandledexpression.com/general/2018/05/14/nom-4-0-faster-safer-simpler-parsers.html#dealing-with-incomplete-usage
You may note that this parser does not actually rejects the invalid input, but parses the beginning and returns the rest. If you use this parser as a subparser in another parser, the latter would then feed the remainder to the next subparser, which would crash as well (because it would expect a comma), causing the overall parser to fail.
If you don't want that, you could make csv_style_string match peek!(alt!(char!(',')|char!('\n")|eof!())).