MediaWiki Semantic Template: Property "" (as page type) with input value contains invalid characters or is incomplete can cause unexpected results - mediawiki

Thanks to this answer I've created a template which mix a visual representation and populates a semantic one.
I've used the template in a page but the last part doesn't work and returns the following message.
Property "" (as page type) with input value "psicologia|governo|politica|lavoro|+sep=|" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
The template:
[{{{url}}} {{{title}}}] - {{{categories}}}
{{#subobject:
|url = {{{url}}}
|title = {{{title}}}
|#category={{{categories}}}
}}
The desired behaviour:
{{#subobject:
|url = https://www.instagram.com/p/CXeE2j-NT6s/
|title = Bullismo: proposta una legge in Francia per punirlo penalmente. Si rischia anche il carcere
|#category=bullismo|violenza|leggi|punizioni|+sep=|
}}

You cannot use | as a separator, it will not recognize it as a separator, try a comma, a hyphen or something that you generally don't use in Mediawiki coding

Related

render Displacy Entity Recognition Visualization into Plotly Dash

I want to render a piece of Entity Recognition Visualization by Spacy into a Plotly Dash app.
The html of ER Visualization for rendering is as follows:
<div class="entities" style="line-height: 2.5">
<mark class="entities" style="background: ...>
<span>...</span>
</mark>
<mark class="entities" style="background: ...>
<span>...</span>
</mark>
</div>
I have tried parsing the HTML using BeautifulSoup, and converting the HTML to Dash by the following code. But when I run convert_html_to_dash(html_parsed), it is throwing KeyError: 'style'
html_parsed = bs.BeautifulSoup(html, 'html.parser')
def convert_html_to_dash(el, style = None):
if type(el) == bs.element.NavigableString:
return str(el)
else:
name = el.name
style = extract_style(el) if style is None else style
contents = [convert_html_to_dash(x) for x in el.contents]
return getattr(html,name.title())(contents, style=style)
def extract_style(el):
return {k.strip():v.strip() for k,v in [x.split(": ") for x in
el.attrs["style"].split(";")]}
Not every tag has a style attribute. For tags that don't, you are attempting to access a non-existent key in the attrs dictionary. Python's response is a KeyError.
If you use get() instead, it will return a default value instead of raising a KeyError. You can specify a default value as the second argument to get():
return { k.strip() : v.strip() for k, v in
[ x.split(': ') for x in el.attrs.get('style', '').split(';') ]
}
Here I have chosen the empty string as the default value.
With only this change, your code still remains somewhat brittle. What if the input does not exactly match what you expect?
For one thing, there might not be a space after the colon. Changing split(': ') to split(':') will make it work even if there is no space – if there is one it will be removed anyway since you are calling strip() after splitting.
And what if after splitting on ';' you receive something other than a key-value pair in the list? It is best to check if it is a valid pair (contains exactly one colon), and skip it otherwise.
Your code becomes:
return { k.strip() : v.strip() for k, v in
[ x.split(':') for x in el.attrs.get('style', '').split(';')
if x.count(':') == 1 ] }
Note that I have opted for single-quotation marks. Your code uses both, but it is best to pick one and stick with it.

How do I match a CSV-style quoted string in nom?

A CSV style quoted string, for the purposes of this question, is a string in which:
The string starts and ends with exactly one ".
Two double quotes inside the string are collapsed to one double quote. "Alo""ha"→Alo"ha.
"" on its own is an empty string.
Error inputs, such as "A""" e", cannot be parsed. It's an A", followed by junk e".
I've tried several things, none of which have worked fully.
The closest I've gotten, thanks to some help from user pinkieval in #nom on the Mozilla IRC:
use std::error as stderror; /* Avoids needing nightly to compile */
named!(csv_style_string<&str, String>, map_res!(
terminated!(tag!("\""), not!(peek!(char!('"')))),
csv_string_to_string
));
fn csv_string_to_string(s: &str) -> Result<String, Box<stderror::Error>> {
Ok(s.to_string().replace("\"\"", "\""))
}
This does not catch the end of the string correctly.
I've also attempted to use the re_match! macro with r#""([^"]|"")*""#, but that always results in an Err::Incomplete(1).
I've determined that the given CSV example for Nom 1.0 doesn't work for a quoted CSV string as I'm describing it, but I do know implementations differ.
Here is one way of doing it:
use nom::types::CompleteStr;
use nom::*;
named!(csv_style_string<CompleteStr, String>,
delimited!(
char!('"'),
map!(
many0!(
alt!(
// Eat a " delimiter and the " that follows it
tag!("\"\"") => { |_| '"' }
| // Normal character
none_of!("\"")
)
),
// Make a string from a vector of chars
|v| v.iter().collect::<String>()
),
char!('"')
)
);
fn main() {
println!(r#""Alo\"ha" = {:?}"#, csv_style_string(CompleteStr(r#""Alo""ha""#)));
println!(r#""" = {:?}"#, csv_style_string(CompleteStr(r#""""#)));
println!(r#"bad format: {:?}"#, csv_style_string(CompleteStr(r#""A""" e""#)));
}
(I wrote it in full nom, but a solution like yours, based on an external function instead of map!() each character, would work too, and may be more efficient.)
The magic here, that would also solve your regexp issue, is to use CompleteStr. This basically tells nom that nothing will come after that input (otherwise, nom assumes you're doing a streaming parser, so more input may follow).
This is needed because we need to know what to do with a " if it is the last character fed to nom. Depending on the character that comes after it (another ", a normal character, or EOF), we have to take a different decision -- hence the Incomplete result, meaning nom does not have enough input to make the decision. Telling nom that EOF comes next solves this indecision.
Further reading on Incomplete on nom's author's blog: http://unhandledexpression.com/general/2018/05/14/nom-4-0-faster-safer-simpler-parsers.html#dealing-with-incomplete-usage
You may note that this parser does not actually rejects the invalid input, but parses the beginning and returns the rest. If you use this parser as a subparser in another parser, the latter would then feed the remainder to the next subparser, which would crash as well (because it would expect a comma), causing the overall parser to fail.
If you don't want that, you could make csv_style_string match peek!(alt!(char!(',')|char!('\n")|eof!())).

How to pass a concatenated string as a parameter to function: Angular 2

I have the following code:
...
<tr ngFor= "let i = index">
<myCustomElement myName="{{'nameEdit'+i}}">
<button
<--This is where I get the "Got interpolation ({{}}) where expression was expected" error-->
(click)="myFunction(requestForm.controls.'nameEdit'{{i}}.value)">
</button>
</myCustomElement>
</tr>
...
My goal is pass to myFunction the value of nameEdit for each element (so this will be nameEdit1, nameEdit2, nameEdit3 and so on. My existing code results to an Got interpolation ({{}}) where expression was expected error.
What's the proper way to pass my value to myFunction?
(click)="myFunction(requestForm.controls['nameEdit' + i].value") should do the trick
Since double quotes for event directives (...) are interpolated, the {{ ... }} is unnecessary. You will need to also use the javascript object identifier [...] with the dynamic text.
Lastly, this will obviously return error if the controls doesn't have a key with the name you're trying to parse. It would be best practice to have myFunction(...) manage this case.
Working stackblitz example that outputs the values: https://stackblitz.com/edit/angular-whq8ll-od64hx?file=app/slider-overview-example.html

How to send MarkDown to API

I'm trying to send Some Markdown text to a rest api. Just now I figure it out that break lines are not accepted in json.
Example. How to send this to my api:
An h1 header
============
Paragraphs are separated by a blank line.
2nd paragraph. *Italic*, **bold**, and `monospace`. Itemized lists
look like:
* this one
* that one
* the other one
Note that --- not considering the asterisk --- the actual text
content starts at 4-columns in.
> Block quotes are
> written like so.
>
> They can span multiple paragraphs,
> if you like.
Use 3 dashes for an em-dash. Use 2 dashes for ranges (ex., "it's all
in chapters 12--14"). Three dots ... will be converted to an ellipsis.
Unicode is supported. ☺
as
{
"body" : " (the markdown) ",
}
As you're trying to send it to a REST API endpoint, I'll assume you're searching for ways to do it using Javascript (since you didn't specify what tech you were using).
Rule of thumb: except if your goal is to re-build a JSON builder, use the ones already existing.
And, guess what, Javascript implements its JSON tools ! (see documentation here)
As it's shown in the documentation, you can use the JSON.stringify function to simply convert an object, like a string to a json-compliant encoded string, that can later be decoded on the server side.
This example illustrates how to do so:
var arr = {
text: "This is some text"
};
var json_string = JSON.stringify(arr);
// Result is:
// "{"text":"This is some text"}"
// Now the json_string contains a json-compliant encoded string.
You also can decode JSON client-side with javascript using the other JSON.parse() method (see documentation):
var json_string = '{"text":"This is some text"}';
var arr = JSON.parse(json_string);
// Now the arr contains an array containing the value
// "This is some text" accessible with the key "text"
If that doesn't answer your question, please edit it to make it more precise, especially on what tech you're using. I'll edit this answer accordingly
You need to replace the line-endings with \n and then pass it in your body key.
Also, make sure you escape double-quotes (") by \" else your body will end there.
# An h1 header\n============\n\nParagraphs are separated by a blank line.\n\n2nd paragraph. *Italic*, **bold**, and `monospace`. Itemized lists\nlook like:\n\n * this one\n * that one\n * the other one\n\nNote that --- not considering the asterisk --- the actual text\ncontent starts at 4-columns in.\n\n> Block quotes are\n> written like so.\n>\n> They can span multiple paragraphs,\n> if you like.\n\nUse 3 dashes for an em-dash. Use 2 dashes for ranges (ex., \"it's all\nin chapters 12--14\"). Three dots ... will be converted to an ellipsis.\nUnicode is supported.

Escape quotes inside quoted fields when parsing CSV in Flink

In Flink, parsing a CSV file using readCsvFile raises an exception when encountring a field containing quotes like "Fazenda São José ""OB"" Airport":
org.apache.flink.api.common.io.ParseException: Line could not be parsed: '191,"SDOB","small_airport","Fazenda São José ""OB"" Airport",-21.425199508666992,-46.75429916381836,2585,"SA","BR","BR-SP","Tapiratiba","no","SDOB",,"SDOB",,,'
I've found in this mailing list thread and this JIRA issue that quoting inside the field should be realized through the \ character, but I don't have control over the data to modify it. Is there a way to work around this?
I've also tried using ignoreInvalidLines() (which is the less preferable solution) but it gave me the following error:
08:49:05,737 INFO org.apache.flink.api.common.io.LocatableInputSplitAssigner - Assigning remote split to host localhost
08:49:05,765 ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN DataSource (at main(Job.java:53) (org.apache.flink.api.java.io.TupleCsvInputFormat)) -> Map (Map at main(Job.java:54)) -> Combine(SUM(1), at main(Job.java:56) (2/8)
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.flink.api.common.io.GenericCsvInputFormat.skipFields(GenericCsvInputFormat.java:443)
at org.apache.flink.api.common.io.GenericCsvInputFormat.parseRecord(GenericCsvInputFormat.java:412)
at org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:111)
at org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:454)
at org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:79)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:176)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)
Here is my code:
DataSet<Tuple2<String, Integer>> csvInput = env.readCsvFile("resources/airports.csv")
.ignoreFirstLine()
.ignoreInvalidLines()
.parseQuotedStrings('"')
.includeFields("100000001")
.types(String.class, String.class)
.map((Tuple2<String, String> value) -> new Tuple2<>(value.f1, 1))
.groupBy(0)
.sum(1);
If you cannot change the input data, then you should turn off parseQuotedString(). This will simply look for the next field delimiter and return everything in between as a string (including the quotations marks). Then you can remove the leading and trailing quotation mark in a subsequent map operation.