Within Infobox at wikipedia some attributes values are also inside curly braces {{}}.. Some time they have lins also.. I need values inside the braces, which is displayed on wikipedia web page.
I read these are templates also.. Can anyone give me some link or guide me how do I deal with it?
Double-curly-braces {{}} define a call to some kind of magic word, variable, parser function, or template.. Help can be found on MediaWiki.org/.../Manual:Magic_words. The little lines that look like | are called pipes and are used to as separators that allow the wikicore parsing engine to define parameters that can be used with the magic word, variable, parser function, or template..
Hopefully this will help everyone who come across this very same issue.
Considering you will be parsing the infobox with PHP, you can use this:
http://www.mywiki.com/wiki/api.php?format=xml&action=query&titles=PAGE_TITLE_THAT_CONTAINS_AN_INFOBOX&prop=revisions&rvprop=content&rvgeneratexml=1
'rvgeneratexml' is being set to true (1), this will make the xml node <rev> generate an attribute "parsetree" containing the infobox information in XML format.
Then, in PHP, you can load the whole information (<api>everything including <rev></api>) with simpleXML:
$xml = simplexml_load_file($url);
Then you can load the template's information by getting the "parsetree" attribute and loading the string with:
$template = simplexml_load_string($xml->query->pages->page->revisions->rev->attributes()->parsetree);
$template = $template->template; // If more than 1 template, check template[0], [1], etc
Then, by using the correct structure, you can access the elements with something like:
if ($template->part[0]->name='name')
$film = $template->part[0]->value;
Then, $film will contain the film's name (->name is the parameter's name, and ->value is its value).
Related
In my AEM project, we have client-side dynamic variable functionality which checks for any strings that are formed inside of a ${ } wrapper. The dynamic variable values are coming from our cookies. Replacing this with a more friendly format that does not conflict with Sightly is not an option at the moment, so please don't tell me to do that :)
When creating an anchor tag in the source editor of the Text core component, I am setting the href as the following: href="/content/en/opt-in.html?hash=${/profile/hash}". The anti-Samy configuration is blocking the href attribute from being rendered on this element, but I have tried to add the following to the overlayed file /apps/cq/xssprotection/config.xml:
<regexp name="expressionURLWithSpecialCharacters" value="(\$\{(\w|\/|:)+\})"/>
<regexp-list>
<regexp name="onsiteURL"/>
<regexp name="offsiteURL"/>
<regexp name="expressionURL"/>
<regexp name="expressionURLWithSpecialCharacters"/>
</regexp-list>
^ inside of the <attribute name="href"> block of common-attributes. Is there something else I need to do in order to make this not be filtered out so that it can be correctly parsed by the global variable replacement? Thanks!
There are two issues here:
The RTE will encode your URL and turn hash=${/profile/hash} into hash=$%7B/profile/hash%7D when storing into JCR
Even if you pass 1, the expression you are trying to use will only match EXACTLY the URL of ${/profile/hash}. You would need to expand the expression to include everything else (scheme, domain/host, path, query etc.). Think onsiteURL and offsiteURL but allowing your expression as well in query parameters. Have a look at https://github.com/apache/sling-org-apache-sling-xss/blob/master/src/main/java/org/apache/sling/xss/impl/XSSFilterImpl.java#L115 to get a starting point.
Have you tried adding disableXSSFiltering="{Boolean}true”?
Vlad, your second point was helpful in that I hadn't considered that one of the regular expressions in the XSS Protection configuration href attribute block needed to match the ${/profile/hash} in addition to the rest of the URL preceding and following it. Although to your first point, the RTE actually did save the special characters as-is into the JCR and did not encode them, probably since I was using the source editor mode and not the inline text editor.
What I ended up doing was creating a new regular expression as follows:
<regexp name="onsiteURLWithVariableExpression"
value="(?!\s*javascript(?::|:))(?:(?://(?:(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~])|(?:%\p{XDigit}\p{XDigit})|(?:[!$&'()*+,;=]))*#)?(?:\[(?:(?:(?:\p{XDigit}{1,4}:){6}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:::(?:\p{XDigit}{1,4}:){5}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:\p{XDigit}{1,4}){0,1}::(?:\p{XDigit}{1,4}:){4}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,1}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){3}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,2}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){2}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,3}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}:){1}(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,4}\p{XDigit}{1,4})?::(?:(?:\p{XDigit}{1,4}:\p{XDigit}{1,4})|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])))|(?:(?:(?:\p{XDigit}{1,4}:){0,5}\p{XDigit}{1,4})?::(?:\p{XDigit}{1,4}))|(?:(?:(?:\p{XDigit}{1,4}:){0,6}\p{XDigit}{1,4})?::))]|(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])\.(?:\p{N}|[\x31-\x39]\p{N}|1\p{N}{2}|2[\x30-\x34]\p{N}|25[\x30-\x35])|(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~])*|(?:%\p{XDigit}\p{XDigit})*|(?:[!$&'()*+,;=])*))(?::\p{Digit}+)?(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&'()*+,;=]|:|#)+/?)*))|(?:/(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&'()*+,;=]|:|#)+(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&'()*+,;=]|:|#)+/?)*))?)|(?:(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&'()*+,;=]|:|#)+(?:/|(/(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&'()*+,;=]|:|#)+)*)))?(?:\?(?:(?:\p{L}\p{M}*)|(\$\{(\w|\/|:)+\})|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&'()*+,;=]|:|#|/|\?)*)?(?:#(?:(?:\p{L}\p{M}*)|[\p{N}-._~]|%\p{XDigit}\p{XDigit}|[!$&'()*+,;=]|:|#|/|\?)*)?"/>
which is just the onsiteURL with my original expressionURLWithSpecialCharacters: (\$\{(\w|\/|:)+\}) value added as a group in the query string parameter section. This enabled AEM to accept this as an href value in my anchor tag.
I appreciate everyone's help!
I am creating a MediaWiki plug-in that lists many files. For each file, I want to print a [Talk] or [Discuss] link. (It seems that the original name was talk but that it was renamed to discuss.) These links should be red if the page does not exist and blue if it does exist.
There should be a way to add such links in OutputPage.php, but I can't figure it out.
I know about these functions "foo":
$page = WikiPage::factory ( $title )
$talk = $title->getTalkPage()
But I'm not sure how to get $title from foo.
I'm also not sure how to change $talk into the appropriate HTML. I'd rather not add it to the output stream, because I'm building a lot of HTML separately, but I suppose I can refactor so that instead of passing my strings around, I pass around a handle to the output.
Why don't you use OutputPage::addWikiText() to add the appropriate link without worrying about the technical details: [[{{ns:11}}:Foo|Text]] for example.
Alternatively you can get $title from OutputPage::getTitle() for the current page, or from Title::newFromText() for any title you want to use. You can get $talk directly by specifying the correct namespace constant, which might be even easier than the trip via a WikiPage object.
Correct styling for the link can be done with the helper methods Title::exists() and one of the appropriate helpers for generating urls for pages.
See also https://doc.wikimedia.org/mediawiki-core/master/php/classTitle.html
I'm looking for some guidance on a web scraping script i'm working on.
All is going well but I'm stuck on stripping out the image file data.
I'm currently doing a WebRequest, getting elements by class, selecting outerHTML, but need to strip out just the contents of attribute data-imagezoom as per this example.
Sample data:
<a class="aaImg" href="https://imagehost.ssl.server123.com/Product-800x800/image.jpg">
<img class="aaTmb" alt="Matrix 900 x 900 test" src="https://imagehost.ssl.server123.com/Product-190x190/image.jpg" item="image"
data-imagezoom="https://imagehost.ssl.server123.com/Product-1600x1600/image.jpg" data-thumbnail="https://imagehost.ssl.server123.com/Product-190x190/image.jpg">
</img>
</a>
Current code to get that data:
$ProductInfo = Invoke-WebRequest -Uri $ProductURL
$ProductImageRaw = $ProductInfo.ParsedHTML.body.getElementsByClassName("aaImg") |
Select outerHTML
I can obviously get the first image by selecting the href attribute easily.
I was 'dirty coding' by replacing 800x800 with 1600x1600 as the filenames are the same, just a different path, but that came unstuck pretty quick when there were inconsistencies in path names.
You need to access the outer <a> element's <img> child element and call its .getAttribute() method to get the attribute value of interest:
$ProductInfo.ParsedHTML.body.getElementsByClassName("aaImg").
childnodes[0].getAttribute('data-imagezoom')
.childnodes[0] returns the first child node (element)
.getAttributes('data-imagezoom') returns the value of the data-imagezoom attribute.[1]
This should return string https://imagehost.ssl.server123.com/Product-1600x1600/image.jpg.
As for your own answer:
Using regexes (or substring search) to parse structured data such as HTML and XML is brittle and best avoided.
For instance, if the source HTML changes to use '...' instead of "..." around attribute values, your solution breaks (this particular case is not hard to account for in a regex, but there are many more ways in which such parsing can go wrong).
Cross-platform perspective:
Regrettably, the .ParsedHTML property with its HTML DOM is only available in Windows PowerShell (and its COM implementation is cumbersome and slow to work with in PowerShell).
PowerShell Core, even on Windows, doesn't support it, and there's no in-box HTML parser available (as of PowerShell Core 6.2.0).
The HtmlAgilityPack NuGet package is a popular open-source HTML parser, but it is aimed at C# and therefore nontrivial to install and use in PowerShell.
That said, this answer by TheIncorrigible1 has a working example that downloads the required assembly on demand.
[1] Note that .getAttribute() is necessary to access custom attributes, whereas standard attributes such as id and, in the case of <a> elements, href, are represented directly as object properties (e.g., .id; note that .getAttribute() works with standard attributes too.)
So, after a quick crash course in some Regex, this is what I've come up with.
(?<=data-imagezoom=").*?(?="\s)
A positive lookbehind, select all until the closing quotes and whitespace.
Thanks all.
I am working on an assignment where I must use an HTML font tag to change the font color of an perl output statement. I've followed the direction in the assignment, and this is what the code looks like:
print $FILE2 "<font color = 'red'> var1 </font>";
$FILE2 is the filehandle for the file I am writing to. Var1, Var1 is a test variables.
However, there is no color change and the syntax above is printed exactly like that to the screen. I'm not really sure what I am doing, wrong, can somebody please help me?
I have not included any headers other than strict and warnings.
Variables in Perl are always indicated by sigils. In this case you are dealing with scalars (single value container), and the sigil for a scalar variable is $. This sigil is also what allows it to be interpolated (replaced with its value) in a double-quoted string like you have. See perldata for more info.
As a side note, interpolating variables directly into HTML can be dangerous because they can contain HTML tags. If you want to avoid the possibility of the input adding tags of its own, pass the input through an HTML-escaper like encode_entities from HTML::Entities or xml_escape from Mojo::Util first. The usual way to construct HTML like this is with templates, and templating modules often provide such functionality built in.
I am using:
var template = HtmlService.createTemplateFromFile("MyForm.html");
Which contains a variable:
<?= defaultInnerHtml ?>
And I am trying to assign THIS variable with the content of another HTML file:
template.defaultInnerHtml
= HtmlService.createHtmlOutputFromFile(
"MyInnerHtml.html").getContent();
But the contents of MyInnerHtml.html become encoded: the angle brackets are replaced with entities etc. I am trying just to stuff some more Html inside the MyForm.html file before serving that back.
I can't see how to get just the raw contents to show up in place of the variable in the template.
Any help?
To disable this contextual escaping of special characters, change your template so it embeds the value of the variable like this:
<?!= defaultInnerHtml ?>
Note the exclamation mark. This specifies force-printing, which directs the template engine to embed the contents of the variable verbatim.
From Google's documentation:
Contextual escaping is important if your script allows untrusted user input. By contrast, you’ll need to force-print if your scriptlet’s output intentionally contains HTML or scripts that you want to insert exactly as specified.