How to incorporate HTML tags for styling certain parts of localized strings (Polymer) - html

The web application I am working on uses resource strings for localization. The issue I am having is with styling certain parts of these strings. Let's say I want to display this string:
user1234 created a new document.
So in the resource file it would be localized like so:
{username} created a new document.
The issue is I also need <b></b> tags around {username}. I can't put these tags in the html file because I need it to apply just to the username, not to the whole localized string. So unless I split up the string into two localized strings (which I should definitely not do, because other languages do not necessarily have the same sentence structure), I have to put these html tags in the localized string itself:
<b>{username}</b> created a new document.
Even if we disregard best practices for a moment (of which I have read briefly) and go with this, this solution isn't working for me. I believe this is because the application is using Polymer (this seems to work with Angular). So if we stick by the following two requirements:
Use Polymer
Have the whole string together as one resource string
then there doesn't seem to be a way to style certain parts of the string. Does anyone know a solution?

I got it to work by setting the resource string to the inner HTML of the element which contains the string. So let's say the div containing the text has id="textElem", in the Javascript I set the inner HTML like so:
this.$.textElem.innerHTML = this.localize('user_created_document', 'username', this.username)
I suppose I should have specified in the question that my previous attempts of setting the string were just (a) simply binding the string to the property of an object and referencing that in the HTML, and (b) localizing the string directly in the HTML, neither of which worked.

Related

Extracting string from html web scrape

I'm looking for some guidance on a web scraping script i'm working on.
All is going well but I'm stuck on stripping out the image file data.
I'm currently doing a WebRequest, getting elements by class, selecting outerHTML, but need to strip out just the contents of attribute data-imagezoom as per this example.
Sample data:
<a class="aaImg" href="https://imagehost.ssl.server123.com/Product-800x800/image.jpg">
<img class="aaTmb" alt="Matrix 900 x 900 test" src="https://imagehost.ssl.server123.com/Product-190x190/image.jpg" item="image"
data-imagezoom="https://imagehost.ssl.server123.com/Product-1600x1600/image.jpg" data-thumbnail="https://imagehost.ssl.server123.com/Product-190x190/image.jpg">
</img>
</a>
Current code to get that data:
$ProductInfo = Invoke-WebRequest -Uri $ProductURL
$ProductImageRaw = $ProductInfo.ParsedHTML.body.getElementsByClassName("aaImg") |
Select outerHTML
I can obviously get the first image by selecting the href attribute easily.
I was 'dirty coding' by replacing 800x800 with 1600x1600 as the filenames are the same, just a different path, but that came unstuck pretty quick when there were inconsistencies in path names.
You need to access the outer <a> element's <img> child element and call its .getAttribute() method to get the attribute value of interest:
$ProductInfo.ParsedHTML.body.getElementsByClassName("aaImg").
childnodes[0].getAttribute('data-imagezoom')
.childnodes[0] returns the first child node (element)
.getAttributes('data-imagezoom') returns the value of the data-imagezoom attribute.[1]
This should return string https://imagehost.ssl.server123.com/Product-1600x1600/image.jpg.
As for your own answer:
Using regexes (or substring search) to parse structured data such as HTML and XML is brittle and best avoided.
For instance, if the source HTML changes to use '...' instead of "..." around attribute values, your solution breaks (this particular case is not hard to account for in a regex, but there are many more ways in which such parsing can go wrong).
Cross-platform perspective:
Regrettably, the .ParsedHTML property with its HTML DOM is only available in Windows PowerShell (and its COM implementation is cumbersome and slow to work with in PowerShell).
PowerShell Core, even on Windows, doesn't support it, and there's no in-box HTML parser available (as of PowerShell Core 6.2.0).
The HtmlAgilityPack NuGet package is a popular open-source HTML parser, but it is aimed at C# and therefore nontrivial to install and use in PowerShell.
That said, this answer by TheIncorrigible1 has a working example that downloads the required assembly on demand.
[1] Note that .getAttribute() is necessary to access custom attributes, whereas standard attributes such as id and, in the case of <a> elements, href, are represented directly as object properties (e.g., .id; note that .getAttribute() works with standard attributes too.)
So, after a quick crash course in some Regex, this is what I've come up with.
(?<=data-imagezoom=").*?(?="\s)
A positive lookbehind, select all until the closing quotes and whitespace.
Thanks all.

Internationalizing Very Long Texts With GWT (preferably including HTML)

I'm developing an application using GWT (first-timer) and am now at the stage where I want to establish a central structure to provide actual text-based content to my views.
Even though it's obviously possible to define those text-values inline (using UiBinder or calling the appropriate methods on the corresponding objects), I'd be much more comfortable storing them in a central place as is possible using GWT's Constants. Indeed, my application will only be available in one language (for now), so all-the-way i18n may seem overkill, but I'm assuming that those facilities might be best-suited for what I require, seeing how they, too, must have been designed with providing all (constant) text content in mind.
However, my application features several passages of text that are somewhat longer and more complex than your average label text, meaning they could span several lines and might require basic text formatting. I have come up with several ideas on how to fix those issues, but I'm far from satisfied.
First problem: Lengthy string values.
import com.google.gwt.i18n.client.Constants;
public interface AppConstants extends Constants {
#Constants.DefaultStringValue("User Administration")
String userAdministrationTitle();
// ...
}
The sample above contains a very simple string value, defined in the manner that static string internationalization dictates (as far as I know). To add support for another language, say, German, one would provide a .properties file containing the translation:
userAdministrationTitle = Benutzeradministration
Now, one could easily abuse this pattern to a point and never provide a DefaultStringValue, leaving an empty string instead. Then, one could create a .properties file for the default language and add text like one would with a translation. Even then, however, it is (to my knowledge) not possible to apply line-breaks for long values simply to keep the file somewhat well-formatted, like this:
aVeryLongText = This is a really long text that describes some features of the
application in enough detail to allow the user to act on a basis
of information rather than guesswork.
Second problem: Formatting parts of the text.
Since the values are plain strings, there isn't much room for formatting there. Instinctively, I would do the same thing as I would if I were writing the text straight into the regular HTML document and add HTML-tags like <strong> or <em>.
Further down the road, at the point where the strings are read and applied to the widget that's going to display them, there is a problem though: setting the value using a method like setText(String) causes that string to be escaped and the HTML-tags to be printed alongside the rest of the text rather than to be interpreted as formatting instructions. So no luck.
A way to solve this would be to disect the string provided by the i18n file and isolate any HTML-tags, then baking the mess together again using a SafeHtmlBuilder and using that to set the value of the widget, which would indeed result in a formatted text being displayed. That sounds like much of an overcomplication though, so I don't quite like that idea.
So what am I looking for now, dear user who actually read this all the way through (thanks!)? I'm looking for solutions that don't require hacks like the ones described above and provide the functionality that I'm looking for. Alternatively, I welcome any guidance if I'm on the wrong path entirely (GWT first-timer, as I mentioned one eternity ago :-) ). Or basically anything that's on topic and might help find a solution. An acceptable solution, for example, would be a system like the string value files used in Android development (which allows for HTML-styling the texts but obviously requires the containing UI elements to accept that).
Fortunately, there is a standard solution that you can use. First, you need to create a ClientBundle:
public interface HelpResources extends ClientBundle {
public static final HelpResources INSTANCE = GWT.create(HelpResources.class);
#Source("account.html")
public ExternalTextResource account();
#Source("organization.html")
public ExternalTextResource organization();
}
You need to put this bundle into its own package. Then you add HTML files to the same package - one for each language:
account.html
account_es.html
organization.html
organization_es.html
Now, when you need to use it, you do:
private HelpResources help = GWT.create(HelpResources.class);
...
try {
help.account().getText(new ResourceCallback<TextResource>() {
#Override
public void onError(ResourceException e) {
// show error message
}
#Override
public void onSuccess(TextResource r) {
String text = r.getText();
// Pass this text to HTML widget
}
} catch (ResourceException e) {
e.printStackTrace();
}
You need to use HTML widget to display this text if it contains HTML tags.
If you're using UiBinder, i18n support is built-in. Otherwise, use Messages and Constants and use the value with setHTML rather than setText.
For long lines, you should be able to use multiline values in properties files by ending lines with a backslash.

Adding metadata to markdown text

I'm working on software creating annotations and would like my main data structure to be based around markdown.
I was thinking of working with an existing markdown editor, but hacking it so that certain tags, i.e. [annotation-id-001]Sample text.[/annotation-id-001] did not show up as rendered HTML; the above would output Sample text. in an HTML preview and link to a separate annotation with the ID 001.
My question is, is this the most efficient way to represent this kind of metadata inside of a markdown document? Also, if a user wants to legitimately use something like "[annotation-id-001]" as text inside of their document, I assume that I would have to make that string syntax illegal, correct?
I don't know what Markdown parser you use but you can abord your problem with different points of view:
first you can "hack" an existing parser to exclude your annotation tags from "classic" parsing and include them only in a certain mode
you can also use the internal "meta-data" information proposed by certain parsers (like MultiMarkdown or MarkdownExtended) and only write your annotations like meta-data with a reference to their final place in content
or, as mentionned by mb21, you can use simple links notation like [Sample text.](#annotation-id-001) or use footnotes like [Sample text.](^annotation-id-001) and put your annotations as footnotes.

Getting all image tags and changing src in Java

I have created a method that for a given url converts the html into a string. With this string in memory, I would like to find all img tags with a certain data-XXX attribute, extract their src attribute and then change it.
What would be the cleanest way to do this? I have tried XPathReader but it crashes when it finds comments in code... Any other XML parser that would allow me to query for certain attributes without having to go through all tags myself?
Also I've read about regexp but it does not feel right somehow.

Mapping plain text back into HTML document

Situation: I have a group of strings that represent Named Entities that were extracted from something that used to be an HTML doc. I also have both the original HTML doc, the stripped-of-all-markup plain text that was fed to the NER engine, and the offset/length of the strings in the stripped file.
I need to annotate the original HTML doc with highlighted instances of the NEs. To do that I need to do the following:
Find the start / end points of the NE strings in the HTML doc. Something that resulted in a DOM Range Object would probably be ideal.
Given that Range object, apply a styling (probably using something like <span class="ne-person" data-ne="123">...</span>) to the range. This is tricky because there is no guarantee that the range won't include multiple DOM elements (<a>, <strong>, etc.) and the span needs to start/stop correctly within each containing element so I don't end up with totally bogus HTML.
Any solutions (full or partial) are welcome. The back-end is mostly Python/Django, and the front-end is using jQuery. We would rather do this on the back-end, but I'm open to anything.
(I was a bit iffy on how to tag this question, so feel free to re-tag it.)
Use a range utility method plus an annotation library such as one of the following:
artisan.js
annotator.js
vie.js
The free software Rangy JavaScript library is your friend. Regarding your two tasks:
Find the start / end points of the […] strings in the HTML doc. You can use Range#findText() from the TextRange extension. It indeed results in a DOM Level 2 Range compatible object [source].
Given that Range object, apply a styling […] to the range. This can be handled with the Rangy Highlighter module. If necessary, it will use multiple DOM elements for the highlighting to keep up a DOM tree structure.
Discussion: Rangy is a cross-browser implementation of the DOM Level 2 range utility methods proposed by #Paul Sweatte. Using an annotation library would be a further extension on range library functionality; for example, Rangy will be the basis of Annotator 2.0 [source]. It's just not required in your case, since you only want to render highlights, not allow users to add them.