Translation prevention mechanism does not work - microsoft-translator

As an example, I'd like to translate "Overview of Azure Machines" to "de", where "Azure Machines" shouldn't be translated, but there are several issues with it.
This is the documentation I am relating to.
Escaping single words via # or # works.
Not translating a sentence or parts of a sentence does not:
Input (plain): Overview of <span class="notranslate">Azure Machines</span>
Output : Übersicht über die Option "span class="notranslate">Azure Machines</span>
Issue : Opening angle bracket removed
Input (HTML) : Overview of <span class="notranslate">Azure Machines</span>
Output : --- This is the only combination that works but not documented. ---
Issue : --- This is the only combination that works but not documented. ---
Input (plain): Overview of <div class="notranslate">Azure Machines</div>
Output : Übersicht über die Datei »div class="notranslate">Azure Machines</div>
Issue : Opening angle bracket replaced
Input (HTML) : Overview of <div class="notranslate">Azure Machines</div>
Output : Übersicht über<div class="notranslate">Azure Machines</div>
Issue : Space before opening angle bracket removed
NOTE: The docs have been updated and now saying that the class="notranslate" scenario works for HTML only (so the Input (plain) examples are now out of scope here).
The following questions remain to me though:
Why is the space being removed in the last example? This basically makes it a blocker for us.
Is class="notranslate" working on every HTML tag? For example if I have a table <table class="notranslate">...</table>. If so, this should be reflected in the documentation accordingly.
I suppose the dynamic dictionary syntax (<mstrans:dictionary translation="translation of phrase">phrase</mstrans:dictionary>) is supported in textType: plain mode, why is this not the case for <something class="notranslate">phrase</something>?
For those "no-translate" scenarios I would not want to use the dictionary syntax because it would lead to a drastically increased character count and hence, increased costs.
And just a side note which has nothing to do with the question: Naming it "Twitter tag" is really weird. It has nothing to do with Twitter. These are simply two possible escaping prefix characters.

The documentation clearly states this only works in html mode. You need to use the no-translate tag to not translate part of the sentence.
Only div and span are supported, not all html tags. Also you should not repeat the content you want to be not translated outside of the no-translate tag.
For your example, span preserves the space.
Overview of <span class="notranslate">Azure Machines</span>
Übersicht über <span class=\"notranslate\">Azure Machines</span>
It's by design that this works only when the input textType is set as
HTML

Related

Why do some strings contain " " and some " ", when my input is the same(" ")?

My problem occurs when I try to use some data/strings in a p-element.
I start of with data like this:
data: function() {
return {
reportText: {
text1: "This is some subject text",
text2: "This is the conclusion",
}
}
}
I use this data as follows in my (vue-)html:
<p> {{ reportText.text1 }} </p>
<p> {{ reportText.text2 }} </p>
In my browser, when I inspect my elements I get to see the following results:
<p>This is some subject text</p>
<p>This is the conclusion</p>
As you can see, there is suddenly a difference, one p element uses and the other , even though I started of with both strings only using . I know and technically represent the same thingm, but the problem with the string is that it gets treated as a string with 1 large word instead of multiple separate words. This screws up my layout and I can't solve this by using certain css properties (word-wrap etc.)
Other things I have tried:
Tried sanitizing the strings by using .replace( , ), but that doesn't do anything. I assume this is because it basically is the same, so there is nothing to really replace. Same reason why I have to use blockcode on stackoverflow to make the destinction between and .
Logged the data from vue to see if there is any noticeable difference, but I can't see any. If I log the data/reportText I again only see string with 's
So I have the following questions:
Why does this happen? I can't seem to find any logical explanation why it sometimes uses 's and sometimes uses 's, it seems random, but I am sure I am missing something.
Any other things I could try to follow the path my string takes, so I can see where the transformation from to happens?
Per the comments, the solution devised ended up being a simple unicode character replacement targeting the \u00A0 unicode code point (i.e. replacing unicode non-breaking spaces with ordinary spaces):
str.replace(/[\\u00A0]/g, ' ')
Explanation:
JavaScript typically allows the use of unicode characters in two ways: you can input the rendered character directly, or you can use a unicode code point (i.e. in the case of JavaScript, a hexadecimal code prefixed with \u like \u00A0). It has no concept of an HTML entity (i.e. a character sequence between a & and ; like ).
The inspector tool for some browsers, however, utilizes the HTML concept of the HTML entity and will often display unicode characters using their corresponding HTML entities where applicable. If you check the same source code in Chrome's inspector vs. Firefox's inspector (as of writing this answer, anyway), you will see that Chrome uses HTML entities while Firefox uses the rendered character result. While it's a handy feature to be able to see non-printable unicode characters in the inspector, Chrome's use of HTML entities is only a convenience feature, not a reflection of the actual contents of your source code.
With that in mind, we can infer that your source code contains unicode characters in their fully rendered form. Regardless of the form of your unicode character, the fix is identical: you need to target these unicode space characters explicitly and replace them with ordinary spaces.

Style guide for documentation in HTML urges to use spaces in <code>...</code>

In the style guide for the maintenance of a bulky documentation of an existing system using HTML which I has to maintain for a client, I found, that text given in a code-tag should be enclosed with spaces like:
..., the element<code> STATE </code>matches datatype ...
In most cases the whole text is enclosed in <p> tags:
<p>..., the element<code> STATE </code>matches datatype ...</p>
Does anyone has an idea why I should write <code> STATE </code> with no place before and afterwards?
One explanation could be that rendering the HTML leads to "better" (i. e. same / bigger width, ...) constant spaces between normal text and the code (the space in code-tag seems to be "bigger"). Is that approach meaningful? Or are there arguments against this rule so I could convince the program director to kick-out this rule?
This sounds like a way of enforcing a style without, for whatever reason, using CSS.
There's no reason to do this other than to conform to somebody's preference (your boss or a client, presumably, in this case).
To back this up, the HTML specification itself uses examples of <code> elements wrapped within <p> elements which do not follow this format:
Example 104
The following example shows how the element can be used in a paragraph to mark up element names and computer code, including punctuation.
<p>The <code>code</code> element represents a fragment of computer code.</p>
— Example 104 within the HTML5.1 specification

RegExp to search text inside HTML tags

I'm having some difficulty using a RegExp to search for text between HTML tags. This is for a search function to search text on a HTML page without find the characters as a match in the tags or attributes of the HTML. When a match has been found I surround it with a div and assign it a highlight class to highlight the search words in the HTML page. If the RegExp also matches on tags or attributes the HTML code is becoming corrupt.
Here is the HTML code:
<html>
<span>assigned</span>
<span>Assigned > to</span>
<span>assigned > to</span>
<div>ticket assigned to</div>
<div id="assigned" class="assignedClass">Ticket being assigned to</div>
</html>
and the current RegExp I've come up with is:
(?<=(>))assigned(?!\<)(?!>)/gi
which matches if assigned or Assigned is the start of text in a tag, but not on the others. It does a good job of ignoring the attributes and tags but it is not working well if the text does not start with the search string.
Can anyone help me out here? I've been working on this for a an hour now but can' find a solution (RegExp noob here..)
UPDATE 2
https://regex101.com/r/ZwXr4Y/1 show the remaining problem regarding HTML entities and HTML comments.
When searching the problem left is that is not ignored, all text inside HTML entities and comments should be ignored. So when searching for "b" it should not match even if the HTML entity is correctly between HTML tags.
Update #2
Regex:
(<)(script[^>]*>[^<]*(?:<(?!\/script>)[^<]*)*<\/script>|\/?\b[^<>]+>|!(?:--\s*(?:(?:\[if\s*!IE]>\s*-->)?[^-]*(?:-(?!->)-*[^-]*)*)--|\[CDATA[^\]]*(?:](?!]>)[^\]]*)*]])>)|(e)
Usage:
html.replace(/.../g, function(match, p1, p2, p3) {
return p3 ? "<div class=\"highlight\">" + p3 + "</div>" : match;
})
Live demo
Explanation:
As you went through more different situations I had to modify RegEx to cover more possible cases. But now I came with this one that covers almost all cases. How it works:
Captures all <script> tags and their contents
Captures all CDATAblocks
Captures all HTML tags (opening / closing)
Captures all HTML comments (as well as IE if conditional statements)
Captures all targeted strings defined in last group inside remaining text (here it is
(e))
Doing so lets us quickly manipulate our target. E.g. Wrap it in tags as represented in usage section. Talking performance-wise, I tried to write it in a way to perform well.
This RegEx doesn't provide a 100% guarantee to match correct positions (99% does) but it should give expected results most of the time and can get modified later easily.
try this
Live Demo
string.match(/<.{1,15}>(.*?)<\/.{1,15}>/g)
this means <.{1,15}>(.*?)</.{1,15}> that anything that between html tag
<any> Content </any>
will be the target or the result for example
<div> this is the content </content>
"this is the content" this is the result

Parsing on HTML some specific datas

I'm working on a small app that requires me to parse an html site on the web.
My problem is as follows :
The parsing routine is working fine for some infos BUT I'm searching for hours for a way to get some infos that refuse to appear.
Here is the partial code structure I'm willing to parse :
<body>
`<header>
<nav>
<div.....>
<aside......>
<main>
<div .....>
<a ......>
<a ......>
</div>
.
.
.
<div id="general">
<h2> ........</h2>
<p>
<span class="label">text</span>
"text 2 to be parsed"
<br>
<span class="label">other text</span>
"text 3 to be parsed"
<br>
just an exemple of structure, to be precise the url is http://www.ourairports.com/airports/EBBR/pilot-info.html
OK it seems that the html code is not appearing on the preview so in the source code of the page above, when you see [div id="general"], below you have a [p] followed by [span class="label"]some text[/span] and just below that you have text between brackets. This happens on several lines and I need to catch those infos .
I've tried with : //body/div/main/div[#id='general']/p as XpathQueryString but result is 1 node and empty
also with div[#id='general'] but result is no node found,
with div[#id='general']/p/span result is no node found,
with //div/p/span[#class='label'] results are the titles between the flags and >/span> but I'm looking to retrieve the text between quotes just behind and I cannot figure out how to succeed. I think I've tried all combinations (a lot others than explained above) but no chance. Is there a special path to get to this text ?
Thanks for your advices.
By the way, this is my very first post on stackoverflow.com and My first language is french, so I do apologize in advance for any rule not followed or my bad english.
Enjoy your day, evening, ... night on the keyboard.
Alain
Your first expression //body/div/main/div[#id='general']/p is expected to return a single node, the <p>. And it works exactly that way on the referred website as you observed. The expression reaches down to that node but not deeper where the text nests. However you must get the text too, just encapsulated in html, with fancy tags around it. A good XPath selector API used properly should return the html node that was matched, including the <p> tag itself.
If all you see in the end is just the text nodes try the following:
Think of the text among the <span>s as html nodes, text() nodes.
//div[#id='general']/p/text()
This will match the "text to be parsed".
A node() will match any html node (even text among tags) and a * any non-text() node.
For any number of steps, use the double slash:
//div[#id='general']/p//text()
Now you match every text node under the <p> tag, regardless of the nesting level. And since text nodes are by definition leaf nodes (cannot contain other nodes), this guarantees that you will not match members of the same path down the tree more than once.
Some comments on you expressions:
//body is superficial, there is only one body and html defines exactly where.
Nodes quantified by #id should not need be proceeded by selectors for their parents, start with //div[#id='something unique'] .
Learn more about XPath. An API that properly returns selected "nodes" and not just concatenated text can play an important role in the understanding of how the expressions work in practice.

Html tags inside of double quote

I need to bold the words inside of double quotes.
title="Character needs to be bold"
When i put <b></b> inside of title's double quote. it just displays them as it is.
So, Is there any way i can bold the characters inside the double quotes?
Are you trying to markup the text inside a title attribute? Because that's not going to work, you'll have to resort to some kind of extended tooltip solution (can be js, but there's also ways to do it with just html/css).
See this question:
Tooltip with HTML content without JavaScript
Some more context would be appreciated though, just the title attribute doesn't give us much information
Title was edited to give context, my answer remains the same.
There is a way depending on the node you are using the 'title' attribute under. See if it has a 'format' attribute as by default the title is set to 'text'. If so, set the format="html", then you will be able to use <b> within your title.
As far as i know you cannot format the string, that is used as the title of a html document. That ist the String, that will get the titel of the tab or window of the browser etc.
If you have a html-entity, that needs formatting, you can format the whole thing with style="your css style", or other css integration. If you have a switch of formatting inside one html entity, you schould look after dividing it up into multiple entities or using another aproach. Do you have a complete example?
cheers,
nx
use < b > Character needs to be bold < / b>
no spaces in between the b and the symbols