Microsoft Translator Text API breaks notranslate spans

Microsoft Translator Text API breaks notranslate spans - microsoft-translator

I'm using the Microsoft Translator Text API to translate some sentences. My sentences contains some parts of text that I need to not being translated.
To achieve this I using <span class="notranslate"></span> by wrapping not translatable text. It works good in most cases, by in some cases MT API breaks this spans.
Examples (Input -> Output):
some <span class="notranslate">1</span> text -> деякий 1 текст
some <span class="notranslate">1</span> another text -> деякий
<span class="notranslate">1 інший </span> текст
Good Example:
some <span class="notranslate">1</span> text -> деякий <span class="notranslate">1</span> текст
I do not observe any regularities, it happens randomly. Maybe I miss something?
UPD:
I tried to send headers Content-Type: text/xml or Content-Type: text/html - the same result in both: engine breaks some spans.

I found the solution.
Microsoft Translator API 3.0 Documentation recommends to use <div class="notranslate"></div> instead of <span class="notranslate"></span>.
I use API 2 version, but seems like after changing wrapper to <div>, MT API stopped breaking of my notranslate wrappers.

With version 3.0, it's not enough to use <div>. Also as Denis Kurochkin warned, it will reduce the effectiveness of the translation (by ending the sentence prematurely).
To achieve this, use <span class="notranslate">Text won't translate here</span> or <span translate="no">Text</span>, plus include the textType=html query parameter to ensure it is working correctly:
/translate?api-version=3.0&to=zh&textType=html
Without it (regardless of span/div), it will not translate the text inside the tags, but it will translate the attributes within the tag
i.e. if you have other attributes inside the <span> tag then they will be modified, something like this:
<span data-type=""mention"" class=""mention"" data-id=""39dcf29b-fce0-4a26-90ef-6342e017c1b8"" data-label=""My name has words inside it | Super cool company"" class=""notranslate"">My name has words inside it | Super cool company

Related

How to use Google Cloud translator for HTML text and preserve the line breaks?

I'm using Google Cloud Translator API to translate some HTML texts. I set the format to HTML, and the translation qualities are pretty good (it keeps all the tags untranslated and only translated the text between the tags). However, it often removes all the line breaks in the HTML text. For example, I selected the English-German option, and
<p><a class="selfLink" id="notes" href="#notes" rel="help"><strong>Notes</strong></a>
<ul>
<li><a class="selfLink" id="disclaimer" href="#disclaimer" rel="help">DISCLAIMER OF LIABILITY</a>
...
becomes
<p><a class="selfLink" id="notes" href="#notes" rel="help"><strong>Anmerkungen</strong></a><ul><li> <a class="selfLink" id="disclaimer" href="#disclaimer" rel="help">...
It's very difficult to read the translated text since it's all in one line. I know that I can set the translator mode to treat the input text as "text" to preserve the line breaks, but in text mode, the translator is not able to identify HTML entities and determine whether a piece of text should be translated or not. Manually adding the line breaks is not a desirable approach. What can I do to improve the readability of the HTML translation?

Disappearing newlines is one of the features of the HTML mode, another is that some of the Unicode characters will turn into HTML entities. You will run into it sooner or later :-)
The solution is to replace all newlines with <br/> before sending the text to Google Translate API, and after getting the translation replace <br/> with newlines + making HTML decode.

Non-breaking space removed by Text API

I'm using the Microsoft Translator Text API to translate parts of a webpage. The platform we use, inserts in the HTML to render empty lines. So a part of the webpage can be:
<p>
<span>This is a dummy text</span>
</p>
<p>
<span> </span>
</p>
When I send this to the Microsoft Translator Text API, it returns the following HTML:
<p>
<span>Il s’agit d’un texte factice</span>
</p>
<p>
<span></span>
</p>
I've set the content type to text/html, and escape the HTML characters to be able to send it to the API (so will be replaced with &nbsp;). But the text that is returned by the API has completely lost the .
How can I prevent the API from removing the instances in the HTML? Or is this a bug in the API?

A notranslate span may help to prevent translation. You would have to try it to see if it does indeed preserve the nbsp tag.

See the answer to Microsoft Translator API - notranslate trimming leading space? from Chis Wendt (Microsoft):
Translator trims leading and trailing space, and compresses any other white space to a single space. This is by design. Translator needs to move the words around freely to form the newly composed sentence, and wouldn't know what to do with the extra white space. A workaround would be to trim in your code before translation, and then restore the trimmed off pieces afterwards, depending on the context.
Line breaks and non-breaking spaces tend to be used for specific line layout based on the particular source text that would need to be laid out differently in another language in any case because of different word lengths and arrangements of the significant words.

Draw multiple overlines in Qt rich text

I want to display logic equations in a QTextBrowser. It would be a lot better if I could draw overlines to symbolise the "not".
Right now I am able to draw one overline using text-decoration:overline :
Not(A) = <span style="text-decoration:overline"> A </span>
But it doesn't work if I want multiple overlines. For example the following equation :
Not(A or not B) = <span style="text-decoration:overline"> A or <span style="text-decoration:overline"> B </span> </span>
Is there a solution or a workaround to be able to do this?

MathML is not HTML, and QTextBrowser supports a simplified variant of HTML. It knows nothing of MathML. Alas, it might be a relatively simple change to the layout engine to implement this aspect of MathML, though. Worth looking into it I'd think.

How to edit html classes inline?

lets say you have html like this,
<span class="full-sentence">
<span class="subject">She</span><span class="verb">loves</span><span class="object">him</span>
.</span>
What the user sees is,
She loves him.
Using a wysiwyg HTML inline editor, you could change the "She loves him." string into something else, like "He loves her coat." for example, but you would have no way of adding the span class "noun" to the word coat in the wysiwyg editor without displaying the source code to some extent.
I'm trying to find a way to do this, first by displaying the span classes text, such as "verb" from the , display the "verb" string in the output, and allowing it to be changed inline, and have it transform the string inside the sourcecode right inside the parenthesis of class=""
I'm trying to accomplish this WITHOUT displaying anything irrelevant, such as the <span class=""></span> characters. All the user really needs to work with is the spot inside the "" marks, the text itself, and have the ability to add new span class boundaries, by highlighting a string and pressing some kind of button that wraps that highlighted string in <span class=""></span> and then allows you to write classes to fill the "".
It would ideally look something like this, without the awkward spacing between text strings as in a libreoffice writer table, which this is a picture of,
![enter image description here][2]

The default ckeditor shows parent elements in the bottom bar, and permits right click to edit properties of select elements. You could use a bar or a mouseover to edit and display the word type, and you would also want some cleanup code for when people paste html and join two words etc.
From a technical perspective it's relatively trivial once you are clear on the interface you want to make your own custom WYSIWYG.
<div contenteditable="true">
Editable text...
</div>

Can I slice a word with </span> for the sake of structured data?

I have this line inside a ProfessionalService itemscope:
Az <span itemprop="makesOffer">ágyi poloska irtását</span> permetezéses módszerrel végezzük.
This is in Hungarian and the problem comes from my language too. For search engines I would like to communicate the offer is "ágyi poloska irtás" without the addendum "át" so it would look like this:
Az <span itemprop="makesOffer">ágyi poloska irtás</span>át permetezéses módszerrel végezzük.
Is this legal? Can I break a word with a </span> closing tag?
Sorry I can't come up with an English example. The example sentence is about how the company exterminate bed bugs it would read like this in English: The <span itemprop="makesOffer">bed bug extermination</span> done by spraying method. but in English it works.

Yes, it is valid and it can make sense to do this.
Any conforming Microdata parser will get the value "ágyi poloska irtás" for the property makesOffer.
Following the HTML5 specification, consumers would have no reason to break the word (e.g., by adding whitespace or a line break) if it contains a span element (… which does not necessarily mean that you won’t find consumers that do this nonetheless).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Microsoft Translator Text API breaks notranslate spans - microsoft-translator

I found the solution. Microsoft Translator API 3.0 Documentation recommends to use <div class="notranslate"></div> instead of <span class="notranslate"></span>. I use API 2 version, but seems like after changing wrapper to <div>, MT API stopped breaking of my notranslate wrappers.

Related

How to use Google Cloud translator for HTML text and preserve the line breaks?

Non-breaking space removed by Text API

Draw multiple overlines in Qt rich text

How to edit html classes inline?

Can I slice a word with </span> for the sake of structured data?

Categories

Resources