API call - the SMT category - microsoft-translator

I have recently tried to review the Chinese -> English system. According to https://blogs.msdn.microsoft.com/translation/2017/11/15/microsoft-translator-accelerates-use-of-neural-networks-across-its-offerings/ , those systems were already switched to NMT models. There is also statement, that user can still use the statistical system when setting category to "SMT".
However the https://blogs.msdn.microsoft.com/translation/2016/01/27/new-microsoft-translator-customization-features-help-unleash-the-power-of-artificial-intelligence-for-everyone/ mentions there were actually three standard categories available for SMT engines: General(default), TECH, SPEECH.
Could you please explain which domain is offered by the SMT category now? And for how long it will be supported on your side?
Thanks

We are working on customizaton using a neural network decoder. Currently, the Microsoft Translator Hub has 3 Category IDs for SMT and they are general, tech and speech.
With content that is not narrowly confined to your domain, you may find it to be better using category=generalnn than your current customization.

Chinese is using the NMT system so using Category=generalnn would result in the same translation when calling the service using the Microsoft Translator Text API.
The second article is addressing Customization where you can create your own custom translation system or dictionary tuned to your domain, style and terminology. If you're interested in customization (SMT at this time), there are categories associated with using the Translator Text API and the Microsoft Translator Hub. The category identifies the domain for the project you create using the Hub. Two of the categories are Tech and Speech.
See the Microsoft Translator Hub User Guide to learn more about the Hub.

The tech category will produce different results only when translating FROM English to other languages. In the case of English>Chinese, with my sample sentence "My computer doesn't boot up.", it does. For Chinese>English, specifying "tech" will fall back to the default, which is neural in the case of Chinese<>English. "speech" generates the same results as "generalnn" in all cases.
It is generally true, including for Hub categories, that a category that is valid in one language pair is valid in all language pairs. The API will fail with an "invalid category" error only if that category doesn't exist at all. The reason for this design is so that you can build your custom systems out language by language, over time, while still allowing the user to choose between all available languages, at the cost of, maybe, occasionally suboptimal domain vocabulary in an as of yet uncustomized language pair.
The API does not return to you whether a customized system was used or not. A trick to get that feature anyway is to watermark your custom system using a dictionary entry. Make a dictionary entry "_mywatermark" that translates to "CustomSystem180309_1700_en_ru" for instance, and then you can test anytime, in any application, whether you are getting your custom system or not.

Related

Person schema is not detecting into Rich Results Test?

I have used Person schema on my site as per schema.org standard but the rich result testing tool is not detecting the person schema. As the older Structure data testing tool detects the person schema.
Is the person schema is supported by Google or not?
Person schema schema.org:- https://schema.org/Person
Testing screenshot:- https://a.cl.ly/E0ur0e69
The Rich Results Test only reports top level entities that generate Rich Results in Google. Person is not one of them. Person is a supported entity, but only used inside other entities, like an author of a review.
I faced the same issue with "Person" schema type while trying to test my markup in Google's Rich Results Test tool. After spending much time online to find out the issue, came across a very good article that gives the answer to this issue. Apparently, not all schema types are supported in Google's Rich Test Result tool.
The following is a list of Supported rich result types that the Google's tool currently supports as mentioned in their Seach Console Help Page.
AMP article
Article
Breadcrumb
Carousel
Course
Critic Review
Dataset
Employer rating
Estimated salary
Event
FAQ
Fact check
Guided recipe
How-to
Image License
Job posting
Job training
Local business
Logo
Movie
Product
Q&A page
Recipe
Review snippet
Sitelinks searchbox
Software app
Special Announcement
Video
As you can see, schema type "Person" is not in the supported list.
Thanks to this wonderful article SEO for 2021: How to Use Google's New Testing Tool for Structured Data written by Anne Fernandez that has some additional tips.

How to determine the language set for a published workshared Revit model?

Our company has offices in both Germany and overseas.
Therefore our employees work on Revit models using both the German and English versions of the software.
We have found that the workshared models published to BIM360 have the language of their model derivates set to the language of the last Revit instance to save / synchronize the model.
As we are working with these model derivates, we are very interested in pre-emptively determining the language of the data.
The Version json data for the model obtained from the GET projects/:project_id/folders/:folder_id/contents endpoint doesn't appear to include the language the model was last saved in.
Is there anything we missed with regards to determining the language of the model derivative data before obtaining it (and determining the language from the data itself).
Alternatively, having the option to specify the desired language upon requesting the data from the API using the GET :urn/metadata/:guid and GET :urn/metadata/:guid/properties endpoints would be the best, and most proper, solution.
Unfortunately, Forge API doesn't support such a feature currently after checking with our engineering team. So, I summited a wishlist item RVTLMV-1692 on your behalf for adding i18n support on the Revit built-in properties. You're welcome to track the updates or provide us additional information via sending an email quoting the RVTLMV prefixed id to Forge Help email alias in the future. Cheers!

Visualizing chatbot structure

I have created a chatbot using Snatchbot for the purpose of a quiz. I have been asked to create a dynamic decision tree structure for the chatbot which must be displayed on the web page, i.e. everytime the user answers a question, a branch on the tree must be created according the user's response. Is there anyway to do this? Is it possible to generate the JSON for the structure of the chatbot rather than the JSON for previous conversations? Would any other platform such as dialogflow be more suitable?
I am also using SnatchBot, you will need to use the NLP section to create all your samples and train your Data, then you could add global connections, Giving the possibility to direct the bot to the needed subject at any point of the conversation.
The value of this tool is that it allows the user to immediately (and at any point in the conversation) direct the bot to a particular subject.
Technical perspective, I have some recommendations for you:
https://jorin.me/chatbots.pdf (Development and Applications)
https://www.researchgate.net/publication/325607065_Implementation_of_a_Chat_Bot_System_using_AI_and_NLP (Implementation Using AI And NLP)
Strategy perspective, here are the crucial 6 different main criteria for enterprise chatbot implementation success:
Defining clear audience profiles of the project
Identifying clear goal for the project
Defining clear Dialog-flow Key Intents Related
Platform’s Customer Experience SWOT assessment Forming coherent teams
Testing and involving the audience from early on in the validation of
the project
Implementing feedback analytics to be used as basis for
continuous improvement
(Source: http://athenka.com)

Autodesk Forge randomly loses object and room information

I'm using Autodesk Forge to integrate with our remodeling tool. In particular, I need to count objects of different families and types and determine to what room they actually belong. I use Model Derivative API for this purpose. To keep the room/area information I convert .rvt files to .nwc files as suggested here. However, when I retrieve data with GET /modelderivative/v2/designdata/{urn}/metadata/{guid}/properties I face the following problems from time to time:
Room information sometimes disappears from Objects for some reason
Objects disappear from result data for some reason (but they seem to exist when I browse them in A360)
I have no idea, what can be the reason for this.
I have no explanation for the disappearance of room data or objects for you.
If you can provide a reproducible case demonstrating that, I will gladly pass it on to the development team for analysis.
If you are interested in an immediate reliable solution and full control, which I assume is the case, I would suggest following the second bullet item in the advice provided by Eason in the previous answer that you refer to above:
Extract all the room information and object relationships you are interested in via the Revit API, store that data somewhere yourself, and use it later on wherever you like to your heart's content.
Then you will be completely safe and independent of all other components and their unpredictable behaviour.
If the only information that you need is the room containing each family instance, I can even implement a suitable Revit add-in for you.
Another suggestion that might help, if that is indeed the data you require: determine that information in a Revit add-in and attach it to each family instance in your own personal shared parameter. That will ensure that it remains intact through the translation process. Afaik, all shared parameter data is retained, independent of other behaviour.

Reliably extracting identity fields from scanned documents / images?

I have to pull two pre-printed (not hand-written) fields out of a paper form, such that it can be automatically routed after being scanned. The fields contain batch and item identifiers, like "GG-9192" or "EPN/245G".
I've tried the following software:
Tesseract-OCR
Cuneiform
Canon ImageRunner built-in OCR
Asprise OCR Java API (demo)
I've tried the following settings:
Scanning at resolutions of 300dpi and 600dpi
Tried different fonts, including OCR-A and OCR-B.
In all cases output was pretty much all over the place. I can kick back documents for which I can't properly extract the necessary information, but I'm thinking it's going to be at least half of them. I considered some sort of fuzzy logic based on known values in a database, but sometimes these identifiers can differ by a single character, like "123G" and "123C".
Is this a lost cause? Perhaps OCR just isn't mature enough to handle a requirement of this nature? What other techniques might you recommend? Barcodes?
Edit: the containing application is in Java, so any recommendations for which there are free or cheap Java-based APIs for would help.
Edit 2: if anyone is interested...without any special tuning, Cuneiform for Linux and the Canon ImageRunner worked best, with Tesserect-OCR and Asprise Java API producing the worst results...none of the four was acceptable for anything but standard document search grade OCR. I'm beginning to think that this isn't going to work out.
If you have control over the fields, why use a human-readable format in the first place? For scanning, it seems like a QR Code, or something similar would be best. It is marked for orientation, and has some built-in error correction.
http://en.wikipedia.org/wiki/QR_Code
I started digging for products starting with Tomato's suggestion. I tried ABBYY and CVISION. Both have products that can automate OCR:
CVISION Maestro Recognition Server 4.0
ABBYY Recognition Server 2.0
In addition, ABBYY has SDKs for various platforms, and CVISION has an SDK that appears to work with at least VB/VC++.
I haven't tried either SDK yet, and am not sure it's necessary for my project. All I need is PDFs coming in that I can extract the text from. I did however try CVISION's server product and with the OCR on its most accurate settings, it worked really well. I haven't tried ABBYY's server product yet because I have to go through a reseller to get a trial. I'm in the process of doing so, but if it starts getting annoying I'm probably going to go with CVISION. I did try ABBYY's FineReader standalone product, and it worked very well, so I assume that their server product would also.