Possible to add multiple language variants to projects in the same workspace? - microsoft-translator

I'm in the process of setting up projects and training engines in a workspace.
I want to create all the projects using the same metadata so that they will all share the same Category ID (so they can all be called from the same TMS workflow).
However, it won't let me set up projects that share the same language pair from the drop-down, even if the project names are different (e.g. Project_fr-fr, Project_fr-ca).
This is a blocker because, for example, the drop-down only has English>French, while I have projects with fr-fr and fr-ca as target.
Any workaround suggestions welcome.
Thanks,
Colin

Language pair must be unique under the same workspace to know which custom model to use during the translation time. When you create Project_fr-ca, add label to add uniqueness. So, fr-fr endpoint (category id) would look like -, and fr-ca would be --.
-ME

Related

API call - the SMT category

I have recently tried to review the Chinese -> English system. According to https://blogs.msdn.microsoft.com/translation/2017/11/15/microsoft-translator-accelerates-use-of-neural-networks-across-its-offerings/ , those systems were already switched to NMT models. There is also statement, that user can still use the statistical system when setting category to "SMT".
However the https://blogs.msdn.microsoft.com/translation/2016/01/27/new-microsoft-translator-customization-features-help-unleash-the-power-of-artificial-intelligence-for-everyone/ mentions there were actually three standard categories available for SMT engines: General(default), TECH, SPEECH.
Could you please explain which domain is offered by the SMT category now? And for how long it will be supported on your side?
Thanks
We are working on customizaton using a neural network decoder. Currently, the Microsoft Translator Hub has 3 Category IDs for SMT and they are general, tech and speech.
With content that is not narrowly confined to your domain, you may find it to be better using category=generalnn than your current customization.
Chinese is using the NMT system so using Category=generalnn would result in the same translation when calling the service using the Microsoft Translator Text API.
The second article is addressing Customization where you can create your own custom translation system or dictionary tuned to your domain, style and terminology. If you're interested in customization (SMT at this time), there are categories associated with using the Translator Text API and the Microsoft Translator Hub. The category identifies the domain for the project you create using the Hub. Two of the categories are Tech and Speech.
See the Microsoft Translator Hub User Guide to learn more about the Hub.
The tech category will produce different results only when translating FROM English to other languages. In the case of English>Chinese, with my sample sentence "My computer doesn't boot up.", it does. For Chinese>English, specifying "tech" will fall back to the default, which is neural in the case of Chinese<>English. "speech" generates the same results as "generalnn" in all cases.
It is generally true, including for Hub categories, that a category that is valid in one language pair is valid in all language pairs. The API will fail with an "invalid category" error only if that category doesn't exist at all. The reason for this design is so that you can build your custom systems out language by language, over time, while still allowing the user to choose between all available languages, at the cost of, maybe, occasionally suboptimal domain vocabulary in an as of yet uncustomized language pair.
The API does not return to you whether a customized system was used or not. A trick to get that feature anyway is to watermark your custom system using a dictionary entry. Make a dictionary entry "_mywatermark" that translates to "CustomSystem180309_1700_en_ru" for instance, and then you can test anytime, in any application, whether you are getting your custom system or not.

Namespace for (DDD) entities cutting across domains

I have a couple of business-related domains like Purchase, Marketing and Economy. Having the models arranged into a namespace* for each domain would be nice, but there are some entities cutting across domains, like an Item. How to organize those cross-cutting objects?
* = As in C#/Java/Python namespaces.
Since you have the concept of Bounded Context, you should not share domains between the namespaces. Actually, you should have one Item for each namespace that requires it, and each of those Item should have it's own fields as required by the context it is included.
As Eric Evans said, it is not a big deal replicate data in order to never share the same domain between contexts, but only data.
Determining whether you have the correct design will require some experience with the domain so you should check with your domain expert.
You may very well require a Shared Kernel for classes that are cross-cutting. You'll have to be careful that you do not abuse the shared kernel by placing too many generic / logical classes in there.
To add to what #rafaels88 has answered you may need to create a BC specific domain construct where some logical entity exists. For instance, a User in the Identity & Access Control BC would be an Author in one BC but perhaps a Supervisor in another.
You could also duplicate an AR in one BC as a VO in another. A Customer in the CRM BC may be the system of record for a customer and, therefore, contain a whole lot more information. In the Order BC, however, a Customer VO may only contain an Id, Name, and perhaps Address (for example).
So you will need to evaluate what type of object you have before deciding where to place it.

How can I disable semantic notations in text areas in Semantic MediaWiki Forms?

I am working on a user-moderated database and settled on MediaWiki with Semantic MediaWiki as an engine. I installed Semantic Forms to force the end users to conform to a certain standard when creating or editing entries. The problem is that since a user can add a semantic notation to any form text input it can throw off the proper structure of the system, i.e. if it was an IMDB clone a user can add [[Directed by:Forest Gump]] which would then result in the movie "Forest Gump" showing up under a list of directors.
I doubt that there's any setting that can simply turn this off or on, but I've had one or two ideas as to how to get it working.
One, perhaps there's a way to disable semantic notation on specific namespaces and put the forms on those namespaces. I have a feeling that this will cause the forms to merely break.
Another idea is to modify the code. This is clearly the less ideal approach. To get started, I believe I would need to create some sort of filter on SFTextAreaInput which would disable semantic notations for the user inserted text, but alas I'm unsure as to how to get started on that.
Well, Semantic MediaWiki is still a Wiki. In your classical enterprise database, you restrict the users' input options as a means of ensuring data integrity. That isn't what wikis do; the thinking with a wiki is, yes, the user can enter incorrect information, but another user will amend it and let the first user know what was wrong.
I wouldn't try to coerce SMW into rigid data acquisition. I mean, you do have options such as removing the standard input fields in forms:
'''Free text:'''
{{{standard input|free text|rows=10}}}
If users are selecting a movie page when they should be selecting a director page, then you probably want to encourage correct selection by populating the form control from the Directors category, like:
{{{field|Director|input type=combobox|values from category=Directors}}}
Yes, they can still go very far out of their way to select "Forrest Gump", but if that happens then the fact that someone wilfully circumvented the preselected correct options is a more pressing concern than the fact that the system permits it.
Wikis work best when the system encourages rather than enforces valid knowledge.
My name is Wolfgang Fahl I am behind the smartMediaWiki approach. You might want to go the smartMediaWiki route
see
http://semantic-mediawiki.org/wiki/SMWCon_Spring_2015/smartMediaWiki
For a start don't go just by the property values but e.g. also by a category.
{{#ask: [[Category:Movie]] [[Directed by::+]]
|?Directed by
}}
will only show pages that have both the property set and are in the correct category.
In the smartMediaWiki approach you'd create a topic "Movie" and the entry of movies would be done via Forms. This is an elaboration of the SemanticForms and semantic PageSchemas idea that recently evolved. You can find out more about this at SMWCon Barcelona 2015 this fall.

MediaWiki extension to support taxonomy by genus and species

I'm trying to build a MediaWiki-based website for a very specific purpose. Namely, I would like to create a field guide for a specific group of animals (reptiles and amphibians). Since the people I would want to generate content on the website aren't necessarily techies, I'd like to make things as easy and painless as possible for contributors.
Now, in most groups of animals, taxonomic designations are fluid, and change all the time. As an example, consider the following:
A species used to be called Genus1 species1. It was then called Genus2 species1. As of now, this species has been split into several species, say Genus2 species1, Genus2 species2, Genus2 species3, etc. In the worst case, anything about the nomenclature and classification of the species could change, including, but not limited to, the species being moved, split or merged with any other species.
For users, these changes should be transparent. That is, on typing in http://url_of_wiki/wiki/Genus1_species1, they should automatically be redirected to the lowest taxonomic group (in this case Genus2) that is non-ambiguous. Essentially, if a page is redesignated (moved, split or merged), I would like to automatically create all new pages and redirects required.
I should be able to implement this as an extension quite easily. However, I've read the MediaWiki documentation on extensions, but haven't been able to figure out just what part of MediaWiki it would be best to target.
So, the question is, is this type of extension best implemented as a parser extension, by adding new tags, or a user-interface extension, or a combination of the two (a user-interface extension backed by a parser extension)?
Nice challenging problem! If it were up to me I would solve it in a different way:
use page level for genera and
sub page level for species.
This will automatically take care of renaming since redirects will be made.
Alternatively:
- use page level for species and
- categories for genera.
Then use an if pagename template (see Wikipedia example) to change the category based on the page name.
Or possibly combine these methods.
(See also Wikis and Wikipedia)

How do you come up with names for your namespaces?

I'll preface this by saying that I usually work in C#/.Net.
Normally, I use a naming scheme that puts common, reusable components into a namespace that reflects our organization and project-specific components into a namespace tied to the project. One of the reasons I do this is that I sometimes share my components with others outside my department, but within the organization. Project-specific namespaces are typically prefaced with the name or abbreviation of the department. When I reuse code between projects, I typically migrate it into one of the organization-based namespaces.
For example:
UIOWA.DirectoryServices contains classes that deal with the specific implementation of our Active Directory.
UIOWA.Calendar contains classes that deal with the University's master calendar.
LST.Inventory.Datalayer holds the classes implementing the data layer of the Learning Spaces Technology group inventory application.
I'm embarking on a project now for an entity that has a fuzzier connection to the Unviersity (a student group that runs a charity event) that has the potential to be sold outside of our University and, thus, it doesn't really fit into my normal naming conventions, i.e., the department is only the first customer of potentially many that might use the project.
My inclination is to go the organization naming route and create an "organizational project" name space for this application. I'd like to hear how others handle this and any advice you might have.
Thanks.
See also this related question about namespace organization.
EDIT
I ended up creating the org/project namespace UIOWA.MasterEvent and deriving further namespaces from there. Still interested in other opinions for future projects.
My department got his name changed thrice in the last five years, so we're all glad that someone decided against using namespaces with organisational names...
Our namespaces are organised by project names. Reusable stuff is put into the Toolbox namespace. Perhaps a bit crude, but it works quite well so far.
I'm a .NET developer, and I always use the organisational project namespace (com.bolidian.projectspace) because it guarantees uniqueness.
I use the organisation, followed by the product eg Acme.Crm. When grouping classes together in a subnamespace always use a plural or action so that it cant clash with a class. eg
Acme.Crm.Letters
Acme.Crm.Invoicing
I follow Microsoft's convention by not capitalising acronyms eg Crm instead of CRM, Sql instead of SQL - but that's more a personal preference.