I've been trying to train "Microsoft custom translator" on my dictionary several times, but all are failed. When I'm choosing only "Phrase dictionary" - model creation finishes successfully. But when I'm trying to create a training model - every time I've got the error "DATA PROCESSING FAILED"
My dictionary data is a set of 40.000 russian words and their english translation. Words are already mapped, so I'm using .align files.
What can be the reason of this error?
One of the common reasons for failed training with training document type is the number of parallel sentences would be less than the minimum required 10,000. For more information about the different document types and min/max, plz refer to
https://learn.microsoft.com/en-us/azure/cognitive-services/translator/custom-translator/sentence-alignment
Related
First time posting a question, please forgive if I don't have enough information.
I have a tiered BIML Script that has the following tiers:
10-Connection – create the connection nodes
20- Model – loop through the connections to build database, table nodes
30-Create/Drop Staging Tables – This is included as the package/s need to be run prior to the remainder of the creation process.
30- Flat File – loop through the table object to create flat file formats and connections
40-Packages – loop through the table objects and create extract and load packages
45-Project Params & Connections– attaches the project params and connections (using named connections and GUIDs from 10-Connections). Project params manually created in SSIS.
The process successfully connects to the source SQL Server database, generates the Create/Drop Staging Tables packages with correct metadata, and will create the extract packages successfully the first time.
Upon a second attempt to process the same BIML scripts with no changes made to the files, the process fails with “Object reference not set to an instance of an object.” & “Unable to Query on Connection” on the OleDBSource Node.
The BIML files generated in preview and output debugging have valid queries and source metadata that indicate a positive connection and proper model. I have used emitted queries in SSMS without error. When I move the BIML files to a new project, the process is successful the first time and fails subsequently.
I have tried the following:
Connection Managers
Delete project connection managers prior to package re-generation
GUIDs annotated and used in PackageProject and Packages Nodes.
Delay Validation/Validate External Metadata – I have tried with both true and false on Package, DFT and OleDBSource
Project
Delete .proj files from directory
Direct PackageProject to new ProjectSubpath
I also tried simply hard coding the BimlScript to simplify and remove any variables with the same result.
The most maddening point is that the metadata and queries all indicate the process can connect and query this exact table and it functions, but only on initial creation. Adding or re-generating during testing fails. Anyone ever come across this before?
Great thanks and shout out to cathrine-wilhelmsen, billinkc, whose posts and tutorials have been very helpful. Any and all help would be greatly appreciated.
I changed the driver from SQLNCLI11 to SQLOLEDB with no changes to code. I tested different drivers after seeing a few example connection strings with different drivers.
I wish I could explain why.
There seems to be a significant difference between the datatype values stored in the .dtproj and .dtsx files. I created a project deployment package in VS 2017 and added a package parameter for each type available. Is this as-designed? Am I missing something?
DTSX is the Microsoft SQL Server Integration Services package file format.
DTPROJ is the Microsoft SQL Server Data Tools project file format.
The "DataType values" you see correspond to enumerations specific (and apparently unique) to the individual file formats. It's not really designed to be such, it just ended up that way. These two formats were defined possibly by different teams, at different times, and toward different goals. While it would be nice if both file formats used the same enumeration they just they don't... and that's okay, because the responsible code will decode these values into typed variables compatible with one another... it's okay so long as only code modifies them.
It is nice that these formats are human readable being that they are in a variant of XML, it's unfortunate that the data within is practically incomprehensible. Using integers to represent types and decoding with arbitrary enumerations doesn't really lend these formats to be human modifiable without considerable risk of introducing bugs making the human readability of the format pointless.
So I have this fileMakerPro7 database. As my senior project, I supposed to migrate the database to a MySQL database and than give it a PHP Based interface in 3N form...
Company allow us $200 tops to spend on the project, but if I pay for something, it has to work. However, I am having trouble finding a way of migrating the database. Any suggestions?
I have found "file maker pro migrator" (http://www.fmpromigrator.com), would the trial version be enough for us? In worst case, we will start from the beginning with throwing away the whole database that company has.
I can also download fileMakerPro12 and use it for a month with trial version for free. Would I be able to convert the db by using FMP12?
I am totally lost...open to any free suggestions...
+this is a non-profit-making company I'm doing the project for
If I had to do it, I'd look at the design of the FileMaker db and create something similar in mysql. Then I would export the Filemaker data to text and import it somehow. The details depend on foreign key values and such.
The PHP interface would be done separately.
MySQL Data Conversion:
Yes, if your database is small enough, the demo version of FmPro Migrator will convert the database and also build you a PHP web application - at no cost.
Here are the limitations of the demo version:
5 fields
5 scripts
5 layouts
PHP Web Application:
Most people don't realize it, but there is a wealth of FileMaker metadata available in XML format for performing these types of conversions. This XML info is available either thru copying the layout via the clipboard or reading it from the Database Design Report XML file. I have found the clipboard data to be the most reliable source of this info.
FmPro Migrator is able to parse in the XML and convert it into the PHP web application.
Each object on a layout is represented in XML, along with style and position info. This info can be used to create form files representing the same look as the original layout. In fact, it can be difficult to see the difference between the web application and the original database if you get all of the object properties implemented. This can be helpful for situations in which companies don't want to have to retrain their employees. They want the web application to look and work the same as the original desktop application.
I have done a few of these conversions recently into the CakePHP framework. Here a few techniques I used:
Auto-Enter Calculation Fields - Stored calculation fields are calculated and stored within the model saves a record to the database.
Unstored Calculation Fields - Unstored Calculation fields are calculated in real-time within the form controller - but only for fields actually displayed on the form. This prevents unnecessarily calculating these values if they aren't being used on a form, improving performance.
Global Fields - A Global field in FileMaker is used like a global variable in programming languages. It is important to know that each FileMaker user gets there own private copy of global field data. There is no equivalent feature MySQL or other SQL database servers, but this functionality can easily be simulated using SESSION variables. Therefore each web user will still get their own private SESSION data, simulating the same functionality originally present in the FileMaker database. I structure these globals in the model data array as if they were retrieved from the model, meaning that converted scripts and fields on forms can reference them easily. Just before the record gets written into the database, the results are saved into SESSION variables for persistence.
Global Variables in Scripts - Global variables within FileMaker scripts match up very well with the use of PHP SESSION variables, if you want to implement the same functionality.
Vector Graphic Objects - FileMaker layouts frequently include rectangles, ovals and line objects. These objects can be replaced with the RafaelJS library, providing high quality resolution independent graphics.
Value Lists - Custom and Field based value lists are implemented in a centralized location within the AppController.php file. Therefore making a change to the definition of the value list within the AppController, succeeds in changing the menu automatically throughout the whole application.
I have an iPad app that uses Core Data with Sqlite. I keep getting errors when a save is called on the managedobjectcontext. The error is sql error (19) constraint fails. I found a few websites that lead me to modify my generation code and update the Z_MAX field in the table Z_PRIMARYKEY table. Are there any other things that Core Data does behind the scenes similar to this?
Note: Yes, I know I shouldn't be doing this but part of the problem is the core data database is over 5 MB and it takes a long time to process the data from a plist. (Maybe JSON would be faster?)
EDIT I just noticed Z_ENT which is the entity id. I have to add that into the generation as well.
EDIT 2 Go the Entities mapped but I'm still getting the error. It is having trouble doing deletes and updates even through it appears to be valid.
Appeared to just be an issue with the app itself.
I have a ETL type requirement for SQL Server 2005. I am new to SSIS but I believe that it will be the right tool for the job.
The project is related to a loyalty card reward system. Each month partners in the scheme send one or more XML files detailing the qualiifying transactions from the previous month. Each XML file can contain up to 10,000 records. The format of the XML is very simple, 4 "header" elements, then a repeating sequence containing the record elements. The key record elements are card_number, partner_id and points_awarded.
The process is currently running in production but it was developed as a c# app which runs an insert for each record individually. It is very slow, taking over 8 hours to process a 10,000 record file. Through using SSIS I am hoping to improve performance and maintainability.
What I need to do:
Collect the file
Validate against XSD
Business Rule Validation on the records. For each record I need to ensure that a valid partner_id and card_number have been supplied. To do this I need to execute a lookup against the partner and card tables. Any "bad" records should be stripped out and written to a response XML file. This is the same format as the request XML, with the addition of an error_code element. The "good" records need to be imported into a single table.
I have points 1 and 2 working ok. I have also created an XSLT to transform the XML into a flat format ready for insert. For point 3 I had started down the road of using a ForEach Loop Container control in the control flow surface, to loop each XML node, and the SQL Lookup task. However, this would require a call to the database for each lookup and a call to the file system to write out the XML files for the "bad" and "good" records.
I believe that better performance could be achieved by using the Lookup control on the data flow surface. Unfortunately, I have no experience of working with the data flow surface.
Does anyone have a suggestion as to the best way to solve the problem? I searched the web for examples of SSIS packages that do something similar to what I need but found none - are there any out there?
Thanks
Rob.
SSIS is frequently used to load data warehouses, so your requirement is nothing new. Take a look at this question/answer, to get you started with tutorials etc.
For-each in control flow is used to loop through files in directory, tables in a db etc. Data flow is where records fly through transformations from a source (your xml file) to a destination (tables).
You do need a lookup in one of its many flavors. Google for "ssis loading data warehouse dimensions"; this will eventually show you several techniques of efficiently using lookup transformation.
To flatten the XML (if simple enough), I would simply use XML source in data flow, XML task is for heavier stuff.