Pentaho Data Integration - Connection time out - mysql

I am developing a PDI transformation, which takes data from a MySql database, and output the data into an MSSQL table. But before output, I add a deletion step to delete records in dest. table with same key field values. But I do not know why that by this setting the transformation always fails casting exception of connection timeout of data source.
But, after I added a "Block" step between "table input" and "Delete", the issue got gone, and the transformation got successfully finished.
My configuration and exception message are as blow:
Transformation setting and system exception message
Data Input SQL, and Delete condition

Error what I see from the screen-shot you attached and also recommendation in the 4th error line from top "consider raising value of 'net_write_timeout' on the server"
Default value will be 60, Kindly increase the value for the same.
Follow below document for more reference.
https://wiki.pentaho.com/display/EAI/MySQL

Related

IllegalStateException while trying create NativeQuery with EntityManager

I have been getting this annoying exception while trying to create a native query with my entity manager. The full error message is:
java.lang.IllegalStateException: During synchronization a new object was found through a relationship that was not marked cascade PERSIST: com.model.OneToManyEntity2#61f3b3b.
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.discoverUnregisteredNewObjects(RepeatableWriteUnitOfWork.java:313)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.calculateChanges(UnitOfWorkImpl.java:723)
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.writeChanges(RepeatableWriteUnitOfWork.java:441)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.flush(EntityManagerImpl.java:874)
at org.eclipse.persistence.internal.jpa.QueryImpl.performPreQueryFlush(QueryImpl.java:967)
at org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:207)
at org.eclipse.persistence.internal.jpa.QueryImpl.getSingleResult(QueryImpl.java:521)
at org.eclipse.persistence.internal.jpa.EJBQueryImpl.getSingleResult(EJBQueryImpl.java:400)
The actual code that triggers the error is:
Query query;
query = entityManager.createNativeQuery(
"SELECT MAX(CAST(SUBSTRING_INDEX(RecordID,'-',-1) as Decimal)) FROM `QueriedEntityTable`");
String recordID = (query.getSingleResult() == null ?
null :
query.getSingleResult()
.toString());
This is being executed with an EntityTransaction in the doTransaction part. The part that is getting me with this though is that this is the first code to be executed within the doTransaction method, simplified below to:
updateOneToManyEntity1();
updateOneToManyEntity2();
entityManager.merge(parentEntity);
The entity it has a problem with "OneToManyEntity1" isn't even the table I'm trying to create the query on. I'm not doing any persist or merge up until this point either, so I'm also not sure what is supposedly causing it to be out of sync. The only database work that's being done up until this code is executed is just pulling in data, not changing anything. The foreign keys are properly set up in the database.
I'm able to get rid of this error by doing as it says and marking these relationships as Cascade.PERSIST, but then I get a MySQLContrainstraViolationException on the query.getSingleResult() line. My logs show that its doing some INSERT queries right before this, so it looks like its reaching the EntityManager.merge part of my doTransaction method, but the error and call stack point to a completely different part of the code.
Using EclipseLink (2.6.1), Glassfish 4, and MySQL. The entitymanager is using RESOURCE_LOCAL with all the necessary classes listed under the persistence-unit tag and exclude-unlisted-classes is set to false.
Edit: So some more info as I'm trying to work through this. If I put a breakpoint at the beginning of the transaction and then execute entityManager.clear() through IntelliJ's "Evaluate Expression" tool, everything works fine at least the first time through. Without it, I get an error as it tries to insert empty objects into the table.
Edit #2: I converted the nativeQuery part into using the Criteria API and this let me actually make it through my code so I could find where it was unintentionally adding in a null object to my entity list. I'm still just confused as to why the entity manager is caching these errors or something to the point that creating a native query is breaking because its still trying to insert bad data. Is this something I'd need to call EntityManager.clear() before doing each time? Or am I supposed to call this when there is an error in the doTransaction method?
So after reworking the code and setting this aside, I stumbled on at least part of the answer to my question. My issue was caused by the object being persisted prior to the transaction starting. So when I was entering my transaction, it first tried to insert/update data from my entity objects and threw an error since I hadn't set the values of most of the non-null columns. I believe this is the reason I was getting the cascade errors and I'm positive this is the source of the random insert queries I saw being fired off at the beginning of my transaction. Hope this helps someone else avoid a lot of trouble.

SSIS truncation error

First, I have searched and searched and searched and not found anything that helps me with this.
I have an SSIS project that will fetch a lot of data from an iSeries AS400 and it does this in two very different steps.
Step 1 works perfectly so I manage to fetch tons of info from the AS400, so the connection itself is not the issue.
Step two fails horribly with the following three error codes:
[OLE DB Source [41]] Error: There was an error with OLE DB
Source.Outputs[OLE DB Source Output].Columns[NAME] on OLE DB
Source.Outputs[OLE DB Source Output]. The column status returned was: "Text
was truncated or one or more characters had no match in the target code
page.".
[OLE DB Source [41]] Error: The "OLE DB Source.Outputs[OLE DB Source
Output].Columns[NAME]" failed because truncation occurred, and the
truncation row disposition on "OLE DB Source.Outputs[OLE DB Source
Output].Columns[NAME]" specifies failure on truncation. A truncation error
occurred on the specified object of the specified component.
[SSIS.Pipeline] Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The
PrimeOutput method on OLE DB Source returned error code 0xC020902A. The
component returned a failure code when the pipeline engine called
PrimeOutput(). The meaning of the failure code is defined by the component,
but the error is fatal and the pipeline stopped executing. There may be
error messages posted before this with more information about the failure.
I have desperately tried to find the solution to this problem and this is what I have done (which have not helped at all):
1 - Advanced Editor on SOURCE -> tab: Input and Output Properties -> OLE DB Source Output -> Output Column changed to
a) 40 (from 28) in length - no change
b) data text (from string) - complete crash
c) changed codepage from 1251 to UTF-8 - no change
2 - Fetched the information with OPENQUERY in MSSMS, it works perfectly.
3 - Screamed in frustration at the screen (didn't help).
I am at roads end. I don't know what to do anymore. Help...?
Yes, this is completely maddening.
There are two sets of columns under OLE DB Source Output: "External Columns" and "Output Columns".
Have you tried changing the lengths of both columns - column "Name" under External Columns" and under "Output Columns"?
This kind of error often happens from a mismatch between the External Column definition and its corresponding Output Column.
In an OLE DB Source, External Columns are supposed to be auto-typed according to the source data types: the external provider is supposed to talk metadata to SSIS, saying "well, this column is typed String(40)", for example. But either the provider or SSIS are often, let's say, "less than entirely competent" at getting the types and lengths right.
UPDATE: Have you tried checking the length of the data in the source, independently of SSIS? Something like:
SELECT MAX(Len(TheReallyAnnoyingColumn)) FROM TheTable
You may find setting the Error Output for Truncation on the Source editor dialog to "Ignore Failure" gets you around the issue.
Update - Truncation Redirect:-
Forced truncation on surname - output set to redirect
and enabled Data Viewer on the error output
then copied the row from the data viewer to notepad to show the error
Running the same dtsx wif truncation set to fail :-
Everybody else is focused on the truncation. I'm curious about the one or more characters had no match in the target code page part of the error message.
How is the column actually defined on the IBM i? I'm particularly interested in the Coded Character Set Identifier (aka CCSID)
In a green screen you can use the Display File Field Description (DSPFFD) command.
You could also use the iNav GUI.

SSIS - Use Derived Column to Cast String to Float

I'm having a problem getting data from a .CSV into a column of datatype FLOAT. I've tried to link it directly and also use the Data Conversion Task, but (in both cases) it kept telling me that it couldn't convert:
Error: 0xC02020C5 at DC_Weekly_Cost_Target csv to FatzWklyCst_Target, Data Conversion [156]: Data conversion failed while converting column "Target" (22) to column "Copy of Target" (163). The conversion returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
My research led me to using the Derived Column Transformation Editor. I found a few websites that walked me through how properly use the "Expression" portion:
Above is how I'm attempting to transform the strings (Target and Waste) into datatype Float. I'm not receiving an error message when using the Editor (i.e. It will let me clik OK without an error), however, I am receiving an error when I attempt to run the package:
Error: 0xC0049064 at DC_Weekly_Cost_Target csv to FatzWklyCst_Target, Map Target in correct datatype 1 1 [222]: An error occurred while attempting to perform a type cast.
Error: 0xC0209029 at DC_Weekly_Cost_Target csv to FatzWklyCst_Target, Map Target in correct datatype 1 1 [222]: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "component "Map Target in correct datatype 1 1" (222)" failed because error code 0xC0049064 occurred, and the error row disposition on "output column "Target_Float" (227)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
Error: 0xC0047022 at DC_Weekly_Cost_Target csv to FatzWklyCst_Target, SSIS.Pipeline: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Map Target in correct datatype 1 1" (222) failed with error code 0xC0209029 while processing input "Derived Column Input" (223). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure.
This is my first time using the Derived Column Transformation Editor. Does anyone see what I'm doing incorrectly? Or, do you have any suggestions as to what may be the best approach to getting data from a .csv file into a column of datatype float? I appreciate any help that anyone can give me.
You have tried a reasonable approach but something in the data is blowing it up - possibly "invalid" characters e.g. $ or ,
I would replace the Derived Column transformation with a Script Task. There you can leverage the .NET Framework e.g. Try ... Catch, TryParse, Regex. You can debug your code line-by-line to inspect the rows with errors. You can also use Reflection to factor your conversion code as a function that you call for each column passed into the Script Task.
PS: your destination is irrelevant.

SSIS Package Fails on Status Code 4

I've created an SSIS package that executes inline SQL queries from our database and is supposed to output the contents to a text file. I originally had the text file comma delimited, but changed to pipe delimted after researching the error further. I also did a substring of the FirstName field and ensure that the SSIS placeholder fields matched in length. The error message is as follows:
[Customers Flat File [196]] Error: Data conversion failed. The data conversion for
column "FirstName" returned status value 4 and status text "Text was truncated or one or more
characters had no match in the target code page.".
The SQL statement I'm using in my OLE DB Source is as follows:
SELECT
dbo.Customer.Email, SUBSTRING(dbo.Customer.FirstName, 1, 100) AS FirstName,
dbo.Customer.LastName, dbo.Customer.Gender,
dbo.Customer.DateOfBirth, dbo.Address.Zip, dbo.Customer.CustomerID, dbo.Customer.IsRegistered
FROM
dbo.Customer INNER JOIN
dbo.Address ON dbo.Customer.CustomerID = dbo.Address.CustomerID
What other fixes should I put in place to ensure the package runs without error?
Have you tried to run this query in SSMS? If so, did you get a successful result?
If you havent tried it yet, paste this query in a new SSMS window and wait for it to complete.
If the Query completes, then we dont have a problem with the query. Something could be off inside the package.
But if the query does not finish up and fails, you know where to look.
EDIT
On second thoughts, is your Customer source a flat file or something? It looks like there is a value in the Customer table/file which does not match with the output metadata of the source. Check your source again.

SSIS (2008R2) import from mssql to mysql failing due to a date column

I have an oledb connection to mssql and an ado.net destination (with odbc driver used) to mysql. The tables are exectly the same and all the columns are working bar one.
The error message received is:
[ADO NET Destination [325]] Error: An exception has occurred during data insertion, the message returned from the provider is: Unable to cast object of type 'System.DateTime' to type 'System.Char[]'.
I've seen similar questions on other data types but the resolution of changing to string does not work here. If I convert to string (has to be length 29 otherwise the conversion step fails) I get the following error message:
[ADO NET Destination [325]] Error: An exception has occurred during data insertion, the message returned from the provider is: ERROR [HY000] [MySQL][ODBC 5.1 Driver][mysqld-5.5.15]Incorrect datetime value: '2011-03-21 11:23:48.573000000' for column 'LastModificationDate' at row 1
Other potentially relevant details:
connection driver- {MySQL ODBC 5.1 Driver}
script run before dataflow - set sql_mode='STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,ANSI_QUOTES'
Other datetime columns are working
This column has a reasonably high proportion of nulls
mssql spec: [LastModificationDate] [datetime] NULL
mysql spec: LastModificationDate datetime NULL
Has anyone had experience with this issue and could provide some advice on resolving it?
Can you try converting it to string on sql server side in your query using:
convert(char(10),LastModificationDate,111)+' '+convert(char(8),LastModificationDate,108)
This works for me all the time.
I got the same big headache this week. I tried many ways. Thanks God, finnally, one of them worked. Hope it could help you a little bit.
For some columns with the data type of Int, datetime, decimal....,here, I identified as ColumnA, and I used it as datetime type.
1.in Data Flow Source, use SQL Command to retrieve data. Sth like select isnull(ColumnA,'1800-01-01') as ColumnA, C1, C2, ... Cn from Table
Make sure to use Isnull function for all columns with the datatype mentioned before.
2.Excute the SSIS pkg. It should work.
3.Go back to Control Flow, under the data flow task, add SQL Task control to replace the data back. I mean, update the ColumnA from '1800-01-01' to null again.
That works for me. In my situation, I cannot use ignore failure option. Because if I do, I will lose thousands rows of data.