Unable to update the auto generated schema in Google Auto ML data set - google-contacts-api

I am setting up a prediction model using AutoML Tables. Here are the steps that I have completed.
Prepare a csv file with relevant data(4 fields).
Preprocess the data file.(Remove records with null values and repetitive records).
Create a data set by importing the processed file in the import
step . In the train step, I see the auto
generated schema.
I am unable to edit that schema(specify the target column or modify the data type of a column).
Pls advise.
I encounter below errors
"Schema update failed - The attempted action failed, please try
again"
"Failed to update schema - Schema update failed with errors"

Related

bigquery error: "Could not parse '41.66666667' as INT64"

I am attempting to create a table using a .tsv file in BigQuery, but keep getting the following error:
"Failed to create table: Error while reading data, error message: Could not parse '41.66666667' as INT64 for field Team_Percentage (position 8) starting at location 14419658 with message 'Unable to parse'"
I am not sure what to do as I am completely new to this.
Here is a file with the first 100 lines of the full data:
https://wetransfer.com/downloads/25c18d56eb863bafcfdb5956a46449c920220502031838/f5ed2f
Here are the steps I am currently taking to to create the table:
https://i.gyazo.com/07815cec446b5c0869d7c9323a7fdee4.mp4
Appreciate any help I can get!
As confirmed with OP (#dan), the error encountered is caused by selecting Auto detect when creating a table using a .tsv file as the source.
The fix for this is to manually create a schema and define the data type for each column properly. For more reference on using schema in BQ see this document.

Pentaho Data Integration - Connection time out

I am developing a PDI transformation, which takes data from a MySql database, and output the data into an MSSQL table. But before output, I add a deletion step to delete records in dest. table with same key field values. But I do not know why that by this setting the transformation always fails casting exception of connection timeout of data source.
But, after I added a "Block" step between "table input" and "Delete", the issue got gone, and the transformation got successfully finished.
My configuration and exception message are as blow:
Transformation setting and system exception message
Data Input SQL, and Delete condition
Error what I see from the screen-shot you attached and also recommendation in the 4th error line from top "consider raising value of 'net_write_timeout' on the server"
Default value will be 60, Kindly increase the value for the same.
Follow below document for more reference.
https://wiki.pentaho.com/display/EAI/MySQL

Error while reading data, error message: CSV table references column position 15, but line starting at position:0 contains only 1 columns

I am new in bigquery, Here I am trying to load the Data in GCP BigQuery table which I have created manually, I have one bash file which contains bq load command -
bq load --source_format=CSV --field_delimiter=$(printf '\u0001') dataset_name.table_name gs://bucket-name/sample_file.csv
My CSV file contains multiple ROWS with 16 column - sample Row is
100563^3b9888^Buckname^https://www.settttt.ff/setlllll/buckkkkk-73d58581.html^Buckcherry^null^null^2019-12-14^23d74444^Reverb^Reading^Pennsylvania^United States^US^40.3356483^-75.9268747
Table schema -
When I am executing bash script file from cloud shell, I am getting following Error -
Waiting on bqjob_r10e3855fc60c6e88_0000016f42380943_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'project-name-
staging:bqjob_r10e3855fc60c6e88_0000ug00004521': Error while reading data, error message: CSV
table
encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection
for more details.
Failure details:
- gs://bucket-name/sample_file.csv: Error while
reading data, error message: CSV table references column position
15, but line starting at position:0 contains only 1 columns.
What would be the solution, Thanks in advance
You are trying to insert wrong values to your table per the schema you provided
Based on table schema and your data example I run this command:
./bq load --source_format=CSV --field_delimiter=$(printf '^') mydataset.testLoad /Users/tamirklein/data2.csv
1st error
Failure details:
- Error while reading data, error message: Could not parse '39b888'
as int for field Field2 (position 1) starting at location 0
At this point, I manually removed the b from 39b888 and now I get this
2nd error
Failure details:
- Error while reading data, error message: Could not parse
'14/12/2019' as date for field Field8 (position 7) starting at
location 0
At this point, I changed 14/12/2019 to 2019-12-14 which is BQ date format and now everything is ok
Upload complete.
Waiting on bqjob_r9cb3e4ef5ad596e_0000016f42abd4f6_1 ... (0s) Current status: DONE
You will need to clean your data before upload or use a data sample with more lines with --max_bad_records flag (Some of the lines will be ok and some not based on your data quality)
Note: unfortunately there is no way to control date format during the upload see this answer as a reference
We had the same problem while importing data from local to BigQuery. After researching the data we saw that there data which starting \r or \s enter image description here
After implementing ua['ColumnName'].str.strip() and ua['District'].str.rstrip(). we could add data to Bg.
Thanks

Unable to connect with multiple mysql schema in spring batch admin

I am new to spring batch and batch admin. I stuck in a scenario where i want to use multiple datasource. i.e. One for batch meta-data and business schema(application tables).
I am using below code in my batch-mysql.properties file.
For batch matadata tables
batch.jdbc.driver=com.mysql.jdbc.Driver
batch.jdbc.url=jdbc:mysql://localhost:3306/batch
batch.jdbc.user=root
batch.jdbc.password=root
batch.jdbc.testWhileIdle=true
batch.jdbc.validationQuery=SELECT 1
batch.drop.script=classpath:/org/springframework/batch/core/schema-drop-mysql.sql
batch.schema.script=classpath:/org/springframework/batch/core/schema-mysql.sql
batch.business.schema.script=classpath*:business-schema-mysql.sql
For application schema
db.driver=com.mysql.jdbc.Driver
db.url=jdbc:mysql://localhost:3306/applicationschema
db.user=root
db.password=root
if i remove below line of code
batch.business.schema.script=classpath*:business-schema-mysql.sql
then i am getting an exception that above property couldn't found.
if keep as it is then it is creating application table in batch matadata schema.
Just don't supply any value to the property batch.business.schema.script. Try to leave it like shown below :
batch.business.schema.script=
Also, if you don't want to drop and create batch meta tables every time your application is started, you should set batch.data.source.init=false in your batch properties file.
EDIT : This is how my batch-oracle.properties file looks like :
batch.database.incrementer.class=org.springframework.jdbc.support.incrementer.OracleSequenceMaxValueIncrementer
batch.isolationlevel=READ_COMMITTED
batch.business.schema.script=
batch.data.source.init=false
batch.job.service.reaper.interval=6000
batch.schema.script=classpath:/org/springframework/batch/core/schema-drop-oracle10g.sql
batch.jdbc.url=jdbc:oracle:thin:#localhost:1521:mydb
batch.table.prefix=BATCH_
batch.lob.handler.class=org.springframework.jdbc.support.lob.DefaultLobHandler
batch.verify.cursor.position=true
batch.jdbc.validationQuery=SELECT 1 FROM dual
batch.jdbc.password=mypassword
batch.jdbc.testWhileIdle=false
batch.jdbc.user=user
batch.jdbc.pool.size=5
batch.drop.script=classpath:/org/springframework/batch/core/schema-drop-oracle10g.sql

SSIS Import a date and time from a txt to a table datetime

So I want to import a datetime from a txt:
2015-01-22 09:19:59
into a table using a data flow. I have my Flat Source File and my destination DB set up fine. I changed the data type for the txt input for that column in the advanced settings and the input and output properties to:
database timestamp [DT_DBTIMESTAMP]
This is the same data type as the DB used for the table so this should work.
However, when I execute the package I get a error saying the data conversion failed... How do I make this possible?
[Import txt data [1743]] Error: Data conversion failed. The data conversion for column "statdate" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
[Import txt data [1743]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "statdate" (2098)" failed because error code 0xC0209084 occurred, and the error row disposition on "output column "statdate" (2098)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
[Import txt data [1743]] Error: An error occurred while processing file "C:\Program Files\Microsoft SQL Server\MON_Datamart\Sourcefiles\tbl_L30T1.txt" on data row 14939.
On the row he is giving the error the datetime is filled up with spaces. that is why on the table the "allow nulls" is checked but my SSIS package gives the error for some reason... can I somewhere tell the package to allow nulls aswell?
I suggest you import the data in to a character field and then parse it after entry.
The following function should help you:
SELECT IsDate('2015-01-22 09:19:59')
, IsDate(Current_Timestamp)
, IsDate(' ')
, IsDate('')
The IsDate() function returns a 1 when it thinks the value is a date and a 0 when it is not.
This would allow you to do something like:
SELECT value_as_string
, CASE WHEN IsDate(value_as_string) = 1 THEN
Cast(date_as_string As datetime)
ELSE
NULL
END As value_as_datetime
FROM ...
I solved it Myself. Thank you for your suggestion gvee but the way I did it is way easier.
In the Flat File Source when making a new connection in the advanced tab I fixed all the data types according to the table in the database EXCEPT the column with the timestamp (in my case it was called "statdate")! I changed this data type to a STRING because otherwise my Flat File Source would give me a conversion error even before any scripts would have been able to be executed and the only way arround this was setting the error output to ignore failure wich I don't want. (You still have to change the data type after you set it to a string in the advanced settings by right clicking the flat file source -> show advanced editor -> going to the output colums and changing the data type there from Date to string.)
After the timestamp was set to a string I added a Derived Column with this expression to delete all the spaces and give it then "NULL" value:
TRIM(<YourColumnName>) == "" ? (DT_STR,4,1252)NULL(DT_STR,4,1252) : <YourColumnName>
Next I added a Data Conversion to set the string back to a timestamp. The Data conversion is finally connected to the OLE DB Destination.
I hope this helps anyone with the same problem in the future.
End result: Picture of data flow