Drill "VALIDATION ERROR: A table or view with given name already exists in schema" for empty directory - apache-drill

After upgrading drill on our cluster to drill-1.12.0-mapr, testing our daily ETL scripts (which all use drill for converting parquet files to tsv), a validation error ("table or view with given name already exists") is always thrown when trying to run a CREATE TABLE statement on some empty directories in a writable workspace.
[Error Id: 6ea46737-8b6a-4887-a671-4bddbea02476 on mapr002.ucera.local:31010]
at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
:
:
:
Caused by: org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]
After some brief debugging, I see that the FS directory in question under the specified dfs.etl_interal workspace (ie. /internal_etl/project/version-2/stages/storage/ACCOUNT/tsv) is in fact empty, yet still throwing these errors.
Looking for the error ID in the drillbit.log file in the associated node in the error message above, we see
2018-12-04 10:13:25,285 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 23f92019-db56-862f-e7b9-cd51b3e174ae: create table dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv` as
select <a bunch of fields>
from dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/parquet`
2018-12-04 10:13:25,406 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,408 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,893 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,894 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,905 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.a.d.e.p.s.h.CreateTableHandler - User Error Occurred: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]
[Error Id: 45177abc-7e9f-4678-959f-f9e0e38bc564 ]
at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) ~[drill-common-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.checkTableCreationPossibility(CreateTableHandler.java:326) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler.java:90) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:567) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to complete before shutting down
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO o.apache.drill.exec.work.WorkManager - Waiting for 0 running fragments to complete before shutting down
This error occurs even when using DROP TABLE [IF EXISTS] <workspace>.<table path name> before the CREATE TABLE statement. Furthermore, the configurations for the dfs workspace itself does not appear to be changed from before upgrading to drill-1.12, see below:
:
:
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"etl_internal": {
"location": "/etl/internal",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
:
:
Note that the full process in question is intended to mv the directory contents every day and CREATE TABLE with new data from current day (in case that makes a difference) and this process had been working fine when we were using drill-1.11.
More debugging information:
Simply deleting the .../tsv endpoint folder and relying on drill to make the directory during the CREATE TABLE statement does not work. Throws the unsurprising error
Error: VALIDATION ERROR: Table [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] not found
[Error Id: 02e7c088-9162-4731-9fa8-85dfd39e1dec on mapr001.ucera.local:31010] (state=,code=0)
Ie. drill does not appear to be automatically creating the table.
Undoing these changes and rerunning to get the original error, we can examine the location via the sqlline interpreter interface. Doing so, we see
0: jdbc:drill:zk=mapr001:5181,mapr002:5181,ma> describe dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv`;
+--------------+------------+--------------+
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+--------------+------------+--------------+
+--------------+------------+--------------+
No rows selected (1.791 seconds)
So it sees something there, but only when I make it myself, which is like a catch-22 given that the original error is complaining that something is already there.
If anyone with more experience using drill knows what could be happening here, any opinions or advice would be appreciated.

Looks like you have made some mistake in the process of updating Drill version on your MapR cluster. Please see this doc for more info: http://doc.mapr.com/display/MapR/Upgrading+to+the+Latest+Version+of+Drill
or the last docs in case you are using the latest MapR Core version:
https://mapr.com/docs/home/UpgradeGuide/PreupgradeStepsDrill.html?hl=drill%2Cupgrade
https://mapr.com/docs/home/UpgradeGuide/PostUpgradeStepsDrill.html?hl=drill%2Cupgrade
DROP TABLE for Drill schemaless tables works fine. See more info about Drill schemaless tables (empty directories):
https://drill.apache.org/docs/data-sources-and-file-formats-introduction/#schemaless-tables

TLDR: restarted the drillbits on the nodes and everything appears to be working now.
What was done to get drill to run the CTAS statement without error was:
Restart the drill services from the MapR MCS. This was done purely based
on a hunch due to the hanging-drill-1.11-processes issue encountered
earlier where after upgrading from drill-1.11 to drill-1.12, had problem where needed to manually go to each node, jps to see that drillbit 1.11 was still running, and kill -9 <pid of 1.11 drillbit>, and restart the drillbits to get 1.12 working. Not sure how much this helped,
but documenting as it was the only change made in the process of
debugging that was not undone before running
the the changes that ultimately appear to have now resolved the
error.
Changed the drill-using scripts to delete the target folder (hadoop fs -rm -r /hdfs/path/to/folder) of the CTAS statement after running some necessary processes on it and then letting the CTAS statement re-create it itself (even though as previously mentioned in original post, tried this earlier and received "Table not found" errors in a weird catch-22 situation (thus my thinking that restarting the drill services may have contributed)).
I know that just restarting the services may not be the best most informative answer, but that's what appeared to work here. If anyone as any more information or thoughts to add based on the solution description above do please leave a comment.

Related

MySQL crash while generating entity data model in Visual Studio 2019

When I try to generate the entity models from the existing database the mysqld service crashes.
This does not occur with MySQL 8.0.20, only with 8.0.21. I was hoping to use the new json features added to the update but this problem is driving me nuts.
MySQL Installed
The entity wizard connects ok with the server and show the tables I want to import, when the importing process begin throws an exception "connection lost" and I see the mysqld.exe stopped.
The MySQL log show nothing useful except for part of the query generated by the wizard:
15:45:58 UTC - mysqld got exception 0xc0000005 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x1c9945fcfc0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
7ff7c812f74b mysqld.exe!?get_full_info#Item_aggregate_type##IEAAXPEAVItem###Z()
7ff7c81228d6 mysqld.exe!??0Item_aggregate_type##QEAA#PEAVTHD##PEAVItem###Z()
7ff7c83ab288 mysqld.exe!?prepare#SELECT_LEX_UNIT##QEAA_NPEAVTHD##PEAVQuery_result##_K2#Z()
7ff7c841dd0e mysqld.exe!?resolve_derived#TABLE_LIST##QEAA_NPEAVTHD##_N#Z()
7ff7c83d40d6 mysqld.exe!?resolve_placeholder_tables#SELECT_LEX##QEAA_NPEAVTHD##_N#Z()
7ff7c83d25aa mysqld.exe!?prepare#SELECT_LEX##QEAA_NPEAVTHD###Z()
7ff7c83ab191 mysqld.exe!?prepare#SELECT_LEX_UNIT##QEAA_NPEAVTHD##PEAVQuery_result##_K2#Z()
7ff7c841dd0e mysqld.exe!?resolve_derived#TABLE_LIST##QEAA_NPEAVTHD##_N#Z()
7ff7c83d40d6 mysqld.exe!?resolve_placeholder_tables#SELECT_LEX##QEAA_NPEAVTHD##_N#Z()
7ff7c83d25aa mysqld.exe!?prepare#SELECT_LEX##QEAA_NPEAVTHD###Z()
7ff7c83ab191 mysqld.exe!?prepare#SELECT_LEX_UNIT##QEAA_NPEAVTHD##PEAVQuery_result##_K2#Z()
7ff7c841dd0e mysqld.exe!?resolve_derived#TABLE_LIST##QEAA_NPEAVTHD##_N#Z()
7ff7c83d40d6 mysqld.exe!?resolve_placeholder_tables#SELECT_LEX##QEAA_NPEAVTHD##_N#Z()
7ff7c83d25aa mysqld.exe!?prepare#SELECT_LEX##QEAA_NPEAVTHD###Z()
7ff7c832980c mysqld.exe!?prepare_inner#Sql_cmd_select##MEAA_NPEAVTHD###Z()
7ff7c832942c mysqld.exe!?prepare#Sql_cmd_dml##UEAA_NPEAVTHD###Z()
7ff7c8325ef5 mysqld.exe!?execute#Sql_cmd_dml##UEAA_NPEAVTHD###Z()
7ff7c822d36d mysqld.exe!?mysql_execute_command##YAHPEAVTHD##_N#Z()
7ff7c822dfc9 mysqld.exe!?mysql_parse##YAXPEAVTHD##PEAVParser_state###Z()
7ff7c8226eb2 mysqld.exe!?dispatch_command##YA_NPEAVTHD##PEBTCOM_DATA##W4enum_server_command###Z()
7ff7c8227e6e mysqld.exe!?do_command##YA_NPEAVTHD###Z()
7ff7c80726c8 mysqld.exe!?modify_thread_cache_size#Per_thread_connection_handler##SAXK#Z()
7ff7c93322a1 mysqld.exe!?set_compression_level#Zstd_comp#compression#transaction#binary_log##UEAAXI#Z()
7ff7c8f3739c mysqld.exe!?my_thread_join##YAHPEAUmy_thread_handle##PEAPEAX#Z()
7fff75851542 ucrtbase.dll!_configthreadlocale()
7fff77ab6fd4 KERNEL32.DLL!BaseThreadInitThunk()
7fff77bfcec1 ntdll.dll!RtlUserThreadStart()
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (1c999a99d08): SELECT
`Project7`.`C12` AS `C1`,
`Project7`.`C1` AS `C2`,
`Project7`.`C2` AS `C3`,
`Project7`.`C3` AS `C4`,
`Project7`.`C4` AS `C5`,
`Project7`.`C5` AS `C6`,
`Project7`.`C6` AS `C7`,
`Project7`.`C7` AS `C8`,
`Project7`.`C8` AS `C9`,
`Project7`.`C9` AS `C10`,
`Project7`.`C10` AS `C11`
FROM (SELECT
`UnionAll3`.`SchemaName` AS `C1`,
`UnionAll3`.`Name` AS `C2`,
`UnionAll3`.`ReturnTypeName` AS `C3`,
`UnionAll3`.`IsAggregate` AS `C4`,
`UnionAll3`.`C1` AS `C5`,
`UnionAll3`.`IsBuiltIn` AS `C6`,
`UnionAll3`.`IsNiladic` AS `C7`,
`UnionAll3`.`C2` AS `C8`,
`UnionAll3`.`C3` AS `C9`,
`UnionAll3`.`C4` AS `C10`,
`UnionAll3`.`C5` AS `C11`,
1 AS `C12`
FROM ((SELECT
`Extent1`.`SchemaName`,
`Extent1`.`Name`,
`Extent1`.`ReturnTypeName`,
`Extent1`.`IsAggregate`,
1 AS `C1`,
`Extent1`.`IsBuiltIn`,
`Extent1`.`IsNiladic`,
`UnionAll1`.`Name` AS `C2`,
`UnionAll1`.`TypeName` AS `C3`,
`UnionAll1`.`Mode` AS `C4`,
`UnionAll1`.`Ordinal` AS `C5`
FROM (
SELECT /* Funct
Connection ID (thread ID): 18
Status: NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Is this a bug or some kind of configuration error on my part?
This proved to be a bug in the new query optimizations, but I was able to find a workaround by disabling some of the optimizations on the server.
Here is the command I used:
Set global optimizer_switch='derived_merge=off,subquery_to_derived=off,prefer_ordering_index=off,semijoin=off';
Hope it helps until they fix the bug.

Error while reading data, error message: CSV table references column position 15, but line starting at position:0 contains only 1 columns

I am new in bigquery, Here I am trying to load the Data in GCP BigQuery table which I have created manually, I have one bash file which contains bq load command -
bq load --source_format=CSV --field_delimiter=$(printf '\u0001') dataset_name.table_name gs://bucket-name/sample_file.csv
My CSV file contains multiple ROWS with 16 column - sample Row is
100563^3b9888^Buckname^https://www.settttt.ff/setlllll/buckkkkk-73d58581.html^Buckcherry^null^null^2019-12-14^23d74444^Reverb^Reading^Pennsylvania^United States^US^40.3356483^-75.9268747
Table schema -
When I am executing bash script file from cloud shell, I am getting following Error -
Waiting on bqjob_r10e3855fc60c6e88_0000016f42380943_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'project-name-
staging:bqjob_r10e3855fc60c6e88_0000ug00004521': Error while reading data, error message: CSV
table
encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection
for more details.
Failure details:
- gs://bucket-name/sample_file.csv: Error while
reading data, error message: CSV table references column position
15, but line starting at position:0 contains only 1 columns.
What would be the solution, Thanks in advance
You are trying to insert wrong values to your table per the schema you provided
Based on table schema and your data example I run this command:
./bq load --source_format=CSV --field_delimiter=$(printf '^') mydataset.testLoad /Users/tamirklein/data2.csv
1st error
Failure details:
- Error while reading data, error message: Could not parse '39b888'
as int for field Field2 (position 1) starting at location 0
At this point, I manually removed the b from 39b888 and now I get this
2nd error
Failure details:
- Error while reading data, error message: Could not parse
'14/12/2019' as date for field Field8 (position 7) starting at
location 0
At this point, I changed 14/12/2019 to 2019-12-14 which is BQ date format and now everything is ok
Upload complete.
Waiting on bqjob_r9cb3e4ef5ad596e_0000016f42abd4f6_1 ... (0s) Current status: DONE
You will need to clean your data before upload or use a data sample with more lines with --max_bad_records flag (Some of the lines will be ok and some not based on your data quality)
Note: unfortunately there is no way to control date format during the upload see this answer as a reference
We had the same problem while importing data from local to BigQuery. After researching the data we saw that there data which starting \r or \s enter image description here
After implementing ua['ColumnName'].str.strip() and ua['District'].str.rstrip(). we could add data to Bg.
Thanks

MySQLNonTransientConnectionException in PDI

I have problem with MySQL in PDI (Kettle). This error appears in process of reading information by Input Table. Even if all data is gived out of base successfully, this error appears and, probably, doesn't affect on transformation.
Error comitting connection
Communications link failure during commit(). Transaction resolution unknown.
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Communications link failure during commit(). Transaction resolution unknown.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)...
Why this problem happens?
This is a MySQL error documented in a manual page with a nice title: My sql server has gone away.
Matt Casters (the main author of Kettle) gives a bunch of solutions on the Pentaho wiki which is not yet uploaded on Hitachi Vantara forum.
Matt's first solution is to increase the net_write_timeout. The default is 60 and he did increase it to 1800, mentioning that less may be sufficient.
In order to do this, edit the connection and select Options on the left panel.
Then write net_write_timeout in he Parameters column and 1800 as value.

Progress SQL error in ssis package: buffer too small for generated record

I have an ssis package which uses SQL command to get data from Progress database. Every time I execute the query, it throws this specific error:
ERROR [HY000] [DataDirect][ODBC Progress OpenEdge Wire Protocol driver][OPENEDGE]Internal error -1 (buffer too small for generated record) in SQL from subsystem RECORD SERVICES function recPutLONG called from sts_srtt_t:::add_row on (ttbl# 4, len/maxlen/reqlen = 33/32/33) for . Save log for Progress technical support.
I am running the following query:
Select max(ROWID) as maxRowID from TableA
GROUP BY ColumnA,ColumnB,ColumnC,ColumnD
I've had the same error.
After change startup-parameter -SQLTempStorePageSize and -SQLTempStoreBuff to 24 and 3000 respectively the problem was solved.
I think, for you the values must be changed to 40 and 20000.
You can find more information here. The name of the parameter in that article was a bit different than in my Database, it depends on the Progress-version witch is used.

Hive on Cosmos global instance return error code 12 in Java client

I recently developed a Java client which allow me to query my Hive tables from a simple url .
Unfortunately , since last Thursday the queries seems to have some issues . From time to time , my query which worked before , doesn't return me anything .
So I decided to take a look at my logs, and everytime I do a query this occur :
java.sql.SQLException: Query returned non-zero code: 12, cause: FAILED: Hive Internal Error: java.lang.RuntimeExcep
tion(org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create
directory /tmp/hive-root/hive_2015-06-29_09-19-53_268_7855618362212093455. Name node is in safe mode.
Resources are low on NN. Safe mode must be turned off manually.
I think the issue come from the node itself , because I didn't do any changes on my code or in my hive tables . What do you think the problem come from ? And what can I do to resolve it ?
Thank you for your reading my question .
This was due the cluster automatically entered in safe mode. We fixed it by freeing/adding some disk space.