Insert into Select command causing exception ParseException line 1:12 missing TABLE at 'table_name' near '<EOF>' - mysql

I am 2 days old into hadoop and hive. So, my understanding is very basic. I have a question which might be silly. Question :I have a hive external table ABC and have created a sample test table similar to the table as ABC_TEST. My goal is to Copy certain contents of ABC to ABC_TEST depending on select clause. So I created ABC_TEST using the following command:
CREATE TABLE ABC_TEST LIKE ABC;
Problem with this is:
1) this ABC_TEST is not an external table.
2) using Desc command, the LOCATION content for ABC_TEST was something like
hdfs://somepath/somdbname.db/ABC_TEST
--> On command "hadoop fs -ls hdfs://somepath/somdbname.db/ABC_TEST " I found no files .
--> Whereas, "hadoop fs -ls hdfs://somepath/somdbname.db/ABC" returned me 2 files.
3) When trying to insert values to ABC_TEST from ABC, I have the above exception mentioned in the title. Following is the command I used to insert values to ABC_TEST:
INSERT INTO ABC_TEST select * from ABC where column_name='a_valid_value' limit 5;
Is it wrong to use the insert into select option in Hive? what am I missing? Please help

The correct syntax is "INSERT INTO TABLE [TABLE_NAME]"
INSERT INTO TABLE ABC_TEST select * from ABC where column_name='a_valid_value' limit 5;

I faced exactly the same issue and the reason is the Hive version.
In one of our clusters, we are using hive 0.14 and on a new set up we're using hive-2.3.4.
In hive 0.14 "TABLE" keyword is mandatory to be used in the INSERT command.
However in version hive 2.3.4, this is not mandatory.
So while in hive 2.3.4, the query you've mentioned above in your question will work perfectly fine but in older versions you'll face exception "FAILED: ParseException line 1:12 missing TABLE <>".
Hope this helps.

Related

Trying to configure an Insert into command for MySQL

I'm pretty new to SQL and need some help configuring a command. The details of my database structure can be found in this thread:
How to copy new data but skip old data from 2 tables in MySQL
The general problem is that I'm merging a new (temporary) database with an old one. I want to keep all the data in the old but copy over any new data from the new. If there is a duplicate, the old should be favored/kept.
My current command is:
INSERT INTO BAT_players
SELECT *
FROM bat2.bat_players
WHERE NOT EXISTS (SELECT 1 FROM BAT_players WHERE BAT_players(UUID) = bat2.bat_players(UUID));
When I run this, I get
Function bat2.bat_players undefined or Function bat.BAT_players undefined
I do not know how to proceed and would appreciate the help.
Columns are accessed using . not parens:
INSERT INTO BAT_players
SELECT *
FROM bat2.bat_players bp2
WHERE NOT EXISTS (SELECT 1
FROM BAT_players bp
WHERE bp.UUID = bp2.UUID
);
Note that the columns have to correspond by position, because you are not explicitly listing them. As a general rule, you want to list all all the columns in an insert:
INSERT INTO BAT_players ( . . . )
SELECT . . .
. . .
I am no familiar with the idea of MySQL,
I worked with SQL Server to be honest but if all the infrastructure are the same and I say IF, then there is a trick to these kinds of transactions between databases and that's simply the phrase dbo.
Like below:
using BAT
Insert into bat_players
SELECT * FROM bat2.dbo.bat_players
and also the rest of your conditions
or
instead of using the phrase using bat you can simply add the dbo to:
Insert into bat.dbo.bat_players
and again the rest of your condition,
just remember to use the dbo before each [table name].
HUGE UPDATE
if you want to access the fields (columns) you have to use . as #Gordon Linoff explained above. For example:
...
Where bat2.dbo.bat_players.UUID = --the condition--

Apache Drill Parser error while creating table from simple select of JSON file

I get parser error while creating table from SQL query of JSON file in apache Drill.
USE dfs.tmp;
CREATE Table myt AS
(SELECT KVGEN(repo)[1] reponame FROM dfs.`f:\DemoData\201901-000000000000.json`
WHERE STRPOS(payload,'ARM') >0)
error:
Org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR: Encountered ";" at line 1, column 12. Was expecting one of: <EOF> "." ... "[" ... SQL Query USE dfs.tmp; ^ CREATE Table myt AS (SELECT KVGEN(repo)[1] reponame FROM dfs.`f:\DemoData\201901-000000000000.json` WHERE STRPOS(payload,'ARM') >0)
What am i doing wrong ?
You are trying to submit to queries, but Drill doesn't support submitting several queries via a single form in Drill Web UI.
Please create Jira ticket to improve it: https://issues.apache.org/jira/browse/DRILL.
You can use Drill SqlLine (Drill shell). It hasn't this limitation.

Problem dropping Hive table from pyspark script

I have a table in hive created from many json files using hive-json-serde method, WITH SERDEPROPERTIES ('dots.in.keys' = 'true'), as some keys there have a dot in, like `aaa.bbb`. I create external table and use backticks for these keys. Now I have a problem dropping this table from pyspark script, using sqlContext.sql("DROP TABLE IF EXISTS "+table_name), I'm getting this error message:
An error occurred while calling o63.sql.
: org.apache.spark.SparkException: Cannot recognize hive type string: struct<associations:struct<aaa.bbb:array<string> ...
Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '.' expecting ':'(line 1, pos 33)
== SQL ==
struct<associations:struct<aaa.bbb:array<string>,...
---------------------------------^^^
In HUE i can drop this table without any problem. Am I doing it wrong, or may be there is better way to do it?
It looks like it is not possible to work with Hive tables created with the hive-json-serde method, with dot in keys , using sqlContext.sql("...") from pyspark script, as I want. There is always the same error, if I want to drop such Hive table, or create it (haven't tried other things yet). So my workaround is to use python os.system() and execute required query through hive itself:
q='hive -e "DROP TABLE IF EXISTS '+ table_name+';"'
os.system(q)
It's more complicated with CREATE TABLE query, as we need to escape backticks with '\':
statement = "CREATE TABLE test111 (testA struct<\`aa.bb\`:string>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://bucket/test111';"
q='hive -e "'+ statement+'"'
It outputs some additional hive related info, but works!

Inserting into MySQL tables through SparkSQL, by querying from the same table

I have a MySQL table that was created in MySQL like that:
create table nnll (a integer, b integer)
I've initialized pyspark (2.1) and executed the code:
sql('create table nnll using org.apache.spark.sql.jdbc options (url "jdbc:mysql://127.0.0.1:3306", dbtable "prod.nnll", user \'user\', password \'pass\')')
sql('insert into nnll select 1,2')
sql('insert into nnll select * from nnll')
From some reason, I get the exception:
AnalysisException: u'Cannot insert overwrite into table that is also being read from.;;\nInsertIntoTable Relation[a#2,b#3] JDBCRelation(prod.nnll) [numPartitions=1], OverwriteOptions(false,Map()), false\n+- Project [a#2, b#3]\n +- SubqueryAlias nnll\n +- Relation[a#2,b#3] JDBCRelation(prod.nnll) [numPartitions=1]\n'
It seems like my insert statement is translated into insert overwrite statement by spark, because I'm trying to insert to the same table that I'm querying (on the same partition, I have only one).
Is there any way to avoid this, and make spark translate this query to a regular query?
Thank you very much!

RODBC ERROR: Could not SQLExecDirect in mysql

I have been trying to write an R script to query Impala database. Here is the query to the database:
select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA
When I run this query manually (read: outside the Rscript via impala-shell), I am able to get the table contents. However, when the same is tried via the R script, I get the following error:
[1] "HY000 140 [Cloudera][ImpalaODBC] (140) Unsupported query."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA'
closing unused RODBC handle 1
Why does the query fail when tried via R? and how do I fix this? Thanks in advance :)
Edit 1:
The connection script looks as below:
library("RODBC");
connection <- odbcConnect("Impala");
query <- "select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA";
data <- sqlQuery(connection,query);
You need to install the relevant drivers, please look at the following link
I had the same issue, all i had to do was update the ODBC drivers.
Also if you can update your odbcConnect with the username and password
connection <- odbcConnect("Impala");
to
connection <- odbcConnect("Impala", uid="root", pwd="password")
The RODBC package is quirky: if there's no row updated/deleted in the query execution it will throw an error.
So before using sqlDelete to delete rows, or using sqlUpdate to update values, first check if there's at least one row that will be deleted/updated by querying COUNT(*).
I've had no problem after implementing the check, for Oracle SQL 12g.
An alternative would be to use a staging table for the new batch of data, and use sqlQuery to execute a MERGE command. RODBC won't complaint if there's zero row merged.
This might also be due to an error in your sql query itself. For example, I got this error when I missed an 'in' in the following generalized statement. Example:
stringstuff <- someDT$columnyouwanttouse
somestring <- toString(sprintf("'%s'", stringstuff))
RESULTS <- sqlQuery(con, paste0("select
fling as flam
and toot **in** (",somestring,")
limit 30
;"))
I got the error you did when I left out the 'in', so double check your syntax.
This error message can arise if the table doesn't exist in the database.
A few sensible checks:
Check for typos in the table name in your query
See if you can run the same query on the same database via another sql client
Talk to your data base administrator to confirm that the table does exist
Re-installing the RODBC package did the trick for me!
I had a similar problem. After unnisntalling the R version 4.2.1 and install the R version 4.1.3 the problem was solved.