I use spark-2.1 .Below is my code
delta="insert overwrite table schema1.table1 select * from schema2.table2"
try:
spark.sql(delta)
except Exception as e:
spark.sql("drop table schema2.table2")
print("Overall load failed for schema1.table1", e)
sqlCtx.sql("drop table schema1.table1 ")
Below is what I am trying
To insert into table1 of schema1 from another table2 in another schema2 .
I am putting it in a try block , so that if it is successfull it will go out to except condition will drop the table and print the message Overall load failed for schema1.table1 .
Now the problem is whenever I execute the above statement it is dropping the table in the schema . pyspark is not being controlled by python's try and catch
I sense without going into try it is going into catch block and dropping
Please help in crossing this hurdle
Thanks in advance !
Related
I'm pretty new to SQL and need some help configuring a command. The details of my database structure can be found in this thread:
How to copy new data but skip old data from 2 tables in MySQL
The general problem is that I'm merging a new (temporary) database with an old one. I want to keep all the data in the old but copy over any new data from the new. If there is a duplicate, the old should be favored/kept.
My current command is:
INSERT INTO BAT_players
SELECT *
FROM bat2.bat_players
WHERE NOT EXISTS (SELECT 1 FROM BAT_players WHERE BAT_players(UUID) = bat2.bat_players(UUID));
When I run this, I get
Function bat2.bat_players undefined or Function bat.BAT_players undefined
I do not know how to proceed and would appreciate the help.
Columns are accessed using . not parens:
INSERT INTO BAT_players
SELECT *
FROM bat2.bat_players bp2
WHERE NOT EXISTS (SELECT 1
FROM BAT_players bp
WHERE bp.UUID = bp2.UUID
);
Note that the columns have to correspond by position, because you are not explicitly listing them. As a general rule, you want to list all all the columns in an insert:
INSERT INTO BAT_players ( . . . )
SELECT . . .
. . .
I am no familiar with the idea of MySQL,
I worked with SQL Server to be honest but if all the infrastructure are the same and I say IF, then there is a trick to these kinds of transactions between databases and that's simply the phrase dbo.
Like below:
using BAT
Insert into bat_players
SELECT * FROM bat2.dbo.bat_players
and also the rest of your conditions
or
instead of using the phrase using bat you can simply add the dbo to:
Insert into bat.dbo.bat_players
and again the rest of your condition,
just remember to use the dbo before each [table name].
HUGE UPDATE
if you want to access the fields (columns) you have to use . as #Gordon Linoff explained above. For example:
...
Where bat2.dbo.bat_players.UUID = --the condition--
I have a table in hive created from many json files using hive-json-serde method, WITH SERDEPROPERTIES ('dots.in.keys' = 'true'), as some keys there have a dot in, like `aaa.bbb`. I create external table and use backticks for these keys. Now I have a problem dropping this table from pyspark script, using sqlContext.sql("DROP TABLE IF EXISTS "+table_name), I'm getting this error message:
An error occurred while calling o63.sql.
: org.apache.spark.SparkException: Cannot recognize hive type string: struct<associations:struct<aaa.bbb:array<string> ...
Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '.' expecting ':'(line 1, pos 33)
== SQL ==
struct<associations:struct<aaa.bbb:array<string>,...
---------------------------------^^^
In HUE i can drop this table without any problem. Am I doing it wrong, or may be there is better way to do it?
It looks like it is not possible to work with Hive tables created with the hive-json-serde method, with dot in keys , using sqlContext.sql("...") from pyspark script, as I want. There is always the same error, if I want to drop such Hive table, or create it (haven't tried other things yet). So my workaround is to use python os.system() and execute required query through hive itself:
q='hive -e "DROP TABLE IF EXISTS '+ table_name+';"'
os.system(q)
It's more complicated with CREATE TABLE query, as we need to escape backticks with '\':
statement = "CREATE TABLE test111 (testA struct<\`aa.bb\`:string>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://bucket/test111';"
q='hive -e "'+ statement+'"'
It outputs some additional hive related info, but works!
I am 2 days old into hadoop and hive. So, my understanding is very basic. I have a question which might be silly. Question :I have a hive external table ABC and have created a sample test table similar to the table as ABC_TEST. My goal is to Copy certain contents of ABC to ABC_TEST depending on select clause. So I created ABC_TEST using the following command:
CREATE TABLE ABC_TEST LIKE ABC;
Problem with this is:
1) this ABC_TEST is not an external table.
2) using Desc command, the LOCATION content for ABC_TEST was something like
hdfs://somepath/somdbname.db/ABC_TEST
--> On command "hadoop fs -ls hdfs://somepath/somdbname.db/ABC_TEST " I found no files .
--> Whereas, "hadoop fs -ls hdfs://somepath/somdbname.db/ABC" returned me 2 files.
3) When trying to insert values to ABC_TEST from ABC, I have the above exception mentioned in the title. Following is the command I used to insert values to ABC_TEST:
INSERT INTO ABC_TEST select * from ABC where column_name='a_valid_value' limit 5;
Is it wrong to use the insert into select option in Hive? what am I missing? Please help
The correct syntax is "INSERT INTO TABLE [TABLE_NAME]"
INSERT INTO TABLE ABC_TEST select * from ABC where column_name='a_valid_value' limit 5;
I faced exactly the same issue and the reason is the Hive version.
In one of our clusters, we are using hive 0.14 and on a new set up we're using hive-2.3.4.
In hive 0.14 "TABLE" keyword is mandatory to be used in the INSERT command.
However in version hive 2.3.4, this is not mandatory.
So while in hive 2.3.4, the query you've mentioned above in your question will work perfectly fine but in older versions you'll face exception "FAILED: ParseException line 1:12 missing TABLE <>".
Hope this helps.
I'm trying to insert a ton of rows into my MySQL database. I have a query like this, but with about 700 more repetitive entries in it but for some reason the query is only inserting the first row to the database. In this case it would be '374','4957','0'.
INSERT INTO table VALUES ('374','4957','0'),('374','3834','0'),('374','4958','0'),('374','5076','0'),('374','4921','0'),('374','3835','0'),('374','4922','0'),('374','3836','0'),('374','3837','0'),('374','4879','0'),('374','3838','0')
I can't figure out what I'm doing wrong.
Thank you in advance.
Don't mean to state the obvious, but if the first field '374' is your primary key field, than this is the issue.
Otherwise, are there any error messages received from the database? That is always a good place to look for bugs.
For better understanding why something is not working next time use code like this:
$sql = "INSERT INTO table VALUES ('374','4957','0'),('374','3834','0')";
if (!mysqli_query($link, $sql)) {
printf("Errormessage: %s\n", mysqli_error($link));
}
That should display error message returned from MySQL.
More information: PHP manual - mysqli_error
Try to write the column names before values.
For example:
INSERT INTO table (column1,column2,column3) VALUES ...
Using CI for the first time and i'm smashing my head with this seemingly simple issue. My query wont insert the record.
In an attempt to debug a possible problem, the insert code has been simplified but i'm still getting no joy.
Essentially, i'm using;
$data = array('post_post' => $this->input->post('ask_question'));
$this->db->insert('posts', $data);
I'm getting no errors (although that possibly due to disabling them in config/database.php due to another CI related trauma :-$ )
Ive used
echo print $this->db->last_query();
to get the generated query, shown as below:
INSERT INTO `posts` (`post_post`) VALUES ('some text')
I have pasted this query into phpMyAdmin, it inserts no problem. Ive even tried using $this->db->query() to run the outputted query above 'manually' but again, the record will not insert.
The scheme of the DB table 'posts' is simply two columns, post_id & post_post.
Please, any pointers on whats going on here would be greatly appreciated...thanks
OK..Solved, after much a messing with CI.
Got it to work by setting persistant connection to false.
$db['default']['pconnect'] = FALSE;
sigh
Things generally look ok, everything you have said suggests that it should work. My first instinct would be to check that what you're inserting is compatible with your SQL field.
Just a cool CI feature; I'd suggest you take a look at the CI Database Transaction class. Transactions allow you to wrap your query/queries inside a transaction, which can be rolled back on failure, and can also make error handling easier:
$this->db->trans_start();
$this->db->query('INSERT INTO posts ...etc ');
$this->db->trans_complete();
if ($this->db->trans_status() === FALSE)
{
// generate an error... or use the log_message() function to log your error
}
Alternatively, one thing you can do is put your Insert SQL statement into $this->db->query(your_query_here), instead of calling insert. There is a CI Query feature called Query Binding which will also auto-escape your passed data array.
Let me know how it goes, and hope this helps!