Inserting into MySQL tables through SparkSQL, by querying from the same table - mysql

I have a MySQL table that was created in MySQL like that:
create table nnll (a integer, b integer)
I've initialized pyspark (2.1) and executed the code:
sql('create table nnll using org.apache.spark.sql.jdbc options (url "jdbc:mysql://127.0.0.1:3306", dbtable "prod.nnll", user \'user\', password \'pass\')')
sql('insert into nnll select 1,2')
sql('insert into nnll select * from nnll')
From some reason, I get the exception:
AnalysisException: u'Cannot insert overwrite into table that is also being read from.;;\nInsertIntoTable Relation[a#2,b#3] JDBCRelation(prod.nnll) [numPartitions=1], OverwriteOptions(false,Map()), false\n+- Project [a#2, b#3]\n +- SubqueryAlias nnll\n +- Relation[a#2,b#3] JDBCRelation(prod.nnll) [numPartitions=1]\n'
It seems like my insert statement is translated into insert overwrite statement by spark, because I'm trying to insert to the same table that I'm querying (on the same partition, I have only one).
Is there any way to avoid this, and make spark translate this query to a regular query?
Thank you very much!

Related

How to convert an existing table in GridDB into a partitioned table?

I have a table with data similar to the following:
gs[public]> gs[public]> tql t1 select *;
9 results. (11 ms)
gs[public]> get
id,serial,intime
1,137192745719237,2022-11-11T05:33:15.979Z
2,137192745719246,2022-11-11T05:34:16.271Z
3,237192745719237,2022-11-11T05:34:21.189Z
5,337192745719237,2022-11-11T05:35:30.048Z
6,137192745719255,2022-11-11T05:35:38.121Z
7,137192745719279,2022-11-11T05:35:41.322Z
8,137192745719210,2022-11-11T05:35:47.521Z
9,137192745719201,2022-11-11T05:35:50.586Z
10,137192745719205,2022-11-11T05:35:53.671Z
The 9 results had been acquired.
gs[public]>
which currently has more than 30 million rows. Some data query statements are relatively slow, and there is also a problem of historical data archiving. I need to convert an existing table into a partitioned table for use.
I guess there are several ways, but I don't know how to implement it, and I haven't found the corresponding reference materials
It is realized by the conversion function, so the original table name will not change. The table of the relational database should be convertible, or the parent-child table solution similar to the pg database. Check the help and find no related functions.
data:
connect createcollection createcompindex
createindex createtimeseries disconnect
dropcompindex dropcontainer dropindex
droptrigger get getcsv
getnoprint getplanjson getplantxt
gettaskplan killsql putrow
queryclose removerow searchcontainer
searchview settimezone showconnection
showcontainer showevent showsql
showtable showtrigger sql
tql tqlanalyze tqlclose
tqlexplain
By exporting data, creating a partition table and then importing it, but I am not sure how much time this method will take. If I delete the original table directly, there will be risks.
gs[public]> showtable
Database : public
Name Type PartitionId
---------------------------------------------
t2 COLLECTION 13
t3 COLLECTION 27
t1 COLLECTION 55
t1_Partition COLLECTION 55
myHashPartition COLLECTION 101
gs[public]>
If I pass the DML statement of alter, I am not sure whether the NewSQL interface has this function.I know the following statement is wrong, is there a correct statement?
gs[public]> alter table t1 to t1_Partition;
D20332: An unexpected error occurred while executing a SQL. : msg=[[240001:SQL_COMPILE_SYNTAX_ERROR] Parse SQL failed, reason = Syntax error at or near "to" (line=1, column=15) on updating (sql="alter table t1 to t1_Partition") (db='public') (user='admin') (appName='gs_sh') (clientId='6045b94-4626-4d38-a96a-ff396a16791:7') (clientNd='{clientId=8, address=192.168.5.120:60478}') (address=192.168.5.120:20001, partitionId=6946)]
Hoping someone can tell me how to properly convert an existing table to a partitioned table in GridDB with minimal downtime. thanks

PyFlink Error/Exception: "Hive Table doesn't support consuming update changes which is produced by node PythonGroupAggregate"

Using Flink 1.13.1 and a pyFlink and a user-defined table aggregate function (UDTAGG) with Hive tables as source and sinks, I've been encountering an error:
pyflink.util.exceptions.TableException: org.apache.flink.table.api.TableException:
Table sink 'myhive.mydb.flink_tmp_model' doesn't support consuming update changes
which is produced by node PythonGroupAggregate
This is the SQL CREATE TABLE for the sink
table_env.execute_sql(
"""
CREATE TABLE IF NOT EXISTS flink_tmp_model (
run_id STRING,
model_blob BINARY,
roc_auc FLOAT
) PARTITIONED BY (dt STRING) STORED AS parquet TBLPROPERTIES (
'sink.partition-commit.delay'='1 s',
'sink.partition-commit.policy.kind'='success-file'
)
"""
)
What's wrong here?
I imagine you are executing a streaming query that is doing some sort of aggregation that requires updating previously emitted results. The parquet/hive sink does not support this -- once results are written, they are final.
One solution would be to execute the query in batch mode. Another would be to use a sink (or a format) that can handle updates. Or modify the query so that it only produces final results -- e.g., a time-windowed aggregation rather than an unbounded one.

Extract character from db to another db with all items

I tried to copy the data related to the guid of a character to another db (same account) but it always appears without any item
I am use deleted/insert
This from inventori db for example:
DELETE FROM `character_inventory` WHERE `item`=item;
INSERT INTO `character_inventory` VALUES (guid, bag, slot, item);
Ever export/import the character appear without items: no equiment and no inventory
psd: I executed the query with the server turned off and on with the same result
You can use this query to import the table content from another database.
INSERT INTO db1.`character_inventory` SELECT * FROM db2.`character_inventory` WHERE guid=XXX;
You need also to copy the item_instance table probably, so use:
INSERT INTO db1.`item_instance` SELECT * FROM db2.`item_instance`;

SQL filling a table importing data from another table and math

I am trying to develop software for one of my classes.
It is supposed to create a table contrato where I would fill the info of the clients and how much are they going to pay and how many payments they will make to cancel the contract.
On the other hand I have another table cuotas which should be filled by importing some info from table1 and I'm trying to perform the math and save the payment info directly into the SQL. But it keeps telling me I cant save the SQL because of error #1241
I'm using PHPMyAdmin and Xampp
Here is my SQL code
INSERT INTO `cuotas`(`Ncontrato`, `Vcontrato`, `Ncuotas`) SELECT (`Ncontrato`,`Vcontrato`,`Vcuotas`) FROM contrato;
SELECT `Vcuotaunit` = `Vcontrato`/`Ncuotas`;
SELECT `Vcuotadic`=`Vcuotaunit`*2;
Can you please help me out and fix whatever I'm doing wrong?
Those selects are missing a FROM clause.
So it's unknown from which table or view they have to take the columns.
You could use an UPDATE after that INSERT.
INSERT INTO cuotas (Ncontrato, Vcontrato, Ncuotas)
SELECT Ncontrato, Vcontrato, Vcuotas
FROM contrato;
UPDATE cuotas
SET Vcuotaunit = (Vcontrato/Ncuota),
Vcuotadic = (Vcontrato/Ncuota)*2
WHERE Vcuotaunit IS NULL;
Or use 1 INSERT that also does the calculations.
INSERT INTO cuotas (Ncontrato, Vcontrato, Ncuotas, Vcuotaunit, Vcuotadic)
SELECT Ncontrato, Vcontrato, Vcuotas,
(Vcontrato/Ncuota) as Vcuotaunit,
(Vcontrato/Ncuota)*2 as Vcuotadic
FROM contrato;

Insert into Select command causing exception ParseException line 1:12 missing TABLE at 'table_name' near '<EOF>'

I am 2 days old into hadoop and hive. So, my understanding is very basic. I have a question which might be silly. Question :I have a hive external table ABC and have created a sample test table similar to the table as ABC_TEST. My goal is to Copy certain contents of ABC to ABC_TEST depending on select clause. So I created ABC_TEST using the following command:
CREATE TABLE ABC_TEST LIKE ABC;
Problem with this is:
1) this ABC_TEST is not an external table.
2) using Desc command, the LOCATION content for ABC_TEST was something like
hdfs://somepath/somdbname.db/ABC_TEST
--> On command "hadoop fs -ls hdfs://somepath/somdbname.db/ABC_TEST " I found no files .
--> Whereas, "hadoop fs -ls hdfs://somepath/somdbname.db/ABC" returned me 2 files.
3) When trying to insert values to ABC_TEST from ABC, I have the above exception mentioned in the title. Following is the command I used to insert values to ABC_TEST:
INSERT INTO ABC_TEST select * from ABC where column_name='a_valid_value' limit 5;
Is it wrong to use the insert into select option in Hive? what am I missing? Please help
The correct syntax is "INSERT INTO TABLE [TABLE_NAME]"
INSERT INTO TABLE ABC_TEST select * from ABC where column_name='a_valid_value' limit 5;
I faced exactly the same issue and the reason is the Hive version.
In one of our clusters, we are using hive 0.14 and on a new set up we're using hive-2.3.4.
In hive 0.14 "TABLE" keyword is mandatory to be used in the INSERT command.
However in version hive 2.3.4, this is not mandatory.
So while in hive 2.3.4, the query you've mentioned above in your question will work perfectly fine but in older versions you'll face exception "FAILED: ParseException line 1:12 missing TABLE <>".
Hope this helps.