How to convert an existing table in GridDB into a partitioned table? - partitioning

I have a table with data similar to the following:
gs[public]> gs[public]> tql t1 select *;
9 results. (11 ms)
gs[public]> get
id,serial,intime
1,137192745719237,2022-11-11T05:33:15.979Z
2,137192745719246,2022-11-11T05:34:16.271Z
3,237192745719237,2022-11-11T05:34:21.189Z
5,337192745719237,2022-11-11T05:35:30.048Z
6,137192745719255,2022-11-11T05:35:38.121Z
7,137192745719279,2022-11-11T05:35:41.322Z
8,137192745719210,2022-11-11T05:35:47.521Z
9,137192745719201,2022-11-11T05:35:50.586Z
10,137192745719205,2022-11-11T05:35:53.671Z
The 9 results had been acquired.
gs[public]>
which currently has more than 30 million rows. Some data query statements are relatively slow, and there is also a problem of historical data archiving. I need to convert an existing table into a partitioned table for use.
I guess there are several ways, but I don't know how to implement it, and I haven't found the corresponding reference materials
It is realized by the conversion function, so the original table name will not change. The table of the relational database should be convertible, or the parent-child table solution similar to the pg database. Check the help and find no related functions.
data:
connect createcollection createcompindex
createindex createtimeseries disconnect
dropcompindex dropcontainer dropindex
droptrigger get getcsv
getnoprint getplanjson getplantxt
gettaskplan killsql putrow
queryclose removerow searchcontainer
searchview settimezone showconnection
showcontainer showevent showsql
showtable showtrigger sql
tql tqlanalyze tqlclose
tqlexplain
By exporting data, creating a partition table and then importing it, but I am not sure how much time this method will take. If I delete the original table directly, there will be risks.
gs[public]> showtable
Database : public
Name Type PartitionId
---------------------------------------------
t2 COLLECTION 13
t3 COLLECTION 27
t1 COLLECTION 55
t1_Partition COLLECTION 55
myHashPartition COLLECTION 101
gs[public]>
If I pass the DML statement of alter, I am not sure whether the NewSQL interface has this function.I know the following statement is wrong, is there a correct statement?
gs[public]> alter table t1 to t1_Partition;
D20332: An unexpected error occurred while executing a SQL. : msg=[[240001:SQL_COMPILE_SYNTAX_ERROR] Parse SQL failed, reason = Syntax error at or near "to" (line=1, column=15) on updating (sql="alter table t1 to t1_Partition") (db='public') (user='admin') (appName='gs_sh') (clientId='6045b94-4626-4d38-a96a-ff396a16791:7') (clientNd='{clientId=8, address=192.168.5.120:60478}') (address=192.168.5.120:20001, partitionId=6946)]
Hoping someone can tell me how to properly convert an existing table to a partitioned table in GridDB with minimal downtime. thanks

Related

How do I insert data from a dataframe into selected columns in a MySQL table using R

I have a data frame made up of 3 columns named INTERNAL_ID, NT_CLONOTYPE and SAMPLE_ID. I need to write a script in R that will transfer this data into the appropriate 3 columns with the exact names in a MySQL table. However, the table has more than 3 columns, say 5 (INTERNAL_ID, COUNT, NT_CLONOTYPE, AA_CLONOTYPE, and SAMPLE_ID). The MySQL table already exists and may or may not include preexisting rows of data.
I'm using the dbx and RMariaDB libraries in R. I've been able to connect to the MySQL database with dbxConnect(). When I try to run dbxUpsert()
-----
conx <- dbxConnect(adapter = "mysql", dbname = "TCR_DB", host = "127.0.0.1", user = "xxxxx", password = "xxxxxxx")
table <- "TCR"
records <- newdf #dataframe previously created with the update data.
dbxUpsert(conx, table, records, where_cols = c("INTERNAL_ID"))
dbxDisconnect(conx)
I expect to obtain an updated mysql table with the new rows, which may or may not have null entries in the columns not contained in the data frame.
Ex.
INTERNAL_ID COUNT NT_CLONOTYPE AA_CLONOTYPE SAMPLE_ID
Pxxxxxx.01 CTTGGAACTG PMA.01
The connection and disconnection all run fin, but instead of the output I obtain the following error:
Error in .local(conn, statement, ...) :
could not run statement: Field 'COUNT' doesn't have a default value
I'm suspecting it's because the number of columns in each file are not the same, but I'm not sure. And if such, how can I get around this.
I figured it out. I changed the table entry for "COUNT" to default to NULL. This allowed for the program to proceed by ignoring "COUNT".

SQL filling a table importing data from another table and math

I am trying to develop software for one of my classes.
It is supposed to create a table contrato where I would fill the info of the clients and how much are they going to pay and how many payments they will make to cancel the contract.
On the other hand I have another table cuotas which should be filled by importing some info from table1 and I'm trying to perform the math and save the payment info directly into the SQL. But it keeps telling me I cant save the SQL because of error #1241
I'm using PHPMyAdmin and Xampp
Here is my SQL code
INSERT INTO `cuotas`(`Ncontrato`, `Vcontrato`, `Ncuotas`) SELECT (`Ncontrato`,`Vcontrato`,`Vcuotas`) FROM contrato;
SELECT `Vcuotaunit` = `Vcontrato`/`Ncuotas`;
SELECT `Vcuotadic`=`Vcuotaunit`*2;
Can you please help me out and fix whatever I'm doing wrong?
Those selects are missing a FROM clause.
So it's unknown from which table or view they have to take the columns.
You could use an UPDATE after that INSERT.
INSERT INTO cuotas (Ncontrato, Vcontrato, Ncuotas)
SELECT Ncontrato, Vcontrato, Vcuotas
FROM contrato;
UPDATE cuotas
SET Vcuotaunit = (Vcontrato/Ncuota),
Vcuotadic = (Vcontrato/Ncuota)*2
WHERE Vcuotaunit IS NULL;
Or use 1 INSERT that also does the calculations.
INSERT INTO cuotas (Ncontrato, Vcontrato, Ncuotas, Vcuotaunit, Vcuotadic)
SELECT Ncontrato, Vcontrato, Vcuotas,
(Vcontrato/Ncuota) as Vcuotaunit,
(Vcontrato/Ncuota)*2 as Vcuotadic
FROM contrato;

Inserting into MySQL tables through SparkSQL, by querying from the same table

I have a MySQL table that was created in MySQL like that:
create table nnll (a integer, b integer)
I've initialized pyspark (2.1) and executed the code:
sql('create table nnll using org.apache.spark.sql.jdbc options (url "jdbc:mysql://127.0.0.1:3306", dbtable "prod.nnll", user \'user\', password \'pass\')')
sql('insert into nnll select 1,2')
sql('insert into nnll select * from nnll')
From some reason, I get the exception:
AnalysisException: u'Cannot insert overwrite into table that is also being read from.;;\nInsertIntoTable Relation[a#2,b#3] JDBCRelation(prod.nnll) [numPartitions=1], OverwriteOptions(false,Map()), false\n+- Project [a#2, b#3]\n +- SubqueryAlias nnll\n +- Relation[a#2,b#3] JDBCRelation(prod.nnll) [numPartitions=1]\n'
It seems like my insert statement is translated into insert overwrite statement by spark, because I'm trying to insert to the same table that I'm querying (on the same partition, I have only one).
Is there any way to avoid this, and make spark translate this query to a regular query?
Thank you very much!

update SQL table from foreign data source without first deleting all entries (but do delete entries no longer present)

I have a bunch of MySQL tables I work with where the ultimate data source from a very slow SQL server administered by someone else. My predecessors' solution to dealing with this is to do queries more-or-less like:
results = python_wrapper('SELECT primary_key, col2, col3 FROM foreign_table;')
other_python_wrapper('DELETE FROM local_table;')
other_python_wrapper('INSERT INTO local_table VALUES() %s;' % results)
The problem is this means you can never use values in local_table as foreign key constraints for other tables because they are constantly being deleted and added back into the table whenever you update it from the foreign source. However, if a record really does dis sapper in the results to the query on the foreign server, than that usually means you would want to trigger a cascade effect to drop records in other local tables that you've linked with a foreign key constraint to data propagated from the foreign table.
The only semi-reasonable solution I've come up with is to do something like:
results = python_wrapper('SELECT primary_key, col2, col3 FROM foreign_table;')
other_python_wrapper('DELETE FROM local_table_temp;')
other_python_wrapper('INSERT INTO local_table_temp VALUES() %s;' % results)
other_python_wrapper('DELETE FROM local_table WHERE primary_key NOT IN local_table_temp;')
other_python_wrapper('INSERT INTO local_table SELECT * FROM local_table_temp ON DUPLICATE KEY UPDATE local_table.col2 = local_table_temp.col2, local_table.col3 = local_table_temp.col3
The problem is there's a fair number of these tables and many of the tables have a large number of columns that need to be updated so it's tedious to write the same boiler-plate over & over. And if the table schema changes, there's more than one place you need to update the listing of all columns.
Is there any more concise way to do this with the SQL code?
Thanks!
I have a somewhat un-satisfactory answer to my own question. Since I'm using python to query the foreign Oracle database and put that into SQL, and I trust the format of the table and column names to be pretty well behaved, I can just wrap the whole procedure in python code and have python generate the update SQL update queries based off inspecting the tables.
For a number of reasons, I'd still like to see a better way to do this, but it works for me because:
I'm using an external scripting language that can inspect the database schema anyway.
I trust the database, column, and table names I'm working with to be well-behaved because these are all things I have direct control over.
My solution depends on the local SQL table structure; specifically which keys are primary keys. The code won't work without properly structured tables. But that's OK, because I can restructure the MySQL tables to make my python code work.
While I do hope someone else can think up a more-elegant and/or general-purpose solution, I will offer up my own python code to anyone who is working on a similar problem who can safely make the same assumptions I did above.
Below is a python wrapper I use to do simple SQL queries in python:
import config, MySQLdb
class SimpleSQLConn(SimpleConn):
'''simplified wrapper around a MySQLdb.connection'''
def __init__(self, **kwargs):
self._connection = MySQLdb.connect(host=config.mysql_host,
user=config.mysql_user,
passwd=config.mysql_pass,
**kwargs)
self._cursor = self._connection.cursor()
def query(self, query_str):
self._cursor.execute(query_str)
self._connection.commit()
return self._cursor.fetchall()
def columns(self, database, table):
return [x[0] for x in self.query('DESCRIBE `%s`.`%s`' % (database, table))g]
def primary_keys(self, database, table):
return [x[0] for x in self.query('DESCRIBE `%s`.`%s`' % (database, table)) if 'PRI' in x]
And here is the actual update function, using the SQL wrapper class above:
def update_table(database,
table,
mysql_insert_with_dbtable_placeholder):
'''update a mysql table without first deleting all the old records
mysql_insert_with_dbtable_placeholder should be set to a string with
placeholders for database and table, something like:
mysql_insert_with_dbtable_placeholder = "
INSERT INTO `%(database)s`.`%(table)s` VALUES (a, b, c);
note: code as is will update all the non-primary keys, structure
your tables accordingly
'''
sql = SimpleSQLConn()
query ='DROP TABLE IF EXISTS `%(database)s`.`%(table)s_temp_for_update`' %\
{'database': database, 'table': table}
sql.query(query)
query ='CREATE TABLE `%(database)s`.`%(table)s_temp_for_update` LIKE `%(database)s`.`%(table)s`'%\
{'database': database, 'table': table}
sql.query(query)
query = mysql_insert_with_dbtable_placeholder %\
{'database': database, 'table': '%s_temp_for_update' % table}
sql.query(query)
query = '''DELETE FROM `%(database)s`.`%(table)s` WHERE
(%(primary_keys)s) NOT IN
(SELECT %(primary_keys)s FROM `%(database)s`.`%(table)s_temp_for_update`);
''' % {'database': database,
'table': table,
'primary_keys': ', '.join(['`%s`' % key for key in sql.primary_keys(database, table)])}
sql.query(query)
update_columns = [col for col in sql.columns(database, table)
if col not in sql.primary_keys(database, table)]
query = '''INSERT into `%(database)s`.`%(table)s`
SELECT * FROM `%(database)s`.`%(table)s_temp_for_update`
ON DUPLICATE KEY UPDATE
%(update_cols)s
''' % {'database': database,
'table': table,
'update_cols' : ',\n'.join(['`%(table)s`.`%(col)s` = `%(table)s_temp_for_update`.`%(col)s`' \
% {'table': table, 'col': col} for col in update_columns])}
sql.query(query)

Inserting DATA in MS Access

I've this code:
SELECT VISA41717.Fraud_Post_Date, VISA41717.Merchant_Name_Raw, VISA41717.Merchant_City, VISA41717.Merchant_Country, VISA41717.Merchant_Category_Code, VISA41717.ARN, VISA41717.POS_Entry_Mode, VISA41717.Fraud_Type, VISA41717.Local_Amt, VISA41717.Fraud_Amt, VISA41717.Purch_Date, VISA41717.Currency_Code, VISA41717.Cashback_Indicator, VISA41717.Card_Account_Num
FROM VISA41717 LEFT JOIN MASTERCARD_VISA ON VISA41717.ARN=MASTERCARD_VISA.MICROFILM_NUMBER
WHERE VISA41717.ARN IS NULL OR MASTERCARD_VISA.MICROFILM_NUMBER IS NULL
ORDER BY VISA41717.ARN;
this is really works, But I need to match the first 6 digit of VISA41717.Card_Account_Num from BIN.INT to get the other data from BIN table and combined it all in one table only.
it should be this way:
Can you help me with this.
thanks!
What do you mean by 'all in one table'? Just build a query that joins tables.
Try:
SELECT ... FROM VISA41717 RIGHT JOIN BIN ON Left(VISA41717.Card_Account_Num, 6) = Bin.Int ...
Won't be able to build this join in Design View, use SQL View. Or build a query object that creates a field by extraction of the 6 characters and then build another query that includes that query and MASTERCARD_VISA and BIN tables.