I'm working on automating the process of building a database. This is a database that needs daily updates after one build.
This database has 51 tables, divided into 3 schemas (there are 17 tables in each schema), and has a total of 20 million records, each record with a PK of manage_number .
I need to update 2000~3000 records every day, but I don't know which method to use.
Make a table for PK indexing
This is a method to separately create a table with a PK and a table name with a PK in the same database. To create a table with metadata about which table the manage_number is stored in. Currently, this methodology is applied. The problem is that the build time takes 5-6 times longer than before (increased from 2 minutes to 12 minutes).
Multi-table update query
This is how to perform update query on 17 tables with the same schema. However, in this case, the load in the FROM clause is expected to be very high, and I think it will be a burden on the server.
Update query may looks like below.
UPDATE table1, table2, table3, ..., table17
SET data_here
WHERE manage_number = 'TARGET_NUMBER';
Please share which way is better, or if you have a better way.
Thank you.
Related
I have a database table which is around 700GB with 1 Billion rows, the data is approximately 500 GB and index is 200GB,
I am trying to delete all the data before 2021,
Roughly around 298,970,576 rows in 2021 and there are 708,337,583 rows remaining.
To delete this I am running a non-stop query in my python shell
DELETE FROM table_name WHERE id < 1762163840 LIMIT 1000000;
id -> 1762163840 represent data from 2021. Deleting 1Mil row taking almost 1200-1800sec.
Is there any way I can speed up this because the current way is running for more than 15 days and there is not much data delete so far and it's going to do more days.
I thought that if I make a table with just ids of all the records that I want to delete and then do an exact map like
DELETE FROM table_name WHERE id IN (SELECT id FROM _tmp_table_name);
Will that be fast? Is it going to be faster than first making a new table with all the records and then deleting it?
The database is setup on RDS and instance class is db.r3.large 2 vCPU and 15.25 GB RAM, only 4-5 connections running.
I would suggest recreating the data you want to keep -- if you have enough space:
create table keep_data as
select *
from table_name
where id >= 1762163840;
Then you can truncate the table and re-insert new data:
truncate table table_name;
insert into table_name
select *
from keep_data;
This will recreate the index.
The downside is that this will still take a while to re-insert the data (renaming keep_data would be faster). But it should be much faster than deleting the rows.
AND . . . this will give you the opportunity to partition the table so future deletes can be handled much faster. You should look into table partitioning if you have such a large table.
Multiple techniques for big deletes: http://mysql.rjweb.org/doc.php/deletebig
It points out that LIMIT 1000000 is unnecessarily big and causes more locking than might be desirable.
In the long run, PARTITIONing would be beneficial, it mentions that.
If you do Gordon's technique (rebuilding table with what you need), you lose access to the table for a long time; I provide an alternative that has essentially zero downtime.
id IN (SELECT...) can be terribly slow -- both because of the inefficiency of in-SELECT and due to the fact that DELETE will hang on to a huge number of rows for transactional integrity.
I have two big tables for example:
'tbl_items' and 'tbl_items_transactions'
First table keeping some items metadata which may have 20 (varchar) columns with millions rows... and second table keeping each transaction of first table.
for example if a user insert new record to tbl_items then automatically a new record will be adding to tbl_items_transactions with same data plus date, username and transaction type to keep each row history.
so in the above scenario two tables have same columns but tbl_items_transactions have 3 extra columns date, username, transaction_type to keep each tbl_items history
now assume we have 1000 users that wants to Insert, Update, Delete tbl_items records with a web application. so these two tables scale very soon (maybe billion rows in tbl_items_transactions)
I have tried MySQL, MariaDB, PostgreSQL... they are very good but when table scale and millions rows inserted they are slow when run some select queries on tbl_items_transactions... but sometimes PostgreSQL is faster than MySQL or MariaDB
now I think I'm doing wrong things... If you was me... do you use MariaDB or PostgreSQL or somthing like that and structure your database like what I did?
Your setup is wrong.
You should not duplicate the columns from tbl_items in tbl_items_transactions, rather you should have a foreign key in the latter table pointing to the former.
That way data integrity is preserved, and tbl_items_transactions will be much smaller. This technique is called normalization.
To speed up queries when the table get large, define indexes on them that match the WHERE and JOIN conditions.
I have to join 3 tables from 3 different database.
I am fetching records from table 1 (every hour and around 100K record every hour). Using the key from table1, I will have to get record from table 2 and then using key from table 2, i will have to fetch record from table 3.
I am thinking of using IN clause however I am not sure if that would be good option.
Also, Records in table2 and table3 are less in numbers and wont change frequently. So there is an option of using second level cache to get records of these two tables in cache and then filter records based on what is required for table1.
should I join tables approach or use cache for less frequently updated tables.
Please suggest.
MySQL database has only one table products. Table record contains 1 integer autoincrement primarykey, 1 integer modified_time field and 10 varchar fields.
Two times a day CRON launches process that receives 700.000 XML records with new/updated/old products from other server. Usually there are about 100.000 updated on new products a day (other 600.000 dont change).
The question is what way will be faster
1) Make something like DROP TABLE then recreate same table or DELETE * FROM products. Then INSERT everything we receive (700k records).
2) Start loop on every XML record, compare modified_time field and UPDATE if XML modified_time is newer than modified_time from database.
As I understand second way will lead to 700k SELECT queries, 100k UPDATE queries and some DELETE queries.
So what way will be faster? Thanks in advance
the scripts i've been working with in SQL work with close to 40,000 records and i've noticed a huge increase in execution time for when i use an UPDATE command
in 2 tables that have like 10 fields in them each, INSERT executes quicker for both combined than this UPDATE command
UPADTE table1
INNER JOIN table2 ON table1.primarykey = table2.primarykey
SET table1.code = table2.code
really what the UPDATE is doing is copying the code from one table to another where the identical records exists, this is because table1 is a staging table between 2 databases while table2 is a possessing table to insert staging table's data across multiple tables, both tables have the same number of record which is about 40,000
now to me UPDATE should be executing a lot quicker, considering it's only connecting 2 identical tables and inserting data for 1 field it should be running quicker than 2 INSERTS where 40,000 records are being created over 10 fields (so in other words, inserting 800,000 pieces of data) and i'm running the queries in a SQL console windows to avoid php timeouts
is UPDATE somehow more resource hungry than INSERT and is there any way to get it to go faster (apart from changing the fact that i use a separate table for processing, the staging table updates frequently so i copy the data like a snapshot and work with that, the code field is NULL to begin with so i only copy over records with a NULL code meaning records where code is not NULLhave already been worked with)
Is that UPDATE command the actual SQL? Because you need a WHERE clause to avoid updating every record in the table...
Also, INSERT doesn't first need to find the record to update from 2 joined tables.