I have a MySQL database with a table that is populated with approximately 1.5 million rows of data that needs to be entirely refreshed every 5 minutes. The data is no longer needed once it is older than 5 minutes.
Getting the data into the table is no problem...I can populate it in approximately 50-70 seconds. Where I'm having some trouble is figuring out how to shift all the old data out and replace it with new data. I need to be able to run queries at any time across the entire data set. These queries need to run very fast and they must contain only data from one data set at a time (i.e., the query should not pull a combination of new and old data during the 1 minute that the table is being updated).
I do not have much experience working with large temporary data sets, so I would appreciate some advice on how best to solve this problem.
Create partitions. You can then populate one partition while users query from the other.
To do this manually you just need something like...
CREATE TABLE tbl0 (blah)
CREATE TABLE tbl1 (blah)
CREATE TABLE meta (combined_source INT)
INSERT INTO meta VALUES (0)
CREATE VIEW combined AS
SELECT * FROM tbl0 WHERE 0 = (SELECT combined_source FROM meta) % 2
UNION ALL
SELECT * FROM tbl1 WHERE 1 = (SELECT combined_source FROM meta) % 2
Now you can insert new data into the 'inactive' table and it WON'T appear in the view.
Next, increment the value in meta. Immediately the view switches from showing data from one table to showing data from the other table.
On your next iteration you just check meta to determine which table to empty and load the new data into.
One benefit of this approach is that you don't even need to be within a transaction.
Related
I'm working on automating the process of building a database. This is a database that needs daily updates after one build.
This database has 51 tables, divided into 3 schemas (there are 17 tables in each schema), and has a total of 20 million records, each record with a PK of manage_number .
I need to update 2000~3000 records every day, but I don't know which method to use.
Make a table for PK indexing
This is a method to separately create a table with a PK and a table name with a PK in the same database. To create a table with metadata about which table the manage_number is stored in. Currently, this methodology is applied. The problem is that the build time takes 5-6 times longer than before (increased from 2 minutes to 12 minutes).
Multi-table update query
This is how to perform update query on 17 tables with the same schema. However, in this case, the load in the FROM clause is expected to be very high, and I think it will be a burden on the server.
Update query may looks like below.
UPDATE table1, table2, table3, ..., table17
SET data_here
WHERE manage_number = 'TARGET_NUMBER';
Please share which way is better, or if you have a better way.
Thank you.
I have n (source) tables with the same structure that each have few million rows. Each of these table receives new data from different sources on a regular basis.
(Ex: Sales table. Each store have its own sales table. There's 1000 stores selling hundred of thousands items each day. How would you combine those tables?)
I would like to merge them in one summary table. I would like the changes from any of the source tables to be reflected on the summary and changes on the summary to be reflected on the appropriate source table.
(Ex: Sales table. When new sales occurs, the summary table is updated. If a changes to the sale is made in the summary table, it is reflected on the appropriate store table.)
I can see three solutions.
1.Create an event/trigger that would refresh my summary tables at a given time or after an insert/update/delete.
Something like:
#Some event triggers this
DROP TABLE table_summary;
INSERT INTO table_summary
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
UNION ALL
SELECT * FROM tablen...
The downside here, I believe, is performance, I do not think I can afford to run this query every time there is an INSERT/UPDATE/DELETE on one of the table.
2.Create a view.
CREATE VIEW table_summary AS
SELECT * FROM table1
UNION ALL
SELECT * FROM table2;
#This query takes 90s to complete
Performance wise, I have the same kind of problem as with the solution #1
3.Create an INSERT/UPDATE/DELETE trigger for each table. That's a lot of triggers and MySQL limit to one per table. I started that way but the code scaffolding to maintain appears impressive and likely hard to maintain.
I am sure there's a better way I have not think of.
I have huge data record in subscribers_table about 4,200,000 records, when I select one user by email it is take long time, now I don't want delete those records and I need to work on seperated tables like subscribers_table1, subscribers_table2, ... , subscribers_table42
Now I need define procedure in mysql that move the hole of data in subscribers_table into seperated tables subscribers_table1, subscribers_table2, ..., subscribers_table42
the following code is in pseudo code:
table_number = 1
function table_to_migrate_into_separate_tables():
//this loop to read 100,000 record and move to next 100,000 until the end of table
for every 100,000 record in subscribers_table:
//this to create table with nambe (original name + table_number)
Create table("subscribers_table" + table_number)
//this to move 100,000 record only to the created table
Move 100,000 record to table("subscribers_table"+ table_number)
//increase table number to be unique
table_number ++
//this check if subscribers_table has migrate all the records into seperate table then break loop and finish
if subscribers_table has finish:
Break loop
This is called Partitionning, and MySQL can do the job for you:
ALTER TABLE your_table PARTITION BY KEY(some_column_here) PARTITIONS 40;
However, 4M rows is not that large after all, perhaps all you lack is an index on email.
I have a table with about 35 million rows. each has about 35 integer values and one time value (last updated)
The table has two indexes
primary - uses two integer values from the table columns
Secondary - uses the 1st integer from the primary + another integer value.
I would like to delete old records (about 20 millions of them) according to the date field.
What is the fastest way:
1. Delete as is according the the date field?
2. Create another index by date and then delete by date.
There will be one time deletion of large portion of the data and then incremental weekly deletion of much smaller parts.
Is there another way to do it more efficiently?
it might be quicker to create a new table containing the rows you want to keep, drop the old table and then rename the new table
For weekly deletions an index on date field would speed things up.
Fastest (but not easiest) - i think - is to keep your records segmented into multiple
tables based on date, e.g. given week, and then have a union table of all those tables for the regular queries across the whole thing (so your queries would be unaltered). You would each week, create new tables and redefine the union table.
When you wish to drop old records, you simply recreate the union table to leave the records in the old tables out, and then drop those left out (remember to truncate before you drop depending on you filesystem). This is probably the fastest way to get there with MySQL.
A mess to manage though :)
I have a table with:
location_id (location) - ranges from 0-90,000
time (time_period) - ranges from 1-15 for each location
temperature (temp) - A unique value for each location x time_period.
Example data:
location time_period temp
91 1 -4
91 2 3
91 3 12
.......................
.......................
91 15 20
I'd like to create a new field called cum_temp and add the cumulative value for each cell till that current time_period. My current thought is to do duplicate the table and run:
update site_a
set cum_temp = (select sum(temp)
from site_a_copy
where site_a_copy.location = site_a.location
and site_a_copy.time_period <= site_a.time_period);
Is this the most efficient way to do this or can you suggest something better?
Note that you don't need a separate table, unless you don't want to store the cumulative temperature in one of them.
Running a query periodically or manually will work (though do it from a stored procedure), or you can try to use a trigger to do it automatically:
CREATE TRIGGER sum_temp
BEFORE INSERT ON site
FOR EACH ROW
SET NEW.cum_temp = NEW.temp + (
SELECT SUM(temp) FROM site
WHERE site.location_id = NEW.location_id
AND site.`time` < NEW.`time`
);
However, triggers for this application are fraught with peril. For one, the above won't update any rows that are temporally after the new row. As long as you only append new records, this isn't a serious problem. Getting a trigger to update later rows is basically impossible, as row level locks prevent you from issuing UPDATEs to the same table within INSERT and UPDATE triggers. Storing the cumulative temperature in another table would get around this.
Second, when inserting multiple values with a single INSERT statement, the new rows must be in temporal order in the statement, else the cumulative temperature will be set improperly. Rows that are earlier in time but later in the INSERT won't be taken into account when calculating the cumulative temperature for rows later in time but earlier in the INSERT. This is a consequence to the first issue.
You could also use a view:
CREATE VIEW site_a AS
SELECT *, SUM(temp) AS cum_temp
FROM site AS s
JOIN site AS t
ON s.location_id=t.location_id AND s.time_period >= t.time_period
;
See also "UPDATE TABLE WITH SUM"