Is there any way to load data into a specific partition of partitioned table.What I am looking for something like 'insert into a_partitioned_table.partition....'.
So I can avoid overhead happening for sql server scanning for appropriate partition.
Thanks
you can query like
insert into target_table
partition (partitions_column_name of target_table)
select * from another_table where partition='888888';
Related
In INDEX Query, why the SELECT query is much faster than other UPDATE or INSERT query in SQL?
In simplest, select is pulling data already written. Update and insert have to write the data to the pages and also update indexes, so it needs to traverse all possible indexes of impacted tables.
Additionally, (credit to obe), select queries can take advantage of the cache if the data is associated with a prior query. It does not need to go back to the original data pages / indexes to re-pull the data.
I have a MySQL 8 RDS (innodb) instance that I am trying to insert / update to.
The target table contains approx 120m rows and I am trying to insert 2.5m to the table from a csv. Some of the data in the source table may already exist in the target table which is constrained by a primary key, in which case update.
Having done some research I have found that the quickest way seems to be to do a bulk load from the source table into a temporary table, then a
insert into target_table
select col1, col2 from source table a
on duplicate key update col1 = a.col1, col2 = a.col2
However this seems to be taking hours.
Is there a best practice to optimise inserts of this sort?
Would it be quicker to separate the inserts into inserts an updates separately? Can I disable indexes in the target table (I know this is possible for myisam)?
Thanks
I have a need to "update" some table data I receive from external source (every time I receive "all" data, with some fields for some records updated).
There's no unique field or combination of fields, and thus I figured the best way would be to every time to wipe out all data from DB and write all (now updated) data in again. There are up to a 1000 records (there will never be more than that), about 15 short fields each: text, numbers, datetime. And I'm writing it to remote DB (so, it's slow).
Currently I'm doing:
delete from `table` where `date_dt` > ?
and then for each row
INSERT INTO `table` ( `field_0`,`field_1`,... ) VALUES (?,?,...)
It's not only slow, but it's possible that the end user may not see the complete data while I'm still inserting.
I figured I could do:
CREATE TEMPORARY TABLE `temp_table` ( ... ); -- same structure as in main table
INSERT INTO `temp_table` ( `field_0`,`field_1`,... ) VALUES (?,?,...) -- repeat 1000x
START TRANSACTION;
DELETE FROM `table`;
INSERT INTO `table` SELECT * FROM `temp_table`;
DROP `temp_table`;
COMMIT;
Does this makes any sense? What's is a better way of solving this?
The speed of filling up the temp table with data is not crucial, but filling the main table with data is (so users don't see incomplete data, or the period of time they do is minimal).
mysqlimport --delete will truncate the table first, and then load your external data from a CSV file. It runs many times faster than doing INSERT one row at a time.
See https://dev.mysql.com/doc/refman/5.7/en/mysqlimport.html
I did a presentation in April 2017 about performance of bulk data loads for MySQL:
https://www.slideshare.net/billkarwin/load-data-fast
P.S.: Don't use the temp table solution if you have a MySQL replication environment. This is a well-known way of breaking replication. If the slave restarts in between your creation of the temp table and the INSERT...SELECT that reads from the temp table, then the slave will find the temp table is gone, and this will result in an error and stop replication. This might seem unlikely, but it does happen eventually.
I've created a view on partitioned table. When I pass the partitioned column to the SELECT statement of view, the optimizer is not going to that particular partition when checked through EXPLAIN statement.
Is there any way to make the view access a single partition of its table?
[Edit] : Here is how I created the view on two partitioned tables
CREATE TABLE Partition1 (ID INT,NAME VARCHAR(100),DOB DATE)
PARTITION BY LIST (YEAR(DOB))
(
PARTITION P_2000 VALUES IN (2000),
PARTITION P_2001 VALUES IN (2001)
);
CREATE TABLE NOPART (ID INT,DOB DATE)
PARTITION BY LIST (YEAR(DOB))
(
PARTITION P_2000 VALUES IN (2000),
PARTITION P_2001 VALUES IN (2001)
);
CREATE OR REPLACE VIEW P_VIEW
AS
SELECT ID,DOB
FROM PARTITION1
UNION
SELECT ID,DOB
FROM NOPART;
EXPLAIN
SELECT * FROM P_VIEW
WHERE DOB = '2001-01-01';
When I run the "Explain" it shows optimizer is going to both partitions "p_2000" and "p_2001".
There are many deficiencies in the implementation of VIEWs. You may have hit one.
There are many uses of PARTITIONing that do not provide any performance. BY RANGE is probably the only variant that helps performance for some use cases. A table with less than a million rows is not worth partitioning.
Without seeing your CREATE TABLE, CREATE VIEW, and SELECT, we can only give you vague answers like I have.
(Responding to added code) Unless there is more to it than that, PARTITIONing in that way provide no benefit over having an index on DOB.
Furthermore, The VIEW + PARTITION approach (without an index) must scan the entire 2001 partition looking for the few rows for '2001-01-01'. Instead the simple index approach can find them immediately -- 365 times as fast. (OK, not really that much faster, but still.)
I want to keep the last 45 days of log data in a MySQL table for statistical reporting purposes. Each day could be 20-30 million rows. I'm planning on creating a flat file and using load data infile to get the data in there each day. Ideally I'd like to have each day on it's own partition without having to write a script to create a partition every day.
Is there a way in MySQL to just say each day gets it's own partition automatically?
thanks
I would strongly suggest using Redis or Cassandra rather than MySQL to store high traffic data such as logs. Then you could stream it all day long rather than doing daily imports.
You can read more on those two (and more) in this comparison of "NoSQL" databases.
If you insist on MySQL, I think the easiest would just be to create a new table per day, like logs_2011_01_13 and then load it all in there. It makes dropping older dates very easy and you could also easily move different tables on different servers.
er.., number them in Mod 45 with a composite key and cycle through them...
Seriously 1 table per day was a valid suggestion, and since it is static data I would create packed MyISAM, depending upon my host's ability to sort.
Building queries to union some or all of them would be only moderately challenging.
1 table per day, and partition those to improve load performance.
Yes, you can partition MySQL tables by date:
CREATE TABLE ExampleTable (
id INT AUTO_INCREMENT,
d DATE,
PRIMARY KEY (id, d)
) PARTITION BY RANGE COLUMNS(d) (
PARTITION p1 VALUES LESS THAN ('2014-01-01'),
PARTITION p2 VALUES LESS THAN ('2014-01-02'),
PARTITION pN VALUES LESS THAN (MAXVALUE)
);
Later, when you get close to overflowing into partition pN, you can split it:
ALTER TABLE ExampleTable REORGANIZE PARTITION pN INTO (
PARTITION p3 VALUES LESS THAN ('2014-01-03'),
PARTITION pN VALUES LESS THAN (MAXVALUE)
);
This doesn't automatically partition by date, but you can reorganize when you need to. Best to reorganize before you fill the last partition, so the operation will be quick.
I have stumbled on this question while looking for something else and wanted to point out the MERGE storage engine (http://dev.mysql.com/doc/refman/5.7/en/merge-storage-engine.html).
The MERGE storage is more or less a simple pointer to multiple tables, and can be redone in seconds. For cycling logs, it can be very powerfull! Here's what I'd do:
Create one table per day, use LOAD DATA as OP mentionned to fill it up. Once it is done, drop the MERGE table and recreate it including that new table while ommiting the oldest one. Once done, I could delete/archive the old table. This would allow me to rapidly query a specific day, or all as both the orignal tables and the MERGE are valid.
CREATE TABLE logs_day_46 LIKE logs_day_45 ENGINE=MyISAM;
DROP TABLE IF EXISTS logs;
CREATE TABLE logs LIKE logs_day_46 ENGINE=MERGE UNION=(logs_day_2,[...],logs_day_46);
DROP TABLE logs_day_1;
Note that a MERGE table is not the same as a PARTIONNED one and offer some advantages and inconvenients. But do remember that if you are trying to aggregate from all tables it will be slower than if all data was in only one table (same is true for partitions, as they are basically different tables under the hood). If you are going to query mostly on specific days, you will need to choose the table yourself, but if partitions are done on the day values, MySQL will automatically grab the correct table(s) which might come out faster and easier to write.