MySQL partitioning and temporary tables - mysql

A large table (~10.5M rows) has been causing issues lately. I previously modified my application to use temporary tables for faster selects, but was still having issues due to UPDATE statements. Today I implemented partitions so that the writes happen more quickly, but now my temporary tables error. Its purpose is to group events, placing the first event ID of a set in the EVENT_ID column. Example: writing 4 events beginning at 1000 would result in events 1000, 1001, 1002, 1003, all with an EVENT_ID of 1000. I have tried to do away with the UPDATE statements, but that would require too much refactoring, so it is not an option. Here is the table definition:
CREATE TABLE `all_events` (
`ID` bigint NOT NULL AUTO_INCREMENT,
`EVENT_ID` bigint unsigned DEFAULT NULL,
`LAST_UPDATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`EMPLOYEE_ID` int unsigned NOT NULL,
`QUANTITY` float unsigned NOT NULL,
`OPERATORS` float unsigned NOT NULL DEFAULT '0',
`SECSEARNED` decimal(10,2) unsigned NOT NULL DEFAULT '0.00' COMMENT 'for all parts in QUANTITY',
`SECSBURNED` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`YR` smallint unsigned NOT NULL DEFAULT (year(curdate())),
PRIMARY KEY (`ID`,`YR`),
KEY `LAST_UPDATE` (`LAST_UPDATE`),
KEY `EMPLOYEE_ID` (`EMPLOYEE_ID`),
KEY `EVENT_ID` (`EVENT_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=17464583 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
/*!50100 PARTITION BY RANGE (`YR`)
(PARTITION p2015 VALUES LESS THAN (2016) ENGINE = InnoDB,
PARTITION p2016 VALUES LESS THAN (2017) ENGINE = InnoDB,
PARTITION p2017 VALUES LESS THAN (2018) ENGINE = InnoDB,
PARTITION p2018 VALUES LESS THAN (2019) ENGINE = InnoDB,
PARTITION p2019 VALUES LESS THAN (2020) ENGINE = InnoDB,
PARTITION p2020 VALUES LESS THAN (2021) ENGINE = InnoDB,
PARTITION p2021 VALUES LESS THAN (2022) ENGINE = InnoDB,
PARTITION p2022 VALUES LESS THAN (2023) ENGINE = InnoDB,
PARTITION p2023 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
Now in my application when running a report the statement:
CREATE TEMPORARY TABLE IF NOT EXISTS ape ENGINE=MEMORY AS
SELECT * FROM all_events
WHERE LAST_UPDATE BETWEEN '2022-05-01 00:00:00' AND CURRENT_TIMESTAMP()
Produces the error: 'Specified storage engine' is not supported for default value expressions.
Is there a way to still use temporary tables with ENGINE=MEMORY, or is there another high performance engine I can use? The statement worked until the partitioning was implemented. InnoDB is the only engine my tables can be in due to the MySQL implementation, and it has been InnoDB since before partitioning.
Edit: When removing ENGINE=MEMORY it does work, but running SHOW CREATE TABLE tells me that it's using InnoDB. I would prefer the performance increase of MEMORY vs InnoDB.
Second Edit:
The MySQL server has been crashing 2 to 3 times daily, and every time I catch it I find this error:
TRANSACTION 795211228, ACTIVE 0 sec fetching rows
mysql tables in use 13, locked 13
LOCK WAIT 866 lock struct(s), heap size 106704, 4800 row lock(s), undo log entries 1
MySQL thread id 5032986, OS thread handle 140442167994112, query id 141216988 myserver 192.168.1.100 my-user Searching rows for update
UPDATE `all_events` SET `EVENT_ID`=LAST_INSERT_ID() WHERE `EVENT_ID` IS NULL
RECORD LOCKS space id 30558 page no 16 n bits 792 index EVENT_ID of table `mydb`.`all_events` trx id 795211228 lock_mode X
It's running Galera Cluster with 3 nodes. Node 3 is the main, becomes unavailable, and 1 comes offline to resync 3. I fail over to 2 and we're usually good until it catches up, but it's causing downtime. The temp tables I'm using are for faster reads, the partitioning is my attempt at improving write performance.
Third edit:
Added example SELECT - note there are fields not in the table definition, I reduced what was displayed for simplicity of the post, but all fields in the SELECT do in fact exist.
CREATE TEMPORARY TABLE IF NOT EXISTS allpe AS
SELECT * FROM all_events
WHERE LAST_UPDATE BETWEEN ? AND ?;
CREATE TEMPORARY TABLE IF NOT EXISTS ap1 AS SELECT * FROM allpe;
CREATE TEMPORARY TABLE IF NOT EXISTS ap2 AS SELECT * FROM allpe;
SELECT PART_NUMBER, WORKCENTER_NAME, SUM(SECSEARNED) AS EARNED, SUM(SECSBURNED) AS BURNED, SUM(QUANTITY) AS QUANTITY, (
SELECT SUM(ap1.SECSEARNED)
FROM ap1
WHERE ap1.PART_NUMBER = ape.PART_NUMBER AND ap1.WORKCENTER_ID = ape.WORKCENTER_ID
) AS EARNEDALL, (
SELECT SUM(ap2.SECSBURNED)
FROM ap2
WHERE ap2.PART_NUMBER = ape.PART_NUMBER AND ap2.WORKCENTER_ID = ape.WORKCENTER_ID
) AS BURNEDALL
FROM allpe ape
WHERE EMPLOYEE_ID = ?
GROUP BY PART_NUMBER, WORKCENTER_ID, WORKCENTER_NAME, EMPLOYEE_ID
ORDER BY EARNED;
DROP TEMPORARY TABLE allpe;
DROP TEMPORARY TABLE ap1;
DROP TEMPORARY TABLE ap2;
Fourth edit:
Writing inside of stored procedure - this is not in a loop, but multiple rows can come from multiple joins to employee_presence, so I cannot get the ID and store it for writing subsequent rows.
INSERT INTO `all_events`(`EVENT_ID`,`LAST_UPDATE`,`PART_NUMBER`, `WORKCENTER_ID`,`XPPS_WC`, `EMPLOYEE_ID`,`WORKCENTER_NAME`, `QUANTITY`, `LEVEL_PART_NUMBER`,`OPERATORS`,`SECSEARNED`,`SECSBURNED`)
SELECT NULL,NOW(),NEW.PART_NUMBER,NEW.ID,OLD.XPPS_WC,ep.EMPLOYEE_ID,NEW.NAME,(NEW.PARTS_MADE-OLD.PARTS_MADE)*WorkerContrib(ep.EMPLOYEE_ID,OLD.ID),IFNULL(NEW.LEVEL_PART_NUMBER,NEW.PART_NUMBER),WorkerCount(NEW.ID)*WorkerContrib(ep.EMPLOYEE_ID,OLD.ID),WorkerContrib(ep.EMPLOYEE_ID,OLD.ID)*CreditSeconds,WorkerCount(NEW.ID)*WorkerContrib(ep.EMPLOYEE_ID,OLD.ID)*IFNULL(TIMESTAMPDIFF(SECOND, GREATEST(NEW.LAST_PART_TIME,NEW.JOB_START_TIME), now()),0)
FROM employee_presence ep WHERE ep.WORKCENTER_ID=OLD.ID;
UPDATE `all_events` SET `EVENT_ID`=LAST_INSERT_ID() WHERE `WORKCENTER_ID`=NEW.ID AND `EVENT_ID` IS NULL;

I would suppose to read the following link from dev.MySQL.com
You cannot use CREATE TEMPORARY TABLE ... LIKE to create an empty
table based on the definition of a table that resides in the mysql
tablespace, InnoDB system tablespace (innodb_system), or a general
tablespace. The tablespace definition for such a table includes a
TABLESPACE attribute that defines the tablespace where the table
resides, and the aforementioned tablespaces do not support temporary
tables. To create a temporary table based on the definition of such a
table, use this syntax instead:
CREATE TEMPORARY TABLE new_tbl SELECT * FROM orig_tbl LIMIT 0;
So it seems the correct syntax for your case will be:
CREATE TEMPORARY TABLE ape
SELECT * FROM all_events
WHERE...

In the current issue the problematic column is YR smallint unsigned NOT NULL DEFAULT (year(curdate())). This DEFAULT value is not legal for a column which is used in partitioning expression. The error will be "Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed ...".
And only when you fix this by removing the partitioning then you'll receive an error "'Specified storage engine' is not supported for default value expressions".
CREATE TABLE .. SELECT inherits main columns properties from source tables.
In the current issue the problematic column is YR smallint unsigned NOT NULL DEFAULT (year(curdate())) again. The column in temptable must inherit main properties, including DEFAULT expression - but this expression is not allowed for MEMORY engine.

As the error suggests, the expression default does not work with the MEMORY storage engine.
One solution would be to remove that default from your all_events.yr column.
The other solution is to create an empty temporary table initially as an InnoDB table, then use ALTER TABLE to remove the expression default and convert to MEMORY engine before filling it with data.
Example:
mysql> create temporary table t as select * from all_events where false;
mysql> alter table t alter column yr drop default, engine=memory;
mysql> insert into t select * from all_events;

Sufficient? If I am not mistaken, this is equivalent to what your SELECT finds (no temp tables needed):
SELECT PART_NUMBER, WORKCENTER_ID, WORKCENTER_NAME, EMPLOYEE_ID,
SUM(SECSEARNED) AS TOT_EARNED,
SUM(SECSBURNED) AS TOT_BURNED,
SUM(QUANTITY) AS TOT_QUANTITY
FROM all_events
WHERE EMPLOYEE_ID = ?
AND LAST_UPDATE >= '2022-05-01'
GROUP BY PART_NUMBER, WORKCENTER_ID, WORKCENTER_NAME;
For performance, it would need this.
INDEX(EMPLOYEE_ID, LAST_UPDATE)
Also, removing the partitioning might speed it up a little more.
else (Notes on other fixes to the path you have taken)
Since yr is not needed, avoid it by changing '*' to a list of needed columns in
CREATE TEMPORARY TABLE IF NOT EXISTS ape ENGINE=MEMORY AS
SELECT * FROM all_events
WHERE LAST_UPDATE BETWEEN '2022-05-01 00:00:00' AND CURRENT_TIMESTAMP()
WHERE ap2.PART_NUMBER = ape.PART_NUMBER AND ap2.WORKCENTER_ID = ape.WORKCENTER_ID
Add this composite index to all_events:
INDEX(PART_NUMBER, WORKCENTER_ID)
That will probably suffice to make the query fast enough without the temp tables.
Also add thatallpe` after building it.
If you are running MySQL 8.0, you can use WITH instead of needing the two extra temp tables.

Related

Does InnoDB use ROW locks when using UPDATE without a where clause

I have a question about the locks on InnoDb Tables. If i understand the documentation correctly, InnoDB uses ROW locks when performing an UPDATE statement.
I have a table like this:
CREATE TABLE `table_1` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`info` TEXT NULL,
`deactivated` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4;
Due to the size of the table this statement:
UPDATE tbl_name set deactivated = 1
can take up to 60 seconds.
During that time it happens that other processes want to INSERT another row into table_1.
Those processes have to wait for the UPDATE to be finished as the table is using the MyISAM engine which locks the whole table.
Now for a solution I'd imagine i just have to change the DB Engine to InnoDB to enable ROW locks rather than table locks.
My question now is: do i have to specify a WHERE clause in the UPDATE statement to force InnoDb to use row locks? Like so:
UPDATE tbl_name set deactivated = 1 WHERE deactivated IS NULL
Or does InnoDB handle the UPDATE on a ROW basis regardless of the existance of a WHERE clause?
Also due to the Table having an auto_increment column, does that change the way InnoDb handles locks?

mysql drop partition does not work

I have created main partition 20170621 and 24 sub partitions
20170621_0 .. 20170621_23
Now I would like to delete the main partition. But I get an error.
alter table VAL90W02 drop PARTITION `20180621`
#1508 - Cannot remove all partitions, use DROP TABLE instead.
I can´t drop sub-partitions either. So, how do I drop the partition?
(from Comment)
create table mytable (
id int(11) NOT NULL AUTO_INCREMENT,
...,
x_date datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id, x_date)
) ENGINE = MYISAM
PARTITION BY RANGE (day(x_date))
SUBPARTITION BY HASH (hour(x_date))
( PARTITION 20180621 VALUES LESS THAN (24)
( SUBPARTITION 20180621_0 ENGINE = MyISAM,
SUBPARTITION 20180621_1 ENGINE = MyISAM, ...)
), ...;
Irritatingly, when deleting the last partition of a partitioned table, you have to use
ALTER TABLE VAL90W02 REMOVE PARTITIONING;
instead.
This is a misleading error thrown by MySQL (I'm using 5.7 Aurora, not sure which versions this affects).
Arguably, it's a failure of MySQL to handle the edge case on the part of the ALTER TABLE DROP PARTITION command.

MySQL - Hash Partition not working

I am using MySQL 5.6 Server. I had created a table with HASH partitiong but some how I am unable to use specific partitions in my query.
Table Structure
CREATE TABLE `testtable` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`purchased` DATE DEFAULT NULL,
KEY `ìd` (`id`),
KEY `Purchased` (`purchased`)
) ENGINE=INNODB
/*!50100 PARTITION BY RANGE ( YEAR(purchased))
SUBPARTITION BY HASH ( dayofyear(purchased))
SUBPARTITIONS 366
(PARTITION p0 VALUES LESS THAN (2015) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (2016) ENGINE = InnoDB) */
My Query
EXPLAIN PARTITIONS
SELECT *
FROM testtable
WHERE purchased BETWEEN '2014-12-29' AND '2014-12-31';
Check SQL FIDDLE Page
My EXPLAIN plan tells me that server is using all partitions instead of specific partitions.
How can I write a query so that server scans specific partitions?
And also want to know what is the problem with my current query and why it is not working?
Thanks in advance...
True. HASH partitioning is essentially useless.
Other things to note...
Having more than about 50 partitions leads to certain inefficiencies.
If you will be purging "old" rows, then consider BY RANGE and have a month in each partition. Then do the purging via DROP PARTITION. More details, including sample code: http://mysql.rjweb.org/doc.php/partitionmaint

Maintaining large quantities of historical data efficiently

I've been thinking about keeping a history in the following table structure:
`id` bigint unsigned not null auto_increment,
`userid` bigint unsigned not null,
`date` date not null,
`points_earned` int unsigned not null,
primary key (`id`),
key `userid` (`userid`),
key `date` (`date`)
This will allow me to do something like SO does with its Reputation Graph (where I can see my rep gain since I joined the site).
Here's the problem, though: I just ran a simple calculation:
SELECT SUN(DATEDIFF(`lastclick`,`registered`)) FROM `users`
The result was as near as makes no difference 25,000,000 man-days. If I intend to keep one row per user per day, that's a [expletive]ing large table, and I'm expecting further growth. Even if I exclude days where a user doesn't come online, that's still huge.
Can anyone offer any advice on maintaining such a large amount of data? The only queries that will be run on this table are:
SELECT * FROM `history` WHERE `userid`=?
SELECT SUM(`points_earned`) FROM `history` WHERE `userid`=? AND `date`>?
INSERT INTO `history` VALUES (null,?,?,?)
Would the ARCHIVE engine be of any use here, for instance? Or do I just not need to worry because of the indexes?
Assuming its mysql:
for history tables you should consider partitioning you can set the best partition rule for you and looking at what queries you have there are 2 choices :
a. partition by date (1 partition = 1 month for example)
b. partition by user (lets say you have 300 partitions and 1 partition = 100000 users)
this will help you allot if you will use partition pruning (here)
you could use a composite index for user,date (it will be used for the first 2 queries)
avoid INSERT statement, when you have huge data use LOAD DATA (this will not work is the table is partitioned )
And most important ... the best engine for huge volumes of data is MyISAM

Mysql 5.5 Table partition user and friends

I have two tables in my db that have millions of rows now, the selection and insertion is getting slower and slower.
I am using spring+hibernate+mysql 5.5 and read about the sharding as well as partitioning the table and like the idea of partitioning my tables,
My current Db structure is like
CREATE TABLE `user` (
`id` BIGINT(20) NOT NULL,
`name` VARCHAR(255) DEFAULT NULL,
`email` VARCHAR(255) DEFAULT NULL,
`location_id` bigint(20) default NULL,
`updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `FK3DC99772C476E06B` (`location_id`),
CONSTRAINT `FK3DC99772C476E06B` FOREIGN KEY (`location_id`) REFERENCES `places` (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8
CREATE TABLE `friends` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`user_id` BIGINT(20) DEFAULT NULL,
`friend_id` BIGINT(20) DEFAULT NULL,
`updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_friend` (`user_id`,`friend_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8
Now I am testing how to better use partitioning, for user table following I thought will be good based on by usage.
CREATE TABLE `user_partition` (
`id` BIGINT(20) NOT NULL,
`name` VARCHAR(255) DEFAULT NULL,
`email` VARCHAR(255) DEFAULT NULL,
`location_id` bigint(20) default NULL,
`updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `FK3DC99772C476E06B` (`location_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8
PARTITION BY HASH(id DIV 100000)
PARTITIONS 30;
I created a procedures to load data in two table and check the performance of the two tables
DELIMITER //
CREATE PROCEDURE load_partition_table()
BEGIN
DECLARE v INT DEFAULT 0;
WHILE v < 1000000
DO
INSERT INTO user_partition (id,NAME,email)
VALUES (v,CONCAT(v,' name'),CONCAT(v,'#yahoo.com')),
(v+1,CONCAT(v+1,' name'),CONCAT(v+1,'#yahoo.com')),
(v+2,CONCAT(v+2,' name'),CONCAT(v+2,'#yahoo.com')),
(v+3,CONCAT(v+3,' name'),CONCAT(v+3,'#yahoo.com')),
(v+4,CONCAT(v+4,' name'),CONCAT(v+4,'#yahoo.com')),
(v+5,CONCAT(v+5,' name'),CONCAT(v+5,'#yahoo.com')),
(v+6,CONCAT(v+6,' name'),CONCAT(v+6,'#yahoo.com')),
(v+7,CONCAT(v+7,' name'),CONCAT(v+7,'#yahoo.com')),
(v+8,CONCAT(v+8,' name'),CONCAT(v+8,'#yahoo.com')),
(v+9,CONCAT(v+9,' name'),CONCAT(v+9,'#yahoo.com'))
;
SET v = v + 10;
END WHILE;
END
//
CREATE PROCEDURE load_table()
BEGIN
DECLARE v INT DEFAULT 0;
WHILE v < 1000000
DO
INSERT INTO user (id,NAME,email)
VALUES (v,CONCAT(v,' name'),CONCAT(v,'#yahoo.com')),
(v+1,CONCAT(v+1,' name'),CONCAT(v+1,'#yahoo.com')),
(v+2,CONCAT(v+2,' name'),CONCAT(v+2,'#yahoo.com')),
(v+3,CONCAT(v+3,' name'),CONCAT(v+3,'#yahoo.com')),
(v+4,CONCAT(v+4,' name'),CONCAT(v+4,'#yahoo.com')),
(v+5,CONCAT(v+5,' name'),CONCAT(v+5,'#yahoo.com')),
(v+6,CONCAT(v+6,' name'),CONCAT(v+6,'#yahoo.com')),
(v+7,CONCAT(v+7,' name'),CONCAT(v+7,'#yahoo.com')),
(v+8,CONCAT(v+8,' name'),CONCAT(v+8,'#yahoo.com')),
(v+9,CONCAT(v+9,' name'),CONCAT(v+9,'#yahoo.com'))
;
SET v = v + 10;
END WHILE;
END
//
Results were surprizing, insert/select in non partition table giving better results.
mysql> select count(*) from user_partition;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.40 sec)
mysql> select count(*) from user;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.00 sec)
mysql> call load_table();
Query OK, 10 rows affected (20.31 sec)
mysql> call load_partition_table();
Query OK, 10 rows affected (21.22 sec)
mysql> select * from user where id = 999999;
+--------+-------------+------------------+---------------------+
| id | name | email | updated_time |
+--------+-------------+------------------+---------------------+
| 999999 | 999999 name | 999999#yahoo.com | 2012-11-27 08:06:54 |
+--------+-------------+------------------+---------------------+
1 row in set (0.00 sec)
mysql> select * from user_no_part where id = 999999;
+--------+-------------+------------------+---------------------+
| id | name | email | updated_time |
+--------+-------------+------------------+---------------------+
| 999999 | 999999 name | 999999#yahoo.com | 2012-11-27 08:03:14 |
+--------+-------------+------------------+---------------------+
1 row in set (0.00 sec)
So two question
1) Whats the best way to partition user table so that inserts and selects also become fast and removing FOREIGN KEY on location_id is correct? I know partition can be good only if we access on the base of partition key, In my case I want to read the table only by id. why inserts are slower in partition table?
2) What the best way to partition friend table as I want to partition friends on the bases of user_id as want to place all user friends in same partition and always access it using a user_id. Should I drop the primary key on friend.id or add the user_id in primary key?
First I would recommend if possible that you upgrade to 5.6.5 or later of Mysql to ensure you are taking advantage of partitioning properly and with best performance. This is not always possible due to GA concerns, but my experience is that there was a difference in performance between 5.5 and 5.6, and 5.6 offers some other types of partitioning.
1) My experience is that inserts and updates ARE faster on partitioned sets as well as selects AS LONG AS YOU ARE INCLUDING THE COLUMN THAT YOU ARE PARTITIONING ON IN THE QUERY. If I ask for a count of all records across all partitions, I see slower responses. That is to be expected because the partitions are functioning LIKE separate tables, so if you have 30 partitions it is like reading 30 tables and not just one.
You must include the value you are partitioning on in the primary key AND it must remain stable during the life of the record.
2) I would include user_id and id in the primary key - assuming that your friends tables user_id and id do not change at all once the record is established (i.e. any change would be a delete/insert). In my case it was "redundant" but more than worth the access. Whether you choose user_id/id or id/user_id depends on your most frequent access.
A final note. I tried to create LOTS of partitions when I first started breaking my data into partitions, and found that just a few seemed to hit the sweet spot - 6-12 partitions seemed to work best for me. YMMV.
1. Use this sql query to select table and excepting all column, except id:
I answer what you need:
I suggest you to remove FOREIGN KEY and PRIMARY KEY
I know this is crazy, but they can ask computer to know what the current id, last id, next id and this wlll take long than create id manually.
other way you can create int id manually by java .
use this sql query to insert fastly:
INSERT INTO user (id,NAME,email)
VALUES ('CREATE ID WITH JAVA', 'NAME', 'EMAIL#YAHOO.COM')
I can't decide my query can work faster or not...
Because all depend on your computer performance, make sure you use it on server, because server can finish all tasks fastly.
and for select, in page where profile info located you will need one row for one user that defined in profile id.
use mysql limit if you only need one and if you need more than one ...
Just change the limit values like this
for one row:
select * from user where id = 999999 limit 1;
and for seven row:
select * from user where id = 999999 limit 7;
I think this query will work faster than without limit
and remember limit can work with insert too
2. For friend partition:
the answer is drop the primary key
Table with no primary key is no problem
Once again, create the id with java...
java designed to be faster in interface and your code include while
and java can do it.
For example you need to retrieve your all friend data ...
use this query to perform faster:
select fr.friend_id, usr.* from friends as fr INNER JOIN user as usr
ON dr.friend_id = usr.id
where fr.user_id = 999999 LIMIT 10;
and i think this is enough
sorry i can only explain about mysql and not in java.
Because, i'm not expert in java but i understand about it
1) If You use always(or mostly) only id to select data it is obvious to use this field as base for partitioning condition. As it is number there is no need for hash function simply use range partitioning. How many partitions to create(what numbers to choose as borders) you need to find by Yourself but as #TJChambers mentioned before around 8-10 should be efficient enough.
Insert are slower because You test it wrong.
You simply insert 1000000 rows one after another without any randomness and the only difference is that for partitioned table mysql needs to calculate hash which is extra time.
But as in Your case id is base of condition for partitioning You will never gain anything with inserting as all new rows go on the end of table.
If You had for example table with GPS localizations and partitioned it by lat and lon You could see difference in inserting if for example each partition was different continent.
And difference would be seen if You had a table with some random(real) data and were inserting some random values not linear.
Your select for partitioned table is slower because again You test it wrong.
#TJChambers wrote before me about it, Your query needs to work on all partitions(it is like working with many tables) so it extends time. Try to use where to work with data from just one partition to see a difference.
for example run:
select count(*) from user_partition where id<99999;
and
select count(*) from user where id<99999;
You will see a difference.
2) This one is hard. There is no way to partition it without redundancy of data(at least no idea coming to my mind) but if time of access (select speed) is the most important the best way may be to partition it same way as user table (range on one of the id's) and insert 2 rows for each relationship it is (a,b) and (b,a). It will double number of rows but if You partition in to more than 4 parts you will work on less records per query anyway and You will have just one condition to check no need for or.
I tested it with with this schema
CREATE TABLE `test`.`friends` (
`a` INT NOT NULL ,
`b` INT NOT NULL ,
INDEX ( `a` ),
INDEX ( `b` )
) ENGINE = InnoDB;
CREATE TABLE `test`.`friends_part` (
`a` INT NOT NULL ,
`b` INT NOT NULL ,
INDEX ( `a` , `b` )
) ENGINE = InnoDB
PARTITION BY RANGE (a) (
PARTITION p0 VALUES LESS THAN (1000),
PARTITION p1 VALUES LESS THAN (2000),
PARTITION p2 VALUES LESS THAN (3000),
PARTITION p3 VALUES LESS THAN (4000),
PARTITION p4 VALUES LESS THAN (5000),
PARTITION p5 VALUES LESS THAN (6000),
PARTITION p6 VALUES LESS THAN (7000),
PARTITION p7 VALUES LESS THAN (8000),
PARTITION p8 VALUES LESS THAN (9000),
PARTITION p9 VALUES LESS THAN MAXVALUE
);
delimiter //
DROP procedure IF EXISTS fill_friends//
create procedure fill_friends()
begin
declare i int default 0;
declare a int;
declare b int;
while i<2000000
do
set a = rand()*10000;
set b = rand()*10000;
insert into friends values(a,b);
set i = i + 1;
end while;
end
//
delimiter ;
delimiter //
DROP procedure IF EXISTS fill_friends_part//
create procedure fill_friends_part()
begin
insert into friends_part (select a,b from friends);
insert into friends_part (select b as a, a as b from friends);
end
//
delimiter ;
Queries I have run are:
select * from friends where a=317 or b=317;
result set: 475
times: 1.43, 0.02, 0.01
select * from friends_part where a=317;
result set: 475
times: 0.10, 0.00, 0.00
select * from friends where a=4887 or b=4887;
result set: 483
times: 1.33, 0.01, 0.01
select * from friends_part where a=4887;
result set: 483
times: 0.06, 0.01, 0.00
I didn't bother about uniqueness of data but in your example You may use unique index.
As well I used InnoDB engine, but MyISAM is better if most of the queries are select and you are not going to do many writes.
There is no big difference for 2nd and 3rd run probably because of caching, but there is visible difference for 1st run. It is faster because we are breaking one of prime rules of database designing, but the end justifies the means so it may be good solution for really big tables. If you are going to have less than 1M of records I think You can survive without partitioning.