"Lost" 30% of records after partitioning

"Lost" 30% of records after partitioning - mysql

I've got a MYISAM table of 90 million records over 18GB of data, and tests suggest it's a candidate for partitioning.
Original schema:
CREATE TABLE `email_tracker` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`hash` varchar(65) COLLATE utf8_unicode_ci NOT NULL,
`userId` int(11) NOT NULL,
`dateSent` datetime NOT NULL,
`dateViewed` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `userId` (`userId`),
KEY `dateSent` (`dateSent`),
KEY `dateViewed` (`dateViewed`),
KEY `hash` (`hash`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 1 row in set (0.01 sec)
I've previously partitioned the table on a test server with "ALTER TABLE email_tracker PARTITION BY HASH..." and run typical queries against it, and there were no problems with the queries. To avoid locking the table on the production DB, I'm testing again on the test server using this approach as we can afford to lose some tracking data while this runs:
RENAME TABLE email_tracker TO email_tracker_orig; CREATE TABLE email_tracker LIKE email_tracker_orig;
CREATE TABLE email_tracker_part LIKE email_tracker_orig;
ALTER TABLE email_tracker_part DROP PRIMARY KEY, ADD PRIMARY KEY (id, userId);
ALTER TABLE email_tracker_part PARTITION BY HASH (id + userId) partitions 30;
INSERT INTO email_tracker_part (SELECT * FROM email_tracker_orig);
The _orig table has 90,795,103 records. After the query, the _part table only has 68,282,298. And I have no idea why that might be. Any ideas?
mysql> select count(*) from email_tracker_orig;
+----------+
| count(*) |
+----------+
| 90795103 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from email_tracker_part;
+----------+
| count(*) |
+----------+
| 68274818 |
+----------+
1 row in set (0.00 sec)
(On subsequent tests, the _part table contains slightly different numbers of records which is weirder still)
Edit #1: Just realised that half of the partition table are empty due to auto-increment-increment = 2 for replication, so going to repartition BY KEY (userId) and see how that works out.
Edit #2 - Still the same after re-partitioning so trying to identify missing rows to establish a pattern.

I am not sure of your requirements, but the mysql documentation states that "the use of hashing expressions involving multiple columns is not particularly recommended." I would recommend that you just partition by id. Partitioning by id + userId doesn't give any obviously better distribution of your elements across the partitions.

Looks like the INSERT query merely terminated prematurely - exactly 40 mins in this case. Just re-running this for the missing records is doing the trick:
INSERT INTO email_tracker_part (SELECT * FROM email_tracker_orig WHERE id > 148893974);
There's nothing in the my.cnf that suggests a timeout of 40 mins, and I've been running longer queries than this on this test server, but I have my solution so I'll close this even though the underlying reason remains unclear to me.

Related

MySQL InnoDB row/table lock when performing ALTER

I created a sysbench table shown below with 25,000,000 records (5.7G in size):
Create Table: CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`k` int(11) NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=25000001 DEFAULT CHARSET=latin1
Then added an index on c using the ALTER statement directly, which took about 18 minutes to update the table as shown below:
mysql> alter table sbtest1 add index c_1(c);
Query OK, 0 rows affected (18 min 47.32 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table sbtest1\G
*************************** 1. row ***************************
Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`k` int(11) NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`),
KEY `c_1` (`c`)
) ENGINE=InnoDB AUTO_INCREMENT=25000002 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
During the 18 minutes of the table update process, i tried to perform some transactions on the table by inserting new records and also update existing records on column c, and which to my surprise all worked when i expected a lock to prevent this from happening. I have always understood that performing an ALTER on an InnoDB table, especially a large table, can result on a record lock for the duration of the process, so wondering why i was able t perform inserts and updates without any problems?
Here are some info about my server:
mysql> show variables like '%isolation%';
+-----------------------+-----------------+
| Variable_name | Value |
+-----------------------+-----------------+
| transaction_isolation | REPEATABLE-READ |
| tx_isolation | REPEATABLE-READ |
+-----------------------+-----------------+
mysql> select version()
-> ;
+-----------+
| version() |
+-----------+
| 5.7.25-28 |
+-----------+
To me, it now seems like in MySQL 5.7, its okay to directly run the ALTER statement without any worries about locks? Is this an accurate conclusion?
UPDATED
When i tried to delete the added index c_1, it only took less than a second, which also surprised me coz i expected this too take longer than actually adding an index. I have always believed that adding an index is simple and quick, yet deleting or updating one takes a long time as the entire table structure has to be altered. So a bit confused about this???

Adding secondary index can be done inplace and permit concurrent DML.

Update large table from smaller, mission critical, table without locking small table

In MySQL, I have two innodb tables, a small mission critical table, that needs to be readily available at all times for reads/writes. Call this mission_critical. I have a larger table (>10s of millions of rows), called big_table. I need to update big_table, for instance:
update mission_critical c, big_table b
set
b.something = c.something_else
where b.refID=c.id
This query could take more than an hour, but this creates a write-lock on the mission_critical table. Is there a way I can tell mysql, "I don't want a lock on mission_critical" so that that table can be written to?
I understand that this is not ideal from a transactional point of view. The only workaround I can think of right now is to make a copy of the small mission_critical table and do the update from that (which I don't care gets locked), but I'd rather not do that if there's a way to make MySQL natively deal with this more gracefully.
It is not the table that is locking but all of the records in mission_critical that are locked, since they are basically all scanned by the update. I am not assuming this; the symptom is that when a user logs in to an online system, it tries to update a datetime column in mission_critical to update the last time they logged in. These queries die due to a Lock wait timeout exceeded error while the query above is running. If I kill the query above, all pending queries run immediately.
mission_critical.id and big_table.refID are both indexed.
The pertinent portions of the creation statements for each table is:
mission_critical:
CREATE TABLE `mission_critical` (
`intID` int(11) NOT NULL AUTO_INCREMENT,
`id` smallint(6) DEFAULT NULL,
`something_else` varchar(50) NOT NULL,
`lastLoginDate` datetime DEFAULT NULL,
PRIMARY KEY (`intID`),
UNIQUE KEY `id` (`id`),
UNIQUE KEY `something_else` (`something_else`),
) ENGINE=InnoDB AUTO_INCREMENT=1432 DEFAULT CHARSET=latin1
big_table:
CREATE TABLE `big_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`postDate` date DEFAULT NULL,
`postTime` int(11) DEFAULT NULL,
`refID` smallint(6) DEFAULT NULL,
`something` varchar(50) NOT NULL,
`change` decimal(20,2) NOT NULL
PRIMARY KEY (`id`),
KEY `refID` (`refID`),
KEY `postDate` (`postDate`),
) ENGINE=InnoDB AUTO_INCREMENT=138139125 DEFAULT CHARSET=latin1
The explanation of the query is:
+----+-------------+------------------+------------+------+---------------+-------+---------+------------------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+---------------+-------+---------+------------------------------------+------+----------+-------------+
| 1 | SIMPLE | mission_critical | | ALL | id | | | | 774 | 100 | Using where |
| 1 | UPDATE | big_table | | ref | refID | refID | 3 | db.mission_critical.something_else | 7475 | 100 | |
+----+-------------+------------------+------------+------+---------------+-------+---------+------------------------------------+------+----------+-------------+

I first suggested a workaround with a subquery, to create a copy in an internal temporary table. But in my test the small table was still locked for writes. So I guess your best bet is to make a copy manually.
The reason for the lock is described in this bug report: https://bugs.mysql.com/bug.php?id=72005
This is what Sinisa Milivojevic wrote in an answer:
update table t1,t2 ....
any UPDATE with a join is considered a multiple-table update. In that
case, a referenced table has to be read-locked, because rows must not
be changed in the referenced table during UPDATE until it has
finished. There can not be concurrent changes of the rows, nor DELETE
of the rows, nor, much less, exempli gratia any DDL on the referenced
table. The goal is simple, which is to have all tables with consistent
contents when UPDATE finishes, particularly since multiple-table
UPDATE can be executed with several passes.
In short, this behavior is for a good reason.
Consider writing INSERT and UPDATE triggers, which will update the big_table on the fly. That would delay writes on the mission_critical table. But it might be fast enough for you, and wouldn't need the mass-update-query any more.
Also check if it wouldn't be better to use char(50) instead of varchar(50). I'm not sure, but it's possible that it will improve the update performance because the row size wouldn't need to change. I could improve the update performance about 50% in a test.

UPDATE will lock the rows that it needs to change. It may also lock the "gaps" after those rows.
You may use MySQL transactions in loop
Update only 100 rows at once
BEGIN;
SELECT ... FOR UPDATE; -- arrange to have this select include the 100 rows
UPDATE ...; -- update the 100 rows
COMMIT;

May be worth trying a correlated subquery to see if the optimiser comes up with a different plan, but performance may be worse.
update big_table b
set b.something = (select c.something_else from mission_critical c where b.refID = c.id)

Mysql Innodb "select count(*)" performance [duplicate]

I have a largish but narrow InnoDB table with ~9m records. Doing count(*) or count(id) on the table is extremely slow (6+ seconds):
DROP TABLE IF EXISTS `perf2`;
CREATE TABLE `perf2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) DEFAULT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
KEY `IDX_CHANNEL_ID` (`channel_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
RESET QUERY CACHE;
SELECT COUNT(*) FROM perf2;
While the statement is not run too often it would be nice to optimize it. According to http://www.cloudspace.com/blog/2009/08/06/fast-mysql-innodb-count-really-fast/ this should be possible by forcing InnoDB to use an index:
SELECT COUNT(id) FROM perf2 USE INDEX (PRIMARY);
The explain plan seems fine:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE perf2 index NULL PRIMARY 4 NULL 8906459 Using index
Unfortunately the statement is as slow as before. According to "SELECT COUNT(*)" is slow, even with where clause I've also tried optimizing the table without success.
What/is the/re a way to optimize COUNT(*) performance on InnoDB?

As of MySQL 5.1.6 you can use the Event Scheduler and insert the count to a stats table regularly.
First create a table to hold the count:
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL);
Then create an event to update the table:
CREATE EVENT update_stats
ON SCHEDULE
EVERY 5 MINUTE
DO
INSERT INTO stats (`key`, `value`)
VALUES ('data_count', (select count(id) from data))
ON DUPLICATE KEY UPDATE value=VALUES(value);
It's not perfect but it offers a self contained solution (no cronjob or queue) that can be easily tailored to run as often as the required freshness of the count.

For the time being I've solved the problem by using this approximation:
EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)
The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.

Based on #Che code, you can also use triggers on INSERT and on UPDATE to perf2 in order to keep the value in stats table up to date in realtime.
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL
);
Then:
CREATE TRIGGER `count_up` AFTER INSERT ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` + 1
WHERE `stats`.`key` = 'perf2_count';
CREATE TRIGGER `count_down` AFTER DELETE ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` - 1
WHERE `stats`.`key` = 'perf2_count';
So the number of rows in the perf2 table can be read using this query, in realtime:
SELECT `value` FROM `stats` WHERE `key` = 'perf2_count';
This would have the advantage of eliminating the performance issue of performing a COUNT(*) and would only be executed when data changes in perf2.

optimize mysql count in 31M data [duplicate]

I have a largish but narrow InnoDB table with ~9m records. Doing count(*) or count(id) on the table is extremely slow (6+ seconds):
DROP TABLE IF EXISTS `perf2`;
CREATE TABLE `perf2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) DEFAULT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
KEY `IDX_CHANNEL_ID` (`channel_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
RESET QUERY CACHE;
SELECT COUNT(*) FROM perf2;
While the statement is not run too often it would be nice to optimize it. According to http://www.cloudspace.com/blog/2009/08/06/fast-mysql-innodb-count-really-fast/ this should be possible by forcing InnoDB to use an index:
SELECT COUNT(id) FROM perf2 USE INDEX (PRIMARY);
The explain plan seems fine:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE perf2 index NULL PRIMARY 4 NULL 8906459 Using index
Unfortunately the statement is as slow as before. According to "SELECT COUNT(*)" is slow, even with where clause I've also tried optimizing the table without success.
What/is the/re a way to optimize COUNT(*) performance on InnoDB?

As of MySQL 5.1.6 you can use the Event Scheduler and insert the count to a stats table regularly.
First create a table to hold the count:
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL);
Then create an event to update the table:
CREATE EVENT update_stats
ON SCHEDULE
EVERY 5 MINUTE
DO
INSERT INTO stats (`key`, `value`)
VALUES ('data_count', (select count(id) from data))
ON DUPLICATE KEY UPDATE value=VALUES(value);
It's not perfect but it offers a self contained solution (no cronjob or queue) that can be easily tailored to run as often as the required freshness of the count.

For the time being I've solved the problem by using this approximation:
EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)
The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.

Based on #Che code, you can also use triggers on INSERT and on UPDATE to perf2 in order to keep the value in stats table up to date in realtime.
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL
);
Then:
CREATE TRIGGER `count_up` AFTER INSERT ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` + 1
WHERE `stats`.`key` = 'perf2_count';
CREATE TRIGGER `count_down` AFTER DELETE ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` - 1
WHERE `stats`.`key` = 'perf2_count';
So the number of rows in the perf2 table can be read using this query, in realtime:
SELECT `value` FROM `stats` WHERE `key` = 'perf2_count';
This would have the advantage of eliminating the performance issue of performing a COUNT(*) and would only be executed when data changes in perf2.

Mysql 5.5 Table partition user and friends

I have two tables in my db that have millions of rows now, the selection and insertion is getting slower and slower.
I am using spring+hibernate+mysql 5.5 and read about the sharding as well as partitioning the table and like the idea of partitioning my tables,
My current Db structure is like
CREATE TABLE `user` (
`id` BIGINT(20) NOT NULL,
`name` VARCHAR(255) DEFAULT NULL,
`email` VARCHAR(255) DEFAULT NULL,
`location_id` bigint(20) default NULL,
`updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `FK3DC99772C476E06B` (`location_id`),
CONSTRAINT `FK3DC99772C476E06B` FOREIGN KEY (`location_id`) REFERENCES `places` (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8
CREATE TABLE `friends` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`user_id` BIGINT(20) DEFAULT NULL,
`friend_id` BIGINT(20) DEFAULT NULL,
`updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_friend` (`user_id`,`friend_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8
Now I am testing how to better use partitioning, for user table following I thought will be good based on by usage.
CREATE TABLE `user_partition` (
`id` BIGINT(20) NOT NULL,
`name` VARCHAR(255) DEFAULT NULL,
`email` VARCHAR(255) DEFAULT NULL,
`location_id` bigint(20) default NULL,
`updated_time` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `FK3DC99772C476E06B` (`location_id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8
PARTITION BY HASH(id DIV 100000)
PARTITIONS 30;
I created a procedures to load data in two table and check the performance of the two tables
DELIMITER //
CREATE PROCEDURE load_partition_table()
BEGIN
DECLARE v INT DEFAULT 0;
WHILE v < 1000000
DO
INSERT INTO user_partition (id,NAME,email)
VALUES (v,CONCAT(v,' name'),CONCAT(v,'#yahoo.com')),
(v+1,CONCAT(v+1,' name'),CONCAT(v+1,'#yahoo.com')),
(v+2,CONCAT(v+2,' name'),CONCAT(v+2,'#yahoo.com')),
(v+3,CONCAT(v+3,' name'),CONCAT(v+3,'#yahoo.com')),
(v+4,CONCAT(v+4,' name'),CONCAT(v+4,'#yahoo.com')),
(v+5,CONCAT(v+5,' name'),CONCAT(v+5,'#yahoo.com')),
(v+6,CONCAT(v+6,' name'),CONCAT(v+6,'#yahoo.com')),
(v+7,CONCAT(v+7,' name'),CONCAT(v+7,'#yahoo.com')),
(v+8,CONCAT(v+8,' name'),CONCAT(v+8,'#yahoo.com')),
(v+9,CONCAT(v+9,' name'),CONCAT(v+9,'#yahoo.com'))
;
SET v = v + 10;
END WHILE;
END
//
CREATE PROCEDURE load_table()
BEGIN
DECLARE v INT DEFAULT 0;
WHILE v < 1000000
DO
INSERT INTO user (id,NAME,email)
VALUES (v,CONCAT(v,' name'),CONCAT(v,'#yahoo.com')),
(v+1,CONCAT(v+1,' name'),CONCAT(v+1,'#yahoo.com')),
(v+2,CONCAT(v+2,' name'),CONCAT(v+2,'#yahoo.com')),
(v+3,CONCAT(v+3,' name'),CONCAT(v+3,'#yahoo.com')),
(v+4,CONCAT(v+4,' name'),CONCAT(v+4,'#yahoo.com')),
(v+5,CONCAT(v+5,' name'),CONCAT(v+5,'#yahoo.com')),
(v+6,CONCAT(v+6,' name'),CONCAT(v+6,'#yahoo.com')),
(v+7,CONCAT(v+7,' name'),CONCAT(v+7,'#yahoo.com')),
(v+8,CONCAT(v+8,' name'),CONCAT(v+8,'#yahoo.com')),
(v+9,CONCAT(v+9,' name'),CONCAT(v+9,'#yahoo.com'))
;
SET v = v + 10;
END WHILE;
END
//
Results were surprizing, insert/select in non partition table giving better results.
mysql> select count(*) from user_partition;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.40 sec)
mysql> select count(*) from user;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.00 sec)
mysql> call load_table();
Query OK, 10 rows affected (20.31 sec)
mysql> call load_partition_table();
Query OK, 10 rows affected (21.22 sec)
mysql> select * from user where id = 999999;
+--------+-------------+------------------+---------------------+
| id | name | email | updated_time |
+--------+-------------+------------------+---------------------+
| 999999 | 999999 name | 999999#yahoo.com | 2012-11-27 08:06:54 |
+--------+-------------+------------------+---------------------+
1 row in set (0.00 sec)
mysql> select * from user_no_part where id = 999999;
+--------+-------------+------------------+---------------------+
| id | name | email | updated_time |
+--------+-------------+------------------+---------------------+
| 999999 | 999999 name | 999999#yahoo.com | 2012-11-27 08:03:14 |
+--------+-------------+------------------+---------------------+
1 row in set (0.00 sec)
So two question
1) Whats the best way to partition user table so that inserts and selects also become fast and removing FOREIGN KEY on location_id is correct? I know partition can be good only if we access on the base of partition key, In my case I want to read the table only by id. why inserts are slower in partition table?
2) What the best way to partition friend table as I want to partition friends on the bases of user_id as want to place all user friends in same partition and always access it using a user_id. Should I drop the primary key on friend.id or add the user_id in primary key?

First I would recommend if possible that you upgrade to 5.6.5 or later of Mysql to ensure you are taking advantage of partitioning properly and with best performance. This is not always possible due to GA concerns, but my experience is that there was a difference in performance between 5.5 and 5.6, and 5.6 offers some other types of partitioning.
1) My experience is that inserts and updates ARE faster on partitioned sets as well as selects AS LONG AS YOU ARE INCLUDING THE COLUMN THAT YOU ARE PARTITIONING ON IN THE QUERY. If I ask for a count of all records across all partitions, I see slower responses. That is to be expected because the partitions are functioning LIKE separate tables, so if you have 30 partitions it is like reading 30 tables and not just one.
You must include the value you are partitioning on in the primary key AND it must remain stable during the life of the record.
2) I would include user_id and id in the primary key - assuming that your friends tables user_id and id do not change at all once the record is established (i.e. any change would be a delete/insert). In my case it was "redundant" but more than worth the access. Whether you choose user_id/id or id/user_id depends on your most frequent access.
A final note. I tried to create LOTS of partitions when I first started breaking my data into partitions, and found that just a few seemed to hit the sweet spot - 6-12 partitions seemed to work best for me. YMMV.

1. Use this sql query to select table and excepting all column, except id:
I answer what you need:
I suggest you to remove FOREIGN KEY and PRIMARY KEY
I know this is crazy, but they can ask computer to know what the current id, last id, next id and this wlll take long than create id manually.
other way you can create int id manually by java .
use this sql query to insert fastly:
INSERT INTO user (id,NAME,email)
VALUES ('CREATE ID WITH JAVA', 'NAME', 'EMAIL#YAHOO.COM')
I can't decide my query can work faster or not...
Because all depend on your computer performance, make sure you use it on server, because server can finish all tasks fastly.
and for select, in page where profile info located you will need one row for one user that defined in profile id.
use mysql limit if you only need one and if you need more than one ...
Just change the limit values like this
for one row:
select * from user where id = 999999 limit 1;
and for seven row:
select * from user where id = 999999 limit 7;
I think this query will work faster than without limit
and remember limit can work with insert too
2. For friend partition:
the answer is drop the primary key
Table with no primary key is no problem
Once again, create the id with java...
java designed to be faster in interface and your code include while
and java can do it.
For example you need to retrieve your all friend data ...
use this query to perform faster:
select fr.friend_id, usr.* from friends as fr INNER JOIN user as usr
ON dr.friend_id = usr.id
where fr.user_id = 999999 LIMIT 10;
and i think this is enough
sorry i can only explain about mysql and not in java.
Because, i'm not expert in java but i understand about it

1) If You use always(or mostly) only id to select data it is obvious to use this field as base for partitioning condition. As it is number there is no need for hash function simply use range partitioning. How many partitions to create(what numbers to choose as borders) you need to find by Yourself but as #TJChambers mentioned before around 8-10 should be efficient enough.
Insert are slower because You test it wrong.
You simply insert 1000000 rows one after another without any randomness and the only difference is that for partitioned table mysql needs to calculate hash which is extra time.
But as in Your case id is base of condition for partitioning You will never gain anything with inserting as all new rows go on the end of table.
If You had for example table with GPS localizations and partitioned it by lat and lon You could see difference in inserting if for example each partition was different continent.
And difference would be seen if You had a table with some random(real) data and were inserting some random values not linear.
Your select for partitioned table is slower because again You test it wrong.
#TJChambers wrote before me about it, Your query needs to work on all partitions(it is like working with many tables) so it extends time. Try to use where to work with data from just one partition to see a difference.
for example run:
select count(*) from user_partition where id<99999;
and
select count(*) from user where id<99999;
You will see a difference.
2) This one is hard. There is no way to partition it without redundancy of data(at least no idea coming to my mind) but if time of access (select speed) is the most important the best way may be to partition it same way as user table (range on one of the id's) and insert 2 rows for each relationship it is (a,b) and (b,a). It will double number of rows but if You partition in to more than 4 parts you will work on less records per query anyway and You will have just one condition to check no need for or.
I tested it with with this schema
CREATE TABLE `test`.`friends` (
`a` INT NOT NULL ,
`b` INT NOT NULL ,
INDEX ( `a` ),
INDEX ( `b` )
) ENGINE = InnoDB;
CREATE TABLE `test`.`friends_part` (
`a` INT NOT NULL ,
`b` INT NOT NULL ,
INDEX ( `a` , `b` )
) ENGINE = InnoDB
PARTITION BY RANGE (a) (
PARTITION p0 VALUES LESS THAN (1000),
PARTITION p1 VALUES LESS THAN (2000),
PARTITION p2 VALUES LESS THAN (3000),
PARTITION p3 VALUES LESS THAN (4000),
PARTITION p4 VALUES LESS THAN (5000),
PARTITION p5 VALUES LESS THAN (6000),
PARTITION p6 VALUES LESS THAN (7000),
PARTITION p7 VALUES LESS THAN (8000),
PARTITION p8 VALUES LESS THAN (9000),
PARTITION p9 VALUES LESS THAN MAXVALUE
);
delimiter //
DROP procedure IF EXISTS fill_friends//
create procedure fill_friends()
begin
declare i int default 0;
declare a int;
declare b int;
while i<2000000
do
set a = rand()*10000;
set b = rand()*10000;
insert into friends values(a,b);
set i = i + 1;
end while;
end
//
delimiter ;
delimiter //
DROP procedure IF EXISTS fill_friends_part//
create procedure fill_friends_part()
begin
insert into friends_part (select a,b from friends);
insert into friends_part (select b as a, a as b from friends);
end
//
delimiter ;
Queries I have run are:
select * from friends where a=317 or b=317;
result set: 475
times: 1.43, 0.02, 0.01
select * from friends_part where a=317;
result set: 475
times: 0.10, 0.00, 0.00
select * from friends where a=4887 or b=4887;
result set: 483
times: 1.33, 0.01, 0.01
select * from friends_part where a=4887;
result set: 483
times: 0.06, 0.01, 0.00
I didn't bother about uniqueness of data but in your example You may use unique index.
As well I used InnoDB engine, but MyISAM is better if most of the queries are select and you are not going to do many writes.
There is no big difference for 2nd and 3rd run probably because of caching, but there is visible difference for 1st run. It is faster because we are breaking one of prime rules of database designing, but the end justifies the means so it may be good solution for really big tables. If you are going to have less than 1M of records I think You can survive without partitioning.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008