Related
I have a question on join two table
TWD_CSD_NEWS_DETAIL (200 million row):
+-----------+------------+--------------+------------------+
| CSD_ID | CSD_ID_DRI | CSD_PARTY_ID | CSD_PARTY_AMOUNT |
+-----------+------------+--------------+------------------+
| 1 | 1 | 1183 | 27870 |
+-----------+------------+--------------+------------------+
| 2 | 1 | 1723 | 12 |
+-----------+------------+--------------+------------------+
| 3 | 1 | 1243 | 87474 |
+-----------+------------+--------------+------------------+
.
.
.
+-----------+------------+--------------+------------------+
| 18575622 | 8881 | 1183 | 27870 |
+-----------+------------+--------------+------------------+
the result of SHOW CREATE TABLE TWD_CSD_NEWS_DETAIL:
CREATE TABLE `TWD_CSD_NEWS_DETAIL` (
`CSD_ID` int(11) NOT NULL AUTO_INCREMENT,
`CSD_ID_CREATED_BY` int(11) DEFAULT NULL,
`CSD_DT_CREATED` datetime DEFAULT NULL,
`CSD_DT_UPD` datetime DEFAULT NULL,
`CSD_ID_DRI` int(11) DEFAULT NULL,
`CSD_ID_UPD_BY` int(11) DEFAULT NULL,
`CSD_PARTY_ID` int(11) DEFAULT NULL,
`CSD_AMOUNT` decimal(26,0) DEFAULT NULL,
`CSD_TIMESTAMP` datetime DEFAULT NULL,
PRIMARY KEY (`CSD_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=184035984 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
TWD_DRI_NEWS_RESULT_HEADER (1 million row) :
+--------+---------------------+----------------+
| DRI_ID | DRI_DATE | DRI_SYM_SYMBOL |
+--------+---------------------+----------------+
| 1 | 2011-11-08 00:00:00 | 1 |
+--------+---------------------+----------------+
| 2 | 2011-11-08 00:00:00 | 2 |
+--------+---------------------+----------------+
| 3 | 2011-11-08 00:00:00 | 3 |
+--------+---------------------+----------------+
.
.
+--------+---------------------+----------------+
| 10001 | 2011-11-11 00:00:00 | 8881 |
+--------+---------------------+----------------+
the result of SHOW CREATE TABLE TWD_DRI_NEWS_RESULT_HEADER :
CREATE TABLE `TWD_DRI_NEWS_RESULT_HEADER` (
`DRI_ID` int(11) NOT NULL AUTO_INCREMENT,
`DRI_DATE` datetime DEFAULT NULL,
`DRI_SYM_SYMBOL` int(11) DEFAULT NULL,
`DRI_TIMESTAMP` datetime DEFAULT NULL,
PRIMARY KEY (`DRI_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=1592193 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
I try to join them with following sql, it works but it will take very long to completed this query when i keep adding csd_id range in where cluase
SELECT
csd.CSD_ID, csd.CSD_ID_DRI, csd.CSD_PARTY_ID, csd.CSD_AMOUNT , dri.DRI_DATE, dri.DRI_SYM_TICKER
FROM
TWD_CSD_NEWS_DETAIL csd
LEFT JOIN
TWD_DRI_NEWS_RESULT_HEADER dri ON dri.DRI_ID = csd.CSD_ID_DRI
WHERE
(
(
( csd_id between 1 and 426029)
|| ( csd_id between 426030 and 851977)
|| ( csd_id between 851978 and 1277890)
..
...
...
)
AND dri.DRI_SYM_SYMBOL = 1
)
do i need create another view to contain result or any faster method to query this? i tried with the range between 1 and 200000000 ther duration and fetch time require 0.197 seconds / 26 seconds
Have you tried using a single range for your query?
SELECT
csd.CSD_ID, csd.CSD_ID_DRI, csd.CSD_PARTY_ID, csd.CSD_SHAREHOLDING ,
dri.DRI_SHAREHOLDING_DATE, dri.DRI_SYM_TICKER
FROM
TWD_CSD_NEWS_DETAIL csd
LEFT JOIN
TWD_DRI_NEWS_RESULT_HEADER dri ON dri.DRI_ID = csd.CSD_ID_DRI
WHERE
csd_id between (1 and 1277890) AND dri.DRI_SYM_SYMBOL = 1;
I have a MariaDB table that looks like this:
+--------+--------+--------+---------------------+
| realm | key2 | userId | date |
+--------+--------+--------+---------------------+
| AB3 | 123 | 1 | 2017-08-04 17:30:00 |
| AB3 | 124 | 1 | 2017-08-04 17:30:00 |
| AB3 | 125 | 1 | 2017-08-04 17:30:00 |
| XY7 | 97 | 2 | 2017-08-04 17:35:00 |
| XY7 | 98 | 2 | 2017-08-04 17:35:00 |
| XY7 | 99 | 2 | 2017-08-04 17:35:00 |
| AB3 | 110 | 3 | 2017-08-04 17:40:00 |
| AB3 | 111 | 3 | 2017-08-04 17:40:00 |
+--------+--------+--------+---------------------+
PRIMARY_KEY (realm, key2)
INDEX (realm, userId)
INDEX (date)
This table operates as some sort of queue for processing user actions. Basically a server always takes the oldest data from this table, processes it and deletes it from this table. Each realm has its own server processing this queue.
Now I want to find out a user's position in queue for that realm. So, using the example above, when I request the position for userId 3 in realm 'AB3', I want to get the result 2 because only one other user (userId 1) is to be processed earlier for realm AB3.
(The row key2 might be irrelevant in this example. I only included it because it is part of the primary key which may make it relevant for finding a good solution)
Here is the SQL schema:
CREATE TABLE `queue` (
`realm` varchar(5) NOT NULL,
`key2` int(10) UNSIGNED NOT NULL,
`userId` int(10) UNSIGNED NOT NULL,
`date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `queue` (`realm`, `key2`, `userId`, `date`) VALUES
('AB3', 110, 3, '2017-08-04 17:40:00'),
('AB3', 111, 3, '2017-08-04 17:40:00'),
('AB3', 123, 1, '2017-08-04 17:30:00'),
('AB3', 124, 1, '2017-08-04 17:30:00'),
('AB3', 125, 1, '2017-08-04 17:30:00'),
('XY7', 97, 2, '2017-08-04 17:35:00'),
('XY7', 98, 2, '2017-08-04 17:35:00'),
('XY7', 99, 2, '2017-08-04 17:35:00');
ALTER TABLE `queue`
ADD PRIMARY KEY (`realm`,`key2`),
ADD KEY `ru` (`realm`,`userId`) USING BTREE,
ADD KEY `date` (`date`);
I came up with this query that seems to work but is pretty slow (~3 seconds) on a table with 10,000,000 entries:
SELECT (COUNT(DISTINCT `realm`, `userId`)+1) `position`
FROM `queue`
WHERE `realm` = 'AB3'
AND `date` < (
SELECT `date`
FROM `queue`
WHERE `realm` = 'AB3' AND `userId` = 3
GROUP BY `realm`, `userId`
)
SQL Fiddle: http://sqlfiddle.com/#!9/fb04fd/9/0
EXPLAIN EXTENDED of this query:
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
| 1 | PRIMARY | queue | ref | PRIMARY,ru,date | PRIMARY | 767 | const | 5266123 | 100.00 | Using where | |
| 2 | SUBQUERY | queue | index_merge | PRIMARY,ru | ru,PRIMARY | 771,767 | | 496 | 75.00 | Using intersect(ru,PRIMARY); Using where | |
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
Do you have any ideas how I can optimize this query to run faster on a table with like 10,000,000 entries?
Other queries that are run on this table:
SELECT `m`.*
FROM `queue` `m`
JOIN (
SELECT `m`.*
FROM `queue` `m`
WHERE `m`.`realm` = ?
ORDER BY `date` ASC
LIMIT 1
) `mm` ON `m`.`realm` = `mm`.`realm` AND `m`.`userId` = `mm`.`userId`;
and
DELETE FROM `queue` WHERE `realm` = ? AND `userId` = ?;
How could I optimize my indexes?
I feel like something wrong with the table DDL. Anyway, i would have rewriten your query like :
SELECT (COUNT(DISTINCT `userId`)+1) `position`
FROM `queue`
WHERE `realm` = 'AB3'
AND `date` < (
SELECT min(`date`)
FROM `queue`
WHERE `realm` = 'AB3' AND `userId` = 3
)
and perhaps have a really specific index for this query like :
index (realm, date)
You can try the sheety index
index (realm, date, userId)
but not even sure it will be faster that the previous one.
Description
I have a MySQL table like the following one:
CREATE TABLE `ticket` (
`ticket_id` int(11) NOT NULL AUTO_INCREMENT,
`ticket_number` varchar(30) DEFAULT NULL,
`pick1` varchar(2) DEFAULT NULL,
`pick2` varchar(2) DEFAULT NULL,
`pick3` varchar(2) DEFAULT NULL,
`pick4` varchar(2) DEFAULT NULL,
`pick5` varchar(2) DEFAULT NULL,
`pick6` varchar(2) DEFAULT NULL,
PRIMARY KEY (`ticket_id`)
) ENGINE=InnoDB AUTO_INCREMENT=19675 DEFAULT CHARSET=latin1;
Let's also asume we have the following values already stored in DB:
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
| ticket_id | ticket_number | pick1 | pick2 | pick3 | pick4 | pick5 | pick6 |
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
| 655 | 08-09-21-24-46-52 | 8 | 9 | 21 | 24 | 46 | 52 |
| 658 | 08-23-24-40-42-45 | 8 | 23 | 24 | 40 | 42 | 45 |
| 660 | 07-18-19-20-22-31 | 7 | 18 | 19 | 20 | 22 | 45 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 19674 | 06-18-33-43-49-50 | 6 | 18 | 33 | 43 | 49 | 50 |
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
Now, my goal is to compare each ticket with each other one in the Table (except itself), in terms of their respective values in ticket_number field (6 elements per set, split by -). Put differently, for instance, imagine I compare ticket_id = 655 with ticket_id = 658, in terms of the elements in their respectives ticket_number fields, then I will find that elements 08 and 24 appear in both sets. If we now compare ticket_id = 660 with ticket_id = 19674, then we have that there is only one coincidence: 18.
What I am actually using to carry out these comparisons is the following query:
select A.ticket_id, A.ticket_number, P.ticket_id, P.ticket_number, count(P.ticket_number) as cnt from ticket A inner join ticket P on A.ticket_id != P.ticket_id
where
((A.ticket_number like concat("%", lpad(P.pick1,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick2,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick3,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick4,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick5,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick6,2,0), "%")) > 3) group by A.ticket_id
having cnt > 5;
That is, first I create a INNER JOIN concatenating all rows with different ticket_id and then I compare each P.pickX (X=[1..6]) with the A.ticket_number of the resulting INNER JOIN operation, and I count the number of matchings between both sets.
Finally, after executing, I obtain something like this:
+-------------+-------------------+-------------+-------------------+-----+
| A.ticket_id | A.ticket_number | P.ticket_id | P.ticket_number | cnt |
+-------------+-------------------+-------------+-------------------+-----+
| 8489 | 14-21-28-32-48-49 | 2528 | 14-21-33-45-48-49 | 6 |
| 8553 | 02-14-17-38-47-53 | 2364 | 02-30-38-44-47-53 | 6 |
| 8615 | 05-12-29-33-36-43 | 4654 | 12-21-29-33-36-37 | 6 |
| 8686 | 09-13-29-34-44-48 | 6038 | 09-13-17-29-33-44 | 6 |
| 8693 | 01-10-14-17-42-50 | 5330 | 01-10-37-42-48-50 | 6 |
| ... | ... | ... | ... | ... |
| 19195 | 05-13-29-41-46-51 | 5106 | 07-13-14-29-41-51 | 6 |
+-------------+-------------------+-------------+-------------------+-----+
Problem
The problem is that I execute this for a table of 10476 rows, resulting in more tan 100 Million ticket_number vs pickX to compare, lasting around 172 seconds in total to conclude. This is too slow.
GOAL
My goal is to make this execution as fast as possible so as to be completed in less than a second, since this must work in real-time.
Is that possible?
If you want to keep the current structure then change pick1..6 to tinyint type instead of varchar
TINYINT(1) stores the values between -128 to 128 if it is signed. And then your query won't have that concat with % which is the cause of slow run.
Then, these two queries will give you the same result
select * FROM ticket where pick1 = '8';
select * FROM ticket where pick1 = '08';
This is the sql structure:
CREATE TABLE `ticket` (
`ticket_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`ticket_number` varchar(30) DEFAULT NULL,
`pick1` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick2` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick3` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick4` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick5` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick6` tinyint(1) unsigned zerofill DEFAULT NULL,
PRIMARY KEY (`ticket_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;
I think, you even can remove the zerofill
if this doesn't work, change the table design.
How big can the numbers be? Looks like 50. If the answer is 63 or less, then change the format to this:
All 6 numbers are stored in a single SET ('0','1','2',...,'50') and use suitable operations to set the nth bit.
Then, comparing two sets becomes BIT_COUNT(x & y) to find out how many match. A simple comparison will test for equality.
If your goal is to see if a particular lottery guess is already in the table, then index that column so that a lookup will be fast. I don't mean minutes or even seconds, but rather a few milliseconds. Even for a billion rows.
The bit arithmetic can be done in SQL or in your client language. For example, to build the SET for (11, 33, 7), the code would be
INSERT INTO t SET picks = '11,33,7' -- order does not matter
Also this would work:
... picks = (1 << 11) |
(1 << 33) |
(1 << 7)
A quick example:
CREATE TABLE `setx` (
`picks` set('1','2','3','4','5','6','7','8','9','10') NOT NULL
) ENGINE=InnoDB;
INSERT INTO setx (picks) VALUES ('2,10,6');
INSERT INTO setx (picks) VALUES ('1,3,5,7,9'), ('2,4,6,8,10'), ('9,8,7,6,5,4,3,2,1,10');
SELECT picks, HEX(picks+0) FROM setx;
+----------------------+--------------+
| picks | HEX(picks+0) |
+----------------------+--------------+
| 2,6,10 | 222 |
| 1,3,5,7,9 | 155 |
| 2,4,6,8,10 | 2AA |
| 1,2,3,4,5,6,7,8,9,10 | 3FF |
+----------------------+--------------+
4 rows in set (0.00 sec)
We've a table called message.
CREATE TABLE IF NOT EXISTS `message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_user_id` int(11) NOT NULL,
`to_user_id` int(11) NOT NULL,
`content` text NOT NULL,
`club_id` int(11) NOT NULL,
`read_flag` int(11) NOT NULL DEFAULT '0',
`parent_id` int(11) NOT NULL,
`status` tinyint(1) DEFAULT NULL,
`create_user_id` int(11) NOT NULL,
`update_user_id` int(11) NOT NULL,
`create_dt_tm` datetime NOT NULL,
`update_dt_tm` datetime NOT NULL,
`delete_flag` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
)
Need to display the messages and message reply to the user.
Entries in the table will like this,
id | from_user_id | to_user_id | content | parent_id
1 | 2 | 3 | hai | 0
2 | 3 | 2 | hi | 1
3 | 3 | 2 | hwru | 1
4 | 3 | 4 | hwru | 1
5 | 4 | 5 | u added | 1
6 | 4 | 5 | new msg | 0
Here is the flow,
lets assume 2=>A, 3 =>B, 4 =>C, 5=> D,
A send a message to B
B reply to that message
B send again one more reply by adding new recipient C
C reply to that thread again by adding new recipient D
All users part of this thread, should able to read full message thread.
A,B,C and D can see the all (1,2,3,4,5) messages if they login except 6th
6th message only C and D can see and it is a different thread
Two queries I'm using now,
One for to list all messages.
Second is for to see the details for that message(when user click on that will show all thread related to that message).
By using single query I need to show the all threads to the login user.
Please help some one to select query for this.
Make the default for parent_id NULL. Gets threads user is allowed to view, replace <thisuserid> with user id
SELECT DISTINCT(COALESCE(parent_id, id)) thread_id FROM message m WHERE from_user_id = <thisuserid> OR to_user_id = <thisuserid>
Get whole thread, including duplicates when sending to many recipients since i cant think of a fool proof way to filter them out as they are stored as separate messages. replace <thisuserid> with user id
SELECT * from message m WHERE id = <threadid> OR parent_id = <threadid>
Although, i would totally separate the recipient from the message itself, not only to make querying the whole chain easier, but also to save space. They way you do it now, every new recipient of a message increases the storage required by an amount equal to the size of the message, which can get out of hand very quickly.
CREATE TABLE IF NOT EXISTS `message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_user_id` int(11) NOT NULL,
`content` text NOT NULL,
`parent_id` int(11),
PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `message_to` (
`message_id` int(11) NOT NULL,
`recipient_id` int(11) NOT NULL,
`read_flag` int(11) NOT NULL DEFAULT '0',
`status` tinyint(1) DEFAULT NULL,
`delete_flag` tinyint(1) NOT NULL DEFAULT '0',
UNIQUE KEY (`message_id`, `recipient_id`)
);
INSERT INTO message VALUES (1, 2, 'hai', null), (2, 3, 'hi', 1), (3, 3, 'hwru', 1), (4, 4, 'u added', 1), (5, 4, 'new msg', null);
INSERT INTO message_to (`message_id`, `recipient_id`) VALUES (1,3), (2,2), (3,2), (3,4), (4,5), (5,5);
Get threads user is allowed to view
SET #user := 2;
SELECT DISTINCT(COALESCE(parent_id, id)) thread_id FROM message m WHERE id IN (
SELECT message_id as id FROM message_to WHERE recipient_id = #user
union
SELECT id from message where from_user_id = #user
);
Get whole thread
SELECT * FROM message m WHERE m.id = 1 OR m.parent_id = 1
Your storage type is called Adjacency list, i.e. just store immediate parent id in parent_id column.
To query node's children:
mysql> SELECT * FROM message m1 INNER JOIN message m2 ON m2.parent_id = m1.id WHERE m1.id = 1;
+----+--------------+------------+---------+-----------+----+--------------+------------+---------+-----------+
| id | from_user_id | to_user_id | content | parent_id | id | from_user_id | to_user_id | content | parent_id |
+----+--------------+------------+---------+-----------+----+--------------+------------+---------+-----------+
| 1 | 2 | 3 | hai | 0 | 2 | 3 | 2 | hi | 1 |
| 1 | 2 | 3 | hai | 0 | 3 | 3 | 2 | hwru | 1 |
| 1 | 2 | 3 | hai | 0 | 4 | 3 | 4 | hwru | 1 |
| 1 | 2 | 3 | hai | 0 | 5 | 4 | 5 | u added | 1 |
+----+--------------+------------+---------+-----------+----+--------------+------------+---------+-----------+
4 rows in set (0.00 sec)
If you would like a flat structure, you can do the following query:
mysql> select * from message m WHERE id = 1 OR parent_id = 1;
+----+--------------+------------+---------+-----------+
| id | from_user_id | to_user_id | content | parent_id |
+----+--------------+------------+---------+-----------+
| 1 | 2 | 3 | hai | 0 |
| 2 | 3 | 2 | hi | 1 |
| 3 | 3 | 2 | hwru | 1 |
| 4 | 3 | 4 | hwru | 1 |
| 5 | 4 | 5 | u added | 1 |
+----+--------------+------------+---------+-----------+
5 rows in set (0.00 sec)
Adjacency list has serious drawbacks: it's hard to query deeply nested trees (we're querying only immediate children of message #1 here).
Please, take a look at linked question and also this excellent presentation by Bill Karwin for other options.
I've got an interesting dilemma now. I have a database schema like the following:
GameList:
+-------+----------+-----------+------------+--------------------------------+
| id | steam_id | origin_id | impulse_id | game_title |
+-------+----------+-----------+------------+--------------------------------+
| 1 | 17450 | NULL | NULL | Dragon Age: Origins |
| 2 | NULL | 138994900 | NULL | Dragon Age(TM): Origins |
| 3 | NULL | NULL | dragonage | Dragon Age Origins |
| 4 | 47850 | 201841300 | fifamgr11 | FIFA Manager 11 |
| ... | ... | ... | ... | ... |
+-------+----------+-----------+------------+--------------------------------+
GameAlias:
+----------+-----------+
| old_id | new_id |
+----------+-----------+
| 2 | 1 |
| 3 | 1 |
| ... | ... |
+----------+-----------+
Depending on whether the stores use the same title for the game there may be no issues, or there may be multiple rows for the same game. The Alias table exists to resolve this issue, by stating that id 2 and id 3 are just aliases for id 1.
What I need is an SQL query which uses both the GameList table and the GameAlias table and returns the following:
ConglomerateGameList:
+-------+----------+-----------+------------+--------------------------------+
| id | steam_id | origin_id | impulse_id | game_title |
+-------+----------+-----------+------------+--------------------------------+
| 1 | 17450 | 138994900 | dragonage | Dragon Age: Origins |
| 4 | 47850 | 201841300 | fifamgr11 | FIFA Manager 11 |
| ... | ... | ... | ... | ... |
+-------+----------+-----------+------------+--------------------------------+
Note that I want the game title of the "new id". The game title for any "old ids" should simply be discarded/ignored.
I would also like to note that I can't make any modifications to the GameList table to solve this issue. If I were to simply re-write the table to look like my desired output then every night when I grab an updated game list from the stores it would fail to find the game in the database, generating yet another row like so:
+-------+----------+-----------+------------+--------------------------------+
| id | steam_id | origin_id | impulse_id | game_title |
+-------+----------+-----------+------------+--------------------------------+
| 1 | 17450 | 138994900 | dragonage | Dragon Age: Origins |
| 4 | 47850 | 201841300 | fifamgr11 | FIFA Manager 11 |
| ... | ... | ... | ... | ... |
| 8139 | NULL | 138994900 | NULL | Dragon Age(TM): Origins |
| 8140 | NULL | NULL | dragonage | Dragon Age Origins |
+-------+----------+-----------+------------+--------------------------------+
I also can't work on the assumption that a game's id will never change as Steam has been known to change them when a major update to the game is released.
Bonus points if it can recognize recursive aliases, like the following:
GameAlias:
+----------+-----------+
| old_id | new_id |
+----------+-----------+
| 2 | 1 |
| 3 | 2 |
| ... | ... |
+----------+-----------+
Since id 3 is an alias for id 2 which itself is an alias for id 1. If recursive aliases is impossible then I can just develop my application logic to prevent them.
Does this work? Correct the table names.
select ga1.new_id, max(gl1.steam_id), max(gl1.origin_id), max(gl1.impulse_id),
max(if(gl1.id = ga1.new_id,gl1.game_title,NULL)) as game_title
from gl1, ga1
where (gl1.id = ga1.new_id OR gl1.id = ga1.old_id)
group by ga1.new_id
union
select gl2.id, gl2.steam_id, gl2.origin_id, gl2.impulse_id, gl2.game_title
from gl2, ga2
where (gl2.id not in (
select ga3.new_id from ga3
union
select ga4.old_id from ga4))
1.First solution (without recursion):
CREATE TABLE GameList
(
id INT NOT NULL PRIMARY KEY
,steam_id INT NULL
,origin_id INT NULL
,impulse_id NVARCHAR(50) NULL
,game_title NVARCHAR(50) NOT NULL
);
INSERT GameList(id, steam_id, origin_id, impulse_id, game_title)
SELECT 1, 17450, NULL, NULL, 'Dragon Age: Origins'
UNION ALL
SELECT 2, NULL, 138994900, NULL, 'Dragon Age(TM): Origins'
UNION ALL
SELECT 3, NULL, NULL, 'dragonage','Dragon Age Origins'
UNION ALL
SELECT 4, 47850, 201841300, 'fifamgr11','FIFA Manager 11';
CREATE TABLE GameAlias
(
old_id INT NOT NULL PRIMARY KEY
,new_id INT NOT NULL
);
INSERT GameAlias (old_id, new_id) VALUES (2,1);
INSERT GameAlias (old_id, new_id) VALUES (3,1);
-- Solution 1
SELECT COALESCE(ga.new_id, gl.id) new_id
,MAX(gl.steam_id) new_steam_id
,MAX(gl.origin_id) new_origin_id
,MAX(gl.impulse_id) new_impulse_id
,MAX( CASE WHEN ga.old_id IS NULL THEN gl.game_title ELSE NULL END ) new_game_title
FROM GameList gl
LEFT OUTER JOIN GameAlias ga ON gl.id = ga.old_id
GROUP BY COALESCE(ga.new_id, gl.id);
-- End of Solution 1
DROP TABLE GameList;
DROP TABLE GameAlias;
Results:
1 17450 138994900 dragonage Dragon Age: Origins
4 47850 201841300 fifamgr11 FIFA Manager 11
2.Second solution (levels of recursion = three levels):
CREATE TABLE GameList
(
id INT NOT NULL PRIMARY KEY
,steam_id INT NULL
,origin_id INT NULL
,impulse_id NVARCHAR(50) NULL
,game_title NVARCHAR(50) NOT NULL
);
INSERT GameList(id, steam_id, origin_id, impulse_id, game_title)
SELECT 1, 17450, NULL, NULL, 'Dragon Age: Origins'
UNION ALL
SELECT 2, NULL, 138994900, NULL, 'Dragon Age(TM): Origins'
UNION ALL
SELECT 3, NULL, NULL, 'dragonage','Dragon Age Origins'
UNION ALL
SELECT 4, 47850, 201841300, 'fifamgr11','FIFA Manager 11'
UNION ALL
SELECT 5, 11111, NULL, NULL, 'Starcraft 1'
UNION ALL
SELECT 6, NULL, 1111111111, NULL, 'Starcraft 1.1'
UNION ALL
SELECT 7, NULL, NULL, NULL, 'Starcraft 1.2'
UNION ALL
SELECT 8, NULL, NULL, 'sc1', 'Starcraft 1.3';
CREATE TABLE GameAlias
(
old_id INT NOT NULL PRIMARY KEY
,new_id INT NOT NULL
);
INSERT GameAlias (old_id, new_id) VALUES (2,1);
INSERT GameAlias (old_id, new_id) VALUES (3,1);
INSERT GameAlias (old_id, new_id) VALUES (6,5);
INSERT GameAlias (old_id, new_id) VALUES (7,6);
INSERT GameAlias (old_id, new_id) VALUES (8,7);
-- Solution 2
CREATE TEMPORARY TABLE Mappings
(
old_id INT NOT NULL PRIMARY KEY
,new_id INT NOT NULL
);
INSERT Mappings (old_id, new_id)
-- first level mapping
SELECT ga.old_id, ga.new_id
FROM GameAlias ga
WHERE ga.new_id NOT IN (SELECT t.old_id FROM GameAlias t)
-- second level mapping
UNION ALL
SELECT ga.old_id, ga2.new_id
FROM GameAlias ga
INNER JOIN GameAlias ga2 ON ga.new_id = ga2.old_id
WHERE ga2.new_id NOT IN (SELECT t.old_id FROM GameAlias t)
-- third level mapping
UNION ALL
SELECT ga.old_id, ga3.new_id
FROM GameAlias ga
INNER JOIN GameAlias ga2 ON ga.new_id = ga2.old_id
INNER JOIN GameAlias ga3 ON ga2.new_id = ga3.old_id;
SELECT COALESCE(ga.new_id, gl.id) new_id
,MAX(gl.steam_id) new_steam_id
,MAX(gl.origin_id) new_origin_id
,MAX(gl.impulse_id) new_impulse_id
,MAX( CASE WHEN ga.old_id IS NULL THEN gl.game_title ELSE NULL END ) new_game_title
FROM GameList gl
LEFT OUTER JOIN Mappings ga ON gl.id = ga.old_id
GROUP BY COALESCE(ga.new_id, gl.id);
DROP TEMPORARY TABLE Mappings;
-- End of Solution 2
DROP TABLE GameList;
DROP TABLE GameAlias;
Results:
1 17450 138994900 dragonage Dragon Age: Origins
4 47850 201841300 fifamgr11 FIFA Manager 11
5 11111 1111111111 sc1 Starcraft 1
I'm sorry, but MySQL doesn't has recursive queries/CTEs.