Database schema User Matching - mysql

Among these two schemas solutions about user matching which could be the best with big data?
Solution 1:
CREATE TABLE `user_matches` (
`user_id_1` int(11) NOT NULL,
`user_id_2` int(11) NOT NULL,
`like_user_1` tinyint(1) DEFAULT '0',
`like_user_2` tinyint(1) DEFAULT '0',
PRIMARY KEY (`user_id_1`,`user_id_2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
To select the matching among two users I should write this query:
SELECT *
FROM user_matches
WHERE (user_id_1 = 123 OR user_id_2 = 123) AND (like_user_1 = 1 AND like_user_2 = 1)
PS: Imagine that like_user_1 and like_user_2 are both indexed
Solution 2:
CREATE TABLE `user_matches` (
`user_id` int(11) NOT NULL,
`user_id_liked` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`user_id_liked`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
To select the matching among users I should write this query:
SELECT me.user_id_liked
FROM user_matches me
INNER JOIN user_matches you ON me.user_id = you.user_id_liked
AND you.user_id = me.user_id_liked
AND me.user_id = 123
I think that the 2nd solution is the best for the schema and for querying because from and joins clauses are executed before where clause, but on the same times in the first solution I don't need to join tables.

I you index like_user_1 and like_user_2 on solution 1 queries should be fast.
I would test both solutions and compare execution plans.
EXPLAIN and EXPLAIN EXTENDED will be useful.

Related

MySQL query with multiple joins taking too long to execute

I have 3 tables. The first one is called map_life, the second one is called scripts and the third one is called npc_data.
I'm running the following query to get all the properties from map_life while also getting the script column from scripts and the storage_cost column from npc_data if the ids match.
SELECT life.*
, script.script
, npc.storage_cost
FROM map_life life
LEFT
JOIN scripts script
ON script.objectid = life.lifeid
AND script.script_type = 'npc'
LEFT
JOIN npc_data npc
ON npc.npcid = life.lifeid
As you can see, map_life id is lifeid, while scripts id is objectid and npc_data id is npcid.
This query is taking about 5 seconds to execute, and I have no idea why. Here's the CREATE statements for all those 3 tables, maybe I'm missing something?
CREATE TABLE `mcdb83`.`map_life` (
`id` bigint(21) unsigned NOT NULL AUTO_INCREMENT,
`mapid` int(11) NOT NULL,
`life_type` enum('npc','mob','reactor') NOT NULL,
`lifeid` int(11) NOT NULL,
`life_name` varchar(50) DEFAULT NULL COMMENT 'For reactors, specifies a handle so scripts may interact with them; for NPC/mob, this field is useless',
`x_pos` smallint(6) NOT NULL DEFAULT '0',
`y_pos` smallint(6) NOT NULL DEFAULT '0',
`foothold` smallint(6) NOT NULL DEFAULT '0',
`min_click_pos` smallint(6) NOT NULL DEFAULT '0',
`max_click_pos` smallint(6) NOT NULL DEFAULT '0',
`respawn_time` int(11) NOT NULL DEFAULT '0',
`flags` set('faces_left') NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `lifetype` (`mapid`,`life_type`)
) ENGINE=InnoDB AUTO_INCREMENT=32122 DEFAULT CHARSET=latin1;
CREATE TABLE `mcdb83`.`scripts` (
`script_type` enum('npc','reactor','quest','item','map_enter','map_first_enter') NOT NULL,
`helper` tinyint(3) NOT NULL DEFAULT '-1' COMMENT 'Represents the quest state for quests, and the index of the script for NPCs (NPCs may have multiple scripts).',
`objectid` int(11) NOT NULL DEFAULT '0',
`script` varchar(30) NOT NULL DEFAULT '',
PRIMARY KEY (`script_type`,`helper`,`objectid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Lists all the scripts that belong to NPCs/reactors/etc. ';
CREATE TABLE `mcdb83`.`npc_data` (
`npcid` int(11) NOT NULL,
`storage_cost` int(11) NOT NULL DEFAULT '0',
`flags` set('maple_tv','is_guild_rank') NOT NULL DEFAULT '',
PRIMARY KEY (`npcid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
For this query:
SELECT l.*, s.script, npc.storage_cost
FROM map_life l LEFT JOIN
scripts s
ON s.objectid = l.lifeid AND
s.script_type = 'npc' LEFT JOIN
npc_data npc
ON npc.npcid = l.lifeid;
You want indexes on: scripts(object_id, script_type, script) and npc_data(npcid, storage_cost). The order of the columns in these indexes is important.
map_life.lifeid does not have any indexes defined, therefore the joins will result in full table scans. Define an index on map_life.lifeid field.
In scripts table the primary key is defined on the following fields in that order: script_type, helper, objectid. The join is done on objectid and there is a constant filter criterion on script_type. Because the order of the fields in the index is wrong, MySQL cannot use the primary key for this query. For this query the order of the fields in the index should b: objectid, script_type, helper.
The above will significantly speed up the joins. You may further increase the speed of the query if your indexes actually cover all fields that are in the query because in this case MySQL does not even have to touch the tables.
Consider adding an index with the following fields and order to the scripts table: object_id, script_type, script and npcid, storage_cost index to npc_data table. However, these indexes may slow down insert / update / delete statements, so do some performance testing before adding these indexes to production environment.

optimizing this query - which index or changes should i do?

Quite a simple question. But i'm a little bit lost when it come to sql optimization and index, i'm learning.
Query
SELECT A.*, count(A.ID) as count
FROM tableB B
JOIN tableA A ON A.ID = B.ID
WHERE B.otherID=xx and B.value='test' and B.languageID=3
Table A
CREATE TABLE `tableA` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`info1` varchar(64) NOT NULL default '',
`info2` varchar(64) NOT NULL default '',
PRIMARY KEY (`ID`)
) TYPE=MyISAM
Table B
CREATE TABLE `tableB` (
`ID` int(11) NOT NULL default '0',
`otherID` int(11) NOT NULL default '0',
`value` varchar(64) NOT NULL default '',
`languageID` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`,`otherID`,`languageID`)
) TYPE=MyISAM
So the query is quite simple, i'm looking for the fields with a specific id and value in the table B, and i'm doing a join on table A because i need some infos which are in there.
I guess the query itself can't be optimized, but maybe i can speed up thing if i create an index, an index on (B.otherID,B.value) maybe ?
Thanks for you lights!
Normally the name ID is used for the PRIMARY KEY. A PRIMARY KEY is necessarily Unique. Yet you say
PRIMARY KEY (`ID`,`otherID`,`languageID`)
Is ID not unique, but this triple is? (Just checking.)
Back to your question...
WHERE B.otherID=xx and B.value='test' and B.languageID=3
Says that B needs those 3 columns in a composite index in any order. With that, the Optimizer will start with B, quickly find the row(s) needed there. Then it will move over to A, which already has an index on ID to handle ON A.ID = B.ID.
My Cookbook on creating indexes.
The normal pattern is COUNT(*). COUNT(x) has the extra burden of checking all the x values for being not NULL. (I suspect you did not need that.)
Use InnoDB, not MyISAM.

Ordering in MySQL Bogs Down

I've been working on a small Perl program that works with a table of articles, displaying them to the user if they have not been already read. It has been working nicely and it has been quite speedy, overall. However, this afternoon, the performance has degraded from fast enough that I wasn't worried about optimizing the query to a glacial 3-4 seconds per query. To select articles, I present this query:
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
WHERE ciid NOT
IN (
SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)
AND (
cid =117
OR cid =308
OR cid =310
)
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
The list of possible cid's varies and could be quite a bit more. In any case, I noted that about 2-3 seconds of the total time to make the query is devoted to "ORDER BY." If I remove that, it only takes about a half second to give me the query back. If I drop the subquery, the performance goes back to normal... but the subquery didn't seem to be problematic until just this afternoon, after working fine for a week or so.
Any ideas what could be slowing it down so much? What might I do to try to get the performance back up to snuff? The table being queried has 45,000 rows. The subquery's table has fewer than 3,000 rows at present.
Update: Incidentally, if anyone has suggestions on how to do multiple queries or some other technique that would be more efficient to accomplish what I am trying to do, I am all ears. I'm really puzzled how to solve the problem at this point. Can I somehow apply the order by before the join to make it apply to the real table and not the derived table? Would that be more efficient?
Here is the latest version of the query, derived from suggestions from #Gordon, below
SELECT channelitem.ciid, channelitem.cid, name, description, url, creationdate, author
FROM `channelitem`
LEFT JOIN (
SELECT ciid, dateRead
FROM `uninet_channelitem_read`
WHERE uid = '1030'
)alreadyRead ON channelitem.ciid = alreadyRead.ciid
WHERE (
alreadyRead.ciid IS NULL
)
AND `cid`
IN ( 6648, 329, 323, 6654, 6647 )
ORDER BY `channelitem`.`creationdate` DESC
LIMIT 0 , 100
Also, I should mention what my db structure looks like with regards to these two tables -- maybe someone can spot something odd about the structure:
CREATE TABLE IF NOT EXISTS `channelitem` (
`newsversion` int(11) NOT NULL DEFAULT '0',
`cid` int(11) NOT NULL DEFAULT '0',
`ciid` int(11) NOT NULL AUTO_INCREMENT,
`description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`url` varchar(222) DEFAULT NULL,
`creationdate` datetime DEFAULT NULL,
`urgent` varchar(10) DEFAULT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`lastchanged` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`author` varchar(255) NOT NULL,
PRIMARY KEY (`ciid`),
KEY `newsversion` (`newsversion`),
KEY `cid` (`cid`),
KEY `creationdate` (`creationdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1638554365 ;
CREATE TABLE IF NOT EXISTS `uninet_channelitem_read` (
`ciid` int(11) NOT NULL,
`uid` int(11) NOT NULL,
`dateRead` datetime NOT NULL,
PRIMARY KEY (`ciid`,`uid`),
KEY `ciid` (`ciid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
It never hurts to try the left outer join version of such a query:
SELECT ci.ciid, ci.cid, ci.name, ci.description, ci.url, ci.creationdate, ci.author
FROM `channelitem` ci left outer join
(SELECT ciid
FROM `uninet_channelitem_read`
WHERE uid = '1030'
) cr
on ci.ciid = cr.ciid
where cr.ciid is null and
ci.cid in (117, 308, 310)
ORDER BY ci.`creationdate` DESC
LIMIT 0 , 100
This query will be faster with an index on uninet_channelitem_read(ciid) and probably on channelitem(cid, ciid, createddate).
The problem could be that you need to create an index on the channelitem table for the column creationdate. Indexes help a database to run queries faster. Here is a link about MySQL Indexing

Mysql Join Query optimization

I have two tables in mysql:
Results Table : 1046928 rows.
Nodes Table : 50 rows.
I am joining these two tables with the following query and the execution of the query is very very slow.
select res.TIndex, res.PNumber, res.Sender, res.Receiver,
sta.Nickname, rta.Nickname from ((Results res join
Nodes sta) join Nodes rta) where ((res.sender_h=sta.name) and
(res.receiver_h=rta.name));
Please help me optimize this query. Right now if I want to pull just top 5 rows, It takes about 5-6 MINUTES. Thank you.
CREATE TABLE `nodes1` (
`NodeID` int(11) NOT NULL,
`Name` varchar(254) NOT NULL,
`Nickname` varchar(254) NOT NULL,
PRIMARY KEY (`NodeID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `Results1` (
`TIndex` int(11) NOT NULL,
`PNumber` int(11) NOT NULL,
`Sender` varchar(254) NOT NULL,
`Receiver` varchar(254) NOT NULL,
`PTime` datetime NOT NULL,
PRIMARY KEY (`TIndex`,`PNumber`),
KEY `PERIOD_TIME_IDX` (`PTime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT res.TIndex ,
res.PNumber ,
res.Sender ,
res.Receiver ,
sta.Nickname ,
rta.Nickname
FROM Results AS res
INNER JOIN Nodes AS sta ON res.sender_h = sta.name
INNER JOIN Nodes AS rta ON res.receiver_h = rta.NAME
Create an index on Results
(sender_h)
Create an index on Results (receiver_h)
Create an index
on Nodes (name)
Joining on the node's name rather than NodeId (the primary key) doesn't look good at all.
Perhaps you should be storing NodeId for foreign key sender and receiver in the Results table instead of name Adding foreign key constraints is a good idea too. Among other things, this might cause indexing automatically depending on your configuration
If this change is difficult, at the very least you should enforce uniqueness on node's name field
If you change the tables definition in this manner, change your query to John's recommendation, and add indexes it should run a lot better and be a lot more readable/better form.

which among the following is the best query asper execution time and load on server

here goes my two MySQL Queries and can some guide me which is the best query to use as per MYSQl DATABase
the below goes my two sql queries
query 1)
select cast(sum(G1.amount)as decimal(8,2)) as YTDRegularPay,cast(sum(b1.amount)as decimal(8,2))as YTDBonusPay
from tbl_employees_swc_grosswagedetails g1,tbl_employees_swc_grosswagedetails b1
where g1.empid=b1.empid
and g1.PayYear=b1.PayYear
and g1.PayperiodNumber=b1.PayperiodNumber
and g1.Fedtaxid=b1.Fedtaxid
and g1.fedtaxid=998899889
and g1.payyear=2011
and g1.PayperiodNumber<=26
and g1.Wage_code='GRTT'
and g1.Taxing_AuthType=b1.Taxing_AuthType
and g1.empid=1005 and b1.wage_code='GRSP'
and g1.taxing_AuthType='FED' ;
and
Query 2)
select abc.Amount as YTDRegularPay,def.Amount as YTDBonusPay
from (select Cast(sum(EG.Amount) as Decimal(8,2)) as Amount
from tbl_employees_swc_grosswagedetails EG
where EG.FedTaxID=998899889
and EG.EmpID=1005
and PayYear=2011
and EG.PayPeriodNumber<=26
and EG.Wage_code='GRTT'
and Taxing_AuthType='FED') as abc,
(select Cast(sum(EG.Amount) as Decimal(8,2)) as Amount
from tbl_employees_swc_grosswagedetails EG
where EG.FedTaxID=998899889
and EG.EmpID=1005
and PayYear=2011
and EG.PayPeriodNumber<=26
and EG.Wage_code='GRSP'
and Taxing_AuthType='FED') as def ;
Here goes my Table structure
delimiter $$
CREATE TABLE `tbl_employees_swc_grosswagedetails` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`empid` int(11) NOT NULL,
`Fedtaxid` varchar(9) NOT NULL,
`Wage_code` varchar(45) NOT NULL,
`Amount` double NOT NULL,
`Hrly_Rate` double DEFAULT NULL,
`Num_hours` double DEFAULT NULL,
`Taxing_AuthType` varchar(10) DEFAULT NULL,
`Taxing_Auth_Name` varchar(10) DEFAULT NULL,
`PayperiodNumber` int(11) NOT NULL,
`PayYear` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `empid` (`empid`),
CONSTRAINT `empid` FOREIGN KEY (`empid`) REFERENCES `tblemployee` (`EmpID`)
ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=359 DEFAULT CHARSET=latin1$$
any good query else these are very much appreciable
Thanks IN adv,
Raghavendra.V
I would say the first one is better, since using JOIN is almost always better than using a subquery. It is also recommended to write the JOIN explicitly (though it does not matter in terms of performance), like this:
SELECT
CAST(SUM(G1.amount) AS decimal(8,2)) AS YTDRegularPay,
CAST(SUM(b1.amount) AS decimal(8,2)) AS YTDBonusPay
FROM
tbl_employees_swc_grosswagedetails g1,
JOIN
tbl_employees_swc_grosswagedetails b1 ON g1.empid = b1.empid
AND g1.PayYear = b1.PayYear
AND g1.PayperiodNumber = b1.PayperiodNumber
AND g1.Taxing_AuthType = b1.Taxing_AuthType
AND g1.Fedtaxid = b1.Fedtaxid
WHERE
g1.fedtaxid = 998899889
AND g1.payyear = 2011
AND g1.PayperiodNumber <= 26
AND g1.Wage_code = 'GRTT'
AND b1.wage_code = 'GRSP'
AND g1.empid = 1005
AND g1.taxing_AuthType = 'FED';
Adding some indexes will probably help as well to make both queries quicker. Since you use many columns in your WHERE clause, you need to choose which ones to index according to the data structure. Try adding a bunch of indexes, run the query with EXPLAIN and see which index is used - this one would be the most effective one and than you can drop the others.