Related
My current project is a social media app somewhat like Facebook. Now the post created by both users and news post (there is a cron running every 15 min and it fetch latest news from various news channels) are keeping in the same table called post table. Because of news post the table is growing very fast and the timeline taking more time to load. So we are planing to slit normal post (post table) and news post (news_post table) to separate tables and then slit old news post to a backup a table (news_post_backup table).
Then on listing post API we have to take union of all these 3 tables and have to sort by post create time and have to take post based on pagination data and other conditions
I want to know is there any benefit from doing like this. I am doubtful because I have to take union then its again become same table like the previous table structure
MYSQL Version on server is 5.6
UPDATE
Here I am adding more information The Query I am running is
select CP.id,CP.user_id,post_title,post_content,post_type,new_title,is_spam,spam_reportedby,CP.privacy,CP.link_title,CP.link_content,CP.link_image,CP.is_paid,CP.payment_status,CP.is_breaking,
CUP.id as channel_userspost_id,CUP.parent_id,
SU.full_name as reporteduser_full_name,SU.user_name as reporteduser_user_name,
SU.user_profile_pic as reporteduser_user_profile_pic,
FU.id as from_user_id, FU.full_name as from_user_full_name,
FU.user_name as from_user_name,
FU.user_profile_pic as from_user_profile_pic,
TU.id as to_user_id, TU.full_name as to_user_full_name,
TU.user_name as to_user_name,
TU.user_profile_pic as to_user_profile_pic,
TUA.authentication_status as to_user_authentication_status,
FUA.authentication_status as from_user_authentication_status,
C.verification_status as channel_verification_status,
CUP.created_at,CUP.updated_at,
guid,external_url,
CP.channel_id,CP.rss_channel_id,if(CP.rss_channel_id!=0,RC.rss_name,C.channel_name) as channel_name,
if(CP.rss_channel_id!=0,RC.rss_logo,C.profile_pic) as channel_logo,
C.channel_type,
PCD.like_count as like_count,
PCD.search_count as search_count,
PCD.view_count as view_count,
CM.channel_member_status,C.payment_status as channel_payment_status,C.payment_method as channel_payment_method,
CP.is_live_finished from `channel_users_posts` as `CUP` inner join `channel_posts` as `CP` on `CUP`.`channel_post_id` = `CP`.`id` and `is_spam` = 'N'
left join `channels` as `C` on `CP`.`channel_id` = `C`.`id`
left join `rss_channels` as `RC` on `CP`.`rss_channel_id` = `RC`.`id` left join `channel_members` as `CM` on `CM`.`channel_id` = `C`.`id` and `CM`.`user_id` = 427 and `CM`.`channel_member_status` != -1
left join `test_develop_new`.`users` as `FU` on `FU`.`id` = `CUP`.`shared_from` left join `test_develop_new`.`users` as `SU` on `SU`.`id` = `CP`.`spam_reportedby`
left join `test_develop_new`.`users` as `TU` on `TU`.`id` = `CUP`.`user_id` left join `common_auth_develop_new`.`user_authentication` as `FUA` on `FUA`.`user_id` = `FU`.`id`
left join `common_auth_develop_new`.`user_authentication` as `TUA` on `TUA`.`user_id` = `TU`.`id` left join `post_count_details` as `PCD` on `PCD`.`channel_userspost_id` = `CUP`.`id`
where (`CP`.`is_paid` = 'N' or (`CP`.`is_paid` = 'Y' and `CP`.`payment_status` = 'S')) and (`CP`.`channel_id` in (705, 537) or (`CUP`.`user_id` in (8, 12, 427))) and `CUP`.`updated_at` < '2019-04-12 11:09:59.000000' and ((`CP`.`channel_id` != 0 and `CM`.`channel_member_status` is not null) or `CP`.`channel_id` = 0) and ((`CP`.`post_type` != 'BV' or `CP`.`user_id` = 427) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility='PA'))) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_A','CRY_A')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1 AND channel_member_role = '1'
) AND channels.channel_type != 46
)) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_A','CRY_A')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1 AND channel_member_role = '1'
) AND channels.channel_type = 46
)) or (CP.post_type ='BV' AND EXISTS(SELECT id FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_S','CRY_S')) AND EXISTS(
SELECT DISTINCT channel_members.channel_id
FROM channel_members
INNER JOIN channels ON channels.id=channel_members.channel_id
WHERE channel_members.channel_id IN (
705,537
) AND channel_members.channel_id IN (
select channel_id from channel_members where user_id = CP.user_id AND channel_member_status = 1
) AND channel_members.channel_id IN (SELECT visibility_ids FROM broadcast_visibility_ids WHERE post_id=CP.id AND post_visibility IN ('CNL_S','CRY_S'))
)) order by `CUP`.`updated_at` desc limit 30
core post table's name is is channel_posts Here is the schema structure for the table
CREATE TABLE `channel_posts` (
`id` bigint(20) UNSIGNED NOT NULL,
`user_id` bigint(20) NOT NULL,
`channel_id` bigint(20) NOT NULL,
`rss_channel_id` int(11) NOT NULL,
`post_title` text COLLATE utf8mb4_unicode_ci NOT NULL,
`post_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`post_type` enum('T','L','I','V','Y','G','A','MI','MV','MY','MG','MA','NS_T','NS_I','C_T','BV') COLLATE utf8mb4_unicode_ci DEFAULT 'T',
`is_spam` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
`spam_reportedby` bigint(20) NOT NULL,
`privacy` int(11) NOT NULL DEFAULT '2',
`guid` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`external_url` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`link_title` text COLLATE utf8mb4_unicode_ci NOT NULL,
`link_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`is_breaking` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
`is_paid` enum('N','Y') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
`payment_status` enum('F','S') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'F',
`link_image` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`is_live_finished` tinyint(1) NOT NULL DEFAULT '0',
`created_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000',
`updated_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
and there is one more table channel_users_post
CREATE TABLE `channel_users_posts` (
`id` bigint(20) UNSIGNED NOT NULL,
`channel_post_id` bigint(20) NOT NULL,
`parent_id` int(11) NOT NULL DEFAULT '0',
`user_id` bigint(20) NOT NULL,
`shared_from` bigint(20) NOT NULL,
`new_title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`created_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000',
`updated_at` timestamp(6) NOT NULL DEFAULT '0000-00-00 00:00:00.000000'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
There are 200,000 record in channel_post table and 600,000 records in channel_users_post table it takes 48586 ms to load.
Another option is to partition the post table by the type of post and date. It is still one table and no code change in the client side. Mysql can do partition elimination for queries.
Have you considered paging your queries rather than splitting the table? Assuming the table is sorted by time, and there's a clustered index on it, you could do something like
SELECT id, time, content
FROM post
LIMIT 50 OFFSET 5000
to get the 5000th newest post to the 5050th newest post.
In terms of insertion time, you'd probably have a B tree index on the time, so it would be logarithmic.
Additionally, it seems like "content" might be fairly big relative to the rest of the data so you could either make sure the index on time is alt 2, or split that off into its own table and run a separate query when you actually want the content.
EDIT
That's a very big query and I can tell you almost immediately that the reason why it's so slow has less to do with the size of the table and more to do with the amount of data you're processing (10 JOINs with 11 nested SELECTs which have their own JOINs).
Do you have to return all of this at once? Or can you get the very basic information you need and then make some calculations in your application, and then make another query? This way, the disk and memory don't have to do as much work, and you're moving that onto the CPU.
If this query is necessary, please see this SO post for how to optimize queries with 10+ JOINs. However, note that in the end, the OP ended up splitting the query since it still takes too long.
The takeaway here is to write smaller queries which usually don't waste as much time/resources.
I am trying to use the following MySQL query:
SELECT *
FROM top_lines t
LEFT OUTER JOIN last_24_topline AS l ON l.`member_no` = t.`member_no`
AND l.`mfg` = t.`line_no`
WHERE l.account_no = 32049 OR l.account_no IS NULL
However this is returning no rows, as there are no account_no rows in last_24_topline that match. From all that I understand and have read this query should still return all the rows from top_lines, even though no rows match in last_24_topline since I am checking for a value or null, but it is not. Are there any options or settings in MySQL (5.7.2) that would cause this behavior?
Just for information, this query works as expected:
SELECT *
FROM top_lines t
LEFT OUTER JOIN last_24_topline l ON l.`member_no` = t.`member_no`
AND l.`mfg` = t.`line_no`
AND l.`account_no` = 32049
I'm unable to use this construct however since I am using entity framework and you can only pass columns in and not values to the joins
CREATE TABLE `last_24_topline` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`member_no` varchar(30) NOT NULL,
`branch_no` int(11) DEFAULT NULL,
`employee_no` varchar(25) DEFAULT NULL,
`account_no` varchar(25) DEFAULT NULL,
`salesperson_name` varchar(255) DEFAULT NULL,
`customer_name` varchar(255) DEFAULT NULL,
`mfg` varchar(5) DEFAULT NULL,
`mfg_description` varchar(255) DEFAULT NULL,
`last_three` decimal(10,2) DEFAULT '0.00',
`last_twelve` decimal(10,2) DEFAULT '0.00',
`ly_last_three` decimal(10,2) DEFAULT '0.00',
`ly_last_twelve` decimal(10,2) DEFAULT '0.00',
PRIMARY KEY (`id`),
KEY `ix_branch_no` (`branch_no`),
KEY `ix_employee_no` (`employee_no`),
KEY `ix_member_line_account` (`member_no`,`mfg`,`account_no`),
KEY `ix_member_line` (`member_no`,`mfg`),
KEY `ix_account_no` (`account_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `top_lines` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`member_no` varchar(30) NOT NULL,
`line_no` varchar(5) NOT NULL,
`line_description` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_line_no` (`member_no`,`line_no`)
) ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=latin1
insert into `top_lines`(`id`,`member_no`,`line_no`,`line_description`) values (1,'520','772','FED ROTOR/DRUM');
insert into `top_lines`(`id`,`member_no`,`line_no`,`line_description`) values (2,'520','952','FED SST CERAMIC');
insert into `top_lines`(`id`,`member_no`,`line_no`,`line_description`) values (3,'520','954','FED SST FRICTION');
insert into `top_lines`(`id`,`member_no`,`line_no`,`line_description`) values (4,'520','162','EVS FRICTION');
INSERT INTO `last_24_topline` (`id`, `member_no`, `branch_no`, `employee_no`, `account_no`, `salesperson_name`, `customer_name`, `mfg`, `mfg_description`, `last_three`, `last_twelve`, `ly_last_three`, `ly_last_twelve`) VALUES('1','520','0','10856','463854','FORD, JAMES,','JIFFY LUBE','459','FEDERATED AIR FILTER','0.00','15.21','0.00','0.00');
INSERT INTO `last_24_topline` (`id`, `member_no`, `branch_no`, `employee_no`, `account_no`, `salesperson_name`, `customer_name`, `mfg`, `mfg_description`, `last_three`, `last_twelve`, `ly_last_three`, `ly_last_twelve`) VALUES('2','520','0','10856','463854','FORD, JAMES,','JIFFY LUBE','460','FILTERS','0.00','0.00','0.00','16.48');
INSERT INTO `last_24_topline` (`id`, `member_no`, `branch_no`, `employee_no`, `account_no`, `salesperson_name`, `customer_name`, `mfg`, `mfg_description`, `last_three`, `last_twelve`, `ly_last_three`, `ly_last_twelve`) VALUES('3','520','0','10856','463854','FORD, JAMES,','JIFFY LUBE','863','SMP T SERIES','0.00','0.00','0.00','50.67');
I would expect, even with no data in last_24_topline that matches, for the first query to produce a result set containing all the rows in top_lines with null values for the columns from last_24_topline.
Expected results:
So, creating the same schema into another database and inserting only the example data I provided above, I get the results I expect. I am testing further with copying the full rows to the second database to see if it still gives the expected results.
update
Copying all data into the new tables causes the problem to resurface. I'm trying to pare down to the minimum necessary to replicate the issue.
Try detecting for empty string too, maybe the fields are not NULL, but empty strings.
SELECT *
FROM
top_lines t
LEFT JOIN
last_24_topline AS l ON l.member_no = t.member_no AND l.mfg = t.line_no
WHERE
(l.account_no = '' OR l.account_no = '32049' OR l.account_no IS NULL)
If you want more help, i will need a sample data for "table last_24_topline" and the expected output after the join.
As a second try, you can use this one:
SELECT *
FROM
top_lines t
LEFT JOIN
last_24_topline AS l ON l.member_no = t.member_no AND l.mfg = t.line_no
WHERE
l.id IS NULL
OR
(l.id IS NOT NULL AND l.account_no = '32049')
Use a column involved in the join instead of l.account_no this way rows from top_lines will be returned if there is no matching row in the left table.
SELECT *
FROM top_lines t
LEFT OUTER JOIN last_24_topline AS l ON l.`member_no` = t.`member_no`
AND l.`mfg` = t.`line_no`
WHERE l.account_no = 32049 OR l.`member_no` IS NULL
Alternatively place the account number filter directly in to the join
SELECT *
FROM top_lines t
LEFT OUTER JOIN last_24_topline AS l ON l.`member_no` = t.`member_no`
AND l.`mfg` = t.`line_no`
AND l.account_no = 32049
On the query without the account number in the join, the join does match row in the last_24_toplines table, but it does not match the account number in the where clause, so it is filtered out and not seen as a row from top_lines without a matching row from last_24_topline.
So, for example this row from the top_line table
Will match this row from the last_24_toplines
but both will then be filtered out, because the account_no didn't match what was in the where clause: WHERE l.account_no = 32049 OR l.account_no IS NULL
The query with the check for account_no within the join will still match the row from top_line, but will not have a matching row from last_24_topline, so you will get a row with top_lines data with null last_24_topline.
I have this query for example (good, it works how I want it to)
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Example Results:
memberid postcount
3 283
6 230
9 198
Now I want to join the memberid of the discusComments table with that of the discusTopic table (because what I really want to do is only get my results from a specific GROUP, and the group id is only in the topic table and not in the comment one hence the join.
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
LEFT JOIN `discusTopics` ON `discusComments`.`memberID` = `discusTopics`.`memberID`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Example Results:
memberid postcount
3 14789
6 8678
9 6987
How can I stop this huge increase happening in the postcount? I need to preserve it as before.
Once I have this sorted I want to have some kind of line which says WHERE discusTopics.groupID = 6, for example
CREATE TABLE IF NOT EXISTS `discusComments` (
`id` bigint(255) NOT NULL auto_increment,
`topicID` bigint(255) NOT NULL,
`comment` text NOT NULL,
`timeStamp` bigint(12) NOT NULL,
`memberID` bigint(255) NOT NULL,
`thumbsUp` int(15) NOT NULL default '0',
`thumbsDown` int(15) NOT NULL default '0',
`status` int(1) NOT NULL default '1',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7190 ;
.
CREATE TABLE IF NOT EXISTS `discusTopics` (
`id` bigint(255) NOT NULL auto_increment,
`groupID` bigint(255) NOT NULL,
`memberID` bigint(255) NOT NULL,
`name` varchar(255) NOT NULL,
`views` bigint(255) NOT NULL default '0',
`lastUpdated` bigint(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `groupID` (`groupID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=913 ;
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments`
JOIN `discusTopics` ON `discusComments`.`topicID` = `discusTopics`.`id`
GROUP BY `discusComments`.`memberID` ORDER BY postcount DESC
Joining the topicid in both tables solved the memberID issue. Thanks #Andiry M
You need to use just JOIN not LEFT JOIN and you can add AND discusTopics.memberID = 6 after ON discusComments.memberID = discusTopics.memberID
You can use subqueries lik this
SELECT `discusComments`.`memberID`, COUNT( `discusComments`.`memberID`) AS postcount
FROM `discusComments` where `discusComments`.`memberID` in
(select distinct memberid from `discusTopics` WHERE GROUPID = 6)
If i understand your question right you do not need to use JOIN here at all. JOINs are needed in case when you have many to many relationships and you need for each value in one table select all corresponding values in another table.
But here you have many to one relationship if i got it right. Then you can simply do select from two tables like this
SELECT a.*, b.id FROM a, b WHERE a.pid = b.id
This is simple request and won't create a giant overhead as JOIN does
PS: In the future try to experiment with your queries, try to avoid JOINs especially in MySQL. They are slow and dangerous in their complexity. For 90% of cases when you want to use JOIN there is simple and much faster solution.
I am basically having the exact same problem as here:
SQL View: Join tables without causing the data to duplicate on every row?
Except on that question he was using SQL, and I am using mysql. I am wondering if the same query is possible in mysql. If so, I may have the wrong syntax?
I am trying to do something like
select a.name as account_Name,
p.description as property_DESCRIPTION,
p.address as property_ADDRESS,
null as vehicles_DESCRIPTION,
null as vehicles_MAKE,
null as vehicles_MODEL
from Accounts a
inner join Properties p
on a.accountid = p.accountid
UNION ALL
select a.name as account_Name,
null as property_DESCRIPTION,
null as property_ADDRESS,
v.description as vehicles_DESCRIPTION,
v.make as vehicles_MAKE,
v.model as vehicles_MODEL
from Accounts a
inner join vehicles v
on a.accountid = v.accountid
Here is my actual code:
SELECT user.first_name, user.last_name, upi.image_id, NULL AS friends.friend_user_id FROM user
INNER JOIN user_profile_images as upi ON (user.user_id = upi.user_id)
UNION
SELECT user.first_name, user.last_name, NULL AS upi.image_id, friends.friend_user_id FROM user
INNER JOIN friends ON (user.user_id = friends.user_id)
WHERE user.user_id = '$profile_id'
where I have 3 tables: user, user_profile_images, and friends. Both user_profile_images and friends are related to the user through the user_id. So a user can have multiple profile images as well as multiple friend entries. I can post the table diagrams if it doesnt make sense. But what I want is basically a view of all the info, with fields NULL if they don't apply to the overall view.
If I do the query with 2 tables, either with user and user_profile_images, or user and friends, I get the desired results, but adding the third table gives me duplicate rows.
The solution, as #MarcB suggests, is to use UNION rather than UNION ALL.
However, I have a question for you - why use the UNION at all? The following is equivalent, except that if (say) account 1 has one property and one vehicle, instead of getting:
account_Name property_DESCRIPTION vehicles_MAKE
account1 property1 NULL
account1 NULL vehicle1
You'll get
account_Name property_DESCRIPTION vehicles_MAKE
account1 property1 vehicle1
Query:
SELECT a.name as account_Name,
p.description as property_DESCRIPTION,
p.address as property_ADDRESS,
v.description as vehicles_DESCRIPTION,
v.make as vehicles_MAKE,
v.model as vehicles_MODEL
FROM Accounts a
LEFT JOIN Properties p
on a.accountid = p.accountid
LEFT JOIN vehicles v
on a.accountid = v.accountid
WHERE p.description IS NOT NULL AND v.make IS NOT NULL
Note - the last line (IS NOT NULL for both p and v) simulates the 'accounts table' part of the INNER JOIN and makes sure that only accounts with at least a property OR a vehicle are shown. Substitute the id columns of p and v there.
If I were going after data like that where two tables are related by an id on the third I would consider using outer joins. In the comments of the question you referenced, outer joins were mentioned as a possible solution. The code for that would look like this.
select a.name as account_Name,
p.description as property_DESCRIPTION,
p.address as property_ADDRESS,
v.description as vehicles_DESCRIPTION,
v.make as vehicles_MAKE,
v.model as vehicles_MODEL
from accounts a
left outer join properties p on p.accountid = a.accountid
left outer join vehicles v on v.accountid = a.accountid;
Here is how the solution was tested. First I created the three tables.
CREATE TABLE `accounts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE `properties` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`accountid` int(11) NOT NULL,
`description` varchar(255) COLLATE utf8_bin NOT NULL,
`address` varchar(255) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
CREATE TABLE `vehicles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`accountid` int(11) NOT NULL,
`description` varchar(255) COLLATE utf8_bin NOT NULL,
`make` varchar(100) COLLATE utf8_bin NOT NULL,
`model` varchar(100) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Next I inserted data into each table.
INSERT INTO `demo`.`accounts` (
`accountid` ,
`name`
)
VALUES (
NULL , 'techport80.com'
);
INSERT INTO `demo`.`properties` (
`id` ,
`accountid` ,
`description` ,
`address`
)
VALUES (
NULL , '1', 'office', '123 may street');
INSERT INTO `demo`.`vehicles` (
`id` ,
`accountid` ,
`description` ,
`make` ,
`model`
)
VALUES (
NULL , '1', 'motorcycle', 'honda', 'shadow'
);
At this point if I test the solution, I will receive one row with account, property and vehicle information.
INSERT INTO `demo`.`vehicles` (
`id` ,
`accountid` ,
`description` ,
`make` ,
`model`
)
VALUES (
NULL , '1', 'passenger car', 'Ford', 'Mustang'
);
Now if I test my solution, I see 2 row, one for each vehicle. But the two rows are not exactly duplicate. Though there are some column with the same data. Most notably, the account information.
INSERT INTO `demo`.`properties` (
`id` ,
`accountid` ,
`description` ,
`address`
)
VALUES (
NULL , '1', 'home', '321 yam street'
);
Next I added a second address. Now four rows are returned. One for each vehicle and one for each address. But still, none of the rows are duplicate.
Finally I added another vehicle. This was to cause an off balance of vehicle vs properties.
INSERT INTO `demo`.`vehicles` (
`id` ,
`accountid` ,
`description` ,
`make` ,
`model`
)
VALUES (
NULL , '1', 'Van', 'Chev', 'Cargo'
);
Now we have 6 rows. One per each vehicle and each vehicle is also related to two properties. (So 2 x 3)
Having gone thru this exercise, I wonder if a hierarchical view of the data would be a better model for this data. Perhaps XML or JSON could be used represent the data. For this task, you could use a stored function, but I personally would first consider a programming language like PHP, C#, C++, or a slew of others.
HTH
Looking at this query there's got to be something bogging it down that I'm not noticing. I ran it for 7 minutes and it only updated 2 rows.
//set product count for makes
$tru->query->run(array(
'name' => 'get-make-list',
'sql' => 'SELECT id, name FROM vehicle_make',
'connection' => 'core'
));
while($tempMake = $tru->query->getArray('get-make-list')) {
$tru->query->run(array(
'name' => 'update-product-count',
'sql' => 'UPDATE vehicle_make SET product_count = (
SELECT COUNT(product_id) FROM taxonomy_master WHERE v_id IN (
SELECT id FROM vehicle_catalog WHERE make_id = '.$tempMake['id'].'
)
) WHERE id = '.$tempMake['id'],
'connection' => 'core'
));
}
I'm sure this query can be optimized to perform better, but I can't think of how to do it.
vehicle_make = 45 rows
taxonomy_master = 11,223 rows
vehicle_catalog = 5,108 rows
All tables have appropriate indexes
UPDATE: I should note that this is a 1-time script so overhead isn't a big deal as long as it runs.
CREATE TABLE IF NOT EXISTS `vehicle_make` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(32) NOT NULL,
`product_count` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=46 ;
CREATE TABLE IF NOT EXISTS `taxonomy_master` (
`product_id` int(10) NOT NULL,
`v_id` int(10) NOT NULL,
`vehicle_requirement` varchar(255) DEFAULT NULL,
`is_sellable` enum('True','False') DEFAULT 'True',
`programming_override` varchar(25) DEFAULT NULL,
PRIMARY KEY (`product_id`,`v_id`),
KEY `idx2` (`product_id`),
KEY `idx3` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `vehicle_catalog` (
`v_id` int(10) NOT NULL,
`id` int(11) NOT NULL,
`v_make` varchar(255) NOT NULL,
`make_id` int(11) NOT NULL,
`v_model` varchar(255) NOT NULL,
`model_id` int(11) NOT NULL,
`v_year` varchar(255) NOT NULL,
PRIMARY KEY (`v_id`,`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx` (`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx2` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Update: The successful query to get what I needed is here....
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
without the tables/columns this is my best guess from reverse engineering the given queries:
UPDATE m
SET product_count =COUNT(t.product_id)
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.name
The given code loops over each make, and then runs a query the counts for each. My answer just does them all in one query and should be a lot faster.
have an index for each of these:
vehicle_make.id cover on name
vehicle_catalog.id cover make_id
taxonomy_master.v_id
EDIT
give this a try:
CREATE TEMPORARY TABLE CountsOf (
id int(11) NOT NULL
, CountOf int(11) NOT NULL DEFAULT 0.00
);
INSERT INTO CountsOf
(id, CountOf )
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
UPDATE taxonomy_master,CountsOf
SET taxonomy_master.product_count=CountsOf.CountOf
WHERE taxonomy_master.id=CountsOf.id;
instead of using nested query ,
you can separated this query to 2 or 3 queries,
and in php insert the result of the inner query to the out query ,
its faster !
#haim-evgi Separating the queries will not increase the speed significantly, it will just shift the load from the DB server to the Web server and create overhead of moving data between the two servers.
I am not sure with the appropriate indexes you run such query 7 minutes. Could you please show the table structure of the tables involved in these queries.
Seems like you need the following indices:
INDEX BTREE('make_id') on vehicle_catalog
INDEX BTREE('v_id') on taxonomy_master