MySQL: Update rows in table by iterating and joining with another one

MySQL: Update rows in table by iterating and joining with another one - mysql

I have a table papers
CREATE TABLE `papers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(1000) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`my_count` int(11) NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `title_fulltext` (`title`),
) ENGINE=MyISAM AUTO_INCREMENT=1617432 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
and another table link_table
CREATE TABLE `auth2paper2loc` (
`auth_id` int(11) NOT NULL,
`paper_id` int(11) NOT NULL,
`loc_id` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The id papers.id from the upper table is the same one like the link_table.paper_id in the second table. I want to iterate through every row in the upper table and count how many times this its id appears in the second table and store the "count" into the column "my_count" in the upper table.
Example: If The paper with tid = 1 = paper_id appears 5 times in the table link_table, then my_count = 5.
I can do that by a Python script but it results in too many querys and I have millions of entrys so it is really slow. And I can't figure out the right syntax to make this right inside of MySQL.
This is what I am iterating about in a for-loop in Python (too slow):
SELECT count(link_table.auth_id) FROM link_table
WHERE link_table.paper_id = %s
UPDATE papers SET auth_count = %s WHERE id = %s
Could someone please tell me how to create this one? There must be a way to nest this and put it directly in MySQL so it is faster, isn't there?

How does this perform for you?
update papers a
set my_count = (select count(*)
from auth2paper2loc b
where b.paper_id = a.id);

Use either:
UPDATE PAPERS
SET my_count = (SELECT COUNT(b.paper_id)
FROM AUTH2PAPERLOC b
WHERE b.paper_id = PAPERS.id)
...or:
UPDATE PAPERS
LEFT JOIN (SELECT b.paper_id,
COUNT(b.paper_id) AS numCount
FROM AUTH2PAPERLOC b
GROUP BY b.paper_id) x ON x.paper_id = PAPERS.id
SET my_count = COALESCE(x.numCount, 0)
The COALESCE is necessary to convert the NULL to a zero when there aren't any instances of PAPERS.id in the AUTH2PAPERLOC table.

update papers left join
(select paper_id, count(*) total from auth2paper2loc group by paper_id) X
on papers.id = X.paper_id
set papers.my_count = IFNULL(X.total, 0)

Related

Speed up the following queries by merging them into one

I need to find a way for merging two queries into one.
This is the structure of the tables I am using:
(there are also other fields on contents and ratings but I didn't add them, since they aren't needed for this.
-- Create syntax for TABLE 'contents'
CREATE TABLE `contents` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`rating` decimal(5,4) DEFAULT '0.0000',
`ratingsCount` int(8) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8;
-- Create syntax for TABLE 'ratings'
CREATE TABLE `ratings` (
`what` int(11) DEFAULT NULL,
`time` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`rating` decimal(3,2) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Since last time I asked something here on stack overflow someone told me to write the code I'm using right now. Here it is:
db.query("SELECT AVG(rating) `avg`, COUNT(rating) cnt FROM `ratings` WHERE what = ?", [req.params.id], function(err, avg) {
db.query("UPDATE contents SET `rating` = ?, `ratingsCount` = ? WHERE id = ?", [avg[0].avg, avg[0].cnt, req.params.id], function() { });
});

You could use an UPDATE/JOIN combination to do it in a single round trip to the database;
UPDATE contents c
JOIN (
SELECT what, AVG(rating) rating, COUNT(rating) ratingsCount
FROM ratings WHERE what = ? GROUP BY what
) r
ON c.id = r.what
SET c.rating = r.rating, c.ratingsCount = r.ratingsCount
An SQLfiddle to test with.
The subquery will find the average/count for the value of "what", the outer query will just join that information to update contents.

This one will do the work:
UPDATE contents c SET
rating=(SELECT AVG(rating) FROM ratings r WHERE r.what=c.id),
ratingsCount=(SELECT COUNT(rating) FROM ratings r WHERE r.what=c.id);
You'll need to add index for id and what columns.

Mysql query optimisation to transform subquery in join without DISTINCT

I have tables:
CREATE TABLE IF NOT EXISTS `bk_cart_rule` (
`id_cart_rule` int(10) unsigned NOT NULL DEFAULT '0',
`cart_rule_restriction` tinyint(1) unsigned NOT NULL DEFAULT '0',
KEY `id_cart_rule` (`id_cart_rule`),
KEY `cart_rule_restriction` (`cart_rule_restriction`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `bk_cart_rule_combination` (
`id_cart_rule_1` int(10) unsigned NOT NULL,
`id_cart_rule_2` int(10) unsigned NOT NULL,
KEY `id_cart_rule_1` (`id_cart_rule_1`),
KEY `id_cart_rule_2` (`id_cart_rule_2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `bk_cart_rule_lang` (
`id_cart_rule` int(10) unsigned NOT NULL,
`id_lang` int(10) unsigned NOT NULL,
KEY `id_cart_rule` (`id_cart_rule`),
KEY `id_lang` (`id_lang`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And a query :
SELECT SQL_NO_CACHE cr.*, crl.*, 1 as selected FROM bk_cart_rule cr
LEFT JOIN bk_cart_rule_lang crl ON (cr.id_cart_rule = crl.id_cart_rule AND crl.id_lang = 2)
WHERE cr.id_cart_rule != 375 AND
( cr.cart_rule_restriction = 0 OR
cr.id_cart_rule IN (
SELECT IF(id_cart_rule_1 = 375, id_cart_rule_2, id_cart_rule_1) FROM bk_cart_rule_combination WHERE 375 = id_cart_rule_1 OR 375 = id_cart_rule_2 ) )
Obvious optimization is:
SELECT SQL_NO_CACHE DISTINCT cr.*, crl.* 1 as selected FROM bk_cart_rule cr
LEFT JOIN bk_cart_rule_lang crl ON (cr.id_cart_rule = crl.id_cart_rule AND crl.id_lang = 2)
LEFT JOIN bk_cart_rule_combination crc ON (375 = crc.id_cart_rule_1 AND cr.id_cart_rule = crc.id_cart_rule_2) OR (375 = crc.id_cart_rule_2 AND cr.id_cart_rule = crc.id_cart_rule_1)
WHERE cr.id_cart_rule != 375 AND (cr.cart_rule_restriction = 0 OR NOT ISNULL(crc.id_cart_rule_1))
But how can i get rid off DISTINCT (in bk_cart_rule_combination I've two-way combinations : )
id_cart_rule_1 id_cart_rule_2
375 776
776 375
Or maybe there is a better optimization possible?

If the ordering of the cart rules is not important, then add the constraint that the id for the first one is less than the id of the second one. That is, put them in the table in order.
Sadly, MySQL doesn't allow simple check constraints. Instead, you have to implement it in some other way. Here are three:
Implement an insert/update trigger to maintain the ordering (and prevent duplicates).
Implement the logic on the application side.
Wrap all data modifications in stored procedures and implement the logic in the stored procedure.
If you don't want to go through all that trouble (which would probably help with other issues), you can replace the select distinct with:
group by least(id_cart_rule_1, id_cart_rule_2), greatest(id_cart_rule_1, id_cart_rule_2)

complicated sql query returns a result with empty tables

I have three empty tables
--
-- Tabellenstruktur für Tabelle `projects`
--
CREATE TABLE IF NOT EXISTS `projects` (
`id_project` int(11) NOT NULL AUTO_INCREMENT,
`id_plan` int(11) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`description` longtext NOT NULL,
PRIMARY KEY (`id_project`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
-- --------------------------------------------------------
--
-- Tabellenstruktur für Tabelle `project_plans`
--
CREATE TABLE IF NOT EXISTS `project_plans` (
`id_plan` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` longtext NOT NULL,
`max_projects` int(11) DEFAULT NULL,
`max_member` int(11) DEFAULT NULL,
`max_filestorage` bigint(20) NOT NULL DEFAULT '3221225472' COMMENT '3GB Speicherplatz',
PRIMARY KEY (`id_plan`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
-- --------------------------------------------------------
--
-- Tabellenstruktur für Tabelle `project_users`
--
CREATE TABLE IF NOT EXISTS `project_users` (
`id_user` int(11) NOT NULL,
`id_project` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
All these tables are empty but i get a result with my query?
my query:
SELECT
A.id_plan,
A.name AS plan_name,
A.description AS plan_description,
A.max_projects,
A.max_member,
A.max_filestorage,
B.id_plan,
B.name AS project_name,
B.description AS project_description,
C.id_user,
C.id_project,
COUNT(*) AS max_project_member
FROM
".$this->config_vars["projects_plans_table"]." AS A
LEFT JOIN
".$this->config_vars["projects_table"]." AS B
ON
B.id_plan = A.id_plan
LEFT JOIN
".$this->config_vars["projects_user_table"]." AS C
ON
C.id_project = B.id_project
WHERE
C.id_project = '".$id."'
&& B.deleted = '0'
i think the problem is the COUNT (*) AS ...
how i can solve the problem?

For one, you are getting a record explicitly due to the COUNT(). Even though you have no records, you are asking the engine how many records which at worst case will return zero. Count(), like other aggregates are anticipated to have a group by, so even though you don't have one, you are still asking.
So the engine is basically stating hey... there are no records, but I have to send you a record so you can get the count() column to look at and do with what you will. So, it is doing what you asked.
Now, for the comment to the other question where you asked...
Yes but i want to count the project member from a project, how i can count the users from project_users where all users have the id_project 1.
Since you only care about a count, and not the specific WHO involved, you can get this result directly from the project_users table (which should have an index on both the ID_User and another on the ID_Project. Then
select count(*)
from project_users
where id_project = 1
To expand from basis of your original question to get the extra details, I would do...
select
p.id_project,
p.id_plan,
p.name as projectName,
p.description as projectDescription,
pp.name as planName,
pp.description as planDescription,
pp.max_projects,
pp.max_member,
pp.max_filestorage,
PJCnt.ProjectMemberCount
from
( select id_project,
count(*) as ProjectMemberCount
from
project_users
where
id_project = 1 ) PJCnt
JOIN Projects p
on PJCnt.id_project = p.id_project
JOIN Project_Plans PP
on p.id_plan = pp.id_plan
Now, based on this layout of tables, a plan can have a max member count, but there is nothing indicating max members for the plan based on all projects, or max per SINGLE project. So, if a plan allows for 20 people, can there be 20 people for 10 different projects under the same plan? That's something only you would know the impact of... just something to consider what you are asking for.

Your cleaned-up query should look like :
See sqlfidle demo as well : http://sqlfiddle.com/#!2/e693f5/9
SELECT
A.id_plan,
A.name AS plan_name,
A.description AS plan_description,
A.max_projects,
A.max_member,
A.max_filestorage,
B.id_plan,
B.name AS project_name,
B.description AS project_description,
C.id_user,
C.id_project,
COUNT(*) AS max_project_member
FROM
project_plans AS A
LEFT JOIN
projects AS B
ON
B.id_plan = A.id_plan
LEFT JOIN
project_users AS C
ON
C.id_project = B.id_project
WHERE
C.id_project = '".$id."';
This will return you null values for all the cols from the select because you have one legit return form the result set and that is the count(*) output 0.
To fix this just add a group by at the end (see group by example http://sqlfiddle.com/#!2/14d46/2) or
Remove the count(*) and the null values will be gone as well as the count(*) values 0
See simple sql example here : http://sqlfiddle.com/#!2/ab7dd/5
Just comment the count() and you fixed you null problem!

Update statement causes fields to be updated with NULL or maximum value

If you had to pick one of the two following queries, which would you choose and why:
UPDATE `table1` AS e
SET e.points = e.points+(
SELECT points FROM `table2` AS ep WHERE e.cardnbr=ep.cardnbr);
or:
UPDATE `table1` AS e
INNER JOIN
(
SELECT points, cardnbr
FROM `table2`
) AS ep ON (e.cardnbr=ep.cardnbr)
SET e.points = e.points+ep.points;
Tables' definitions:
CREATE TABLE `table1` (
`cardnbr` int(10) DEFAULT NULL,
`name` varchar(50) DEFAULT NULL,
`points` decimal(7,3) DEFAULT '0.000',
`email` varchar(50) NOT NULL DEFAULT 'user#company.com',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=25205 DEFAULT CHARSET=latin1$$
CREATE TABLE `table2` (
`cardnbr` int(10) DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
`points` decimal(7,3) DEFAULT '0.000',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
UPDATE: BOTH are causing problems the first is causing non matched rows to update into NULL.
The second is causing them to update into the max value 999.9999 (decimal 7,3).
PS the cardnbr field is NOT a key

I prefer the second one..reason for that is
When using JOIN the databse can create an execution plan that is better for your query and save time whereas subqueries (like your first one ) will run all the queries and load all the datas which may take time.
i think subqueries is easy to read but performance wise JOIN is faster...

First, the two statements are not equivalent, as you found out yourself. The first one will update all rows of table1, putting NULL values for those rows that have no related rows in table2.
So the second query looks better because it doesn't update all rows of table1. It could be written in a more simpel way, like this though:
UPDATE table1 AS e
INNER JOIN table2 AS ep
ON e.cardnbr = ep.cardnbr
SET e.points = e.points + ep.points ;
So, the 2nd query would be the best to use, if cardnbr was the primary key of table2. Is it?
If it isn't, then which values from table2 should be used for the update of table1 (added to points)? All of them? You could use this:
UPDATE table1 AS e
INNER JOIN
( SELECT SUM(points) AS points, cardnbr
FROM table2
GROUP BY cardnbr
) AS ep ON e.cardnbr = ep.cardnbr
SET
e.points = e.points + ep.points ;
Just one of them? That would require some other derived table, depending on what you want.

MySQL query killing my server

Looking at this query there's got to be something bogging it down that I'm not noticing. I ran it for 7 minutes and it only updated 2 rows.
//set product count for makes
$tru->query->run(array(
'name' => 'get-make-list',
'sql' => 'SELECT id, name FROM vehicle_make',
'connection' => 'core'
));
while($tempMake = $tru->query->getArray('get-make-list')) {
$tru->query->run(array(
'name' => 'update-product-count',
'sql' => 'UPDATE vehicle_make SET product_count = (
SELECT COUNT(product_id) FROM taxonomy_master WHERE v_id IN (
SELECT id FROM vehicle_catalog WHERE make_id = '.$tempMake['id'].'
)
) WHERE id = '.$tempMake['id'],
'connection' => 'core'
));
}
I'm sure this query can be optimized to perform better, but I can't think of how to do it.
vehicle_make = 45 rows
taxonomy_master = 11,223 rows
vehicle_catalog = 5,108 rows
All tables have appropriate indexes
UPDATE: I should note that this is a 1-time script so overhead isn't a big deal as long as it runs.
CREATE TABLE IF NOT EXISTS `vehicle_make` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(32) NOT NULL,
`product_count` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=46 ;
CREATE TABLE IF NOT EXISTS `taxonomy_master` (
`product_id` int(10) NOT NULL,
`v_id` int(10) NOT NULL,
`vehicle_requirement` varchar(255) DEFAULT NULL,
`is_sellable` enum('True','False') DEFAULT 'True',
`programming_override` varchar(25) DEFAULT NULL,
PRIMARY KEY (`product_id`,`v_id`),
KEY `idx2` (`product_id`),
KEY `idx3` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `vehicle_catalog` (
`v_id` int(10) NOT NULL,
`id` int(11) NOT NULL,
`v_make` varchar(255) NOT NULL,
`make_id` int(11) NOT NULL,
`v_model` varchar(255) NOT NULL,
`model_id` int(11) NOT NULL,
`v_year` varchar(255) NOT NULL,
PRIMARY KEY (`v_id`,`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx` (`v_make`,`v_model`,`v_year`),
UNIQUE KEY `idx2` (`v_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Update: The successful query to get what I needed is here....
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;

without the tables/columns this is my best guess from reverse engineering the given queries:
UPDATE m
SET product_count =COUNT(t.product_id)
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.name
The given code loops over each make, and then runs a query the counts for each. My answer just does them all in one query and should be a lot faster.
have an index for each of these:
vehicle_make.id cover on name
vehicle_catalog.id cover make_id
taxonomy_master.v_id
EDIT
give this a try:
CREATE TEMPORARY TABLE CountsOf (
id int(11) NOT NULL
, CountOf int(11) NOT NULL DEFAULT 0.00
);
INSERT INTO CountsOf
(id, CountOf )
SELECT
m.id,COUNT(t.product_id) AS CountOf
FROM taxonomy_master t
INNER JOIN vehicle_catalog v ON t.v_id=v.id
INNER JOIN vehicle_make m ON v.make_id=m.id
GROUP BY m.id;
UPDATE taxonomy_master,CountsOf
SET taxonomy_master.product_count=CountsOf.CountOf
WHERE taxonomy_master.id=CountsOf.id;

instead of using nested query ,
you can separated this query to 2 or 3 queries,
and in php insert the result of the inner query to the out query ,
its faster !

#haim-evgi Separating the queries will not increase the speed significantly, it will just shift the load from the DB server to the Web server and create overhead of moving data between the two servers.
I am not sure with the appropriate indexes you run such query 7 minutes. Could you please show the table structure of the tables involved in these queries.

Seems like you need the following indices:
INDEX BTREE('make_id') on vehicle_catalog
INDEX BTREE('v_id') on taxonomy_master

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008