When I execute this query like this they take so much execution time because user_fans table contain 10000 users entries. How can I optimize it?
Query
SELECT uf.`user_name`,uf.`user_id`,
#post := (SELECT COUNT(*) FROM post WHERE user_id = uf.`user_id`) AS post,
#post_comment_likes := (SELECT COUNT(*) FROM post_comment_likes WHERE user_id = uf.`user_id`) AS post_comment_likes,
#post_comments := (SELECT COUNT(*) FROM post_comments WHERE user_id = uf.`user_id`) AS post_comments,
#post_likes := (SELECT COUNT(*) FROM post_likes WHERE user_id = uf.`user_id`) AS post_likes,
(#post+#post_comments) AS `sum_post`,
(#post_likes+#post_comment_likes) AS `sum_like`,
((#post+#post_comments)*10) AS `post_cal`,
((#post_likes+#post_comment_likes)*5) AS `like_cal`,
((#post*10)+(#post_comments*10)+(#post_likes*5)+(#post_comment_likes*5)) AS `total`
FROM `user_fans` uf ORDER BY `total` DESC lIMIT 20
I would try to simplify this COMPLETELY by putting triggers on your other tables, and just adding a few columns to your User_Fans table... One for each respective count() you are trying to get... from Posts, PostLikes, PostComments, PostCommentLikes.
When a record is added to whichever table, just update your user_fans table to add 1 to the count... it will be virtually instantaneous based on the user's key ID anyhow. As for the "LIKES"... Similar, only under the condition that something is triggered as a "Like", add 1.. Then your query will be a direct math on the single record and not rely on ANY joins to compute a "weighted" total value. As your table gets even larger, the queries too will get longer as they have more data to pour through and aggregate. You are going through EVERY user_fan record which in essence is querying every record from all the other tables.
All that being said, keeping the tables as you have them, I would restructure as follows...
SELECT
uf.user_name,
uf.user_id,
#pc := coalesce( PostSummary.PostCount, 000000 ) as PostCount,
#pl := coalesce( PostLikes.LikesCount, 000000 ) as PostLikes,
#cc := coalesce( CommentSummary.CommentsCount, 000000 ) as PostComments,
#cl := coalesce( CommentLikes.LikesCount, 000000 ) as CommentLikes,
#pc + #cc AS sum_post,
#pl + #cl AS sum_like,
#pCalc := (#pc + #cc) * 10 AS post_cal,
#lCalc := (#pl + #cl) * 5 AS like_cal,
#pCalc + #lCalc AS `total`
FROM
( select #pc := 0,
#pl := 0,
#cc := 0,
#cl := 0,
#pCalc := 0
#lCalc := 0 ) sqlvars,
user_fans uf
LEFT JOIN ( select user_id, COUNT(*) as PostCount
from post
group by user_id ) as PostSummary
ON uf.user_id = PostSummary.User_ID
LEFT JOIN ( select user_id, COUNT(*) as LikesCount
from post_likes
group by user_id ) as PostLikes
ON uf.user_id = PostLikes.User_ID
LEFT JOIN ( select user_id, COUNT(*) as CommentsCount
from post_comment
group by user_id ) as CommentSummary
ON uf.user_id = CommentSummary.User_ID
LEFT JOIN ( select user_id, COUNT(*) as LikesCount
from post_comment_likes
group by user_id ) as CommentLikes
ON uf.user_id = CommentLikes.User_ID
ORDER BY
`total` DESC
LIMIT 20
My variables are abbreviated as
"#pc" = PostCount
"#pl" = PostLikes
"#cc" = CommentCount
"#cl" = CommentLike
"#pCalc" = weighted calc of post and comment count * 10 weighted value
"#lCalc" = weighted calc of post and comment likes * 5 weighted value
The LEFT JOIN to prequeries runs those queries ONCE through, then the entire thing is joined instead of being hit as a sub-query for every record. By using the COALESCE(), if there are no such entries in the LEFT JOINed table results, you won't get hit with NULL values messing up the calcs, so I've defaulted them to 000000.
CLARIFICATION OF YOUR QUESTIONS
You can have any QUERY as an "AS AliasResult". The "As" can also be used to simplify any long table names for simpler readability. Aliases can also be using the same table but as a different alias to get similar content, but for different purpose.
select
MyAlias.SomeField
from
MySuperLongTableNameInDatabase MyAlias ...
select
c.LastName,
o.OrderAmount
from
customers c
join orders o
on c.customerID = o.customerID ...
select
PQ.SomeKey
from
( select ST.SomeKey
from SomeTable ST
where ST.SomeDate between X and Y ) as PQ
JOIN SomeOtherTable SOT
on PQ.SomeKey = SOT.SomeKey ...
Now, the third query above is not practical requiring the ( full query resulting in alias "PQ" representing "PreQuery" ). This could be done if you wanted to pre-limit a certain set of other complex conditions and wanted a smaller set BEFORE doing extra joins to many other tables for all final results.
Since a "FROM" does not HAVE to be an actual table, but can be a query in itself, any place else used in the query, it has to know how to reference this prequery resultset.
Also, when querying fields, they too can be "As FinalColumnName" to simplify results to where ever they will be used too.
select
CONCAT( User.Salutation, User.LastName ) as CourtesyName
from ...
select
Order.NonTaxable
+ Order.Taxable
+ ( Order.Taxable * Order.SalesTaxRate ) as OrderTotalWithTax
from ...
The "As" columnName is NOT required being an aggregate, but is most commonly seen that way.
Now, with respect to the MySQL variables... If you were doing a stored procedure, many people will pre-declare them setting their default values before the rest of the procedure. You can do them in-line in a query by just setting and giving that result an "Alias" reference. When doing these variables, the select will simulate always returning a SINGLE RECORD worth of the values. Its almost like an update-able single record used within the query. You don't need to apply any specific "Join" conditions as it may not have any bearing on the rest of the tables in a query... In essence, creates a Cartesian result, but one record against any other table will never create duplicates anyhow, so no damage downstream.
select
...
from
( select #SomeVar := 0,
#SomeDate := curdate(),
#SomeString := "hello" ) as SQLVars
Now, how the sqlvars work. Think of a linear program... One command is executed in the exact sequence as the query runs. That value is then re-stored back in the "SQLVars" record ready for the next time through. However, you don't reference it as SQLVars.SomeVar or SQLVars.SomeDate... just the #SomeVar := someNewValue. Now, when the #var is used in a query, it is also stored as an "As ColumnName" in the result set. Some times, this can be just a place-holder computed value in preparation of the next record. Each value is then directly available for the next row. So, given the following sample...
select
#SomeVar := SomeVar * 2 as FirstVal,
#SomeVar := SomeVar * 2 as SecondVal,
#SomeVar := SomeVar * 2 as ThirdVal
from
( select #SomeVar := 1 ) sqlvars,
AnotherTable
limit 3
Will result in 3 records with the values of
FirstVal SecondVal ThirdVal
2 4 8
16 32 64
128 256 512
Notice how the value of #SomeVar is used as each column uses it... So even on the same record, the updated value is immediately available for the next column... That said, now look at trying to build a simulated record count / ranking per each customer...
select
o.CustomerID,
o.OrderID
#SeqNo := if( #LastID = o.CustomerID, #SeqNo +1, 1 ) as CustomerSequence,
#LastID := o.CustomerID as PlaceHolderToSaveForNextRecordCompare
from
orders o,
( select #SeqNo := 0, #LastID := 0 ) sqlvars
order by
o.CustomerID
The "Order By" clause forces the results to be returned in sequence first. So, here, the records per customer are returned. First time through, LastID is 0 and customer ID is say...5. Since different, it returns 1 as the #SeqNo, THEN it preserves that customer ID into the #LastID field for the next record. Now, next record for customer... Last ID is the the same, so it takes the #SeqNo (now = 1), and adds 1 to 1 and becomes #2 for the same customer... Continue on the path...
As for getting better at writing queries, take a look at the MySQL tag and look at some of the heavy contributors. Look into the questions and some of the complex answers and how problems solving works. Not to say there are not others with lower reputation scores just starting out and completely competent, but you'll find who gives good answers and why's. Look at their history of answers posted too. The more you read and follow, the more you'll get a better handle on writing more complex queries.
You can convert this query to Group By clause, instead of using Subquery for each column.
You can create indexes on the relationship parameters ( it will be the most helpful way of optimizing your query response ).
1000 user records isn't much data at all.
There may be work you can do on the database itself:
1) Have you got the relevant indexes set on the foreign keys (indexes set on user_id in each of the tables)? Try running EXPLAIN before the query http://www.slideshare.net/phpcodemonkey/mysql-explain-explained
2) Are your data types correct?
See the difference between #me(see image 1) and #DRapp(see image 2) Query execution time and explain. When i read #Drapp answer i realized that what am i doing wrong in this query and why my query take so much time basically answer is so simple my query dependent on subquery or #Drapp used derived (temporary/file sort) with the help of session variables , Alias and joins...
image 1 exe time (00:02:56:321)
image 2 exe time (00:00:32:860)
Related
At the end of this process I need to have a maximum of 15 records for each type in a table
My (hypothetical) table "stickorder" has 3 columns: StickColor, OrderNumber, PrimeryKey. (OrderNumber, PrimeryKey are unique)
I can only handle 15 orders for each stick color So I need to delete all the extra orders (They will be processed another day and are in a master table so I don't need them in this table.)
I have tried some similar solutions on this site but nothing seem to work, this is the closest
INSERT INTO stickorder2
(select posts_ordered.*
from (
select
stickorder.*,
#row:=if(#last_order=stickorder.OrderNumber, #row+1, 1) as row,
#last_orders:=stickorder.OrderNumber
from
stickorder inner join
(select OrderNumber from
(select distinct OrderNumber
from stickorder
order by OrderNumber) limit_orders
) limit_orders
on stickorder.OrderNumber = limit_orders.OrderNumber,
(select #last_order:=0, #row:=0) r
) posts_ordered
where row<=15);
When using insert, you should always list the columns. Alternatively, you might really want create table as.
Then, there are lots of other issues with your query. For instance, you say you want a limit on the number for each color, and yet you have no reference to StickColor in your query. I think you want something more along these lines:
INSERT INTO stickorder2(col1, . . . col2)
select so.*
from (select so.*,
#row:=if(#lastcolor = so.StickColor, #row+1,
if(#lastcolor := so.lastcolor, 1, 1)
) as row
from stickorders so cross join
(select #lastcolor := 0, #row := 0) vars
order by so.StickColor
) so
where row <= 15;
I have a simple process I'm trying to do in a single SQL statement.
I've got a table of players (called tplayers) with columns indicating what their userid and tourneyid are, as well as a "playerpoints" column. I've also got a table called "tscores" which contains scores, a userid and column called "rankpoints" - I want to take the top 3 rows per player with the highest rankpoints and put that value in the corresponding user record in tplayers -- all for a specific tourneyid.
Here's the query:
update tplayers p set playerpoints=
(
select sum(b.mypoints) y from
(
select scorerankpoints as mypoints from tscores t where t.tourneyid=p.tourneyid and p.userid=t.userid and t.scorerankpoints>0 order by scorerankpoints desc limit 3
) as b
) where p.tourneyid='12'
This generates this error: Unknown column 'p.tourneyid' in 'where clause'
I'm basically looking to take the top 3 values of "scorerankpoints" from table tscores and put the summed value into a column in table tplayers called playerpoints,
and I want to do this for all players and scores who have the same tourneyid in their tables.
It appears that the inner reference to p.tourneyid is undefined... Is there a way to do this in a single statement or do I have to break it up?
MySQL has a problem resolving correlated references that are more than one layer deep. This is a hard one to fix.
The following uses variables to enumerate the rows and then choosing the right rows for aggregation in an update/join:
update tplayers p join
(select ts.userid, sum(ts.scorerankpoints) as mypoints
from (select ts.*,
#rn := if(#userid = userid, 1, #rn + 1) as rn,
#userid := #userid
from tscores ts cross join
(select #rn := 0, #userid := '') const
where ts.tourneyid = '12'
order by ts.userid, ts.scorerankpoints desc
) ts
where rn <= 3
) ts
on p.userid = ts.userid
set playerpoints = ts.mypoints
where p.tourneyid = '12' ;
I have something like this:
SELECT id, fruit, pip
FROM plant
WHERE COUNT(*) = 2;
This weird query is self explanatory I guess. COUNT(*) here means the number of rows in plant table. My requirement is that I need to retrieve values from specified fields only if total number of rows in table = 2. This doesn't work but: invalid use of aggregate function COUNT.
I cannot do this:
SELECT COUNT(*) as cnt, id, fruit, pip
FROM plant
WHERE cnt = 2;
for one, it limits the number of rows outputted to 1, and two, it gives the same error: invalid use of aggregate function.
What I can do is instead:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
But then that subquery is the main query re-run. I'm presenting here a small example of the larger part of the problem, though I know an additional COUNT(*) subquery in the given example isn't that big an overhead.
Edit: I do not know why the question is downvoted. The COUNT(*) I'm trying to get is from a view (a temporary table) in the query which is a large query with 5 to 6 joins and additional where clauses. To re-run the query as a subquery to get the count is inefficient, and I can see the bottleneck as well.
Here is the actual query:
SELECT U.UserName, E.Title, AE.Mode, AE.AttemptNo,
IF(AE.Completed = 1, 'Completed', 'Incomplete'),
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
) AS Inst_Count,
(
SELECT COUNT(DISTINCT(AQ.FK_QId))
FROM attempt_question AS AQ
JOIN `question` AS Q
ON Q.PK_Id = AQ.FK_QId
LEFT JOIN actions AS A
ON A.FK_QId = AQ.FK_QId
WHERE AQ.FK_ExcAttemptId = #excAttemptId
AND (
Q.Type = #descQtn
OR Q.Type = #actQtn
AND A.type = 'CTVI.NotImplemented'
AND A.IsDelete = #status
AND (
SELECT COUNT(*)
FROM actions
WHERE FK_QId = A.FK_QId
AND type != 'CTVI.NotImplemented'
AND IsDelete = #status
) = 0
)
) AS NotEvalInst_Count,
(
SELECT COUNT(DISTINCT(FK_QId))
FROM attempt_question AS AQ
WHERE FK_ExcAttemptId = #excAttemptId
AND Mark = #mark
) AS CorrectAns_Count,
E.AllottedTime, AE.TimeTaken
FROM attempt_exercise AS AE
JOIN ctvi_exercise_tblexercise AS E
ON AE.FK_EId = E.PK_EId
JOIN ctvi_user_table AS U
ON AE.FK_UId = U.PK_Id
JOIN ctvi_grade AS G
ON AE.FK_GId = G.PK_GId
WHERE AE.PK_Id = #excAttemptId
-- AND COUNT(AE.*) = #number --the portion in contention.
Kindly ignore the above query and guide me to right direction from the small example query I posted, thanks.
In MySQL, you can only do what you tried:
SELECT id, fruit, pip
FROM plant
WHERE (
SELECT COUNT(*)
FROM plant
) = 2;
or this variation:
SELECT id, fruit, pip
FROM plant
JOIN
(
SELECT COUNT(*) AS cnt
FROM plant
) AS c
ON c.cnt = 2;
Whether the 1st or the 2nd is more efficient, depends on the version of MySQL (and the optimizer). I would bet on the 2nd one, on most versions.
In other DBMSs, that have window functions, you can also do the first query that #Andomar suggests.
Here is a suggestion to avoid the bottleneck of calculating the derived table twice, once to get the rows and once more to get the count. If the derived table is expensive to be calculated, and its rows are thousands or millions, calculating them twice only to throw them away, is a problem, indeed. This may improve efficiency as it will limit the intermediately (twice) calculated rows to 3:
SELECT p.*
FROM
( SELECT id, fruit, pip
FROM plant
LIMIT 3
) AS p
JOIN
( SELECT COUNT(*) AS cnt
FROM
( SELECT 1
FROM plant
LIMIT 3
) AS tmp
) AS c
ON c.cnt = 2 ;
After re-reading your question, you're trying to return rows only if there are 2 rows in the entire table. In that case I think your own example query is already the best.
On another DBMS, you could use a Windowing function:
select *
from (
select *
, count(*) over () as cnt
from plant
) as SubQueryAlias
where cnt = 2
But the over clause is not supported on MySQL.
old wrong anser below
The where clause works before grouping. It works on single rows, not groups of rows, so you can't use aggregates like count or max in the where clause.
To set filters that work on groups of rows, use the having clause. It works after grouping and can be used to filter with aggregates:
SELECT id, fruit, pip
FROM plant
GROUP BY
id, fruit, pip
HAVING COUNT(*) = 2;
The other answers do not fulfill the original question which was to filter the results "without using a subquery".
You can actually do this by using a variable in 2 consecutive MySQL statements:
SET #count=0;
SELECT * FROM
(
SELECT id, fruit, pip, #count:=#count+1 AS count
FROM plant
WHERE
) tmp
WHERE #count = 2;
Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into get top 3 per group. At first this was going well and I used several recommendations and answers from this site to achieve this (Most Viewed Products). However, I'm running into difficulty with my last one "Best Selling Products" because of multiple joins.
Basically, I need to get all products in order by # highest sales per product in which the maximum products per vendor is 3 I've got multiple tables being joined to create the original query, and each time I attempt to use the variables to generate rankings it produces invalid results. The following should help better understand the issue (I've removed unnecessary fields for brevity):
Product Table
productid | vendorid | approved | active | deleted
Vendor Table
vendorid | approved | active | deleted
Order Table
orderid | `status` | deleted
Order Items Table
orderitemid | orderid | productid | price
Now, my original query to get all results is as follows:
SELECT COUNT(oi.price) AS `NumSales`,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;
Finally, (and here's where I'm stumped), I'm trying to alter the above statement such that I received only the top 3 product (by # sold) per vendor. I'd add what I have so far, but I'm embarrassed to do so and this question is already a wall of text. I've tried variables but keep getting invalid results. Any help would be greatly appreciated.
Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.
select
vendorid, productid, NumSales
from
(
select
vendorid, productid, NumSales,
#r := IF(#g=vendorid,#r+1,1) RowNum,
#g := vendorid
from (select #g:=null) initvars
CROSS JOIN
(
SELECT COUNT(oi.price) AS NumSales,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY p.vendorid, p.productid
ORDER BY p.vendorid, NumSales DESC
) T
) U
WHERE RowNum <= 3
ORDER BY NumSales DESC
LIMIT 100;
The approach here is
Group by to get NumSales
Use variables to row number the sales per vendor/product
Filter the numbered dataset to allow for a max of 3 per vendor
Order the remaining by NumSales DESC and return only 100
I like this elegant solution, however when I run an adapted but similar query on my dev machine I get a non-deterministic result-set returned. I believe this is due to the way the MySql optimiser deals with assigning and reading user variables within the same statement.
From the docs:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.
Just adding this note here in case someone else comes across this weird behaviour.
The answer given by #RichardTheKiwi worked great and got me 99% of the way there! I am using MySQL and was only getting the first row of each group marked with a row number, while the rest of the rows remained NULL. This resulted in the query returning only the top hit for each group rather than the first three rows. To fix this, I had to initialize #r in the initvars subquery. I changed,
from (select #g:=null) initvars
to
from (select #g:=null, #r:=null) initvars
You could also initialize #r to 0 and it would work the same. And for those less familiar with this type of syntax, the additional section is reading through each sorted group and if a row has the same vendorid as the previous row, which is tracked with the #g variable, it increments the row number, which is stored in the variable #r. When this process reaches the next group with a new vendorid, the IF statement will no longer evaluate as true and the #r variable (and thereby the RowNum) will be reset to 1.
I have two tables as follows:
Contract
|
Contractuser
My job was to fetch latest invoice date for each contract number from Contractuser table and display results. The resultant table was as follows:
Result Table
Now I wanted to get a auto-increment column to display as the first column in my result set.
I used the following query for it:
SELECT #i:=#i+1 AS Sno,a.ContractNo,a.SoftwareName,a.CompanyName,b.InvoiceNo,b.InvoiceDate,
b.InvAmount,b.InvoicePF,max(b.InvoicePT) AS InvoicePeriodTo,b.InvoiceRD,b.ISD
FROM contract as a,contractuser as b,(SELECT #i:=0) AS i
WHERE a.ContractNo=b.ContractNo
GROUP BY b.ContractNo
ORDER BY a.SoftwareName ASC;
But it seems that the auto-increment is getting performed before the group by procedure because of which serial numbers are getting displayed in a non contiguous manner.
GROUP BY and variables don't necessarily work as expected. Just use a subquery:
SELECT (#i := #i + 1) AS Sno, c.*
FROM (SELECT c.ContractNo, c.SoftwareName, c.CompanyName, cu.InvoiceNo, cu.InvoiceDate,
cu.InvAmount, cu.InvoicePF, max(cu.InvoicePT) AS InvoicePeriodTo, cu.InvoiceRD, cu.ISD
FROM contract c JOIN
contractuser as b
ON c.ContractNo = cu.ContractNo
GROUP BY cu.ContractNo
ORDER BY c.SoftwareName ASC
) c CROSS JOIN
(SELECT #i := 0) params;
Notes:
I also fixed the JOIN syntax. Never use commas in the FROM clause.
I also added reasonable table aliases -- abbreviations for the tables. a and b don't mean anything, so they make the query harder to follow.
I left the GROUP BY with only one key. It should really have all the unaggregated keys but this is allowed under some circumstances.
SELECT #row_no := IF(#prev_val = lpsm.lit_no, #row_no + 1, 1) AS row_num,#prev_val := get_pad_value(lpsm.lit_no,26) LAWSUIT_NO,lpsm.cust_rm_no
FROM lit_person_sue_map lpsm,(SELECT #row_no := 0) x,(SELECT #prev_val := '') y
ORDER BY lpsm.lit_no ASC;
This will return sequence number group by lit_no;