Two MySQL functions work independently but not when combined (RAND()?) - mysql

Two MySQL functions work as expected independently, but together they return random results. Consider these two queries:
Query 1
This returns me the row where id=8 obviously:
SELECT * FROM table_categories WHERE id = 8
Query 2
According to my table setup this consistently returns a number between 1 and 16:
SELECT cat.id FROM table_categories cat
LEFT JOIN table_user_to_category u2c
ON u2c.cat_id = cat.id AND u2c.user_id = 0
ORDER BY IFNULL(u2c.count,0), RAND() LIMIT 1
Combined
Oddly enough, this sometimes returns a row from table_categories, sometimes returns 2 rows, sometimes 3, sometimes none. What the heck?
SELECT * FROM table_categories WHERE id = (
SELECT cat.id FROM table_categories cat
LEFT JOIN table_user_to_category u2c
ON u2c.cat_id = cat.id AND u2c.user_id = 0
ORDER BY IFNULL(u2c.count,0), RAND() LIMIT 1)
Is this because of RAND()? I can't figure it out! Seems like odd behavior to me, but I'm a relative newbie.

Are you sure id is defined as a UNIQUE or PRIMARY KEY? Otherwise you could easily have multiple rows with the same ID, which are all being returned if the random number picks them.

Related

MySQL query that limits first join to specific number of rows

I'm trying to run a query that joins 3 tables and I want to limit the first join to just 5 rows. The end result can return any number of rows so I don't want to add LIMIT to the end of the query.
Here is the query I have, which works but obviously does not limit the first join to 5 rows. I've attempted a subquery, which I believe is the only way to accomplish this, and everything I try gives an error. I can't seem to apply examples I have seen, to my situation.
SELECT mw_customer.customer_id, mw_customer.customer_uid, mw_campaign.customer_id, mw_campaign.campaign_id, mw_campaign.type, mw_campaign.status, mw_campaign_delivery_log.campaign_id, mw_campaign_delivery_log.subscriber_id
FROM mw_customer
JOIN mw_campaign
ON mw_customer.customer_id = mw_campaign.customer_id
AND mw_customer.customer_uid = 'XYZ'
AND mw_campaign.type = 'regular'
AND mw_campaign.status = 'sent'
JOIN mw_campaign_delivery_log
ON mw_campaign.campaign_id = mw_campaign_delivery_log.campaign_id
So what I want to do is limit the "JOIN mw_customer" to a maximum of 5 rows and then after the JOIN mw_campaign_delivery_log, there can be any number of rows.
Thanks
Wrap the first join in a subquery with LIMIT 5.
SELECT t.customer_id, t.customer_uid, t.campaign_id, t.type, t.status, l.subscriber_id
FROM (SELECT cus.customer_id, cus.customer_uid, cam.campaign_id, cam.type, cam.status
FROM mw_customer AS cus
JOIN mw_campaign AS cam
ON cus.customer_id = cam.customer_id
WHERE cus.customer_uid = 'XYZ'
AND cam.type = 'regular'
AND cam.status = 'sent'
LIMIT 5) AS t
JOIN mw_campaign_delivery_log AS l
ON t.campaign_id = l.campaign_id
Note that LIMIT without ORDER BY means that the 5 rows selected will be unpredictable.

SQL optimize "IS NULL" query with LEFT JOIN

I'm working on a project involving words and its translations. One of the queries a translator must task frequently (once every 10 sec or so) is:
SELECT * FROM pwords p
LEFT JOIN words w ON p.id = w.wordid
WHERE w.code IS NULL
OR (w.code <> "USER1" AND w.code <> "USER2")
ORDER BY rand() LIMIT 10
To receive a word to be translated, which the user has not translated already. In this case we want to disallow words input by USER2
The pwords table has around 66k entries and the words table has around 55k entries.
This query takes about 500 seconds to complete, whereas if I remove the IS NULL the query takes 0.0245 ms. My question here is: is there a way to optimize this query? I really need to squeeze the numbers.
The scenario is: USER1 does not want any database entries from USER2 in the words table. It does not want it's own database entries from the same table. Therefore I need to have the IS NULL or a similar method to get entries from all users except USER1 and USER2, either from other users or NULL entries.
tl;dr So my question is: is there a way to make this query run faster? Is "IS NULL" optimizable?
Any and all help is greatly appreciated.
You can try using subquery in order to filter the rows (use WHERE statement) as soon as possible:
SELECT *
FROM pwords p
LEFT JOIN
(SELECT *
FROM words w
WHERE (w.code <> "USER1" AND w.code <> "USER2")) subq
ON p.id = subq.wordid
WHERE w.code IS NULL
ORDER BY rand() LIMIT 10
another (and maybe a more efficient option) is using NOT EXISTS statement:
SELECT *
FROM pwords p
WHERE NOT EXISTS
(SELECT *
FROM words w
WHERE p.id=w.wordid AND (w.code <> "USER1" AND w.code <> "USER2"))
ORDER BY rand() LIMIT 10

Sub query on a condition

Here's my query:
SELECT a.product_title, b.product_title FROM products a, products b
WHERE b.color_id = a.color_id
AND b.price_id = a.price_id
AND b.size_id = a.size_id
AND a.id = 1
AND ??? (SELECT * FROM products LIMIT ???);
I'm trying to perform a sub query if the results of the first query is less than 10, how would I do this? Is it possible to count the rows the query gets out in the same query without performing another query?
Also is it possible to set the LIMIT to be what is required, ie. the first query gets 6 rows, I then need the limit to be 4 - to make up 10 all together.
I Really don't understand your question well,
anyway you can use variables ,
Example:
Set ACount = (select count(a.id) from products a where ...=...);
SELECT a.product_title, b.product_title FROM products a, products b
WHERE b.color_id = a.color_id
AND b.price_id = a.price_id
AND b.size_id = a.size_id
AND a.id = 1
AND if(#ACount<10, "Your where statement here",0);
You can do this using "UNION".
If you don't care about performance and just want to save one query, you can always UNION an second query and get the top 10 rows from the combined result:
SELECT * FROM (
SELECT a.product_title, b.product_title , 0 as Rank
FROM products a, products b
WHERE b.color_id = a.color_id
AND b.price_id = a.price_id
AND b.size_id = a.size_id
AND a.id = 1
LIMIT 10
UNION
SELECT product_title, '', Rank
FROM products
WHERE (your condition)
LIMIT 20
) E
ORDER BY Rank
LIMIT 10
Since the extra results from the second query will have higher rank, if you already have 10 records in the first query, there will be dropped by the limit.
Since Union will remove duplicates so you need add enough results to make sure you get at least 10.
The above code is to show you the concept and you need adjust it to suit your needs.

Query logic, wrong desired results mysql

SELECT * FROM
MobileApps as dtable
WHERE (SELECT COUNT(*) as c
FROM app_details
WHERE trackId=dtable.SourceID)=0
ORDER BY id ASC
LIMIT 0,2
Problem is say the first two results ordered by id are in app_details, so the COUNT(*) doesnt' equal to 0 for the first two results. But there are much more results available in MobileApps table that would equal to 0.
I supposed it would first SELECT * FROM app_details WHERE trackId=dtable.SourceID)=0 and then ORDER BY id ASC LIMIT 0,2, not the other way around, what is a possible way to get it around ?
Thanks
Your query works, but a better way to write it is:
SELECT dtable.*
FROM MobileApps dtable
LEFT JOIN app_details d ON d.trackId = dtable.SourceID
WHERE d.trackId IS NULL
ORDER BY dtable.id
LIMIT 0, 2
or:
SELECT *
From MobileApps dtable
WHERE NOT EXISTS (SELECT *
FROM app_details d
WHERE d.trackId = dtable.SourceID)
ORDER BY id
LIMIT 0, 2
See all 3 versions here: http://www.sqlfiddle.com/#!2/536db/2
For a large table, you should probably benchmark them to see which one MySQL optimizes best.

Top N Per Group with Multiple Table Joins

Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into get top 3 per group. At first this was going well and I used several recommendations and answers from this site to achieve this (Most Viewed Products). However, I'm running into difficulty with my last one "Best Selling Products" because of multiple joins.
Basically, I need to get all products in order by # highest sales per product in which the maximum products per vendor is 3 I've got multiple tables being joined to create the original query, and each time I attempt to use the variables to generate rankings it produces invalid results. The following should help better understand the issue (I've removed unnecessary fields for brevity):
Product Table
productid | vendorid | approved | active | deleted
Vendor Table
vendorid | approved | active | deleted
Order Table
orderid | `status` | deleted
Order Items Table
orderitemid | orderid | productid | price
Now, my original query to get all results is as follows:
SELECT COUNT(oi.price) AS `NumSales`,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;
Finally, (and here's where I'm stumped), I'm trying to alter the above statement such that I received only the top 3 product (by # sold) per vendor. I'd add what I have so far, but I'm embarrassed to do so and this question is already a wall of text. I've tried variables but keep getting invalid results. Any help would be greatly appreciated.
Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.
select
vendorid, productid, NumSales
from
(
select
vendorid, productid, NumSales,
#r := IF(#g=vendorid,#r+1,1) RowNum,
#g := vendorid
from (select #g:=null) initvars
CROSS JOIN
(
SELECT COUNT(oi.price) AS NumSales,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY p.vendorid, p.productid
ORDER BY p.vendorid, NumSales DESC
) T
) U
WHERE RowNum <= 3
ORDER BY NumSales DESC
LIMIT 100;
The approach here is
Group by to get NumSales
Use variables to row number the sales per vendor/product
Filter the numbered dataset to allow for a max of 3 per vendor
Order the remaining by NumSales DESC and return only 100
I like this elegant solution, however when I run an adapted but similar query on my dev machine I get a non-deterministic result-set returned. I believe this is due to the way the MySql optimiser deals with assigning and reading user variables within the same statement.
From the docs:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.
Just adding this note here in case someone else comes across this weird behaviour.
The answer given by #RichardTheKiwi worked great and got me 99% of the way there! I am using MySQL and was only getting the first row of each group marked with a row number, while the rest of the rows remained NULL. This resulted in the query returning only the top hit for each group rather than the first three rows. To fix this, I had to initialize #r in the initvars subquery. I changed,
from (select #g:=null) initvars
to
from (select #g:=null, #r:=null) initvars
You could also initialize #r to 0 and it would work the same. And for those less familiar with this type of syntax, the additional section is reading through each sorted group and if a row has the same vendorid as the previous row, which is tracked with the #g variable, it increments the row number, which is stored in the variable #r. When this process reaches the next group with a new vendorid, the IF statement will no longer evaluate as true and the #r variable (and thereby the RowNum) will be reset to 1.