Unknown Column in 'IN/ALL/ANY' subquery - mysql

I've got 2 tables: members and member_logs.
Members can belong to groups, which are in the members table. Given a date range and a group I'm trying to figure out how to get the 10 days with the highest number of successful logins. What I have so far is a massive nest of subquery terror.
SELECT count(member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and `reg_date` IN
(SELECT DISTINCT DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and (DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' and '2014-03-04'))
and `member_id` IN
(SELECT `member_id`
FROM members
WHERE `group_id` = 'XXXXXXX'
and `deleted` = 0)
ORDER BY `num_users` desc
LIMIT 0, 10
As far as I understand what is happening is that the WHERE clause is evaluating before the subqueries generate, and that I also should be using joins. If anyone can help me out or point me in the right direction that would be incredible.
EDIT: Limit was wrong, fixed it

The first subquery is totally unnecessary because you can filter by dates directly in the current table member_logs. I also prefer a JOIN for the second subquery. Then what you are missing is grouping by date (day).
A query like the following one (not tested) will do the job you want:
SELECT COUNT(ml.member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs ml
INNER JOIN members m ON ml.member_id = m.member_id
WHERE `login_success` = 1
AND DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' AND '2014-03-04'
AND `group_id` = 'XXXXXXX'
AND `deleted` = 0
GROUP BY `reg_date`
ORDER BY `num_users` desc
LIMIT 10

SELECT count(member_id) AS `num_users`,
DATE_FORMAT(`login_date`,'%Y-%m-%d') AS `reg_date`
FROM member_logs
WHERE `login_success` = 1
and `login_date` IN
(SELECT `login_date`
FROM member_logs
WHERE `login_success` = 1
and (DATE_FORMAT(`login_date`,'%Y-%m-%d') BETWEEN '2012-02-25' and '2014-03-04'))
and `member_id` IN
(SELECT `member_id`
FROM members
WHERE `group_id` = 'XXXXXXX'
and `deleted` = 0)
Group by `login_date`
ORDER BY `num_users` desc
LIMIT 0, 10

As a slightly more index friendly version of the previous answers;
To make the query index friendly, you shouldn't do per row calculations in the search conditions. This query removes the per row calculation of the string format date in the WHERE, so it should be faster if there are many rows to eliminate by date range;
SELECT COUNT(*) num_users, DATE(login_date) reg_date
FROM member_logs JOIN members ON member_logs.member_id = members.member_id
WHERE login_success = 1 AND group_id = 'XXX' AND deleted = 0
AND login_date >= '2012-02-25'
AND login_date < DATE_ADD('2014-03-04', INTERVAL 1 DAY)
GROUP BY DATE(login_date)
ORDER BY num_users DESC
LIMIT 10

Related

Calculate the average date difference

This is the essential setup of the table (only the DDL for relevant columns is present). MySQL version 8.0.15
The intent is to show an average of date difference interval between orders.
CREATE TABLE final (
prim_id INT(11) NOT NULL AUTO_INCREMENT,
order_ID INT(11) NOT NULL,
cust_ID VARCHAR(45) NOT NULL,
created_at DATETIME NOT NULL,
item_name VARCHAR(255) NOT NULL,
cust_name VARCHAR(255) NOT NULL,
PRIMARY KEY (prim_id),
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=145699
Additional information:
cust ID -> cust_name (one-to-many)
cust_ID -> order_ID (one-to-many)
order ID -> item_name (one-to-many)
order ID -> created_at (one-to-one)
prim_id -> *everything* (one-to-many)
I've thought of using min(created_at) and max(created_at) but that will exclude all the orders between oldest and newest. I need a more refined solution.
The end result should be like this:
Information about average time intervals between all orders, (not just min and max because there are quite often times, more than two) measured in days, next to a column showing the client's name (cust_name).
If I get this right you might use a subquery getting the date of the previous order. Use datediff() to get the difference between the dates and avg() to get the average of that differences.
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM final f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM final f1
GROUP BY f1.cust_id;
Edit:
If there can be more rows for one order ID, as KIKO Software mentioned we need to do the SELECT from the distinct set of orders like:
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM (SELECT DISTINCT f3.cust_id,
f3.created_at,
f3.order_id
FROM final f3) f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM (SELECT DISTINCT f3.cust_id,
f3.created_at,
f3.order_id
FROM final f3) f1
GROUP BY f1.cust_id;
This may fail if there can be two rows for an order with different customer IDs or different creation time stamps. But in that case the data is just complete garbage and needs to be corrected before anything else.
2nd Edit:
Or alternatively getting the maximum creation timestamp per order if these can differ:
SELECT f1.cust_id,
avg(datediff(f1.created_at,
(SELECT f2.created_at
FROM (SELECT max(f3.cust_id) cust_id,
max(f3.created_at) created_at,
f3.order_id
FROM final f3
GROUP BY f3.order_id) f2
WHERE f2.cust_id = f1.cust_id
AND (f2.created_at < f1.created_at
OR f2.created_at = f1.created_at
AND f2.order_id < f1.order_id)
ORDER BY f2.created_at DESC,
f2.order_id DESC
LIMIT 1)))
FROM (SELECT max(f3.cust_id) cust_id,
max(f3.created_at) created_at,
f3.order_id
FROM final f3
GROUP BY f3.order_id) f1
GROUP BY f1.cust_id;

how to implement two aggregate functions on the same column mysql

SELECT max(sum(`orderquantity`)), `medicinename`
FROM `orerdetails`
WHERE `OID`=
(
SELECT `OrderID`
FROM `order`
where `VID` = 5 AND `OrerResponse` = 1
)
GROUP BY `medicinename`
i want to get the max of the result(sum of the order quantity) but it gives error any soultion to solve this
You don't need Max() here. Instead sort your recordset by that Sum('orderquantity') descending, and take the first record returned:
SELECT sum(`orderquantity`) as sumoforderqty, `medicinename`
FROM `orerdetails`
WHERE `OID`=
(
SELECT `OrderID`
FROM `order`
where `VID` = 5 AND `OrerResponse` = 1
)
GROUP BY `medicinename`
ORDER BY sumoforderqty DESC
LIMIT 1

MySQL - Limit Field To 5 Maximum Occurrences

Background:
I run a platform which allows users to follow creators and view their content.
The following query successfully displays 50 posts ordered by popularity. There is also some other logic to not show posts the user has already saved/removed, but that is not relevant for this question.
Problem:
If one creator is particularly popular (high popularity), the top 50 posts returned will nearly all be by that creator.
This skews the results as ideally the 50 posts returned will not be in favor of one particular author.
Question:
How can I limit it so the author (which uses the field posted_by) is returned no more than 5 times. It could be less, but definitely no more than 5 times should one particular author be returned.
It should still be finally ordered by popularity DESC
SELECT *
FROM `source_posts`
WHERE `posted_by` IN (SELECT `username`
FROM `source_accounts`
WHERE `id` IN (SELECT `sourceid`
FROM `user_source_accounts`
WHERE `profileid` = '100'))
AND `id` NOT IN (SELECT `postid`
FROM `user_posts_removed`
WHERE `profileid` = '100')
AND `live` = '1'
AND `added` >= Date_sub(Now(), INTERVAL 1 month)
AND `popularity` > 1
ORDER BY `popularity` DESC
LIMIT 50
Thank you.
Edit:
I am using MySQL version 5.7.24, so unfortunately the row_number() function will not work in this instance.
In MySQL 8+, you would simply use row_number():
select sp.*
from (select sp.*,
row_number() over (partition by posted_by order by popularity desc) as seqnum
from source_posts sp
) sp
where seqnum <= 5
order by popularity desc
limit 50;
I'm not sure what the rest of your query is doing, because it is not described in your question. You can, of course, add additional filtering criteria or joins.
EDIT:
In earlier versions, you can use variables:
select sp.*
from (select sp.*,
(#rn := if(#p = posted_by, #rn + 1,
if(#p := posted_by, 1, 1)
)
) as rn
from (select sp.*
from source_posts sp
order by posted_by, popularity desc
) sp cross join
(select #p := '', #rn := 0) params
) sp
where rn <= 5
order by popularity desc
limit 50;
Could try the row number function. Using that, it would assign each employee a distinct "id." So if one employee had 50 records, only those with a row_number (named as "rank") less than or equal to 5 would be returned.
Select *
from(
SELECT `source_posts.*`, row_number() over (partition by `username` order by `popularity` desc) as rank
FROM `source_posts`
WHERE `posted_by` IN (SELECT `username`
FROM `source_accounts`
WHERE `id` IN (SELECT `sourceid`
FROM `user_source_accounts`
WHERE `profileid` = '100'))
AND `id` NOT IN (SELECT `postid`
FROM `user_posts_removed`
WHERE `profileid` = '100')
AND `live` = '1'
AND `added` >= Date_sub(Now(), INTERVAL 1 month)
AND `popularity` > 1
ORDER BY `popularity` DESC
LIMIT 50 `enter code here`)
where rank <= 5

MYSQL Selecting oldest date record for each unique event

I have the following two tables
CREATE TABLE IF NOT EXISTS `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `events_dates` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_id` bigint(20) NOT NULL,
`date` date NOT NULL,
`start_time` time NOT NULL,
`end_time` time NOT NULL,
PRIMARY KEY (`id`),
KEY `event_id` (`event_id`),
KEY `date` (`event_id`)
) ENGINE=MyISAM;
Where the link is event_id
What I want is to retrieve all unique event records with their respective event dates ordered by the smallest date ascending within a certain period
Basically the following query does exactly what I want
SELECT Event.id, Event.title, EventDate.date, EventDate.start_time, EventDate.end_time
FROM
events AS Event
JOIN
com_events_dates AS EventDate
ON (Event.id = EventDate.event_id AND EventDate.date = (
SELECT MIN(MinEventDate.date) FROM events_dates AS MinEventDate
WHERE MinEventDate.event_id = Event.id AND MinEventDate.date >= CURDATE() # AND `MinEventDate`.`date` < '2013-02-27'
)
)
WHERE
EventDate.date >= CURDATE() # AND `EventDate`.`date` < '2013-02-27'
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
This query is the result of multiple attempts at further improving the slow time this initially had (1.5 seconds) when i wanted to use group by and other subqueries. Its the fastest one yet but considering that there are 1400 event records and 10000 event records in total, the query takes 400+ ms time to process, also I run a count based on this (for paging purposes) that takes a lot of time as well.
Strangely enough omitting the EventDate condition in the main where clause causes this to be even higher 1s+.
Is there anything I can do to improve this or a different approach at the table structure?
Just to clarify to anyone else... the "#" in MySQL acts as a continuation comment and is basically ignored in the query, it is not an "AND EventDate.Date < '2013-02-27'". That said, it appears you want a list of all events COMING UP that have not yet happened. I would start with a simple "prequery" that just grabs all events and the minimum date based on the event date not happening yet. Then join that result to the other tables to get the rest of the fields you want
SELECT
E.ID,
E.Title,
ED2.`date`,
ED2.Start_Time,
ED2.End_Time
FROM
( SELECT
ED.Event_ID,
MIN( ED.`date` ) as MinEventDate
from
Event_Dates ED
where
ED.`date` >= curdate()
group by
ED.Event_ID ) PreQuery
JOIN Events E
ON PreQuery.Event_ID = E.ID
JOIN Event_Dates ED2
ON PreQuery.Event_ID = ED2.Event_ID
AND PreQuery.MinEventDate = ED2.`date`
ORDER BY
ED2.`date`,
ED2.Start_Time,
ED2.End_Time DESC
LIMIT 20
Your table has redundant index on event ID, just by different names. Calling the name of an index date does not mean that's the column being indexed. The value(s) in parens ( event_id ) is what the index is built on.
So, I would change your create table to...
KEY `date` ( `event_id`, `date`, `start_time` )
Or, to manually create an index.
Create index ByEventAndDate on Event_Dates ( `event_id`, `date`, `start_time` )
If you are talking about optimization, it is helpful to include execution plans when possible.
By the way try this ones (if you are not tried it already):
SELECT
Event.id,
Event.title,
EventDate.date,
EventDate.start_time,
EventDate.end_time
FROM
(select e.id, e.title, min(date) as MinDate
from events_dates as ed
join events as e on e.id = ed.event_id
where date >= CURDATE() and date < '2013-02-27'
group by e.id, e.title) as Event
JOIN events_dates AS EventDate ON Event.id = EventDate.event_id
and Event.MinDate = EventDate.date
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20
;
#assuming event_dates.date for greater event_dates.id always greater
SELECT
Event.id,
Event.title,
EventDate.date,
EventDate.start_time,
EventDate.end_time
FROM
(select e.id, e.title, min(ed.id) as MinID
from events_dates as ed
join events as e on e.id = ed.event_id
where date >= CURDATE() and date < '2013-02-27'
group by e.id, e.title) as Event
JOIN events_dates AS EventDate ON Event.id = EventDate.event_id
and Event.MinID = EventDate.id
ORDER BY EventDate.date ASC , EventDate.start_time ASC , EventDate.end_time DESC
LIMIT 20

MySQL: group by `topic_id`, if `topic_id` > 0

I have entries which might or might not be assigned to a topic.
I want to select entries with the highest score by summing up the score of those, which belong to the same topic. In addition, I want to get a number of similar entries, which were grouped together.
SELECT *, COUNT(`topic_id`) FROM `entries`
GROUP BY `topic_id` HAVING `topic_id` > 0
ORDER BY SUM(`score`) DESC LIMIT 30
This query misses a few things. Firstly, I want entries without topic_id (topic_id = 0) not to be grouped, but to be treated individually. Secondly, COUNT(topic_id) does not always return a real number of entries belonging to the same topic.
SELECT * FROM
(
SELECT topic_id,COUNT(*),SUM(`score`) AS sum
FROM `entries`
WHERE `topic_id` > 0
GROUP BY `topic_id`
UNION
SELECT topic_id, 1,1 AS sum
FROM `entries`
WHERE `topic_id` = 0
)
ORDER BY sum DESC LIMIT 30
SELECT
`topic_id` > 0 IsGroupedTopid,
if(`topic_id`>0, `topic_id`, id) TopicOrEntryId,
COUNT(*) CountEntries,
SUM(`score`) TotalScore
FROM `entries`
GROUP BY
`topic_id` > 0,
if(`topic_id`>0, `topic_id`, id)
ORDER BY SUM(`score`) DESC
LIMIT 30;