I have users and likes tables. A foreign key of the latter references id from users table. The task at hand is to retrieve all distinct users who have more than 100 likes in March 2018. I'm trying to extract date-related values from a column with a type TIMESTAMP
I've come up with only seeing that pretty much all of them have some likes in that period:
SELECT DISTINCT u.name
FROM users AS u
JOIN likes AS l ON u.id = l.user_id
WHERE MONTH(l.timestamp) = 3 AND YEAR(l.timestamp) = 2018;
I guess I have to make use of COUNT() and GROUP BY somehow, but all my struggles were leading to syntax errors. Please give a hand.
You don't want select distinct. You want group by and having:
SELECT u.name
FROM users u JOIN
likes l
ON u.id = l.user_id
WHERE MONTH(l.timestamp) = 3 AND YEAR(l.timestamp) = 2018
GROUP BY u.name
HAVING COUNT(*) > 100;
To be honest, it is better to write the WHERE clause as:
WHERE l.timestamp >= '2018-03-01' AND l.timestamp < '2018-04-01'
This allows the SQL engine to use an index on timestamp, if one is available.
Related
I want to get customer data from all the businesses with more than 1 user.
For this I think I need a subquery to count more than 1 user and then the outer query to give me their emails.
I have tried subqueries in the WHERE and HAVING clause
SELECT u.mail
FROM users u
WHERE count IN (
SELECT count (u.id_business)
FROM businesses b
INNER JOIN users u ON b.id = u.id_business
GROUP BY b.id, u.id_business
HAVING COUNT (u.id_business) >= 2
)
I believe that you do not need a subquery, everything can be achieved in a joined aggregate query with a HAVING clause, like :
SELECT u.mail
FROM users u
INNER JOIN businesses b on b.id = u.id_business
GROUP BY u.id, u.email
HAVING COUNT (*) >= 2
NB : in case several users may have the same email, I have added the primary key of users to the GROUP BY clause (I assumed that the pk is called id) : you may remove this if email is a unique field in users.
I have a scenario with two tables in MySQL.
Table 1: Users
Table 2: login_history
I'm trying to come up with a query to get those users who were not logged In, in given time period, for example between 2017-09-25 AND 2017-10-2.
I tried to use the sub-query, but that query is quite slow. In the example, I have given dummy data, but actually two tables specially login_history has huge amount of data, thus sub-query is taking time.
It would be something like this:
select u.*
from users u
where not exists (select 1
from logins l
where l.user_id = u.id and
l.login_at >= '2017-09-25' and
l.login_at <= '2017-10-02'
);
If this is slow, then try creating an index on logins(user_id, login_at).
Assuming you only want users who have logged in at some point, you could all try aggregation:
select l.user_id
from logins l
group by l.user_id
having sum(l.login_at >= '2017-09-25' and l.login_at <= '2017-10-02') = 0;
However, the not exists should be faster.
Couldnt you just use a basic query?
SELECT * FROM USERS U
JOIN login_history L
ON U.id = L.user_id
WHERE login_at NOT BETWEEN '2017-09-25' AND '2017-10-2'
I have 2 tables: users and their points.
Users have fields:
id
name
Points have fields:
id
start_date
end_date
count_of_points
user_id
So some users may have or not have points. Points entries limited by time interval (from start_date to end_date) and have count of points that user have at this interval.
I need display table of users sorted by total sum of points at this moment (timestamp must be between start_date and end_date) and display this sum value later in view. If user have no points this count must be equals 0.
I have something like this:
$qb = $this->getEntityManager()
->getRepository('MyBundle:User')
->createQueryBuilder('u');
$qb->select('u, SUM(p.count_of_points) AS HIDDEN sum_points')
->leftJoin('u.points', 'p')
->orderBy('sum_points', 'DESC');
$qb->groupBy('u');
return $qb->getQuery()
->getResult();
But this have no limits by date interval and have no field for sum points that I can use in view from object.
I tried to find how to solve this task and I made something like this in SQL:
SELECT u.*, up.points FROM users AS u LEFT OUTER JOIN
(SELECT u.*, SUM(p.count_of_points) AS points FROM `users` AS u
LEFT OUTER JOIN points AS p ON p.user_id = u.id
WHERE p.start_date <= 1463578691 AND p.end_date >= 1463578691
) AS up ON u.id = up.id ORDER BY up.points DESC
But this query give me only users with entries in points table, so I think I must use another JOIN to add users without points. It's complicated query. I have no idea how implements this in doctrine because DQL can't use inner queries with LEFT JOIN.
Maybe there are other ways to solve this task? Maybe my tables schema is wrong and I can do this different way?
EDIT: forgot the date conditions. Corrected answer:
In plain MySQL your query would look like this:
SELECT u.id, u.name, COALESCE(SUM(p.count_of_points),0) AS sum_points
FROM Users u
LEFT JOIN Points p ON p.user_id=u.id
WHERE (p.start_date <= 1463578691 AND p.end_date >= 1463578691) OR p.id IS NULL
GROUP BY u.id
ORDER BY sum_points DESC
The COALESCE function sends back the first not NULL argument, so if a user doesn't have points, the sum would result in NULL, but the COALESCE in 0.
I'm not sure of the translation using the Doctrine query builder, but you could try:
$qb = $this->getEntityManager()->createQueryBuilder();
$qb->select('u')
->addSelect('COALESCE(SUM(p.count_of_points),0) AS sum_points')
->from('User', 'u')
->leftjoin('u.points', 'p')
->where('(p.start_date <= ?1 AND p.end_date >= ?1) OR p.id IS NULL')
->groupBy('u.id')
->orderBy('sum_points','DESC')
->setParameter(1, $date_now);
Using a SQL query, I am trying to find the number of users that have had page views greater than 5 in a given month.
What I have so far is exactly the above except, I can't add the condition of a minimum of 5 page views. It is currently showing the number of users who have had at least 1 page view in a given month.
SELECT CONCAT(MONTH(analytics.date),'/',YEAR(analytics.date)) AS DATE,
COUNT(analytics.id) AS views,
COUNT(DISTINCT users.id) AS num_users
FROM users
LEFT JOIN analytics ON users.id = analytics.user_id
WHERE users.banned = 0
AND analytics.id IS NOT NULL
GROUP BY YEAR(analytics.date), MONTH(analytics.date)
I tried adding AND views > 5 in the where clause but that didn't work as I get an unknown column.
I don't think a HAVING clause will work as this is applied after the GROUP BY and I need to find individual users who have had more than 5 page views.
How else can I achieve this?
If this is your requirement, then you need to aggregate twice, once at the user level and second at the analytics level. Or, use a subquery in the where clause. Here is what you may need:
SELECT CONCAT(MONTH(a.date),'/',YEAR(a.date)) AS DATE,
COUNT(a.id) AS views,
COUNT(DISTINCT u.id) AS num_users
FROM users u LEFT JOIN
analytics a
ON u.id = a.user_id
WHERE u.banned = 0 AND a.id IS NOT NULL AND
5 <= (SELECT COUNT(*) FROM analytics a2 WHERE a2.user_id = u.userid)
GROUP BY YEAR(a.date), MONTH(a.date);
This uses the overall count for the limit.
EDIT: TO speed the subquery, be sure you have an index on analytis(user_id, date).
You have to use a subquery for this, since you're selecting which users feed into the GROUP BY. Here, we do a subquery in the WHERE clause to ask for each row if the user has at least five entries in the analytics table.
SELECT CONCAT(MONTH(analytics.date),'/',YEAR(analytics.date)) AS DATE,
COUNT(analytics.id) AS views,
COUNT(DISTINCT users.id) AS num_users
FROM users
LEFT JOIN analytics ON users.id = analytics.user_id
WHERE users.banned = 0
AND (SELECT COUNT(*) FROM analytics AS a WHERE a.user_id = users.id) > 5
AND analytics.id IS NOT NULL
GROUP BY YEAR(analytics.date), MONTH(analytics.date)
If you want there to be more than 5 views for the user in the given month, then you have to modify your query and you'll need to use an inner join:
SELECT CONCAT(MONTH(analytics.date),'/',YEAR(analytics.date)) AS DATE,
COUNT(analytics.id) AS views,
COUNT(DISTINCT users.id) AS num_users
FROM users
JOIN analytics ON users.id = analytics.user_id
WHERE users.banned = 0
AND (SELECT COUNT(*) FROM analytics AS a WHERE a.user_id = users.id AND EXTRACT(YEAR_MONTH FROM a.date) = EXTRACT(YEAR_MONTH FROM analytics.date)) > 5
AND analytics.id IS NOT NULL
GROUP BY YEAR(analytics.date), MONTH(analytics.date)
I'm trying to get a subset of data based on the latest id and dates. It seems that when selecting other fields in the table they are not in sync with the max id and dates returned.
Any idea how I can fix this?
MySQL:
SELECT MAX(m.id) as id, m.sender_id, m.receiver_id, MAX(m.date) as date, m.content, l.username, p.gender
FROM messages m
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.receiver_id=3
GROUP BY m.sender_id ORDER BY date DESC LIMIT 0, 7
The data for content isn't the correct one. It seems to be returning random content and not the content that is tied to the row for max id and max date.
Do I need to do some sort of sub select to fix this?
To answer the question in the title, "Why doesn't my content field match my MAX(id) field", that's because there is no guarantee that the values returned for the non-aggregate fields will be from the row where the MAX value is found. This is the documented behavior, and this is what we expect.
Other DBMS would throw an error on the statement, MySQL is just more lax, and you are getting values from one row, but it's not guaranteed to be the row that either of the MAX values (id or date) is found on.
You have two separate aggregate expression MAX(m.id) and MAX(m.date). Note that there is no guarantee that those values will come from the same row.
The rule in other databases is that every non-aggregate expression in the SELECT list needs to appear in the GROUP BY. (MySQL is more lax about that, and doesn't make that a requirement.)
One way to "fix" the query so that it does return values from the row with the MAX value is to use an inline view (query) that gets the MAX(id) grouped by what you want to GROUP BY, and then a JOIN back to the original table to get other values on the row.
From your statement it's not clear what result set you want returned. If you want the row that has the maximum id and you also want the row with maximum date, then you could something like this:
SELECT m.id
, m.sender_id
, m.receiver_id
, m.date
, m.content
, l.username
, p.gender
FROM ( SELECT t.sender_id
, t.receiver_id
, MAX(t.id) AS max_id
, MAX(t.date) AS max_date
FROM messages t
WHERE t.receiver_id=3
GROUP
BY t.sender_id
, t.receiver_id
) s
JOIN messages m
ON m.sender_id = s.sender_id
AND m.receiver_id = s.receiver_id
AND ( m.id = s.max_id OR m.date = s.max_date)
LEFT
JOIN login_users l on l.user_id = m.sender_id
LEFT
JOIN profiles p ON p.user_id = l.user_id
ORDER BY m.date DESC LIMIT 0, 7
The inline view aliased as "s" returns the max values, and then that gets joined back to the messages table, aliased as "m".
NOTE
In most cases, we find that a JOIN (query) will perform better than an IN (query), because of the different access plans. You can see the difference in plans with an EXPLAIN.
For performance, you'll want an index
... ON messages (`receiver_id`, `sender_id`, `id`, `date`)
There's an equality predicate on receiver_id, so that should be the leading column, to get a range scan (instead of a full scan). You want the sender_id column next, because that should allow MySQL to avoid a "Using filesort" operation to get the rows grouped. The id and date columns are included, so that the inline view query can be satisfied entirely from the index pages without a need to access the pages in the table. (The EXPLAIN should show "Using where; Using index".)
That same index should also suitable for the outer query, though it does need to access the "content" column from the table pages, so the EXPLAIN will not show "Using index" for that step. (It's likely that the "content" column is much longer than we would want in the index.)
Using a join
SELECT LatestM.id, m.sender_id, m.receiver_id, m.date, m.content, l.username, p.gender
(
SELECT sender_id, MAX(id) AS id
FROM messages
WHERE receiver_id=3
GROUP BY sender_id
) LatestM
INNER JOIN messages m
ON LatestM.sender_id = m.sender_id AND LatestM.id = m.id
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.receiver_id = 3
ORDER BY date DESC
LIMIT 0, 7
Problem with this is that if the latest id does not reflect the latest date then the date returned will not be the latest one.
Well, you could probably solve it without a subselect, but doing one is fairly straight forward. Something like this should work, just make the subselect return the id's of the interesting rows in messages, and get the data for only them.
SELECT m.id as id, m.sender_id, m.receiver_id, m.date as date,
m.content, l.username, p.gender
FROM messages m
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.id IN (
SELECT max(id) FROM messages
WHERE receiver_id=3
GROUP BY sender_id
)
ORDER BY date DESC
LIMIT 0, 7
The reason that your original query does not match up fields is that GROUP BY really requires aggregate functions (like MAX/MIN/SUM/...) applied to every field you select that's not grouped by. The reason the query even runs is that MySQL does not enforce that, but instead returns indeterminate fields from any row that is matching. Afaik, all other SQL RDBMS' refuse to run the query.
EDIT: As for performance, a few indexes that are likely to help are;
CREATE INDEX ix_inner ON messages(receiver_id, sender_id, id);
CREATE INDEX ix_login_users ON login_users(user_id);
CREATE INDEX ix_profiles ON profiles(user_id);