We are attempting to count how many unique telephone numbers called a particular number each day, with 0 (or NULL) for days with no calls. To simplify the data schema, our table contains four fields:
| id | fromz | toz | date |
fromz: inbound call number
toz: number called
date: yyyy-mm-dd
When all we need to know is how many unique numbers called by each day - and do not care what number was called - it is simple to include no-calls days in results.
We JOIN to another table containing only sequential dates: calendar
| id | date |
date:yyyy-mm-dd
SELECT p.fromz, count(unique(p.fromz)), p.date FROM phones p
RIGHT OUTER JOIN calendar c ON
p.date = c.date
GROUP BY c.date
If no p.fromz on c.date, the result for that p.date is "0" (or NULL)
Problem arises when we begin to sort by numbers called:
SELECT p.fromz, count(unique(p.fromz)), p.date FROM phones p
RIGHT OUTER JOIN calendar ON
p.date = c.date
WHERE p.toz = "#myNumber"
GROUP BY c.date
Because there are no WHERE toz = #myNumber on some days we only get results on days when (WHERE) there were calls to #myNumber.
Any suggestions?
I've modified some of your field names slightly. I'm not sure if this 100% meets your use-case, but if it doesn't, you should be able to figure it out from here. Basically, move your current WHERE condition into the ON clause.
SELECT p.fromz, count(distinct p.fromz), p.callDate, c.theDate FROM phones p
RIGHT OUTER JOIN calendar c ON
p.callDate = c.theDate AND p.toz = 2125555555
GROUP BY c.theDate
SQLFiddle
Full Disclosure
This is not standards-compliant. For a compliant query, take a look at Ollie's answer and simply move the p.toz part from the WHERE to the ON to get the results you want.
You are very close.
You're using a nasty MySQL nonstandard extension to GROUP BY and it is driving you crazy, as it has driven many others crazy before you.
http://dev.mysql.com/doc/refman/5.5/en/group-by-extensions.html
You need to follow the rules for standard GROUP BY. These rules specify that each column mentioned in the SELECT clause must be either (1) also mentioned in the GROUP BY clause or (2) an aggregate function.
There are two problems in your query. One is that you mention p.date instead of c.date in your SELECT clause. That p.date will be null if there aren't any items on the date in question in your phones table. The second item is that you're mentioning p.fromz twice.
I think your basic aggregate query should look like this http://sqlfiddle.com/#!2/0ef4e/3/0:
SELECT p.toz,
COUNT(DISTINCT p.fromz) AS unique_calling_number_count,
c.date
FROM phones AS p
RIGHT OUTER JOIN calendar AS c ON p.date = c.date
GROUP BY p.toz, c.date
ORDER BY c.date, p.toz
That will summarize call-origination numbers by day and call-destination numbers.
Then, when you need to filter your phones records according to some kind of criterion, do that in a subquery as follows.
SELECT p.toz,
COUNT(DISTINCT p.fromz) AS unique_calling_number_count,
c.date
FROM (
SELECT * /* filtering subquery */
FROM phones
WHERE toz = "#myNumber"
) AS p
RIGHT OUTER JOIN calendar AS c ON p.date = c.date
WHERE p.toz = "#myNumber"
GROUP BY p.toz, c.date
ORDER BY c.date, p.toz
Here's an example of this working. http://sqlfiddle.com/#!2/0ef4e/2/0 Thanks to #PatrickQ for noticing the WHERE at the wrong level of the query.
Related
I have attached the tables that are included in this question for MYSql. My question states:
First and last names of the top 5 authors clients borrowed in 2017. My code so far:
SELECT BookID,BorrowDate COUNT(BookID) AS BookIDCount
FROM Borrower
WHERE BorrowDate = 2017
ORDER BY BookIDCount DESC
LIMIT 5
I think so far my code just displays the top 5 Author ID in 2017 but I can't figure out how to display the names. I see the link between AuthorID and BookAuthor (maybe). Thank you so much for any help you may provide.
Here are the tables:
You can bring the client table with a join. I think that you want:
select c.clientFirstName, c.clientLastName, count(*) no_books
from borrower b
inner join client c on c.clientId = b.clientId
where b.borrowDate >= '2017-01-01' and b.borrowDate < '2018-01-01'
group by c.clientId, c.clientFirstName, c.clientLastName
order by count(*) desc
limit 5
This treats borrowDate as a column of type date (or the-like), because that what it seems to be. If it just a number that represent the year, then you can change back the where clause to your original condition.
I have two tables; one is called rules and the other data. The Rules table holds events, which have a description, id and date_created and is simply used to categorize events.
The data table has a date and id column; This stores the actual dates of an event, as events can span up to months long in dates.
My issue is this: I wish to select everything from data and group it by date, so each date is represented only once. However, the event with the most recent creation date should have precedence if there is a collision, i.e. two events happen on the same day. Here is what I've tried, which doesn't offer control over date_created:
SELECT d.date, r.description FROM data d LEFT JOIN rules r ON d.id = r.id GROUP BY date ORDER BY d.date
I haven't included date_created yet because I'm stuck, and not sure where it should go in the query to get the desired effect. Any ideas would be greatly appreciated!
From your question it seems to me that at first you need to select maximum date of event creation and then using subquery you can desired result:
SELECT a.date, b.description
FROM data a
INNER JOIN (
SELECT id, description,MAX(date_created) as mdate
FROM rules
GROUP BY id,description
) b ON a.id = b.id AND a.date = b.mdate
Currently I have a simple SQL request to get aall group departure date and the associated group size (teamLength) between 2 dates but it doesn't work properly.
SELECT `groups`.`departure`, COUNT(`group_users`.`group_id`) as 'teamLength'
FROM `groups`
INNER JOIN `group_users`
ON `groups`.`id` = `group_users`.`group_id`
WHERE departure BETWEEN '2017-03-01' AND '2017-03-31'
In fact, if I have more than 1 group between the 2 dates, only 1 date will be recovered in association with the total number of teamLength.
For exemple, if I have 2 groups in the same interval with, for group 1, 2 people and for group 2, 1 people, the result will be:
Here are 2 screenshots of the current state of my groups and group_users tables:
Is it even possible to do what I want in only 1 SQL request ? Thanks
In addition to what jarlh commented (JOIN with ON). Don't ever group data without an explicit GROUP BY. I don't know why MYSQL still allows this...
Change your query to something like this and you should get the result you are looking for. Currently, the other departure dates get lost in the aggregation.
SELECT
groups.departure,
COUNT(1) as team_length
FROM
groups
INNER JOIN group_users
ON groups.id = group_users.group_id
WHERE
groups.departure BETWEEN '2017-03-01' AND '2017-03-31'
GROUP BY
groups.departure
I think that you have a syntax issue in your query. You are missing the ON statement so your database could be trying to get a cartesian product since there is no join clause.
SELECT `groups`.`departure`, COUNT(`group_users`.`id`) as 'teamLength'
FROM `groups`
INNER JOIN `group_users` ON `groups`.`id` = `group_users`.`group_id`
WHERE departure BETWEEN '2017-03-01' AND '2017-03-31'
GROUP BY `groups`.`departure`
You also are missing the GROUP BYclause which is not mandatory in all RDBS but it is a good practice to set it.
I have a query that attempts to retrieve IDs of people, but only if those people have more than one address. I'm also checking that the last time I called them was at least 30 days ago. Finally, I'm trying to order the results, because I want to pull up results with the oldest last_called datetime:
SELECT
p.id,
COUNT(*) AS cnt
FROM
people p
LEFT JOIN addresses a
ON p.id = a.id
WHERE p.last_called <= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY p.id
HAVING COUNT(*) > 1
ORDER BY p.last_called ASC
LIMIT 25
Right now, the results are not excluding people with only one address. I haven't even got to the point where I know if the sort order is correct, but right now I'd just like to know why it is that my query isn't pulling up results where there is at least 2 addresses for the person.
If you don't want to include people with no address then I would recommend using INNER JOIN instead of LEFT JOIN and DISTINCT to get distinct address ids
(just in case if you have duplicate mappings), e.g.:
SELECT
p.id,
COUNT(DISTINCT(a.id)) AS cnt
FROM
people p
JOIN addresses a
ON p.id= a.peopleid
WHERE p.last_called <= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY p.id
HAVING COUNT(DISTINCT(a.id)) > 1
As far as Ordering is concerned, MySQL evaluates GROUP BY before ordering the results and hence, you need to wrap the query inside another query to get the ordered results.
Update
Instead of joining on aid, you need to join on peopleId of an address record to get the people record.
I have two tables players and scores.
I want to generate a report that looks something like this:
player first score points
foo 2010-05-20 19
bar 2010-04-15 29
baz 2010-02-04 13
Right now, my query looks something like this:
select p.name player,
min(s.date) first_score,
s.points points
from players p
join scores s on s.player_id = p.id
group by p.name, s.points
I need the s.points that is associated with the row that min(s.date) returns. Is that happening with this query? That is, how can I be certain I'm getting the correct s.points value for the joined row?
Side note: I imagine this is somehow related to MySQL's lack of dense ranking. What's the best workaround here?
This is the greatest-n-per-group problem that comes up frequently on Stack Overflow.
Here's my usual answer:
select
p.name player,
s.date first_score,
s.points points
from players p
join scores s
on s.player_id = p.id
left outer join scores s2
on s2.player_id = p.id
and s2.date < s.date
where
s2.player_id is null
;
In other words, given score s, try to find a score s2 for the same player, but with an earlier date. If no earlier score is found, then s is the earliest one.
Re your comment about ties: You have to have a policy for which one to use in case of a tie. One possibility is if you use auto-incrementing primary keys, the one with the least value is the earlier one. See the additional term in the outer join below:
select
p.name player,
s.date first_score,
s.points points
from players p
join scores s
on s.player_id = p.id
left outer join scores s2
on s2.player_id = p.id
and (s2.date < s.date or s2.date = s.date and s2.id < s.id)
where
s2.player_id is null
;
Basically you need to add tiebreaker terms until you get down to a column that's guaranteed to be unique, at least for the given player. The primary key of the table is often the best solution, but I've seen cases where another column was suitable.
Regarding the comments I shared with #OMG Ponies, remember that this type of query benefits hugely from the right index.
Most RDMBs won't even let you include non aggregate columns in your SELECT clause when using GROUP BY. In MySQL, you'll end up with values from random rows for your non-aggregate columns. This is useful if you actually have the same value in a particular column for all the rows. Therefore, it's nice that MySQL doesn't restrict us, though it's an important thing to understand.
A whole chapter is devoted to this in SQL Antipatterns.