MySQL count(distinct(email) and Group By DATE(entrydate) - mysql

Having a bit of trouble with getting total unique records and grouping by date. The end result is I am getting totals per day but it is not using unique emails based on the distinct function. Here is my query...
SELECT count(distinct(emailaddress)), DATE(EntryDate)
FROM tblentries
group by DATE(EntryDate)
ORDER BY DATE(EntryDate) desc
The results end up not de-duping the count for each day. Thoughts?
Thanks!

Based on the conversation, I believe what you are looking for is the number of distinct never-before-seen email addresses per day:
SELECT
DATE(t.EntryDate) as RecordDate,
COUNT(DISTINCT t.emailaddress) as NewEmailAddresses
FROM
tblentries t
WHERE
NOT EXISTS(
SELECT 1
FROM tblentries t2
WHERE
t2.emailaddress = t.emailaddress
AND DATE(t2.EntryDate) < DATE(t.EntryDate)
)
GROUP BY
DATE(t.EntryDate)
ORDER BY
DATE(t.EntryDate) ASC;
This is off the top of my head, so it may not be right, and it will be slow, but I think this is in the right direction. On a side note, if you plan on running this regularly, an index on emailaddress would be a good idea.
Let me know if this works.

Related

Filtering out early entries meeting a condition on a per user basis

I am new to SQL and want to filter a database in a way that doesn’t quite map to any of the examples I’ve read. I’m using mySQL with MariaDB.
The table is game result reports that takes the following structure:
id (unique integer id for each report, primary key)
user_id (integer id allocated to each player)
day (integer, the identity of the daily puzzle being reported)
result (integer score)
submitted (timestamp of the report submission)
The day puzzle ticks over at midnight local time so two reports can be submitted at the same time but be from different days.
I want to be able to find the average score reported for each day BUT I want to exclude any user’s score if they’ve never got a particular score prior to that (to eliminate complete newbies and people who are unusually bad at the game). However I can’t work out how to layer in this exclusion.
Where I’ve got to is that I can get the earliest successful game for each user with this query:
SELECT id, user_id, MIN(submitted), result FROM ‘results_daily’ WHERE result > 5 GROUP BY user_id
(Let’s call that output1)
However I can’t see how to apply this to a set of day results as a filter (so only include a result in my average calculation where a user features in output1 AND their daily report has been submitted on or after the date for that user in output1).
It feels like it might be some kind of JOIN operation but I can’t wrap my head around it. Can anyone help?
** EDITED TO ADD:
Ok I think I've got it, although my solution uses a function and I'm not sure if that's the most efficient way to do this or the most SQL-y way. Instinctively it feels like this should be possible without a function but I'm definitely not practiced enough to work it out if it is! O. Jones's answer set me off down the right path I just needed to refine the excluded set with a function. So now my query looks like this:
SELECT day,
AVG(result) average_score,
COUNT(*) number_of_plays,
COUNT(DISTINCT user_id) number_of_non_n00b_players
FROM results_daily
WHERE user_id NOT IN (
SELECT user_id
FROM results_daily
WHERE submitted < GetEureka(user_id)
GROUP BY user_id )
GROUP BY day;
and my function GetEureka() looks like this:
DECLARE eureka TIMESTAMP DEFAULT CURRENT_TIMESTAMP;
SELECT
MIN(submitted) INTO eureka
FROM
results_daily
WHERE
user_id = user
AND
result >= 5
GROUP BY
user_id;
RETURN eureka;
In SQL think about sets. You need the set of user_id values of all n00bz meeting either of these two criteria:
No score greater than 5.
Only one play of the game.
Then, when you compute your averages you exclude rows with those user ids.
So let's get you the n00bz, with the magic of the GROUP BY and HAVING clauses.
SELECT user_id
FROM results_daily
GROUP BY user_id
HAVING COUNT(*) = 1 OR MAX(result) <= 5
Now we can run your stats.
SELECT day,
AVG(score) average_score,
COUNT(*) number_of_plays,
COUNT(DISTINCT user_id) number_of_non_n00b_players
FROM results_daily
WHERE user_id NOT IN (
SELECT user_id
FROM results_daily
GROUP BY user_id
HAVING COUNT(*) = 1 OR MAX(result) <= 5 )
GROUP BY day;
The structured in stuctured query language comes from this use of nested subqueries to define sets of rows and work with them.

access get top and last rows

let's say I have a table CData with the columns CName, Amount1, Amount2.
Now I want to use a query to get calculate the difference between Amount1 and Amount2 for each distinct CName and, as a result of the query, get the ~1000 rows with the biggest difference and the 1000~ rows with the smallest (or most negative) difference. It doesn't matter if the results come in one table or two.
1) I am aware of the function TOP and so I could do this with two queries and sort by Difference (once ascending, once descending). Is there a way to do this in one query, though? This would save some time.
2) General question: When I define a field in my query (in this example "Difference"), can I somehow use it to, for example, sort the data by it? Like this (well, it's not working, but to give you an idea of what I mean):
SELECT CData.CName, CData.Amount2-CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY Difference
Or do I always have to do the following:
...
ORDER BY CData.Amount2-CData.Amount1
Not much of a difference in this example, I just wanted to know if that's possible in general.
Sort the first time ASC (Ascending) and the second time DESC (Descending)
SELECT TOP 1000
CData.CName,
CData.Amount2 - CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY CData.Amount2 - CData.Amount1 ASC
SELECT TOP 1000
CData.CName,
CData.Amount2 - CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY CData.Amount2 - CData.Amount1 DESC
which aggregate functino do you want to perform for your differences? Avg? Sum?
SELECT CName, avg(Amount2-Amount1) AS Difference
FROM CData
GROUP BY CName
btw, to do it in 'one' query, you could use a union query on two subqueries, one with the TOP 1000 asc, one with the TOP 1000 desc
looks like Access is not allowing you to use an alias in the ORDER BY Clause, if you use the QBE grid you can change the format from the UI to SQL and it repeats the calculation in the ORDER BYclause.
Hi, John.
Check out the SO tour for instructions on how to use options such as formatting code.
Not sure if this will work for you, but you can try something like:
select * from
(SELECT TOP 3
CName, Date_Sale, Sum(Amount) AS SumA, 99999-Sum(Amount) as srt
FROM
Data
GROUP BY
CName, Date_Sale
UNION
SELECT TOP 3
CName, Date_Sale, Sum(Amount) AS SumA, Sum(Amount) as srt
FROM
Data
GROUP BY
CName, Date_Sale) u
order by
srt

PHP + MySQL Forum display

I am currently building a simple PHP + MySQL forum but I am having problems with getting the information to show in the correct format.
My current SQL code is
SELECT forum_posts.catId, forum_posts.postId, forum_posts.date, forum_posts.message,
forum_posts.userId, users.userId, users.username, forum_thread.threadId, forum_thread.subjectTitle
FROM forum_posts
LEFT JOIN forum_thread ON forum_posts.threadId = forum_thread.threadId
LEFT JOIN users ON users.userId = forum_posts.userId
GROUP BY forum_posts.catId
ORDER BY forum_posts.postId DESC, forum_posts.date DESC, forum_posts.catId ASC
The problem I have is that it brings back everything in the right category but it brings back the first result of the category and not the last one.
I simply want the code to show the last reply in each category.
Any help is much appreciated thank you.
Your query should return a range of rows. Try to limit the result to 1 element. If you sort the results descending, you will get the last item.
ORDER BY ... DESC LIMIT 1
I am not sure whether you find the latest entry by postId or date. If you find it by date, you must start the grouping with the date.
But I don't understand why you are sorting the results so much for getting only one dataset.
ORDER BY forum_posts.date DESC LIMIT 1;
Is this what you want? Additionally this could help you: Select last row in MySQL.

Mysql COUNT, GROUP BY and ORDER BY

This sounds quite simple but I just can't figure it out.
I have a table orders (id, username, telephone_number).
I want to get number of orders from one user by comparing the last 8 numbers in telephone_number.
I tried using SUBSTR(telephone_number, -8), I've searched and experimented a lot, but still I can't get it to work.
Any suggestions?
Untested:
SELECT
COUNT(*) AS cnt,
*
FROM
Orders
GROUP BY
SUBSTR(telephone_number, -8)
ORDER BY
cnt DESC
The idea:
Select COUNT(*) (i.e., number of rows in each GROUPing) and all fields from Orders (*)
GROUP by the last eight digits of telephone_number1
Optionally, ORDER by number of rows in GROUPing descending.
1) If you plan to do this type of query often, some kind of index on the last part of the phone number could be desirable. How this could be best implemented depends on the concrete values stored in the field.
//Memory intensive.
SELECT COUNT(*) FROM `orders` WHERE REGEXP `telephone_number` = '(.*?)12345678'
OR
//The same, but better and quicker.
SELECT COUNT(*) FROM `orders` WHERE `telephone_number` LIKE '%12345678'
You can use the below query to get last 8 characters from a column values.
select right(rtrim(First_Name),8) FROM [ated].[dbo].[Employee]

Is mysql's dayofweek Expensive in memory... Or I should not care?

I am trying to speedup database select for reporting with more than 3Mil data. Is it good to use dayofweek?
Query is something like this:
SELECT
COUNT(tblOrderDetail.Qty) AS `BoxCount`,
`tblBoxProducts`.`ProductId` AS `BoxProducts`,
`tblOrder`.`OrderDate`,
`tblFranchise`.`FranchiseId` AS `Franchise`
FROM `tblOrder`
INNER JOIN `tblOrderDetail` ON tblOrderDetail.OrderId=tblOrder.OrderId
INNER JOIN `tblFranchise` ON tblFranchise.FranchiseeId=tblOrderDetail.FranchiseeId
INNER JOIN `tblBoxProducts` ON tblOrderDetail.ProductId=tblBoxProducts.ProductId
WHERE (tblOrderDetail.Delivered = 1) AND
(tblOrder.OrderDate >= '2004-05-17') AND
(tblOrder.OrderDate < '2004-05-24')
GROUP BY `tblBoxProducts`.`ProductId`,`tblFranchise`.`FranchiseId`
ORDER BY `tblOrder`.`OrderDate` DESC
But what I really want is to show report for everyday in a week. Like On Sunday, Monday....
So Would it be a good idea to use dayofweek in query or render the result from the view?
No, using dayofweek as one of the columns you're selecting is not going to hurt your performance significantly, nor will it blow out your server memory. Your query shows that you're displaying seven distinct order_date days' worth of orders. Maybe there are plenty of orders, but not many days.
But you may be better off using DATE_FORMAT (see http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date-format) to display the day of the week as part of the order_date column. Then you don't have to muck around in your client turning (0..6) into (Monday..Sunday) or is it (Sunday..Saturday)? You get my point.
A good bet is to wrap your existing query in an extra one just for formatting. This doesn't cost much and lets you control your presentation without making your data-retrieval query more complex.
Note also you omitted order_date from your GROUP BY expression. I think this is going to yield unpredictable results in mySql. In Oracle, it yields an error message. Also, I don't know what you're doing with this result set, but don't you want it ordered by franchise and box products as well as date?
I presume your OrderDate columns contain only days -- that is, all the times in those column values are midnight. Your GROUP BY won't do what you hope for it to do if there are actual order timestamps in your OrderDate columns. Use the DATE() function to make sure of this, if you aren't sure already. Notice that the way you're doing the date range in your WHERE clause is already correct for dates with timestamps. http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_date
So, here's a suggested revision to your query. I didn't fix the ordering, but I did fix the GROUP BY expression and used the DATE() function.
SELECT BoxCount, BoxProducts,
DATE_FORMAT(OrderDate, '%W') AS DayOfWeek,
OrderDate,
Franchise
FROM (
SELECT
COUNT(tblOrderDetail.Qty) AS `BoxCount`,
`tblBoxProducts`.`ProductId` AS `BoxProducts`,
DATE(`tblOrder`.`OrderDate`) AS OrderDate,
`tblFranchise`.`FranchiseId` AS `Franchise`
FROM `tblOrder`
INNER JOIN `tblOrderDetail` ON tblOrderDetail.OrderId=tblOrder.OrderId
INNER JOIN `tblFranchise` ON tblFranchise.FranchiseeId=tblOrderDetail.FranchiseeId
INNER JOIN `tblBoxProducts` ON tblOrderDetail.ProductId=tblBoxProducts.ProductId
WHERE (tblOrderDetail.Delivered = 1)
AND (tblOrder.OrderDate >= '2004-05-17')
AND (tblOrder.OrderDate < '2004-05-24')
GROUP BY `tblBoxProducts`.`ProductId`,`tblFranchise`.`FranchiseId`, DATE(`tblOrder`.`OrderDate`)
ORDER BY DATE(`tblOrder`.`OrderDate`) DESC
) Q
You have lots of inner join operations; this query may still take a while. Make sure tblOrder has some kind of index on OrderDate for best performance.