I am trying to get a SQL query to count personid unique for the month, is a 'Returning' visitor unless they have a record of 'New' for the month as well.
month | personid | visitstat
---------------------------------
January john new
January john returning
January Bill returning
So the query I'm looking for should get a count for each unique personid that has "returning" unless a "new" exists for that personid as well - in this instance returning a count of 1 for
January Bill returning
because john is new for the month.
The query I've tried is
SELECT COUNT(distinct personid) as count FROM visit_info WHERE visitstat = 'Returning' GROUP BY MONTH(date) ORDER BY date
Unfortunately this counts "Returning" even if a "New" record exists for the person in that month.
Thanks in advance, hopefully I explained this clearly enough.
SQL Database Image
Chart of Data
You already wrote the "magic" word yourself, "exists". You can use exactly that, a NOT EXISTS and a correlated subquery.
SELECT count(DISTINCT vi1.personid) count
FROM visit_info vi1
WHERE vi1.visitstat = 'Returning'
AND NOT EXISTS (SELECT *
FROM visit_info vi2
WHERE vi2.personid = vi1.personid
AND year(vi2.date) = year(vi1.date)
AND month(vi2.date) = month(vi1.date)
AND vi2.visitstat = 'New')
GROUP BY year(vi1.date),
month(vi1.date)
ORDER BY year(vi1.date),
month(vi1.date);
I also recommend to include the year in the GROUP BY expression, as you otherwise might get unexpected results when the data spans more than one year. Also only use expressions included in the GROUP BY clause or passed to an aggregation function in the ORDER BY clause. MySQL, as opposed to virtually any other DBMS, might accept it otherwise, but may also produce weird results.
I also faced one of the same scenarios I was dealing with a database. The possible way I did was to use group by with having clause and a subquery.
Related
I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.
You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';
The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.
The question I am working on is as follows:
What is the difference in the amount received for each month of 2004 compared to 2003?
This is what I have so far,
SELECT #2003 = (SELECT sum(amount) FROM Payments, Orders
WHERE YEAR(orderDate) = 2003
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate));
SELECT #2004 = (SELECT sum(amount) FROM Payments, Orders
WHERE YEAR(orderDate) = 2004
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate));
SELECT MONTH(orderDate), (#2004 - #2003) AS Diff
FROM Payments, Orders
WHERE Orders.customerNumber = Payments.customerNumber
Group By MONTH(orderDate);
In the output I am getting the months but for Diff I am getting NULL please help. Thanks
I cannot test this because I don't have your tables, but try something like this:
SELECT a.orderMonth, (a.orderTotal - b.orderTotal ) AS Diff
FROM
(SELECT MONTH(orderDate) as orderMonth,sum(amount) as orderTotal
FROM Payments, Orders
WHERE YEAR(orderDate) = 2004
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate)) as a,
(SELECT MONTH(orderDate) as orderMonth,sum(amount) as orderTotal FROM Payments, Orders
WHERE YEAR(orderDate) = 2003
AND Payments.customerNumber = Orders.customerNumber
GROUP BY MONTH(orderDate)) as b
WHERE a.orderMonth=b.orderMonth
Q: How do I subtract two declared variables in MySQL.
A: You'd first have to DECLARE them. In the context of a MySQL stored program. But those variable names wouldn't begin with an at sign character. Variable names that start with an at sign # character are user-defined variables. And there is no DECLARE statement for them, we can't declare them to be a particular type.
To subtract them within a SQL statement
SELECT #foo - #bar AS diff
Note that MySQL user-defined variables are scalar values.
Assignment of a value to a user-defined variable in a SELECT statement is done with the Pascal style assignment operator :=. In an expression in a SELECT statement, the equals sign is an equality comparison operator.
As a simple example of how to assign a value in a SQL SELECT statement
SELECT #foo := '123.45' ;
In the OP queries, there's no assignment being done. The equals sign is a comparison, of the scalar value to the return from a subquery. Are those first statements actually running without throwing an error?
User-defined variables are probably not necessary to solve this problem.
You want to return how many rows? Sounds like you want one for each month. We'll assume that by "year" we're referring to a calendar year, as in January through December. (We might want to check that assumption. Just so we don't find out way too late, that what was meant was the "fiscal year", running from July through June, or something.)
How can we get a list of months? Looks like you've got a start. We can use a GROUP BY or a DISTINCT.
The question was... "What is the difference in the amount received ... "
So, we want amount received. Would that be the amount of payments we received? Or the amount of orders that we received? (Are we taking orders and receiving payments? Or are we placing orders and making payments?)
When I think of "amount received", I'm thinking in terms of income.
Given the only two tables that we see, I'm thinking we're filling orders and receiving payments. (I probably want to check that, so when I'm done, I'm not told... "oh, we meant the number of orders we received" and/or "the payments table is the payments we made, the 'amount we received' is in some other table"
We're going to assume that there's a column that identifies the "date" that a payment was received, and that the datatype of that column is DATE (or DATETIME or TIMESTAMP), some type that we can reliably determine what "month" a payment was received in.
To get a list of months that we received payments in, in 2003...
SELECT MONTH(p.payment_received_date)
FROM payment_received p
WHERE p.payment_received_date >= '2003-01-01'
AND p.payment_received_date < '2004-01-01'
GROUP BY MONTH(p.payment_received_date)
ORDER BY MONTH(p.payment_received_date)
That should get us twelve rows. Unless we didn't receive any payments in a given month. Then we might only get 11 rows. Or 10. Or, if we didn't receive any payments in all of 2003, we won't get any rows back.
For performance, we want to have our predicates (conditions in the WHERE clause0 reference bare columns. With an appropriate index available, MySQL will make effective use of an index range scan operation. If we wrap the columns in a function, e.g.
WHERE YEAR(p.payment_received_date) = 2003
With that, we will be forcing MySQL to evaluate that function on every flipping row in the table, and then compare the return from the function to the literal. We prefer not do do that, and reference bare columns in predicates (conditions in the WHERE clause).
We could repeat the same query to get the payments received in 2004. All we need to do is change the date literals.
Or, we could get all the rows in 2003 and 2004 all together, and collapse that into a list of distinct months.
We can use conditional aggregation. Since we're using calendar years, I'll use the YEAR() shortcut (rather than a range check). Here, we're not as concerned with using a bare column inside the expression.
SELECT MONTH(p.payment_received_date) AS `mm`
, MAX(MONTHNAME(p.payment_received_date)) AS `month`
, SUM(IF(YEAR(p.payment_received_date)=2004,p.payment_amount,0)) AS `2004_month_total`
, SUM(IF(YEAR(p.payment_received_date)=2003,p.payment_amount,0)) AS `2003_month_total`
, SUM(IF(YEAR(p.payment_received_date)=2004,p.payment_amount,0))
- SUM(IF(YEAR(p.payment_received_date)=2003,p.payment_amount,0)) AS `2004_2003_diff`
FROM payment_received p
WHERE p.payment_received_date >= '2003-01-01'
AND p.payment_received_date < '2005-01-01'
GROUP
BY MONTH(p.payment_received_date)
ORDER
BY MONTH(p.payment_received_date)
If this is a homework problem, I strongly recommend you work on this problem yourself. There are other query patterns that will return an equivalent result.
I think this is the problem:
In #2003 and #2004, you select only the sum. And even if you group by the month you still select one column i.e. each row does not say what month it is select for. So when you try to subtract SQL asks which row in #2003 should be subtracted from #2004.
So I think the solution is to select the month with the sum and do the subtract later based on the month.
Say I have this .csv file which holds data that describes sales of a product. Now say I want a monthly breakdown of number of sales. I mean I wanna see how many orders were received in JAN2005, FEB2005...JAN2008, FEB2008...NOV2012, DEC2012.
Now one very simply way I can think of is count them one by one like this. (BTW I am using logparser to run my queries)
logparser -i:csv -o:csv "SELECT COUNT(*) AS NumberOfSales INTO 'C:\Users\blah.csv' FROM 'C:\User\whatever.csv' WHERE OrderReceiveddate LIKE '%JAN2005%'
My question is if there is a smarter way to do this. I mean, instead of changing the month again and again and running my query, can I write one query which can produce the result in one excel all at one.
Yes.
If you add a group by clause to the statement, then the sql will return a separate count for each unique value of the group by column.
So if you write:
SELECT OrderReceiveddate, COUNT(*) AS NumberOfSales INTO 'C:\Users\blah.csv'
FROM `'C:\User\whatever.csv' GROUP BY OrderReceiveddate`
you will get results like:
JAN2005 12
FEB2005 19
MAR2005 21
Assuming OrderReceiveDate is a date, you would format the date to have a year and month and then aggregate:
SELECT date_format(OrderReceiveddate, '%Y-%m') as YYYYMM, COUNT(*) AS NumberOfSales
INTO 'C:\Users\blah.csv'
FROM 'C:\User\whatever.csv'
WHERE OrderReceiveddate >= '2015-01-01'
GROUP BY date_format(OrderReceiveddate, '%Y-%m')
ORDER BY YYYYMM
You don't want to use like on a date column. like expects string arguments. Use date functions instead.
What is the best way to think about the Group By function in MySQL?
I am writing a MySQL query to pull data through an ODBC connection in a pivot table in Excel so that users can easily access the data.
For example, I have:
Select
statistic_date,
week(statistic_date,4),
year(statistic_date),
Emp_ID,
count(distict Emp_ID),
Site
Cost_Center
I'm trying to count the number of unique employees we have by site by week. The problem I'm running into is around year end, the calendar years don't always match up so it is important to have them by date so that I can manually filter down to the correct dates using a pivot table (2013/2014 had a week were we had to add week 53 + week 1).
I'm experimenting by using different group by statements but I'm not sure how the order matters and what changes when I switch them around.
i.e.
Group by week(statistic_date,4), Site, Cost_Center, Emp_ID
vs
Group by Site, Cost_Center, week(statistic_date,4), Emp_ID
Other things to note:
-Employees can work any number of days. Some are working 4 x 10's, others 5 x 8's with possibly a 6th day if they sign up for OT. If I sum the counts by week, I get anywhere between 3-7 per Emp_ID. I'm hoping to get 1 for the week.
-There are different pay code per employee so the distinct count helps when we are looking by day (VTO = Voluntary Time Off, OT = Over Time, LOA = Leave of Absence, etc). The distinct count will show me 1, where often times I will have 2-3 for the same emp in the same day (hits 40 hours and starts accruing OT then takes VTO or uses personal time in the same day).
I'm starting with a query I wrote to understand our paid hours by week. I'm trying to adapt it for this application. Actual code is below:
SELECT
dkh.STATISTIC_DATE AS 'Date'
,week(dkh.STATISTIC_DATE,4) as 'Week'
,month(dkh.STATISTIC_DATE) as 'Month'
,year(dkh.STATISTIC_DATE) as 'Year'
,dkh.SITE AS 'Site ID Short'
,aep.LOC_DESCR as 'Site Name'
,dkh.EMPLOYEE_ID AS 'Employee ID'
,count(distinct dkh.EMPLOYEE_ID) AS 'Distinct Employee ID'
,aep.NAME AS 'Employee Name'
,aep.BUSINESS_TITLE AS 'Business_Ttile'
,aep.SPRVSR_NAME AS 'Manager'
,SUBSTR(aep.DEPTID,1,4) AS 'Cost_Center'
,dkh.PAY_CODE
,dkh.PAY_CODE_SHORT
,dkh.HOURS
FROM metrics.DAT_KRONOS_HOURS dkh
JOIN metrics.EMPLOYEES_PUBLIC aep
ON aep.SNAPSHOT_DATE = SUBDATE(dkh.STATISTIC_DATE, DAYOFWEEK(dkh.STATISTIC_DATE) + 1)
AND aep.EMPLID = dkh.EMPLOYEE_ID
WHERE dkh.STATISTIC_DATE BETWEEN adddate(now(), interval -1 year) AND DATE(now())
group by dkh.SITE, SUBSTR(aep.DEPTID,1,4), week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE, dkh.EMPLOYEE_ID
The order you use in group by doesn't matter. Each unique combination of the values gets a group of its own. Selecting columns you don't group by gives you somewhat arbitrary results; you'd probably want to use some aggregation function on them, such as SUM to get the group total.
Grouping by values you derive from other values that you already use in group by, like below, isn't very useful.
week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE
If two rows have different weeks, they'll also have different dates, right?
I have a query that shows me the number of calls per day for the last 14 days within my app.
The query:
SELECT count(id) as count, DATE(FROM_UNIXTIME(timestamp)) as date FROM calls GROUP BY DATE(FROM_UNIXTIME(timestamp)) DESC LIMIT 14
On days where there were 0 calls, this query does not show those days. Rather than skip those days, I'd like to have a 0 or NULL in that spot.
Any ideas for how I can achieve this? If you have any questions as to what I'm asking please let me know.
Thanks
I don't believe your query is "skipping over NULL values", as your title suggests. Rather, your data probably looks something like this:
id | timestamp
----+------------
1 | 2014-01-01
2 | 2014-01-02
3 | 2014-01-04
As a result, there are no rows that contain the missing date, so there are no rows to be counted. The answer is that you need to generate a list of all the dates you want and then do a LEFT or RIGHT JOIN to it.
Unfortunately, MySQL doesn't make this as easy as other databases. There doesn't seem to be an effective way of generating a list of anything inline. So you'll need some sort of table.
I think I would create a static table containing a set of integers to be subtracted from the current date. Then you can use this table to generate your list of dates inline and JOIN to it.
CREATE TABLE days_ago_list (days_ago INTEGER);
INSERT INTO days_ago_list VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13)
;
Then:
SELECT COUNT(id), list_date
FROM (SELECT SUBDATE(CURDATE(), days_ago) AS list_date FROM days_ago_list) dates_to_list
LEFT JOIN (SELECT id, DATE(FROM_UNIXTIME(timestamp)) call_date FROM calls) calls_with_date
ON calls_with_date.call_date = dates_to_list.list_date
GROUP BY list_date
It is very important that you group by list_date; call_date will be NULL for any days without calls. It is also important to COUNT on id since NULL ids will not be counted. (That ensures you get a correct count of 0 for days with no calls.) If you need to change the dates listed, you simply update the table containing the integer list.
Here is a SQL Fiddle demonstrating this.
Alternatively, if this is for a web application, you could generate the list of dates code side and match up the counts with the dates after the query is done. This would make your web app logic somewhat more complicated, but it would also simplify the query and eliminate the need for the extra table.
create a table that contains a row for each date you want to ensure is in the results, left outer join with results of your current query, use temp table's date, count of above query and 0 if that count is null