Group By date SQL Query on a fact table - mysql

I have this fact table here, I would like using this table to list group by year and have the total number of patients that have PatientType_id = 1101.
Example:
2012 5
2012 8
The Date_DateKey is actually the date 2012-03-14. I've managed to list the total patients with typeID 1101 for a single year, but I don't know how is possible to list all the years. Could you give me some hints please?
And here's the Date dimension

Normally, a key column in a fact table would reference another table. So, you should have a date/calendar table somewhere with information like the year. That would be the proper way to get this information.
I discourage you from parsing key values in general. In this case, with the information you have provided, it seems to be the only solution:
select floor(date_datekey / 10000) as year, count(distinct patient_id)
from table t
where PatientType_id = 1101
group by floor(date_datekey / 10000)

Try this:
SELECT LEFT(DateKe,4) as Year, COUNT(patient_id) FROM Table WHERE PatientType_id = 1101 GROUP BY LEFT(DateKe,4)

Related

MySQL question about using between operators

I am learning MySQL and saw a project related to e-commerce and customer behaviour and want to follow it along.
However, when calculating the number of unique customers retained on the second day, the original author used a different approach and got a different result.
The code is below:
select count(distinct user_id) as first_day_customer_num from userbehavior
where date = '2017-11-25';-- 359 unique customers counted and retained on the first day
select count(distinct user_id) as second_day_customer_num from userbehavior
where date = '2017-11-26' and user_id in (SELECT user_id FROM userbehavior
WHERE date = '2017-11-25');-- 295 unique customers counted and retained on the second day
I used the between method for date and here is my code below to calculate the number of unique customers retained on the second day:
select count(distinct user_id) as trial from userbehavior
where date between '2017-11-25' and '2017-11-26'; -- 450 unique customers counted
Could I ask why is our result different and which part did I do wrong?
Thank you so much for your help and support, really appreciate it.

Getting average value based on grouped data

I'm trying to find the average of net total for a given month, based on previous years to help show things like seasonal trends in sales.
I have a table called "Invoice" which looks similar to the below (slimmed down for the purpose of this post):
ID - int
IssueDate - DATE
NetTotal - Decimal
Status - Enum
The data I'm trying to get, for example would be similar to this:
(sum of invoices in June 2018 + sum of invoices in June 2019 + sum of invoices in June 2020) divided by number of years covered (3) = Overall average for June
But, doing this for the full 12 months of the year based on all the data (not just 2018 through to 2020).
I'm a bit stumped on how to pull this data. I've tried subqueries and even tried using a SUM within an AVG select, but the query either fails or returns incorrect data.
An example of what I've tried:
SELECT MONTHNAME(`Invoice`.`IssueDate`) AS `CalendarMonth`, AVG(`subtotal`)
FROM (SELECT SUM(`Invoice`.`NetTotal`) AS `subtotal`
FROM `Invoice`
GROUP BY EXTRACT(YEAR_MONTH FROM `Invoice`.`IssueDate`)) AS `sub`, `Invoice`
GROUP BY MONTH(`Invoice`.`IssueDate`)
which returns:
I see two parts to this query, but unsure how to structure it:
A sum and count of all data based on the month
An average based on the number of years
I'm not sure where to go from here and would appreciate any pointers.
Ideally, I'd want to get the totals from rows where "Status" = "Paid", but trying to crack the first part first. Walk before running as they say!
Any guidance greatly appreciated!
Basically you want two levels of aggregation:
SELECT mm, AVG(month_total)
FROM (SELECT YEAR(i.IssueDate) as yyyy, MONTH(i.issueDate) as mm,
SUM(i.`NetTotal`) as month_total
FROM Invoice i
GROUP BY yyyy, mm
) ym
GROUP BY mm;
Just for the Average Amount Part You Could use a query like
Select Date From Your_Table Where Date Like '20__-06-%'
You can arrange it into asc desc order.

how can I calculate the SUM in 4days buckets over all dates

I have a MySQL DB where one column is the DATE and the other column is the SIGNAL. Now I would like to calculate the SUM over Signal for 4 days each.
f.e.
SUM(signal over DATE1,DATE2,DATE3,DATE4)
SUM(signal over DATE5,DATE6,DATE7,DATE8)
...
whereas Date_N = successor of DATE_N-1 but need not to be the day before
Moreless the algo should be variable in the days group. 4 ist just an example.
Can anyone here give me an advice how to perform this in MySQL?
I have found this here group by with count, maybe this could be helpful for my issue?
Thanks
Edit: One important note: My date ranges have gaps in it. you see this in the picture below, in the column count(DISTINCT(TradeDate)). It should be always 4 when I have no gaps. But I DO have gaps. But when I sort the date descending, I would like to group the dates together always 4 days, f.e. Group1: 2017-08-22 + 2017-08-21 + 2017-08-20 + 2017-08-19, Group2: 2017-08-18 + 2017-08-17+2017-08-15+2017-08-14, ...
maybe I could map the decending dateranges into a decending integer autoincrement number, then I would have a number without gaps. number1="2017-08-17" number2="2017-08-15" and so on ..
Edit2:
As I see the result from my table with this Query: I might I have double entries for one and the same date. How Can I distinct this date-doubles into only one reprensentative?
SELECT SUM(CondN1),count(id),count(DISTINCT(TradeDate)),min(TradeDate),max(TradeDate) ,min(TO_DAYS(DATE(TradeDate))),id FROM marketstat where Stockplace like '%' GROUP BY TO_DAYS(DATE(TradeDate)) DIV 4 order by TO_DAYS(DATE(TradeDate))
SUM() is a grouping function, so you need to GROUP BY something. That something should change only every four days. Let's start by grouping by one day:
SELECT SUM(signal)
FROM tableName
GROUP BY date
date should really be of type DATE, like you mentioned, not DATETIME or anything else. You could use DATE(date) to convert other date types to dates. Now we need to group by four dates:
SELECT SUM(signal)
FROM tableName
GROUP BY TO_DAYS(date) DIV 4
Note that this will create an arbitary group of four days, if you want control over that you can add a term like this:
SELECT SUM(signal)
FROM tableName
GROUP BY (TO_DAYS(date)+2) DIV 4
In the meantime and with help of KIKO I have found the solution:
I make a temp table with
CREATE TEMPORARY TABLE if not EXISTS tradedatemaptmp (id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY) SELECT Tradedate AS Tradedate, CondN1, CondN2 FROM marketstat WHERE marketstat.Stockplace like 'US' GROUP BY TradeDate ORDER BY TradeDate asc;
and use instead the originate tradedate the now created id in the temp table. So I could manage that - even when I have gaps in the tradedate range, the id in the tmp table has no gaps. And with this I can DIV 4 and get the always the corresponding 4 dates together.

Stop query from skipping over null values

I have a query that shows me the number of calls per day for the last 14 days within my app.
The query:
SELECT count(id) as count, DATE(FROM_UNIXTIME(timestamp)) as date FROM calls GROUP BY DATE(FROM_UNIXTIME(timestamp)) DESC LIMIT 14
On days where there were 0 calls, this query does not show those days. Rather than skip those days, I'd like to have a 0 or NULL in that spot.
Any ideas for how I can achieve this? If you have any questions as to what I'm asking please let me know.
Thanks
I don't believe your query is "skipping over NULL values", as your title suggests. Rather, your data probably looks something like this:
id | timestamp
----+------------
1 | 2014-01-01
2 | 2014-01-02
3 | 2014-01-04
As a result, there are no rows that contain the missing date, so there are no rows to be counted. The answer is that you need to generate a list of all the dates you want and then do a LEFT or RIGHT JOIN to it.
Unfortunately, MySQL doesn't make this as easy as other databases. There doesn't seem to be an effective way of generating a list of anything inline. So you'll need some sort of table.
I think I would create a static table containing a set of integers to be subtracted from the current date. Then you can use this table to generate your list of dates inline and JOIN to it.
CREATE TABLE days_ago_list (days_ago INTEGER);
INSERT INTO days_ago_list VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13)
;
Then:
SELECT COUNT(id), list_date
FROM (SELECT SUBDATE(CURDATE(), days_ago) AS list_date FROM days_ago_list) dates_to_list
LEFT JOIN (SELECT id, DATE(FROM_UNIXTIME(timestamp)) call_date FROM calls) calls_with_date
ON calls_with_date.call_date = dates_to_list.list_date
GROUP BY list_date
It is very important that you group by list_date; call_date will be NULL for any days without calls. It is also important to COUNT on id since NULL ids will not be counted. (That ensures you get a correct count of 0 for days with no calls.) If you need to change the dates listed, you simply update the table containing the integer list.
Here is a SQL Fiddle demonstrating this.
Alternatively, if this is for a web application, you could generate the list of dates code side and match up the counts with the dates after the query is done. This would make your web app logic somewhat more complicated, but it would also simplify the query and eliminate the need for the extra table.
create a table that contains a row for each date you want to ensure is in the results, left outer join with results of your current query, use temp table's date, count of above query and 0 if that count is null

find max value among different tables group by date

I have 4 tables for different dates, the tables looks like this:
what I'm trying to do is to find the maximum tps for each service_name,function_name among all four days according to hour. for example in the figure I posted there is service_name(BatchItemService) in first raw that have (getItemAvailability) as function_name in date 13-06-12 01. I have same service_name for same function_name in all the other 3 tables for the same hour "01" but with different days, like day 13,14,15. I want to find maximum tps for this service_name,function_name set for hour "01" among all the four days.
I tried this, but it give me incorrect result.
SELECT
t.service_name,
t.function_name,
t.date,
max(t.tps)
FROM
(SELECT
service_name, function_name, date, tps
FROM
trans_per_hr_2013_06_12
UNION ALL
SELECT
service_name, function_name, date,tps
FROM
trans_per_hr_2013_06_13
GROUP BY service_name,function_name,date
UNION ALL
SELECT
service_name, function_name,date, tps
FROM
trans_per_hr_2013_06_14
UNION ALL
SELECT
service_name, function_name, date, tps
FROM
trans_per_hr_2013_06_15
UNION ALL
SELECT
service_name, function_name,date, tps
FROM
trans_per_hr_2013_06_16
) t
GROUP BY t.service_name,t.function_name,hour(t.Date);
Thanks a lot...
Your query looks like it should be returning what you want.
One possible issue is the type of the date column. As shown in the output, this looks like it might be stored as a character string rather than a date. If so, the following would work for the group by statement (assuming the format is as shown: DD-MM-YY H).
GROUP BY t.service_name,t.function_name, right(t.Date, 2);
As Bohemian says in the comment, this is not a good data structure. You have parallel tables and you are storing the date both in the table name and in a column. You should learn about table partitioning. This is a way that you can store different days in different files, but still have MySQL interpret them as one table. It would probably greatly simplify your using this data.