I have 4 tables for different dates, the tables looks like this:
what I'm trying to do is to find the maximum tps for each service_name,function_name among all four days according to hour. for example in the figure I posted there is service_name(BatchItemService) in first raw that have (getItemAvailability) as function_name in date 13-06-12 01. I have same service_name for same function_name in all the other 3 tables for the same hour "01" but with different days, like day 13,14,15. I want to find maximum tps for this service_name,function_name set for hour "01" among all the four days.
I tried this, but it give me incorrect result.
SELECT
t.service_name,
t.function_name,
t.date,
max(t.tps)
FROM
(SELECT
service_name, function_name, date, tps
FROM
trans_per_hr_2013_06_12
UNION ALL
SELECT
service_name, function_name, date,tps
FROM
trans_per_hr_2013_06_13
GROUP BY service_name,function_name,date
UNION ALL
SELECT
service_name, function_name,date, tps
FROM
trans_per_hr_2013_06_14
UNION ALL
SELECT
service_name, function_name, date, tps
FROM
trans_per_hr_2013_06_15
UNION ALL
SELECT
service_name, function_name,date, tps
FROM
trans_per_hr_2013_06_16
) t
GROUP BY t.service_name,t.function_name,hour(t.Date);
Thanks a lot...
Your query looks like it should be returning what you want.
One possible issue is the type of the date column. As shown in the output, this looks like it might be stored as a character string rather than a date. If so, the following would work for the group by statement (assuming the format is as shown: DD-MM-YY H).
GROUP BY t.service_name,t.function_name, right(t.Date, 2);
As Bohemian says in the comment, this is not a good data structure. You have parallel tables and you are storing the date both in the table name and in a column. You should learn about table partitioning. This is a way that you can store different days in different files, but still have MySQL interpret them as one table. It would probably greatly simplify your using this data.
Related
Noobie to SQL. I have a simple query here that is 70 million rows, and my work laptop will not handle the capacity when I import it into Tableau. Usually 20 million rows and less seem to work fine. Here's my problem.
Table name: Table1
Fields: UniqueID, State, Date, claim_type
Query:
SELECT uniqueID, states, claim_type, date
FROM table1
WHERE date >= '11-09-2021'
This gives me what I want, BUT, I can limit the query significantly if I count the number of uniqueIDs that have been used in 3 or more different states. I use this query to do that.
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '11-09-2021'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
The only issue is, when I put this query into Tableau it only displays the FIRST state a unique_id showed up in, and the first date it showed up. A unique_id shows up in multiple states over multiple dates, so when I use this count aggregation it's only giving me the first result and not the whole picture.
Any ideas here? I am totally lost and spent a whole business day trying to fix this
Expected output would be something like
uniqueID | state | claim type | Date
123 Ohio C 01-01-2021
123 Nebraska I 02-08-2021
123 Georgia D 03-08-2021
If your table is only of those four columns, and your queries are based on date ranges, your index must exist to help optimize that. If 70 mil records exist, how far back does that go... Years? If your data since 2021-09-11 is only say... 30k records, that should be all you are blowing through for your results.
I would ensure you have the index based on (and in this order)
(date, uniqueId, claim_type, states). Also, you mentioned you wanted a count of 3 OR MORE, your query > 3 will results in 4 or more unless you change to count(*) >= 3.
Then, to get the entries you care about, you need
SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3
This would give just the 3-part qualifier for date/id/claim that HAD them. Then you would use THIS result set to get the other entries via
select distinct
date, uniqueID, claim_type, states
from
( SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3 ) PQ
JOIN Table1 t1
on PQ.date = t1.date
and PQ.UniqueID = t1.UniqueID
and PQ.Claim_Type = t1.Claim_Type
The "PQ" (preQuery) gets the qualified records. Then it joins back to the original table and grabs all records that qualified from the unique date/id/claim_type and returns all the states.
Yes, you are grouping rows, so therefore you 'loose' information on the grouped result.
You won't get 70m records with your grouped query.
Why don't you split your imports in smaller chunks? Like limit the rows to chunks of, say 15m:
1st:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000;
2nd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 15000000;
3rd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 30000000;
and so on..
I know its not a perfect or very handy solution but maybe it gets you to the desired outcome.
See this link for infos about LIMIT and OFFSET
https://www.bitdegree.org/learn/mysql-limit-offset
It is wise in the long run to use DATE datatype. That requires dates to look like '2021-09-11, not '09-11-2021'. That will let > correctly compare dates that are in two different years.
If your data is coming from some source that formats it '11-09-2021', use STR_TO_DATE() to convert as it goes in; You can reconstruct that format on output via DATE_FORMAT().
Once you have done that, we can talk about optimizing
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '2021-09-11'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
Tentatively I recommend this composite index speed up the query:
INDEX(Unique_id, claim_type, date, states)
That will also help with your other query.
(I as assuming the ambiguous '11-09-2021' is DD-MM-YYYY.)
I have this fact table here, I would like using this table to list group by year and have the total number of patients that have PatientType_id = 1101.
Example:
2012 5
2012 8
The Date_DateKey is actually the date 2012-03-14. I've managed to list the total patients with typeID 1101 for a single year, but I don't know how is possible to list all the years. Could you give me some hints please?
And here's the Date dimension
Normally, a key column in a fact table would reference another table. So, you should have a date/calendar table somewhere with information like the year. That would be the proper way to get this information.
I discourage you from parsing key values in general. In this case, with the information you have provided, it seems to be the only solution:
select floor(date_datekey / 10000) as year, count(distinct patient_id)
from table t
where PatientType_id = 1101
group by floor(date_datekey / 10000)
Try this:
SELECT LEFT(DateKe,4) as Year, COUNT(patient_id) FROM Table WHERE PatientType_id = 1101 GROUP BY LEFT(DateKe,4)
Say I have this .csv file which holds data that describes sales of a product. Now say I want a monthly breakdown of number of sales. I mean I wanna see how many orders were received in JAN2005, FEB2005...JAN2008, FEB2008...NOV2012, DEC2012.
Now one very simply way I can think of is count them one by one like this. (BTW I am using logparser to run my queries)
logparser -i:csv -o:csv "SELECT COUNT(*) AS NumberOfSales INTO 'C:\Users\blah.csv' FROM 'C:\User\whatever.csv' WHERE OrderReceiveddate LIKE '%JAN2005%'
My question is if there is a smarter way to do this. I mean, instead of changing the month again and again and running my query, can I write one query which can produce the result in one excel all at one.
Yes.
If you add a group by clause to the statement, then the sql will return a separate count for each unique value of the group by column.
So if you write:
SELECT OrderReceiveddate, COUNT(*) AS NumberOfSales INTO 'C:\Users\blah.csv'
FROM `'C:\User\whatever.csv' GROUP BY OrderReceiveddate`
you will get results like:
JAN2005 12
FEB2005 19
MAR2005 21
Assuming OrderReceiveDate is a date, you would format the date to have a year and month and then aggregate:
SELECT date_format(OrderReceiveddate, '%Y-%m') as YYYYMM, COUNT(*) AS NumberOfSales
INTO 'C:\Users\blah.csv'
FROM 'C:\User\whatever.csv'
WHERE OrderReceiveddate >= '2015-01-01'
GROUP BY date_format(OrderReceiveddate, '%Y-%m')
ORDER BY YYYYMM
You don't want to use like on a date column. like expects string arguments. Use date functions instead.
I have one table which is having four fields:
trip_paramid, creation_time, fuel_content,vehicle_id
I want to find the difference between two rows.In my table i have one field fuel_content.Every two minutes i getting packets and inserting to database.From this i want to find out total refuel quantity.If fuel content between two packets is greater than 2,i will treat it as refueling quantity.Multiple refuel may happen in same day.So i want to find out total refuel quantity for a day for a vehicle.I created one table schema&sample data in sqlfiddle. Can anyone help me to find a solution for this.here is the link for table schema..http://www.sqlfiddle.com/#!2/4cf36
Here is a good query.
Parameters (vehicle_id=13) and (date='2012-11-08') are injected in the query, but they are parameters to be modified.
You can note that have I chosen an expression using creation_time<.. and creation_time>.. in instead of DATE(creation_time)='...', this is because the first expression can use indexes on "creation_time" while the second one cannot.
SELECT
SUM(fuel_content-prev_content) AS refuel_tot
, COUNT(*) AS refuel_nbr
FROM (
SELECT
p.trip_paramid
, fuel_content
, creation_time
, (
SELECT ps.fuel_content
FROM trip_parameters AS ps
WHERE (ps.vehicle_id=p.vehicle_id)
AND (ps.trip_paramid<p.trip_paramid)
ORDER BY trip_paramid DESC
LIMIT 1
) AS prev_content
FROM trip_parameters AS p
WHERE (p.vehicle_id=13)
AND (creation_time>='2012-11-08')
AND (creation_time<DATE_ADD('2012-11-08', INTERVAL 1 DAY))
ORDER BY p.trip_paramid
) AS log
WHERE (fuel_content-prev_content)>2
Test it:
select sum(t2.fuel_content-t1.fuel_content) TotalFuel,t1.vehicle_id,t1.trip_paramid as rowIdA,
t2.trip_paramid as rowIdB,
t1.creation_time as timeA,
t2.creation_time as timeB,
t2.fuel_content fuel2,
t1.fuel_content fuel1,
(t2.fuel_content-t1.fuel_content) diffFuel
from trip_parameters t1, trip_parameters t2
where t1.trip_paramid<t2.trip_paramid
and t1.vehicle_id=t2.vehicle_id
and t1.vehicle_id=13
and t2.fuel_content-t1.fuel_content>2
order by rowIdA,rowIdB
where (rowIdA,rowIdB) are all possibles tuples without repetition, diffFuel is the difference between fuel quantity and TotalFuel is the sum of all refuel quanty.
The query compare all fuel content diferences for same vehicle(in this example, for vehicle with id=13) and only sum fuel quantity when the diff fuel is >2.
Regards.
Is there a function to find average time difference in the standard time format in my sql.
You can use timestampdiff to find the difference between two times.
I'm not sure what you mean by "average," though. Average across the table? Average across a row?
If it's the table or a subset of rows:
select
avg(timestampdiff(SECOND, startTimestamp, endTimestamp)) as avgdiff
from
table
The avg function works like any other aggregate function, and will respond to group by. For example:
select
col1,
avg(timestampdiff(SECOND, startTimestamp, endTimestamp)) as avgdiff
from
table
group by col1
That will give you the average differences for each distinct value of col1.
Hopefully this gets you pointed in the right direction!
What I like to do is a
SELECT count(*), AVG(TIME_TO_SEC(TIMEDIFF(end,start)))
FROM
table
Gives the number of rows as well...
In order to get actual averages in the standard time format from mysql I had to convert to seconds, average, and then convert back:
SEC_TO_TIME(AVG(TIME_TO_SEC(TIMEDIFF(timeA, timeB))))
If you don't convert to seconds, you get an odd decimal representation of the minutes that doesn't really make any sense (to me).
I was curious if AVG() was accurate or not, the way that COUNT() actually just approximates the value ("this value is an approximation"). After all, let's review the average formula: average = sum / count. So, knowing that the count is accurate is actually really important for this formula!
After testing multiple combinations, it definitely seems like AVG() works and is a great approach. You can calculate yourself to see if it's working with...
SELECT
COUNT(id) AS count,
AVG(TIMESTAMPDIFF(SECOND, OrigDateTime, LastDateTime)) AS avg_average,
SUM(TIMESTAMPDIFF(SECOND, OrigDateTime, LastDateTime)) / (select COUNT(id) FROM yourTable) as calculated_average,
AVG(TIME_TO_SEC(TIMEDIFF(LastDateTime,OrigDateTime))) as timediff_average,
SEC_TO_TIME(AVG(TIME_TO_SEC(TIMEDIFF(LastDateTime, OrigDateTime)))) as date_display
FROM yourTable
Sample Results:
count: 441000
avg_average: 5045436.4376
calculated_average: 5045436.4376
timediff_average: 5045436.4376
date_display: 1401:30:36
Seems to be pretty accurate!
This will return:
count: The count.
avg_average: The average based on AVG(). (Thanks to Eric for their answer on this!)
calculated_average: The average based on SUM()/COUNT().
timediff_avg: The average based on TIMEDIFF(). (Thanks to Andrew for their answer on this!)
date_display: A nicely-formatted display version. (Thanks to C S for their answer on this!)