I have following two queries and their out comes are different but what I want is as following:
I have two tables:
Subject:
-subject_id (Primary key)
-about
-details
feedback:
-id (Primary Key)
-subject_id (Foreign key)
-rating
-DateAndTime
Following are the queries and their result in words:
SELECT distinct about, details, subject.subject_id, round(AVG(rating),2) as Rating,
Max(DATE_FORMAT( DateAndTime, '%d-%m-%Y' )) as Date,
Max(TIME_FORMAT( DateAndTime, '%h:%i:%s' )) as Time
FROM `subject` , `feedback`
WHERE subject.Subject_ID = feedback.Subject_ID
GROUP BY about,details,subject_id
ORDER BY DateAndTime DESC
Here in this query the output is unique about,details,subject_id column and average rating. But Problem is with date and time. I want the last date and time entered for that result and result also contains that but it isn't in ordered manner.
above query's image
When i perform this query it gives perfect order but the rating gets revised
SELECT distinct about, details, subject.subject_id, round(AVG(rating),2) as Rating,
Max(DATE_FORMAT( DateAndTime, '%d-%m-%Y' )) as Date,
Max(TIME_FORMAT( DateAndTime, '%h:%i:%s' )) as Time
FROM `subject` , `feedback`
WHERE subject.Subject_ID = feedback.Subject_ID
GROUP BY about,details,subject_id,dateandtime
ORDER BY DateAndTime,Time DESC
The difference is just in group by clause.
So anyone could help me please.
When you put the day first in the date format, MAX() will select the date with the highest day number, which isn't necessarily the most recent date. For instance, 26-10-2016 is higher than 21-11-2017 because 26 is higher than 21. To order by a formatted date, it has to be in %Y-%m-%d format.
And when you select the maximum time after formatting, you're getting the highest time ignoring the date entirely.
Instead of applying MAX() to the result of DATE_FORMAT, get the maximum DateAndTime and format that as desired:
DATE_FORMAT(MAX(DateAndTime, '%d-%m=%Y')) AS Date,
DATE_FORMAT(MAX(DateAndTime, '%h:%i:%s')) AS Time
To order the results by the displayed date and time, use:
ORDER BY STR_TO_DATE('%d-%m-%Y %h:%i:%s', CONCAT(Date, ' ', Time)) DESC
And you shouldn't have DateAndTime in the GROUP BY, because then you'll get a separate group for each time. You need to combine all the times into a single group so you can then get the last value for it with MAX(DateAndTime).
Related
Noobie to SQL. I have a simple query here that is 70 million rows, and my work laptop will not handle the capacity when I import it into Tableau. Usually 20 million rows and less seem to work fine. Here's my problem.
Table name: Table1
Fields: UniqueID, State, Date, claim_type
Query:
SELECT uniqueID, states, claim_type, date
FROM table1
WHERE date >= '11-09-2021'
This gives me what I want, BUT, I can limit the query significantly if I count the number of uniqueIDs that have been used in 3 or more different states. I use this query to do that.
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '11-09-2021'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
The only issue is, when I put this query into Tableau it only displays the FIRST state a unique_id showed up in, and the first date it showed up. A unique_id shows up in multiple states over multiple dates, so when I use this count aggregation it's only giving me the first result and not the whole picture.
Any ideas here? I am totally lost and spent a whole business day trying to fix this
Expected output would be something like
uniqueID | state | claim type | Date
123 Ohio C 01-01-2021
123 Nebraska I 02-08-2021
123 Georgia D 03-08-2021
If your table is only of those four columns, and your queries are based on date ranges, your index must exist to help optimize that. If 70 mil records exist, how far back does that go... Years? If your data since 2021-09-11 is only say... 30k records, that should be all you are blowing through for your results.
I would ensure you have the index based on (and in this order)
(date, uniqueId, claim_type, states). Also, you mentioned you wanted a count of 3 OR MORE, your query > 3 will results in 4 or more unless you change to count(*) >= 3.
Then, to get the entries you care about, you need
SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3
This would give just the 3-part qualifier for date/id/claim that HAD them. Then you would use THIS result set to get the other entries via
select distinct
date, uniqueID, claim_type, states
from
( SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3 ) PQ
JOIN Table1 t1
on PQ.date = t1.date
and PQ.UniqueID = t1.UniqueID
and PQ.Claim_Type = t1.Claim_Type
The "PQ" (preQuery) gets the qualified records. Then it joins back to the original table and grabs all records that qualified from the unique date/id/claim_type and returns all the states.
Yes, you are grouping rows, so therefore you 'loose' information on the grouped result.
You won't get 70m records with your grouped query.
Why don't you split your imports in smaller chunks? Like limit the rows to chunks of, say 15m:
1st:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000;
2nd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 15000000;
3rd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 30000000;
and so on..
I know its not a perfect or very handy solution but maybe it gets you to the desired outcome.
See this link for infos about LIMIT and OFFSET
https://www.bitdegree.org/learn/mysql-limit-offset
It is wise in the long run to use DATE datatype. That requires dates to look like '2021-09-11, not '09-11-2021'. That will let > correctly compare dates that are in two different years.
If your data is coming from some source that formats it '11-09-2021', use STR_TO_DATE() to convert as it goes in; You can reconstruct that format on output via DATE_FORMAT().
Once you have done that, we can talk about optimizing
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '2021-09-11'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
Tentatively I recommend this composite index speed up the query:
INDEX(Unique_id, claim_type, date, states)
That will also help with your other query.
(I as assuming the ambiguous '11-09-2021' is DD-MM-YYYY.)
I have a date/time field in my table called startTime.
I would like the output as follows:
select
YEAR(startTime),
MONTH(startTime),
DAY(startTime),
dayofmonth(startTime),
startTime,
...
This is fine, and I only have to group by startTime.
However, for my output, I am only really interested in the date part of the startTime.
So I changed my output to be
select
YEAR(startTime),
MONTH(startTime),
DAY(startTime),
dayofmonth(startTime),
DATE(startTime),
...
When I try to run this, SQL makes me group by Year, Month, day, dayofmonth and date(startTime).
This seems to be a quirk of the date() function?
I thought maybe it's due to the time part of the startTime field, but Year, Month, Day and dayofmonth are no more granular than a date so I am confused as why I have to group by those.
Any insights greatly appreciated!
My code currently:
YEAR(startTime),
MONTH(startTime),
DAY(startTime),
dayofmonth(startTime),
date(startTime),
count(id)
from
bookings
group by 1, 2, 3, 4, 5
in your select statement we must have distinct values of selected column (as a group) for every distinct value given value of what we group them by.
For example if you have startime in your select clause, and also in group by clause, we get distinct values of selected attributes - date, month, year, & startTime for every unique value of starttime. But if you remove starttime from select clause, the attributes selected are no longer guaranteed to be unique.
Consider 2 startTime values 2020-03-12T01:01:01.000UTC and 2020-03-12T02:02:02.000UTC its expected to produce two rows if we group by startTime (as two distinct startTimes), but the selected values for year, month and date would be same in both of these rows (as they differ only in time part) which is breaking the group by contract.
Hence we can only have group of attributes in select clause which MUST provide a different value for every combination of attributes in the group by clause.
I have a MySQL database named mydb in which I store daily share prices for
423 companies in a table named data. Table data has the following columns:
`epic`, `date`, `open`, `high`, `low`, `close`, `volume`
epic and date being primary key pairs.
I update the data table each day using a csv file which would normally have 423 rows
of data all having the same date. However, on some days prices may not available
for all 423 companies and data for a particular epic and date pair will
not be updated. In order to determine the missing pair I have resorted
to comparing a full list of epics against the incomplete list of epics using
two simple SELECT queries with different dates and then using a file comparator, thus
revealing the missing epic(s). This is not a very satisfactory solution and so far
I have not been able to construct a query that would identify any epics that
have not been updated for any particular day.
SELECT `epic`, `date` FROM `data`
WHERE `date` IN ('2019-05-07', '2019-05-08')
ORDER BY `epic`, `date`;
Produces pairs of values:
`epic` `date`
"3IN" "2019-05-07"
"3IN" "2019-05-08"
"888" "2019-05-07"
"888" "2019-05-08"
"AA." "2019-05-07"
"AAL" "2019-05-07"
"AAL" "2019-05-08"
Where in this case AA. has not been updated on 2019-05-08. The problem with this is that it is not easy to spot a value that is not a pair.
Any help with this problem would be greatly appreciated.
You could do a COUNT on epic, with a GROUP BY epic for items in that date range and see if you get any with a COUNT less than 2, then select from this result where UpdateCount is less than 2, forgive me if the syntax on the column names is not correct, I work in SQL Server, but the logic for the query should still work for you.
SELECT x.epic
FROM
(
SELECT COUNT(*) AS UpdateCount, epic
FROM data
WHERE date IN ('2019-05-07', '2019-05-08')
GROUP BY epic
) AS x
WHERE x.UpdateCount < 2
Assuming you only want to check the last date uploaded, the following will return every item not updated on 2019-05-08:
SELECT last_updated.epic, last_updated.date
FROM (
SELECT epic , max(`date`) AS date FROM `data`
GROUP BY 'epic'
) AS last_updated
WHERE 'date' <> '2019-05-08'
ORDER BY 'epic'
;
or for any upload date, the following will compare against the entire database, so you don't rely on '2019-08-07' having every epic row. I.e. if the epic has been in the database before then it will show if not updated:
SELECT d.epic, max(d.date)
FROM data as d
WHERE d.epic NOT IN (
SELECT d2.epic
FROM data as d2
WHERE d2.date = '2019-05-08'
)
GROUP BY d.epic
ORDER BY d.epic
Not sure how this would work. I have a between query, but how would I run a query to list results that match each and every day. Example, enterys that exists on 2011-06-17, 2011-06-18, 2011-06-19 and 2011-06-20
SELECT lookup, `loc`, `octect1` ,`octect2` ,`octect3` ,`octect4`, date, time, count(`lookup`) as count FROM index
WHERE date between '2011-06-17' AND '2011-06-20'
GROUP BY lookup
ORDER BY count DESC
Thanks
Instead of BETWEEN, use comparison operators:
SELECT lookup, `loc`, `octect1` ,`octect2` ,`octect3` ,`octect4`,
date, time, count(`lookup`) as count
FROM index
WHERE date > '2011-06-17' AND date < '2011-06-20'
GROUP BY lookup
ORDER BY count DESC
i have a table that has a serial number, date and time when that serial number was modified. i would like to retrieve the latest time and date when that particular serial number was modified. any suggestions? the dates and times are on different columns.
thanks
Assuming the data and time columns are of types that MySQL knows how to sort correctly (i.e. DATE and TIME types), this should work:
SELECT * FROM table_name
ORDER BY date_col DESC, time_col DESC
LIMIT 1
Depends on how one can find the most recent record – can there be multiple rows with the same date and time ?
Is the serial number monotonically increasing or decreasing ?
Try using ORDER BY (DESC sorts from newest to oldest) and LIMIT to get what you want, e.g.
SELECT * FROM `table` ORDER BY `date` DESC, `time`, `serial` DESC LIMIT 1