Strange SQL result order by date, count(date) - mysql

I'm atm seeing a strange behaviour in a sql request.
My table pointLog:
# Column Type
1 date timestamp
2 uid varchar(30)
3 ssid varchar(40)
4 reason varchar(50)
5 points int(5) No None
The statement:
SELECT date, count(date) as anzahl FROM pointLog WHERE uid = 1 order by date desc
returns following result
date anzahl
2012-09-01 12:21:16 14
But the statement:
SELECT date FROM pointLog WHERE uid = 1 order by date desc
Returns
2012-09-02 12:44:08
as first result.
So my question: why I'm not receiving the 2012-09-02 as first result in the first statement which includes a count ??
Thanks a lot!
edit
The count as anzahl is atm only used to verify that there are more than 0 entries in this example. I know its should be tested in the programm.
But my main problem is, that I can't explain, why I'm getting 2 different dates when it's sorted by the same (date). So the only difference is the count attribute. But that shouldn't normally change the sorting?
FINAL* The solution
SELECT MAX(date) AS maxdate, COUNT(date) AS anzahl FROM pointLog WHERE uid = 1

You're getting one results: a count of the dates, preceded by an arbitrary date. The ORDER BY statement is then ordering that, which is a no-op because there's only one line of output.

As #prosfilaes answered, your query does not return deterministic results, the date returned in an arbitrary one.
If you want the latest date and the count, you can use:
SELECT
MAX(date) AS maxdate,
COUNT(date) AS anzahl
FROM
pointLog
WHERE
uid = 1 ;

You need a GROUP BY in your SQL statement, otherwise the COUNT() will not give separate totals for each date.

Related

Optomizing a simple query with 70mil rows to fit into Tableau

Noobie to SQL. I have a simple query here that is 70 million rows, and my work laptop will not handle the capacity when I import it into Tableau. Usually 20 million rows and less seem to work fine. Here's my problem.
Table name: Table1
Fields: UniqueID, State, Date, claim_type
Query:
SELECT uniqueID, states, claim_type, date
FROM table1
WHERE date >= '11-09-2021'
This gives me what I want, BUT, I can limit the query significantly if I count the number of uniqueIDs that have been used in 3 or more different states. I use this query to do that.
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '11-09-2021'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
The only issue is, when I put this query into Tableau it only displays the FIRST state a unique_id showed up in, and the first date it showed up. A unique_id shows up in multiple states over multiple dates, so when I use this count aggregation it's only giving me the first result and not the whole picture.
Any ideas here? I am totally lost and spent a whole business day trying to fix this
Expected output would be something like
uniqueID | state | claim type | Date
123 Ohio C 01-01-2021
123 Nebraska I 02-08-2021
123 Georgia D 03-08-2021
If your table is only of those four columns, and your queries are based on date ranges, your index must exist to help optimize that. If 70 mil records exist, how far back does that go... Years? If your data since 2021-09-11 is only say... 30k records, that should be all you are blowing through for your results.
I would ensure you have the index based on (and in this order)
(date, uniqueId, claim_type, states). Also, you mentioned you wanted a count of 3 OR MORE, your query > 3 will results in 4 or more unless you change to count(*) >= 3.
Then, to get the entries you care about, you need
SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3
This would give just the 3-part qualifier for date/id/claim that HAD them. Then you would use THIS result set to get the other entries via
select distinct
date, uniqueID, claim_type, states
from
( SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3 ) PQ
JOIN Table1 t1
on PQ.date = t1.date
and PQ.UniqueID = t1.UniqueID
and PQ.Claim_Type = t1.Claim_Type
The "PQ" (preQuery) gets the qualified records. Then it joins back to the original table and grabs all records that qualified from the unique date/id/claim_type and returns all the states.
Yes, you are grouping rows, so therefore you 'loose' information on the grouped result.
You won't get 70m records with your grouped query.
Why don't you split your imports in smaller chunks? Like limit the rows to chunks of, say 15m:
1st:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000;
2nd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 15000000;
3rd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 30000000;
and so on..
I know its not a perfect or very handy solution but maybe it gets you to the desired outcome.
See this link for infos about LIMIT and OFFSET
https://www.bitdegree.org/learn/mysql-limit-offset
It is wise in the long run to use DATE datatype. That requires dates to look like '2021-09-11, not '09-11-2021'. That will let > correctly compare dates that are in two different years.
If your data is coming from some source that formats it '11-09-2021', use STR_TO_DATE() to convert as it goes in; You can reconstruct that format on output via DATE_FORMAT().
Once you have done that, we can talk about optimizing
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '2021-09-11'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
Tentatively I recommend this composite index speed up the query:
INDEX(Unique_id, claim_type, date, states)
That will also help with your other query.
(I as assuming the ambiguous '11-09-2021' is DD-MM-YYYY.)

mysql date conversion returns null converting certain months

I have this query (take a look on between dates):
SELECT user_name, COUNT(*) AS 'COUNT'
FROM user_records
WHERE date_created between (STR_TO_DATE('11/24/2020','%m/%d/%y'))
and (STR_TO_DATE('12/26/2021','%m/%d/%y'))
GROUP BY user_name ;
The select is between dates:
startDate: (STR_TO_DATE('11/24/2020','%m/%d/%y'))
finishDate: (STR_TO_DATE('12/26/2021','%m/%d/%y'))
This query will return something because there are records on year 2020
the problem is when i change the month of the finishDate, i tried with:
finishDate: (STR_TO_DATE('1/26/2021','%m/%d/%y')) = null
finishDate: (STR_TO_DATE('01/26/2021','%m/%d/%y')) = null
finishDate: (STR_TO_DATE('10/26/2021','%m/%d/%y')) = null
It just makes no sense... im using mysql community 8.0.20
Since the problem only occurs in the finsihDate perhaps this could be helpful.
SELECT user_name, COUNT(*) AS 'COUNT'
FROM user_records
WHERE date_created between (STR_TO_DATE('11/24/2020','%m/%d/%y'))
and (DATE_ADD(STR_TO_DATE('11/24/2020','%m/%d/%y'), INTERVAL 367 DAY))
GROUP BY user_name ;
Of course you should check for relevant errors or warnings in MySQL server logs, that could explain the problem for finsihDate.
********UPDATE SOLUTION:
for some unknown reason my db IDE shows the date with this format "$DAY/$MONTH/$YEAR" even if insert the right DATE MYSQL FORMAT ("$YEAR-$MONTH-$DAY)
i got the following warnings:
And this is the final query that worked but your solution did worked as well:
SELECT user_name, COUNT(*) AS 'COUNT'
FROM user_records
WHERE date_created between '2020-11-24' AND '2021-01-24'
GROUP BY user_name ;
The problem with your query is the date format. Lowercase '%y' matches a two digit year. So, only the first two characters from 2021 are used for the year -- and you have the wrong year.
But, that is not the real problem. You don't need str_to_date(). Just use properly formatted date literals.
Assuming that the dates are stored correctly as date data types, then you can simply use:
SELECT user_name, COUNT(*) AS COUNT
FROM user_records
WHERE date_created between '2020-11-24' and '2021-12-26'
GROUP BY user_name ;
If date_created is stored as a string, then fix your data model so it is either a date or datetime. Dates should not be stored as strings.

MySQL select records using MAX(datefield) minus three days

Clearly, I am missing the forest for the trees...I am missing something obvious here!
Scenario:
I've a typical table asset_locator with multiple fields:
id, int(11) PRIMARY
logref, int(11)
unitno, int(11)
tunits, int(11)
operator, varchar(24)
lineid, varchar(24)
uniqueid, varchar(64)
timestamp, timestamp
My current challenge is to SELECT records from this table based on a date range. More specifically, a date range using the MAX(timestamp) field.
So...when selecting I need to start with the latest timestamp value and go back 3 days.
EX: I select all records WHERE the lineid = 'xyz' and going back 3 days from the latest timestamp. Below is an actual example (of the dozens) I've been trying to run.
MySQL returns a single row with all NULL values for the following:
SELECT id, logref, unitno, tunits, operator, lineid,
uniqueid, timestamp, MAX( timestamp ) AS maxdate
FROM asset_locator
WHERE 'maxdate' < DATE_ADD('maxdate',INTERVAL -3 DAY)
ORDER BY uniqueid DESC
There MUST be something obvious I am missing. If anyone has any ideas, please share.
Many thanks!
MAX() is an aggregated function, which means your SELECT will always return one row containing the maximum value. Unless you use GROUP BY, but it looks that's not what you need.
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_max
If you need all the entries between MAX(timestamp) and 3 days before, then you need to do a subselect to obtain the max date, and after that use it in the search condition. Like this:
SELECT id, logref, unitno, tunits, operator, lineid, uniqueid, timestamp
FROM asset_locator
WHERE timestamp >= DATE_ADD( (SELECT MAX(timestamp) FROM asset_locator), INTERVAL -3 DAY)
It will still run efficiently as long as you have an index defined on timestamp column.
Note: In your example
WHERE 'maxdate' < DATE_ADD('maxdate',INTERVAL -3 DAY)
Here you were are actually using the string "maxdate" because of the quotes causing the condition to return false. That's why you were seeing NULL for all fields.
Edit: Oops, forgot the "FROM asset_locator" in query. It got lost at some point when writing the answer :)

sql get row with max date and relative id record

I try to make a simple query, spent on it a tone of hours and get nothing....
All I need is to get MAX date and all is corresponding fields.
I'm explain:
I have a table with this fields: BasketID, OrderStatusTypeID, StatusDate.
I try to get only one record that contain OrderStatusTypeID value with the last StatusDate.
This is the data
BasketID OrderStatusTypeID date
1111 13 2013-04-01 11:38:31
1111 26 2013-04-04 17:44:17
1111 39 2013-04-02 12:35:07
1111 40 2013-04-08 12:52:55
This is my query:
SELECT BasketID, OrderStatusTypeID, max(StatusDate) date
FROM st
where BasketID=1111
group by BasketID
This is the results i need
BasketID OrderStatusTypeID date
63558 40 2013-04-08 12:52:55
For some reason I only get OrderStatusTypeID = 13 and not 40!
(max of StatusDate, and NOT max of OrderStatusTypeID).
Why???
BasketID OrderStatusTypeID date
63558 13 2013-04-08 12:52:55
Thanks for fast response!
I assume you are using MySQL because you can run the query even if you have not specified all the non-aggregate column in the GROUP BY clause.
There are many ways to solve the problem but I'm used to do it this way. The query uses a subquery which separately gets the latest date for every BasketID. Since the subquery returned only two columns, you need to join it back on the table itself to get the other columns provided that it match on two columns: BasketID, Date.
SELECT a.*
FROM st a
INNER JOIN
(
SELECT BasketID, MAX(Date) max_date
FROM st
GROUP BY BasketID
) b ON a.BasketID = b.BasketID AND
a.Date = b.max_date
Your query has executed successfully without throwing an exception even though there are non-aggregate columns that are not specified in the GROUP BY clause because it is permitted in MySQL. See MySQL Extensions to GROUP BY.
I know this is an old thread, but
why isn't it possible to use ORDER BY and LIMIT in this case?
A query like that:
SELECT * FROM st
WHERE BasketID=1111
ORDER BY Date DESC
LIMIT 1
Its pretty easy to use TOP command like:
SELECT TOP 1 * FROM st
WHERE BasketID = 1111
ORDER BY date DESC

how to count records with newest date only

How do I modify this MySQL query to only count leadIDs from table leads where column 'Date' contains the newest (youngest) date?
SELECT COUNT(leadID) as accepted FROM leads WHERE change like '%OK%'
The problem is that leadID can have multiple instances in table leads. The original query result is "4" because of one duplicate. The correct result is "3".
The date is stored in this format: 2011-10-26 18:23:52. The result should take hours and minutes into consideration when determining the youngest date.
TABLE leads:
leadID | date | change
1 | 2011-10-26 18:23:52 | BAD
1 | 2011-10-26 17:00:00 | OK
2 | 2011-10-26 19:23:52 | OK
3 | 2011-10-26 20:23:52 | OK
4 | 2011-10-26 21:23:52 | OK
5 | 2011-10-26 22:23:52 | BAD
I think this is what you're looking for:
select count(distinct l1.leadId) as accepted from leads l1
left join leads l2
on l1.leadId = l2.leadId and l1.date < l2.date
where l2.date is null and l1.`change` like '%OK%'
You must decide what you mean by newest date: the single latest? yesterday? today?
if yesterday, then add this to your query clause
select * from mytable where date >= date_sub(now(), interval 1 day)
if you are using oracle database you can use max() function to extract newest date from the table, further to check with the table for this newest date :-
SELECT COUNT(leadID) as accepted FROM leads WHERE change like '%OK%'
and date_col = (select max(date_col) from leads)
I am assuming that with newest date your mean is about newest in the table data..
changes :- as per changes in question and as per mentioned in commends ..
I think you want to take newest date among the records having "change" column value like '%OK%' and want to count distinct leadId
please try the following query-
SELECT COUNT(distinct leadID) as accepted FROM leads WHERE change like '%OK%'
and date_col = (select max(date_col) from leads WHERE change like '%OK%')
You can try (in case your date is a int like return by time() function)
$sql = "SELECT COUNT(leadID) as accepted FROM leads WHERE change like '%OK%' ORDER BY Date DESC LIMIT 1"
You will only extract the newest entry.
Edit: This shouldalso works for your date format YYYY-MM-DD hh:mm:ss
Edit 2: Okay, I did not understood your question.
You have a table lead: leadid date
You want to count the number of row for the newset date.
Like another pointed out you can use the MAX operator:
SELECT COUNT(distinct leadid)
FROM LEAD AS l,
( SELECT MAX(Date) mdate FROM Lead ) AS MaxDate
WHERE l.date = MaxDate.mdate
AND l.change like '%OK%'