How to select and select count in one query in mysql? - mysql

I have this table:
Activity Date
reading 12-10-2014
watching 12-10-2014
reading 13-10-2014
reading 12-10-2014
watching 13-10-2014
What I want to do is selecting the activity and count the activity number based on date, I want the output will be like this (with condition: where date ='12-10-2014'):
Activity count date
reading 2 12-10-2014
watching 1 12-10-2014
How can I do that?, help me please. Thanks.

SELECT Activity,COUNT(1) `count`,date
FROM mytable WHERE date='12-10-2014'
GROUP BY date,Activity;
or
SELECT Activity,COUNT(1) `count`,date
FROM mytable WHERE date='2014-10-12'
GROUP BY date,Activity;
Make sure you table has an index on date and Activity
ALTER TABLE mytable ADD INDEX date_Activity_ndx (date,Activity);

This should work:
select activity, count(activity) as count, date from my_table
where date = '12-10-2014' group by activity;
(not sure about your column labels--you may have to adjust for capitalization)

Related

Optomizing a simple query with 70mil rows to fit into Tableau

Noobie to SQL. I have a simple query here that is 70 million rows, and my work laptop will not handle the capacity when I import it into Tableau. Usually 20 million rows and less seem to work fine. Here's my problem.
Table name: Table1
Fields: UniqueID, State, Date, claim_type
Query:
SELECT uniqueID, states, claim_type, date
FROM table1
WHERE date >= '11-09-2021'
This gives me what I want, BUT, I can limit the query significantly if I count the number of uniqueIDs that have been used in 3 or more different states. I use this query to do that.
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '11-09-2021'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
The only issue is, when I put this query into Tableau it only displays the FIRST state a unique_id showed up in, and the first date it showed up. A unique_id shows up in multiple states over multiple dates, so when I use this count aggregation it's only giving me the first result and not the whole picture.
Any ideas here? I am totally lost and spent a whole business day trying to fix this
Expected output would be something like
uniqueID | state | claim type | Date
123 Ohio C 01-01-2021
123 Nebraska I 02-08-2021
123 Georgia D 03-08-2021
If your table is only of those four columns, and your queries are based on date ranges, your index must exist to help optimize that. If 70 mil records exist, how far back does that go... Years? If your data since 2021-09-11 is only say... 30k records, that should be all you are blowing through for your results.
I would ensure you have the index based on (and in this order)
(date, uniqueId, claim_type, states). Also, you mentioned you wanted a count of 3 OR MORE, your query > 3 will results in 4 or more unless you change to count(*) >= 3.
Then, to get the entries you care about, you need
SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3
This would give just the 3-part qualifier for date/id/claim that HAD them. Then you would use THIS result set to get the other entries via
select distinct
date, uniqueID, claim_type, states
from
( SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3 ) PQ
JOIN Table1 t1
on PQ.date = t1.date
and PQ.UniqueID = t1.UniqueID
and PQ.Claim_Type = t1.Claim_Type
The "PQ" (preQuery) gets the qualified records. Then it joins back to the original table and grabs all records that qualified from the unique date/id/claim_type and returns all the states.
Yes, you are grouping rows, so therefore you 'loose' information on the grouped result.
You won't get 70m records with your grouped query.
Why don't you split your imports in smaller chunks? Like limit the rows to chunks of, say 15m:
1st:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000;
2nd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 15000000;
3rd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 30000000;
and so on..
I know its not a perfect or very handy solution but maybe it gets you to the desired outcome.
See this link for infos about LIMIT and OFFSET
https://www.bitdegree.org/learn/mysql-limit-offset
It is wise in the long run to use DATE datatype. That requires dates to look like '2021-09-11, not '09-11-2021'. That will let > correctly compare dates that are in two different years.
If your data is coming from some source that formats it '11-09-2021', use STR_TO_DATE() to convert as it goes in; You can reconstruct that format on output via DATE_FORMAT().
Once you have done that, we can talk about optimizing
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '2021-09-11'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
Tentatively I recommend this composite index speed up the query:
INDEX(Unique_id, claim_type, date, states)
That will also help with your other query.
(I as assuming the ambiguous '11-09-2021' is DD-MM-YYYY.)

How do I SELECT a MySQL Table value that has not been updated on a given date?

I have a MySQL database named mydb in which I store daily share prices for
423 companies in a table named data. Table data has the following columns:
`epic`, `date`, `open`, `high`, `low`, `close`, `volume`
epic and date being primary key pairs.
I update the data table each day using a csv file which would normally have 423 rows
of data all having the same date. However, on some days prices may not available
for all 423 companies and data for a particular epic and date pair will
not be updated. In order to determine the missing pair I have resorted
to comparing a full list of epics against the incomplete list of epics using
two simple SELECT queries with different dates and then using a file comparator, thus
revealing the missing epic(s). This is not a very satisfactory solution and so far
I have not been able to construct a query that would identify any epics that
have not been updated for any particular day.
SELECT `epic`, `date` FROM `data`
WHERE `date` IN ('2019-05-07', '2019-05-08')
ORDER BY `epic`, `date`;
Produces pairs of values:
`epic` `date`
"3IN" "2019-05-07"
"3IN" "2019-05-08"
"888" "2019-05-07"
"888" "2019-05-08"
"AA." "2019-05-07"
"AAL" "2019-05-07"
"AAL" "2019-05-08"
Where in this case AA. has not been updated on 2019-05-08. The problem with this is that it is not easy to spot a value that is not a pair.
Any help with this problem would be greatly appreciated.
You could do a COUNT on epic, with a GROUP BY epic for items in that date range and see if you get any with a COUNT less than 2, then select from this result where UpdateCount is less than 2, forgive me if the syntax on the column names is not correct, I work in SQL Server, but the logic for the query should still work for you.
SELECT x.epic
FROM
(
SELECT COUNT(*) AS UpdateCount, epic
FROM data
WHERE date IN ('2019-05-07', '2019-05-08')
GROUP BY epic
) AS x
WHERE x.UpdateCount < 2
Assuming you only want to check the last date uploaded, the following will return every item not updated on 2019-05-08:
SELECT last_updated.epic, last_updated.date
FROM (
SELECT epic , max(`date`) AS date FROM `data`
GROUP BY 'epic'
) AS last_updated
WHERE 'date' <> '2019-05-08'
ORDER BY 'epic'
;
or for any upload date, the following will compare against the entire database, so you don't rely on '2019-08-07' having every epic row. I.e. if the epic has been in the database before then it will show if not updated:
SELECT d.epic, max(d.date)
FROM data as d
WHERE d.epic NOT IN (
SELECT d2.epic
FROM data as d2
WHERE d2.date = '2019-05-08'
)
GROUP BY d.epic
ORDER BY d.epic

Get amount of active user of the last n days grouped by date

Suppose I have a Hive table logins with the following columns:
user_id | login_timestamp
I'm now interested in getting some activity KPIs. For instance, daily active user:
SELECT
to_date(login_timestamp) as date,
COUNT(DISTINCT user_id) daily_active_user
FROM
logins
GROUP BY to_date(login_timestamp)
ORDER BY date asc
Changing it from daily active to weekly/monthly active is not a great deal because I can just exchange the to_date() function to get the month and then group by that value.
What I now want to get is the distinct amount of user who were active in the last n days (e.g. 3) grouped by date. Additionally, what I'm looking for is a solution that works for a variable time window and not only for one day (getting the amount of active user of the last 3 days on day x only would be easy).
The result is supposed to like somewhat like this:
date, 3d_active_user
2017-12-01, 111
2017-12-02, 234
2017-12-03, 254
2017-12-04, 100
2017-12-05, 103
2017-12-06, 103
2017-12-07, 230
Using a subquery in the first select (e.g. select x, (select max(x) from x) as y from z) building a workaround for the moving time window is not possible because it is not supported by the Hive version I'm using.
I tried my luck something like COUNT(DISTINCT IF(DATEDIFF(today,login_date)<=3,user_id,null)) but everything I tried so far is not working.
Do you have any idea on how to solve this issue?
Any help appreciated!
You can user "BETWEEN" function.
If you want to find the active users, log in from the particular date to till now.
SELECT to_date(login_timestamp) as date,COUNT(DISTINCT user_id) daily_active_user
FROM logins
WHERE login_timestamp BETWEEN startDate_timeStamp AND now()
GROUP BY to_date(login_timestamp)
ORDER BY date asc
If you want the active users, who are log in users for specific date range then:
NOTE:-
SELECT to_date(login_timestamp) as date,COUNT(DISTINCT user_id) daily_active_user
FROM logins
WHERE login_timestamp BETWEEN to_date(startDate_timeStamp) AND to_date(endDate_timeStamp)
GROUP BY to_date(login_timestamp)
ORDER BY date asc

MySQL get date of record where count was achieved

This honestly sounds like a job for a function in MySql but I'm wondering if there's a way to make a query that selects the date of the record that achieves the count = x
Setup: 1000 records each having the same qualifying conditions.. lets say user_id and visit information and a create_date
Desired Query result: Select the date of the 100th visit
SELECT create_date
FROM user_visits
HAVING COUNT(id) = 100;
You can use order by on your auto_increment column and limit 99,1 to pick 100th visit
SELECT create_date
FROM user_visits
ORDER BY your_auto_increment_column
LIMIT 99,1

how to count records with newest date only

How do I modify this MySQL query to only count leadIDs from table leads where column 'Date' contains the newest (youngest) date?
SELECT COUNT(leadID) as accepted FROM leads WHERE change like '%OK%'
The problem is that leadID can have multiple instances in table leads. The original query result is "4" because of one duplicate. The correct result is "3".
The date is stored in this format: 2011-10-26 18:23:52. The result should take hours and minutes into consideration when determining the youngest date.
TABLE leads:
leadID | date | change
1 | 2011-10-26 18:23:52 | BAD
1 | 2011-10-26 17:00:00 | OK
2 | 2011-10-26 19:23:52 | OK
3 | 2011-10-26 20:23:52 | OK
4 | 2011-10-26 21:23:52 | OK
5 | 2011-10-26 22:23:52 | BAD
I think this is what you're looking for:
select count(distinct l1.leadId) as accepted from leads l1
left join leads l2
on l1.leadId = l2.leadId and l1.date < l2.date
where l2.date is null and l1.`change` like '%OK%'
You must decide what you mean by newest date: the single latest? yesterday? today?
if yesterday, then add this to your query clause
select * from mytable where date >= date_sub(now(), interval 1 day)
if you are using oracle database you can use max() function to extract newest date from the table, further to check with the table for this newest date :-
SELECT COUNT(leadID) as accepted FROM leads WHERE change like '%OK%'
and date_col = (select max(date_col) from leads)
I am assuming that with newest date your mean is about newest in the table data..
changes :- as per changes in question and as per mentioned in commends ..
I think you want to take newest date among the records having "change" column value like '%OK%' and want to count distinct leadId
please try the following query-
SELECT COUNT(distinct leadID) as accepted FROM leads WHERE change like '%OK%'
and date_col = (select max(date_col) from leads WHERE change like '%OK%')
You can try (in case your date is a int like return by time() function)
$sql = "SELECT COUNT(leadID) as accepted FROM leads WHERE change like '%OK%' ORDER BY Date DESC LIMIT 1"
You will only extract the newest entry.
Edit: This shouldalso works for your date format YYYY-MM-DD hh:mm:ss
Edit 2: Okay, I did not understood your question.
You have a table lead: leadid date
You want to count the number of row for the newset date.
Like another pointed out you can use the MAX operator:
SELECT COUNT(distinct leadid)
FROM LEAD AS l,
( SELECT MAX(Date) mdate FROM Lead ) AS MaxDate
WHERE l.date = MaxDate.mdate
AND l.change like '%OK%'