How can this SQL query be more efficient? - mysql

mysql 5.7
linux
The query below takes about 210 seconds on 9000 records. Not really desirable performance.
The data table has these fields:
login_attempt_id integer
user_id integer
login_attempt_data datetime
login_attempt_ip string`
I wish to query the data to find the number of IPs that have failed login. For example:
109 119.27.191.202
93 118.25.146.128
83 132.232.31.117
81 132.232.160.234
The query:
select count(t0.login_attempt_ip) as `ip_count`, t0.login_attempt_ip
from sohne_sma_v4.wp_login_fails t0
where t0.login_attempt_ip in
(select distinct t1.login_attempt_ip from sohne_sma_v4.wp_login_fails t1
where 20 <
(select count(t2.login_attempt_ip) from sohne_sma_v4.wp_login_fails t2
where t2.login_attempt_ip like t1.login_attempt_ip
)
)
and datediff(now(), t0.login_attempt_date) < 15
group by t0.login_attempt_ip
order by ip_count desc;
I can guess the time is spent in the two inner queries.
What is a better way to achieve this query?

You dont really need all these subqueries.
You can just user GROUP BY...HAVING to keep grouped items having count more than 20.
Something like this should work
https://www.db-fiddle.com/f/vCMPWJaRxeSPVVDNSPVhcD/0
SELECT COUNT(t.login_attempt_id) AS ip_count,t.login_attempt_ip FROM sohne_sma_v4.wp_login_fails t
WHERE datediff(now(), t.login_attempt_date) < 15
GROUP BY t.login_attempt_ip HAVING (ip_count> 20 )
ORDER BY ip_count DESC;

Related

Mysql: Get records from last date

I want to get all records which are not "older" than 20 days. If there are no records within 20 days, I want all records from the most recent day. I'm doing this:
SELECT COUNT(DISTINCT t.id) FROM t
WHERE
(DATEDIFF(NOW(), t.created) <= 20
OR
(date(t.created) >= (SELECT max(date(created)) FROM t)));
This works so far, but it is awful slow. created is a datetime, might be due tue the conversion to a date... Any ideas how to speed this up?
SELECT COUNT(*) FROM (
SELECT * FROM t WHERE datediff(now(),created) between 0 and 20
UNION
SELECT * FROM (SELECT * FROM t WHERE created<now() LIMIT 1) last1
) last20d
I used the between clause just in case there might be dates in the future in the table. These will be excluded. Also you can simplify the select, if you just need the count() to
SELECT COUNT(*) FROM (
SELECT id FROM t WHERE datediff(now(),created) between 0 and 20
UNION
SELECT id FROM (SELECT id FROM t WHERE created<now() LIMIT 1) last1
) last20d
otherwise, in the first select version you can leave out the outer select if you want all the data of the chosen records. The UNION will make sure that duplicates will be excluded (in other cases I always use UNION ALL since it is faster).

MySQL query using count() and if/case

I need to find the count of users who have filled the questionnaire on that day.
These are the total counts , i need counts per day. Here is the picture of database: http://www.upload.ee/image/3800828/pildike.png
SELECT DISTINCT USER_ID, COUNT(ANSWER_TIME) AS ARV FROM RESULT WHERE ANSWER_TIME IS NOT NULL GROUP BY USER_ID ORDER BY ARV DESC;
For example this gives me:
32 2142
143 1098
26 979
76 878
But i need like distinct rows, answer_day_of_week is from 1 to 7, depending on day and answer_time is TIMESTAMP. The statistics is been for 97 days and for example person 32 has filled it 2000+ times in 97 days, but i only need to count them once...
I thought to use if-elseif or while or case or some sort of subquery ? I have tried some queries but i always fail...
For example for one day i can have 15 rows from one person on 1. oct but have 0 rows on 2. oct , then it gives answer that he has filled the survey only once.
SELECT USER_ID, COUNT(ARV) AS ARV
FROM
(
SELECT DISTINCT USER_ID, DATE(ANSWER_TIME) AS ARV FROM RESULT WHERE ANSWER_TIME IS NOT NULL
) A
GROUP BY USER_ID ORDER BY ARV DESC;
Please try the above query
Just change your sql query like this:
SELECT DISTINCT USER_ID, COUNT(ANSWER_TIME) AS ARV FROM RESULT WHERE DATE(ANSWER_TIME) >= DATE_SUB(CURDATE(), INTERVAL '97 DAYS') GROUP BY USER_ID ORDER BY ARV DESC;

MySQL: Getting average or sum from 200,000 rows

I would like to get the average or at least the sum of 200,000 rows from mySQL database. This is how I am querying the database but the amount is too large for me to query because I cannot afford to overload the server.
SELECT user_id, total_email FROM email_users
WHERE email_code = 1
LIMIT 200000
SELECT SUM(total_email), AVG(total_email) FROM email_users
WHERE user_id IN
(
01, 02,..., 200000-th user_id
)
My question is there a way to somehow combine the two queries into one so that I can get just the sum or average of 200,000 email_users which has email_code = 1.
EDIT: Thanks to all that have answered. I didn't realise the answer was so easy - nested select statement.
You can do this with a subquery:
SELECT SUM(total_email), AVG(total_email)
from (SELECT eu.*
FROM email_users eu
WHERE eu.email_code = 1
LIMIT 200000
) eu
Some notes. First, using limit without an order by gives indeterminate results. You could (in theory) run this query twice and get different results. Second, this assumes that there is a field called total_email in email_users.
SELECT SUM(total_email), AVG(total_email)
FROM (SELECT total_email
FROM email_users
WHERE email_code = 1
LIMIT 200000) x
How about something like this assuming you just want any 200K records from the DB where email_code=1
SELECT SUM(total_email), AVG(total_email) FROM email_users
WHERE user_id IN
(
SELECT user_id
FROM email_users
WHERE email_code = 1 LIMIT 200000
)
or
SELECT SUM(total_email), AVG(total_email) FROM
(SELECT user_id , total_email
FROM email_users
WHERE email_code = 1 LIMIT 200000)

Why is my SQL so slow?

My table is reasonably small around 50,000 rows. My schema is as follows:
DAILY
match_id
user_id
result
round
tournament_id
Query:
SELECT user_id
FROM `daily`
WHERE user_id IN (SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND (result = 'Won' OR result = 'Lost'))
Using the in keyword in the fashion you are is a very dangerous [from a performance perspective] thing to do. It will result in the sub query [(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost'))] being ran 50,000 times in this case.
You'll want to convert this onto a join something to the effect of
select user_id from daily a join
(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost')) b on a.user_id = b.user_id
Doing something similar to this will result in only two queries and a join.
As Cybernate pointed out in your specific example you can simply use where clauses, but I went ahead and suggested this in case your query is actually more complex than what you posted.
First verify and add Indexes as suggested earlier.
Also why are you using an in if you are querying data from same table.
Change your query to:
SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND ( result = 'Won'
OR result = 'Lost' )
Your query only needs to be:
SELECT d.user_id
FROM DAILY d
WHERE d.round > 25
AND d.tournament_id = 24
AND d.result IN ('Won', 'Lost')
Indexes should be considered on:
DAILY.round
DAILY.tournament_id
DAILY.result
This should return in a millisecond.
SELECT user_id FROM daily WITH(NOLOCK)
where user_id in (select user_id from daily WITH(NOLOCK) where round > 25 and tournament_id = 24 and (result = 'Won' or result = 'Lost'))
Then make sure there is an index on the filter columns.
CREATE NONCLUSTERED INDEX IX_1 ON daily (round ASC, tournament_id ASC, result ASC)

How to use query results in another query?

I am trying to write a query which will give me the last entry of each month in a table called transactions. I believe I am halfway there as I have the following query which groups all the entries by month then selects the highest id in each group which is the last entry for each month.
SELECT max(id),
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm
Gives the correct results
id yyyymm
100 201006
105 201007
111 201008
118 201009
120 201010
I don’t know how to then run a query on the same table but select the balance column where it matches the id from the first query to give results
id balance date
120 10000 2010-10-08
118 11000 2010-09-29
I've tried subqueries and looked at joins but i'm not sure how to go about using them.
You can make your first select an inline view, and then join to it. Something like this (not tested, but should give you the idea):
SELECT x.id
, t.balance
, t.date
FROM your_table t
/* here, we make your select an inline view, then we can join to it */
, (SELECT max(id) id,
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm) x
WHERE t.id = x.id