SQL query not finding correct answer - mysql

I am trying to figure out website visits. Every visit within 30 minutes should count as one visit for that user.
My table looks like this
TimeUser, Userid, OrderID
10/7/2013 14:37:14 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/7/2013 14:39:59 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/7/2013 14:40:35 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/11/2013 0:09:23 _2MrGz4L_d5AF3UHpP-oJQ 151078706
10/2/2013 20:55:05 _4Pb2wEwiQomUny_XwVuvQ 151078706
10/2/2013 20:55:06 _4Pb2wEwiQomUny_XwVuvQ 151078706
10/2/2013 20:55:06 _4Pb2wEwiQomUny_XwVuvQ 151078706
In this case 151078706 should return 3 visits.
I think my SQL query looks right, but when I check my answer with my Excel created Visits number, some of orders off by 5%. I am hundred percent sure Excel numbers are correct.
Here is what I have so far. If anyone sees any issue with my query please correct me. And also if there any other better ways to find visits?
SET #row_num=0,
#temp_row=1;
SELECT orderidtable.orders,
count(orderidtable.users)
FROM
(SELECT temptab.temprow,
temptab.userid users,
temptab.orderid orders,
temptab.TimeUser
FROM
(SELECT #row_num := #row_num + 1 AS rownumber, TimeUser,
userid,
orderid
FROM order.order_dec
ORDER BY orderid,
userid,
timeuser) subtable ,
(SELECT #temp_row:= #temp_row+1 AS temprow, Timeuser,
userid,
orderid
FROM
ORDER.order_dec
ORDER BY orderid,
userid,
timeuser) temptab
WHERE (subtable.rownumber=temptab.temprow
AND abs(Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))>=1800)
OR (subtable.rownumber=temptab.temprow
AND subtable.userid<>temptab.userid)
OR (subtable.rownumber=temptab.temprow
AND subtable.orderid<>temptab.orderid)) orderidtable
GROUP BY orderidtable.orders

Numbering the rows is a right strategy; your query is going wrong in where condition.
Algorithm to solve it would be:
Number the rows ordering by orderid, userid, timeuser. Make two copies (subtable and temptable) of this dataset as you are already doing.
Join these tables on following condition:
subtable.rownumber =temptab.temprow + 1
What we trying to do here is to join the tables in a manner such that a row of subtable joins with a row of temptable with rownumber 1 lesser than its own. We are doing it to be capable of comparing consecutive time of visits of an user to an Ad. (You have already done it by setting #row_num=0, #temp_row=1). This is the only condition we should apply to the JOIN.
Now in the SELECT statement use CASE statement like below
(CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 1800 THEN 0
ELSE 1) As IsVisit
Now in an outer query GROUP BY order_id and in SELECT sum up IsVisit.
Let me know should you need more clarity or let me know if it worked.
Addendum:
From the previous query you can try replacing the where condition as subtable.rownumber = temptab.temprow + 4 and in SELECT statement replace the CASE statement of above query with the following:
(CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 900 THEN 1
ELSE 0) As IsVisit
Take UNION of the result set returned by previous query and this one, and then apply GROUP BY.

One issue I see: Your query is overly complex.
What about this?
Now then, both your original and this query will err when there's a visit near midnight, and another visit right shortly after it - in this case, both queries will count them as 2 visits when they really should be counted as one, if I understood your request correctly. From this simplified query, though, it should be easy for you to do the required change.
SELECT orderidtable.OrderID, COUNT(orderidtable.UserID) visits
FROM (
SELECT Timeuser, Userid, OrderID
FROM order.order_dec SubTab1
WHERE NOT EXISTS (
SELECT 1 FROM order.order_dec SubTab2
WHERE SubTab1.OrderID = SubTab2.OrderID
AND SubTab2.TimeUser > SubTab2.TimeUser
AND Time_To_Sec(SubTab2.TimeUser)
BETWEEN Time_To_Sec(SubTab1.OrderID)
AND Time_To_Sec(SubTab1.OrderID)+1800
)
) orderidtable
GROUP BY orderidtable.OrderID

I think just one time table full scan is sufficient for what you want as follows.
You can test here. http://www.sqlfiddle.com/#!2/a5dbcd/1.
Although my Query is not tested on many sample data, I think minor change is needed if it has bugs.
SELECT MAX(current_uv) AS uv
FROM (
SELECT orderid, userid, timeuser,
IF(orderid != #prev_orderid, #prev_timeuser := 0, #prev_timeuser) AS prev_timeuser,
#prev_orderid := orderid AS prev_orderid,
IF(userid != #prev_userid, #prev_timeuser := 0, #prev_timeuser) AS prev_timeuser2,
#prev_userid := userid AS prev_userid,
IF(TO_SECONDS(timeuser) - #prev_timeuser > 1800, #current_uv := #current_uv + 1, #current_uv) AS current_uv,
#prev_timeuser := TO_SECONDS(timeuser) AS prev_timeuser3
FROM order_dec,
(SELECT #prev_orderid := 0, #prev_userid = '', #prev_timeuser := 0, #current_uv := 0) init
ORDER BY orderid, userid, timeuser
) x;

Related

Trying to use multiple AS, but I'm getting: Subquery returns more than 1 row

I'm using 3 tables from my database which I read data for my rank (top15) table. I'm trying to fill one 'tr' with only one query (using multiple Aliases), but I'm stuck here:
My last try was:
SELECT DISTINCT(mapname),
(SELECT his_time FROM primekz_records
WHERE primekz_records.id=$player_id AND his_aa = 10 AND tp > 0) AS nub10,
(SELECT his_time FROM primekz_records
WHERE primekz_records.id=$player_id AND his_aa = 10 AND tp = 0) AS pro10
FROM primekz_records
JOIN primekz_players ON primekz_records.id=primekz_players.id
JOIN primekz_maps ON primekz_maps.mid=primekz_records.mid
WHERE primekz_players.id=$player_id
Tables are structured:
primekz_players( id, steamid, name ...)
primekz_maps( mid, mapname )
primekz_records( id, mid, his_time, his_aa, tp, ... ) <-- this means one ID(player) can be max 4 times for one mid (map), variations are: his_aa (10/100), tp (0/more)
If I try with only one alias I get this result, which is totally wrong (see Noob100 column).
https://i.snag.gy/tHpUK8.jpg
Does it have something to do with ROW_NUMBER() + 4x AS ?

SQL select from date ranges

I have a table of RADIUS session records that includes start time, stop time, and MAC address. I have a requirement to collect a list of users that were online during two time ranges. I believe I'm getting a list of all users online during the time ranges with the following query:
SELECT s_session_id, s_start_time, s_stop_time, s_calling_station_id
FROM sessions
WHERE (
("2015-10-01 08:00:00" BETWEEN s_start_time AND s_stop_time OR "2015-10-01 08:30:00" BETWEEN s_start_time AND s_stop_time)
OR
("2015-10-01 12:00:00" BETWEEN s_start_time AND s_stop_time OR "2015-10-01 12:30:00" BETWEEN s_start_time AND s_stop_time)
)
ORDER BY s_start_time;
But the next step, isolating details for only those users online during both periods, is eluding me. The closest I get is adding
GROUP BY s_calling_station_id HAVING COUNT(s_calling_station_id) > 1
but that doesn't provide me with all the session details.
Fiddle is here: http://sqlfiddle.com/#!9/1df471/1
Thanks for any assistance!
Use a self-join. Use column aliases so you can access the columns from each session with different names.
SELECT s1.s_calling_station_id,
s1.s_session_id AS s1_session_id, s1.s_start_time AS s1_start_time, s1.s_stop_time AS s1_stop_time,
s2.s_session_id AS s2_session_id, s2.s_start_time AS s2_start_time, s2.s_stop_time AS s2_stop_time
FROM sessions AS s1
JOIN sessions AS s2
ON s1.s_calling_station_id = s2.s_calling_station_id
AND s1.s_session_id != s2.s_session_id
WHERE ("2015-10-01 08:00:00" BETWEEN s1.s_start_time AND s1.s_stop_time OR "2015-10-01 08:30:00" BETWEEN s1.s_start_time AND s1.s_stop_time)
AND
("2015-10-01 12:00:00" BETWEEN s2.s_start_time AND s2.s_stop_time OR "2015-10-01 12:30:00" BETWEEN s2.s_start_time AND s2.s_stop_time)
DEMO
Although this question already has an accepted answer, I'd like to add this one (it avoids duplicates and pulls the data from the sessions table of all sessions that fulfill the condition):
First, create a table that holds the filtered data (the MAC addresses that have connections on both intervals:
create table temp_sessions
select s1.s_calling_station_id
, if(#t1_1 between s1.s_start_time and s1.s_stop_time or #t1_2 between s1.s_start_time and s1.s_stop_time, s1.s_session_id, null) as s_1
, if(#t2_1 between s2.s_start_time and s2.s_stop_time or #t2_2 between s2.s_start_time and s2.s_stop_time, s2.s_session_id, null) as s_2
from -- I use user variables because it will make easier to modify the time intervals if needed
(select #t1_1 := '2015-10-01 08:00:00', #t1_2 := '2015-10-01 08:30:00'
, #t2_1 := '2015-10-01 12:00:00', #t2_2 := '2015-10-01 12:30:00') as init
, sessions as s1
inner join sessions as s2
on s1.s_calling_station_id = s2.s_calling_station_id
and s1.s_session_id != s2.s_session_id
having s_1 is not null and s_2 is not null;
And now, simply use this table to get what you need:
select sessions.*
from sessions
inner join (
select s_calling_station_id, s_1 as s_session_id
from temp_sessions
union
select s_calling_station_id, s_2 as s_session_id
from temp_sessions
) as a using (s_calling_station_id, s_session_id);
Here's the SQL fiddle

query optimization for mysql

I have the following query which takes about 28 seconds on my machine. I would like to optimize it and know if there is any way to make it faster by creating some indexes.
select rr1.person_id as person_id, rr1.t1_value, rr2.t0_value
from (select r1.person_id, avg(r1.avg_normalized_value1) as t1_value
from (select ma1.person_id, mn1.store_name, avg(mn1.normalized_value) as avg_normalized_value1
from matrix_report1 ma1, matrix_normalized_notes mn1
where ma1.final_value = 1
and (mn1.normalized_value != 0.2
and mn1.normalized_value != 0.0 )
and ma1.user_id = mn1.user_id
and ma1.request_id = mn1.request_id
and ma1.request_id = 4 group by ma1.person_id, mn1.store_name) r1
group by r1.person_id) rr1
,(select r2.person_id, avg(r2.avg_normalized_value) as t0_value
from (select ma.person_id, mn.store_name, avg(mn.normalized_value) as avg_normalized_value
from matrix_report1 ma, matrix_normalized_notes mn
where ma.final_value = 0 and (mn.normalized_value != 0.2 and mn.normalized_value != 0.0 )
and ma.user_id = mn.user_id
and ma.request_id = mn.request_id
and ma.request_id = 4
group by ma.person_id, mn.store_name) r2
group by r2.person_id) rr2
where rr1.person_id = rr2.person_id
Basically, it aggregates data depending on the request_id and final_value (0 or 1). Is there a way to simplify it for optimization? And it would be nice to know which columns should be indexed. I created an index on user_id and request_id, but it doesn't help much.
There are about 4907424 rows on matrix_report1 and 335740 rows on matrix_normalized_notes table. These tables will grow as we have more requests.
First, the others are right about knowing better how to format your samples. Also, trying to explain in plain language what you are trying to do is also a benefit. With sample data and sample result expectations is even better.
However, that said, I think it can be significantly simplified. Your queries are almost completely identical with the exception of the one field of "final_value" = 1 or 0 respectively. Since each query will result in 1 record per "person_id", you can just do the average based on a CASE/WHEN AND remove the rest.
To help optimize the query, your matrix_report1 table should have an index on ( request_id, final_value, user_id ). Your matrix_normalized_notes table should have an index on ( request_id, user_id, store_name, normalized_value ).
Since your outer query is doing the average based on an per stores averages, you do need to keep it nested. The following should help.
SELECT
r1.person_id,
avg(r1.ANV1) as t1_value,
avg(r1.ANV0) as t0_value
from
( select
ma1.person_id,
mn1.store_name,
avg( case when ma1.final_value = 1
then mn1.normalized_value end ) as ANV1,
avg( case when ma1.final_value = 0
then mn1.normalized_value end ) as ANV0
from
matrix_report1 ma1
JOIN matrix_normalized_notes mn1
ON ma1.request_id = mn1.request_id
AND ma1.user_id = mn1.user_id
AND NOT mn1.normalized_value in ( 0.0, 0.2 )
where
ma1.request_id = 4
AND ma1.final_Value in ( 0, 1 )
group by
ma1.person_id,
mn1.store_name) r1
group by
r1.person_id
Notice the inner query is pulling all transactions for the final value as either a zero OR one. But then, the AVG is based on a case/when of the respective value for the normalized value. When the condition is NOT the 1 or 0 respectively, the result is NULL and is thus not considered when the average is computed.
So at this point, it is grouped on a per-person basis already with each store and Avg1 and Avg0 already set. Now, roll these values up directly per person regardless of the store. Again, NULL values should not be considered as part of the average computation. So, if Store "A" doesn't have a value in the Avg1, it should not skew the results. Similarly if Store "B" doesnt have a value in Avg0 result.

MYSQL retrieve data dependent on rows returned

I am working on a mysql query that will filter out certain occurrences dependent on how many rows are returned.
I am trying to filter out any support categories when the number of rows returned are 1, however leave the support category in when the result set turned is more than 1.
I originally had this idea however it seems as if it will not work.
SELECT stockmaster.description, SUM(salesorderdetails.quantity), stockmaster.categoryid as qty
FROM salesorderdetails, stockmaster
where salesorderdetails.stkcode=stockmaster.stockid
and orderno='5222'
group by stockmaster.description
HAVING CASE WHEN stockmaster.categoryid = 'S&M' THEN COUNT(*) >= 2 ELSE COUNT(*) = 1 END
Any help will be gratefully accepted.
Try this
SELECT *
FROM
(
SELECT stockmaster.description,
SUM(salesorderdetails.quantity),
stockmaster.categoryid as qty ,
COUNT(*) AS count
FROM salesorderdetails, stockmaster
where salesorderdetails.stkcode=stockmaster.stockid
and orderno='5222'
group by stockmaster.description
HAVING CASE WHEN stockmaster.categoryid = 'S&M'
) MAIN_DATA
WHERE MAIN_DATA.count >1

How can I optimize my query (rank query)?

For the last two days, I have been asking questions on rank queries in Mysql. So far, I have working queries for
query all the rows from a table and order by their rank.
query ONLY one row with its rank
Here is a link for my question from last night
How to get a row rank?
As you might notice, btilly's query is pretty fast.
Here is a query for getting ONLY one row with its rank that I made based on btilly's query.
set #points = -1;
set #num = 0;
select * from (
SELECT id
, points
, #num := if(#points = points, #num, #num + 1) as point_rank
, #points := points as dummy
FROM points
ORDER BY points desc, id asc
) as test where test.id = 3
the above query is using subquery..so..I am worrying about the performance.
are there any other faster queries that I can use?
Table points
id points
1 50
2 50
3 40
4 30
5 30
6 20
Don't get into a panic about subqueries. Subqueries aren't always slow - only in some situations. The problem with your query is that it requires a full scan.
Here's an alternative that should be faster:
SELECT COUNT(DISTINCT points) + 1
FROM points
WHERE points > (SELECT points FROM points WHERE id = 3)
Add an index on id (I'm guessing that you probably you want a primary key here) and another index on points to make this query perform efficiently.