SQL Server 2008 Optimize FULL JOIN with ISNULL statements - sql-server-2008

HI All
I was hoping someone could help me improve a query I have to run periodically. At the moment it takes more than 40 minutes to execute. It uses the full allocated memory during this time, but CPU usage mostly meanders at 2% - 5%, every now and then jumping to 40% for a few seconds.
I have this table (simplified example):
CREATE TABLE [dbo].[dataTable]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[dteEffectiveDate] [date] NULL,
[dtePrevious] [date] NULL,
[dteNext] [date] NULL,
[Age] [int] NULL,
[Count] [int] NULL
) ON [PRIMARY]
GO
Here are some input values:
INSERT INTO [YourDB].[dbo].[dataTable]
([dteEffectiveDate]
,[dtePrevious]
,[dteNext]
,[Age]
,[Count])
VALUES
('2009-01-01',NULL,'2010-01-01',40,300),
('2010-01-01','2009-01-01', NULL,40,200),
('2009-01-01',NULL, '2010-01-01',20,100),
('2010-01-01','2009-01-01', NULL,20,50),
('2009-01-01',NULL,'2010-01-01',30,10)
GO
Each entry has a dteEffectiveDate field. In addition, each has a dtePrevious and dteNext, which reflects the dates of the nearest previous/next effective date. Now what I want is a query that will calculate the mid-value on the Count fields between successive periods, within a specific age.
So for example, in the data above, for age 40 we have 300 at 2009/01/01 and 200 at 2010/01/01 so the query should produce 250.
Note that age 30 has only one entry, 10. This is at 2009/01/01. There is no entry at 2010/01/01, but we know that data was captured at this point, so the fact that there is nothing means that 30 is 0 at this date. Hence the query should produce 5.
In order to achieve this I use a FULL JOIN of the table on itself, and use ISNULL to select values. Here is my code:
SELECT
ISNULL(T1.dteEffectiveDate,T2.dtePrevious) as [Start Date]
,ISNULL(T1.dteNext,T2.dteEffectiveDate) as [End Date]
,ISNULL(T1.Age,T2.Age) as Age
,ISNULL(T1.[Count],0) as [Count Start]
,ISNULL(T2.[Count],0) as [Count End]
,(ISNULL(T1.[Count],0)+ISNULL(T2.[Count],0))/2 as [Mid Count]
FROM
[ExpDBClient].[dbo].[dataTable] as T1
FULL JOIN [ExpDBClient].[dbo].[dataTable] as T2
ON
T2.dteEffectiveDate = T1.dteNext
AND T2.Age = T1.Age
WHERE ISNULL(T1.dteEffectiveDate,T2.dtePrevious) is not null
AND ISNULL(T1.dteNext,T2.dteEffectiveDate) is not null
GO
which outputs:
Start Date End Date Age Count Start Count End Mid Lives
2009-01-01 2010-01-01 40 300 200 250
2009-01-01 2010-01-01 20 100 50 75
2009-01-01 2010-01-01 30 10 0 5
It works perfectly, but when I run this on the actual data, which is about 7m records, it takes painfully long to execute.
Does anyone have any suggestions?
Thanks
Karl

It's hard to make a lot of recommendations.
One thing I'd would definitely recommend is indices on those columns that you use as foreign keys in your JOIN conditions, e.g.
Age
dteEffectiveDate
dteNext
Create a NONCLUSTERED index on each of those columns separately and measure again. With just a few data rows, there's no improvement measurable - but with millions of rows, it might make a difference.

Related

SQL join each row in a table with a one row from another table

The Problem
I have a table window with start and end timestamps. I have another table activity that has a timestamp. I would like to create a query that:
For each row in activity it joins with a single row from window, where the timestamp occurs between start and end, choosing the older window.
Window Table
Start
End
ISBN
0
10
"ABC"
5
15
"ABC"
20
30
"ABC"
25
35
"ABC"
Activity Table
Timestamp
ISBN
7.5
"ABC"
27.5
"ABC"
Desired Result
Start
End
ISBN
Timestamp
0
10
"ABC"
7.5
20
30
"ABC"
27.5
The Attempt
My attempt at solving this so far has ended with the following query:
SELECT
*
FROM
test.activity AS a
JOIN test.`window` AS w ON w.isbn = (
SELECT
w1.isbn
FROM
test.window as w1
WHERE a.`timestamp` BETWEEN w1.`start` AND w1.`end`
ORDER BY w1.`start`
LIMIT 1
)
The output of this query is 8 rows.
When there is guaranteed to be a single oldest window (i.e. no two Start times are the same for any ISBN)
with activity_window as (
select
a.`Timestamp`,
a.`ISBN`,
w.`Start`,
w.`End`,
row_number() over (partition by a.`ISBN`, a.`Timestamp` order by w.`Start`) rn
from
`Activity` a
inner join `Window` w on a.`ISBN` = w.`ISBN` and a.`Timestamp` between w.`Start` and w.`End`
)
select `Start`, `End`, `ISBN`, `Timestamp` from activity_window where rn = 1;
Result:
Start
End
ISBN
Timestamp
0
10
ABC
7.5
20
30
ABC
27.5
(see complete example at DB<>Fiddle)
CTEs are available from MySQL 8.0. Use subqueries when you are still on MySQL 5. Try to avoid table- and column names that are reserved words in SQL (things like Window, Start, End or Timestamp are examples for bad name choices).
Keeping an index over (ISBN, Start, End) on Window (or clustering the entire table that way by defining those three columns as the primary key) helps this query.

Converting mysql query to mssql query for sql server 2000

I was stuck figuring out why there's error when I execute the mysql query on mssql query. I was using sql server 2000. My goal is to achieve the result with the same way that I use on mysql.
A little explanation about the database: The database is about a gps tracker with 3 main table: incoming, md_login, and master_device.
Here is the table and the structure that I will give:
Structure of table incoming:
This table mainly used for the incoming data of the gps tracker for each vehicle, so every interval there'll be incoming data into this tables. Regarding about some of the table structure, you can say 'tanggal' is the meaning of 'date' in English
Text1 varchar
...
Text18 varchar <used as imei>
...
Text31 varchar
Distance varchar
Tanggal datetime
TanggalIncoming datetime
StartDate datetime
EndDate datetime
EngineStatus varchar
AccStatus varchar
Moving varchar
Address varchar
Structure of table md_login:
This table used to store the vehicle with the imei data, so 1 Log_ID can have many Log_DeviceID.
Log_ID char <used as username>
Log_DeviceID varchar <used as vehicle number>
Log_DeviceIMEI varchar <used as imei>
Log_Date datetime
Sample data of table md_login:
Log_ID - Alex
Log_DeviceID - B 7777 GHI
Log_DeviceImei - 012896001194123
Log_Date - 2017-05-30 13:46:57
Structure of table master_device:
Device_Imei varchar
Device_PoliceNumber char
Device_MobileNumber char
Device_MobileNumber2 char
Model varchar
Port char
PortDevice char
ActiveDate datetime
LastUpdate datetime
IdxConn varchar
CommandOperate char
Picture varchar
Sample data of table master_device:
Device_Imei - 012896001194123
Device_PoliceNumber - B 7777 GHI
Device_MobileNumber - 01234567
Device_MobileNumber2 -
Model - STV-08
Port - 340
PortDevice - 20557
ActiveDate - 2017-05-30 13:46:57
LastUpdate - Null
IdxConn - Null
CommandOperate - Null
Picture - livina_grey.png
Here's the query that already works on mysql:
SELECT fi.text18 as Imei,
md.Device_PoliceNumber,
fi.Text6 as Lat,
fi.Text8 as Lng,
fi.Text10 as Speed,
fi.Text16 as Gps_Signal,
fi.Text21 as Battery,
fi.Text22 as Charging,
fi.Text29 as Oil,
fi.Text30 as Temperature,
md.Picture,
fi.EngineStatus,
fi.TanggalIncoming,
fi.Moving,
fi.Address
FROM incoming fi
INNER JOIN (SELECT MAX(tanggalincoming) as maxtglincoming,text18,moving
FROM incoming
GROUP BY text18) ri
ON ri.maxtglincoming = fi.tanggalincoming AND
ri.text18=fi.text18
INNER JOIN md_login AS mdl ON (ri.text18=mdl.log_deviceimei AND
mdl.log_id='alex')
INNER JOIN master_device AS md ON md.device_imei=mdl.log_deviceimei
GROUP BY fi.text18
ORDER BY md.Device_PoliceNumber ASC
A little explanation about the query:
So I was using MAX(tanggalincoming) at first to get the row result based on the latest update from table call incoming. the next step is: I was doing the inner join from the latest incoming table with the full incoming table so the data that will return is based from the latest incoming data that already inner joined.
And here is the sample data result that will be shown when I execute the query in mysql. There can be result more than 1 row data since 1 username can have more than 1 vehicle.
Imei - 012896001194123
Device_PoliceNumber - B 7777 GHI
Lat - -6.27585
Lng - 106.66172
Speed 0
Gps_Signal F
Battery - F:4.18V
Charging - 1
Oil - Null
Temperature - Null
Picture - livina_grey.png
EngineStatus - OFF
TanggalIncoming - 2017-05-31 05:25:59
Moving - STOP
Address - Example Street
But when I try to execute the query on sql server 2000, there's the error showing like this:
Server: Msg 8120, Level 16, State 1, Line 1. Column 'md.Device_PoliceNumber' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
So the main question is: How can I achieve the same result in the sql query?
If you use aggregate functions (like MAX, SUM etc) in Sql Server, you should include all other fields in GROUP BY clause.
In this case in your sub-query you have SELECT MAX(tanggalincoming) as maxtglincoming,text18,moving but only text18 is included into GROUP BY.
Should look like this:
SELECT MAX(tanggalincoming) as maxtglincoming,text18,moving
FROM incoming
GROUP BY text18,moving
The second is you don't have any aggregate functions in the big query. So you should remove it.
If you used GROUP BY to suppress duplications, use DISTINCT instead

If more than 10% of results are over X in mysql

I have a database table with lists of temperature readings from many locations in a number of buildings. I need a query that will give me a true or false if more than 10% of the readings in a building, taken on a date, are greater than X
I am not looking for a average. If there are 100 measurements taken in a building on a date, and 10 of them are over X (say 80 degrees) then create a flag.
The table is laid out as
Building # location # date temperature
| 123 | 555 |2016-04-08 | 68.5 |
| 123 | 556 |2016-04-08 | 70.2 |
| 123 | 557 |2016-04-08 | 65.4 |
| 888 | 999 |2013-03 22 | 80.4 |
Typically a building would have over 100 readings. There are many hundreds of building/date entries in the table
Can this be done with a single mysql query and can you share that query with me?
I obviously haven't made my question clear.
The result I am looking for is a single True or False.
If more than 10% of the results for a building/date combination were over X (say 80%) then show true, or some flag equal to true.
The known fields will be building and date. The location is not relevant, and can be ignored. So given the input of building (123) and date (2016-04-08) are more than 10% of the entries in the table that have that building number and date greater than X (e.g. 80). The only data to be tested are those for that building and date. So the query would end in:
where building_id=`123` AND date =`2016-04-08`
I am NOT looking for an average or a median. I am NOT looking to see a list of the data for that 10%. I am just looking for true or false.
You can use conditional aggregation, something like this:
select building, date,
(case when avg(temperature > x) > 0.1 then 'Y' else 'N' end) as flag
from t
group by building, date;
To return building and date, and "create a flag" for rows where more than 10% of the readings for that building on that date are over a given value X ...
SELECT r.building
, DATE(r.date)
, ( SUM(r.reading > X ) > SUM(.10) ) AS _flag
FROM myreadings r
GROUP BY r.building, DATE(r.date)
Absent more specification about the actual resultset you want to return, we're just guessing at what result set you want to return.
FOLLOWUP
Based on the update to the question... to return a row for a single building and a single date, add the WHERE clause as shown in the question. And remove expressions from the SELECT list.
SELECT ( SUM(r.reading > X ) > SUM(.10) ) AS _flag
FROM myreadings r
WHERE r.building = '123'
AND r.date >= '2016-04-08'
AND r.date < '2016-04-08' + INTERVAL 1 DAY
If there are no rows for the given building and given date, the query will return zero rows. If there is at least one row, and the number of rows that have a reading greater than X is more than 10% of the total number of rows, the query will return a single row, with _flag having a value of 1 (TRUE). Otherwise, the query will return a single row with _flag having a value of 0 (FALSE).
If you want the query to return a row even when there are no matching rows in the table, that can be accomplished with a more complex SQL statement.
If you want the query to return string values 'TRUE' or 'FALSE', that can be accomplished as well.
Again, absent an example of the resultset you are expecting to have returned, (without an actual specification which we can compare a resultset to), we're just guessing.

how to add two millisecond column in mysql

I have table as shown below
gid code time qid
1 123 08:108 15
1 145 11:012 15
1 145 11:216 16
1 123 12:102 16
Now U want to group the 'gid' and add the two time with same code column (ex: i am taking 123, calculating the time (08:108+12:102)/2. Divided by '2' because code 123 appears two time,if it appears three time then divided by 3 this should be dynamic.
I want the result should be
gid code time
1 123 10:105
1 145 11:114
i tried using this query
SELECT sum(time) FROM results group by code; // result in integer values
and SELECT timestamp(sum(time)) FROM results group by code; // result is null
Your time field does not look like it is of the type TIME. A TIME field is in the format HH:MM:SS and doesn't allow to store milliseconds. The MySQL Documentation states that trailing fractions of seconds are allowed in date and time values, but are discarded and not stored.
Your time field looks like it is a varchar and while you can use functions like SUM() or AVG() on that, your notation seconds:milliseconds is wrong.
You can use the following query:
SELECT code,AVG(REPLACE(time,':','.')) FROM results group by code
This replaces the : in your value with ., creating a float number AVG() can handle correctly.
The result:
code AVG(REPLACE(time,':','.'))
123 10.105
145 11.114
Of course this will create more operations on the SQL server. The best way would be to change your column definition to FLOAT and store your seconds and milliseconds as a float:
code time
123 8.108
145 11.012
145 11.216
123 12.102
The result of SELECT code,AVG(time) FROM results GROUP BY code:
code AVG(time)
123 10.1050000190735
145 11.1139998435974
You can use the avg aggregate function on a time column - you'd just need to convert it back to time when you're done, and use time_format if the default format doesn't suit you:
SELECT gid, code, TIME_FORMAT(TIME(AVG(`time`)), '%H-%i.%f')
FROM mytable
GROUP BY gid, code

MySQL Calculating sum over pairwise time differences of log file

i have a table in mysql to log user actions. Each row in the table corresponds to a user action, like login, logout etc.
The table looks like:
CREATE TABLE IF NOT EXISTS `user_activity_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`action_type` smallint NOT NULL,
`action_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
id user_id action_type action_created
...
22 6 1 2013-07-21 14:31:14
23 6 2 2013-07-21 14:31:16
24 8 2 2013-07-21 14:31:18
25 8 1 2013-07-21 14:45:18
26 8 0 2013-07-21 14:45:25
27 8 1 2013-07-21 14:54:54
28 8 2 2013-07-21 15:09:11
29 6 1 2013-07-21 15:09:17
30 6 2 2013-07-21 15:09:29
...
Imagine the action 1 is login and 2 is logout and that i want to find out the total time (in hours:minutes:seconds) the user with id 6 was logged in within a specific range of dates.
My first idea was to fetch all rows with either action 1 or 2 and calculate the date differences in PHP myself. This seems rather complicated and i am sure this can be done in one query with mysql, too!
What i tried was this:
SELECT TIMEDIFF(ual1.action_created, ual2.action_created) FROM user_activity_log
ual1,user_activity_log ual2 WHERE ual1.user_id = 6 AND ual2.user_id = 6 AND
ual1.action_type = 1 AND ual2.action_type = 2 AND
DATE(ual1.action_created) >= '2013-07-21' AND
DATE(ual1.action_created) <= '2013-07-21'
ORDER BY ual1.action_created
to select all login events from ual1 and all logout events from ual2 from the same user and then calculate the pairwise time difference for day 2013.7.21, which does not really work and i don't know why.
How can i calculate the total login time (sum over all time differences, date action 2 - date action 1)?
The result from the correct operation should be 2 seconds from log id pair 22,23 + 12 seconds from log id pair 29,30 = 14 seconds.
Thank you very much for your help in advance. Best regards
I think the easiest way to structure this type of query is using correlated subqueries (and, to be honest, I generally don't like correlated subqueries, but this is an exception). Your query would probably work with the right group by clause.
Here is an alternative method:
select TIMEDIFF(action_created, LogoutTS)
from (select ual.*,
(select ual2.user_activity_log
from user_activity_log ual2
where ual2.user_id = ual.user_id and
ual2.action_type = 2 and
ual2.action_created > ual.action_created
order by ual2.action_created desc
limit 1
) as LogoutTS
from user_activity_log ual
where ual.user_id = 6 and
ual.action_type = 1
) ual
To get the total, you then need to do something like sum(TIMEDIFF(action_created, LogoutTS). However, this can depend on the format of the time column. It might look something like this:
select SUM((UNIX_TIMESTAMP(LogoutTS) - UNIX_TIMESTAMP(action_created))/1000)
Or:
select sec_to_time(SUM((UNIX_TIMESTAMP(LogoutTS) - UNIX_TIMESTAMP(action_created))/1000))