Optimizing similar MySQL subqueries - mysql

This is a subquery I have in a larger SQL script. It's performing the same action within multiple different CASE statements, so I was hoping I could somehow combine the action so it doesn't have to do the same thing over and over. However, I can't get the right results if I move the ORDER BY command outside of the CASE statements.
I'm joining 2 tables, met_data and flexgridlayers_table, on JDAY. Flexgridlayers_table has fields for JDAY and Segment, and met_data has fields JDAY, TAIR, and TDEW (in this simple example, but actually more fields). I'm running this through Matlab, so variable1 and variable2 are values set by a nested loop. I need to use CASE statements to account for the situation where variable1 is not equal to 1, then I want to output 0. Otherwise, I want to find values corresponding to a JDAY join, but the values may not be exact matches in F.JDAY and M.JDAY. I want to match on the closest <= value, so I use the ORDER BY M.JDAY DESC LIMIT 1 statement in each subquery.
The output is a table with fields JDAY (from F.JDAY), TAIR, and TDEW. Whenever I try moving the ORDER BY part outside of the CASE statements to get rid of the repeating subqueries, I get only a single row of results representing the largest JDAY. This query gives me the correct result - is there a way to optimize this?
SELECT F.JDAY,
CASE
WHEN *variable1*<>1 THEN 0
ELSE
(SELECT M.TAIR
FROM met_data AS M
WHERE M.Year=2000 AND M.JDAY<=F.JDAY
ORDER BY M.JDAY DESC LIMIT 1)
END AS TAIR,
CASE
WHEN *variable1*<>1 THEN 0
ELSE
(SELECT M.TDEW
FROM met_data AS M
WHERE M.Year=2000 AND M.JDAY<=F.JDAY
ORDER BY M.JDAY DESC LIMIT 1)
END AS TDEW
FROM FlexGridLayers_table AS F
WHERE F.SEGMENT=*variable2*
Further explanation:
This query pulls all JDAY values from flexgridlayers_table, and then searches within the table met_data to find values corresponding to the closest <= JDAY values in that table. For example, consider the following flexgridlayers_table and met_data tables:
flexgridlayers_table:
Segment JDAY
2 1.5
2 2.5
2 3.5
3 1.5
3 2.5
3 3.5
met_data:
JDAY Year TAIR TDEW
1.0 2000 7 8
1.1 2000 9 10
1.6 2000 11 12
2.5 2000 13 14
2.6 2000 15 16
3.4 2000 17 18
4.0 2000 19 20
What I want (and what the query above returns) would be this, for variable1=1 and variable2=2:
JDAY TAIR TDEW
1.5 9 10
2.5 13 14
3.5 17 18
I'm just wondering if there is a more efficient way of writing this query, so I'm not performing the ORDER BY command on the same list of JDAY values over and over for each TAIR, TDEW, etc. field.

Then I would write as follows... It looks like you are looking for one "TAIR" and "TDEW" per JDAY. If that is the case, apply a LEFT JOIN to your met_data table once on the year condition and F vs M JDay values. Now normally, this would return multiple rows per "JDay"
SELECT
PQ.JDay,
PQ.MaxJDayPerFDay,
CASE WHEN *var1* <> 1 THEN 0 ELSE M2.TAIR END TAIR,
CASE WEHN *var1* <> 1 THEN 0 ELSE M2.TDEW END TDEW
from
( SELECT
F.JDay,
MAX( M.JDAY ) as MaxJDayPerFDay
from
FlexGridLayers_Table F
JOIN met_Data M
ON M.Year = 2000
AND F.JDay >= M.JDay
where
F.Segment = *var2*
group by
F.JDay ) PQ
JOIN Met_Data M2
on M2.Year = 2000
AND PQ.MaxJDayPerFDay = M2.JDay
Now this does a pre-query by applying a MAX() JDay in the met_data ONCE and group by JDay so it will always return one record per F.JDay. So, now you have one query pre-qualified for your F.Segment = variable 2. If you had other columns you wanted from the "F" table, put them into this "PreQuery" (PQ alias) as needed.
Then, this result can immediately be joined back to the met_data table since the one day value is now explicitly known from the prequery. So, you can now get both TAIR and TDEW values at once rather than in two separate queries being applied for every record.
Hope this make sense, if not, let me know.

Related

Reorganizing mysql aggregate row into single piece rows

Consider the following mysql table:
ID WeightS AmountS WeightM AmountM WeightL AmountL Someothercolumnshere
1 6 3 10 2 18 2 ...
I need to reorganize this data into a pivot-friendly table, where each piece in the amount columns should be one result row. E.g. from the first two columns, WeightS and AmountS, the SELECT should produce 3 result rows, each having a weight of 2 kgs (=6 kgs total). So the full result table should be like this:
Weight Someothercolumnshere
2 ...
2 ...
2 ...
5 ...
5 ...
9 ...
9 ...
I don't even know if there's a SQL syntax which is able to do this kind of operation? I've never had a request like this before. Worst case scenario, I have to do it in php instead, but I think MYSQL is a lot more fun :p
I've built the schema on sqlfiddle, but I'm afraid that's all I've got.
You need a Tally table for the task like this. Create as much rows as needed in it.
Create table Tally(`N` int);
insert into Tally( `N`) values(1),(2),(3),(4),(5);
Then
(select `ID`, `WeightS`/`AmountS`, `Someothercolumnshere`
from Catches
join Tally on Catches.`AmountS` >= Tally.`N`
)
UNION ALL
(select `ID`, `WeightL`/`AmountL`, `Someothercolumnshere`
from Catches
join Tally on Catches.`AmountL` >= Tally.`N`
)
UNION ALL
(select `ID`, `WeightM`/`AmountM`, `Someothercolumnshere`
from Catches
join Tally on Catches.`AmountM` >= Tally.`N`
)

For Loop in MySQL, looping through a table and applying it to a where statement

My problem is how to loop through a table and extract information from another table.
I have a table - X with 470 records:
A B C
111 12 18
121 21 29
127 37 101
I would like to write the following query:
create or replace view NEW as
For j = 1-3
Select * from Y
where imei = X.A(j) and id > X.B(j) and id < X.C(j)
Apologies, I am a matlab programmer so I have used that syntax above to explain what I want. How can I do this in MySql? I have looked up For Loops but mostly it loops through within the same table. I need to loop through a different table and use those criteria in the where statement of a different table.
To get 3 rows from a table, use LIMIT 3 in a subquery. To get related rows in another table, use JOIN.
CREATE OR REPLACE VIEW new AS
SELECT Y.*
FROM Y
JOIN (SELECT *
FROM X
LIMIT 3) AS X ON Y.ime1 = X.a AND Y.id > X.b AND Y.id < X.c
To make LIMIT 3 produce predictable results, you should have an ORDER BY clause in the subquery. Otherwise, it will select an arbitrary set of 3 rows from X.

Unable to apply WHERE/AND on MySQL table with 2 columns on MAMP

I thought I had a very simple query to perform, but I can't seem to make it work.
I have this table with 2 columns:
version_id trim_id
1 15
1 25
1 28
1 30
1 35
2 12
2 25
2 33
2 48
3 11
3 25
3 30
3 32
I am trying to get any version-id's that have say a sub-set of trim_id's. Let's say all version_id's that have trim_id's 25 and 30. My obvious attempt was :
SELECT * FROM table WHERE trim_id=25 AND trim_id=30
I was expecting to have version_id 1 and 3 as a result, but instead I get nothing.
I am working with the latest version of MAMP, which has some odd behavior, like in this case it just tells me its 'LOADING' and never gives me an error message or something. But that's normally the case when there is no data to return.
This is InnoDB, if that helps.
Thanks for your input.
Your query does not work because you are using AND and the trim_id cannot have two different values at the same time, so you need to apply Relational Division to get the result.
You will need to use something similar to the following:
SELECT version_id
FROM yourtable
WHERE trim_id in (25, 30)
group by version_id
having count(distinct trim_id) = 2
See SQL Fiddle with Demo.
This will return the version_id values that have both 25 and 30. Then if you wanted to include additional columns in the final result, you can expand the query to:
select t1.version_id, t1.trim_id
from yourtable t1
where exists (SELECT t2.version_id
FROM yourtable t2
WHERE t2.trim_id in (25, 30)
and t1.version_id = t2.version_id
group by t2.version_id
having count(distinct t2.trim_id) = 2);
See SQL Fiddle with Demo
SELECT *
FROM table
WHERE trim_id IN(25,30)

MS Access Crosstab query - sum of columns reported by month

I am putting together a Crosstab and I want a report for multiple column values (all numbers) grouped by month. Here is the SQL I used. I understand that this won't bring back the desired results. every "timex" column has a different number in it. I want a query that will return the sum of the column grouped by month.
TRANSFORM Sum(tblTimeTracking.time1+ tblTimeTracking.time2+ tblTimeTracking.time3+ tblTimeTracking.time4+ tblTimeTracking.time5+ tblTimeTracking.time6+ tblTimeTracking.time7+ tblTimeTracking.time8+ tblTimeTracking.time9+ tblTimeTracking.time10+ tblTimeTracking.time11+ tblTimeTracking.time12+ tblTimeTracking.time13+ tblTimeTracking.time14+ tblTimeTracking.time15+ tblTimeTracking.time16+ tblTimeTracking.time17+ tblTimeTracking.time18+ tblTimeTracking.time19+ tblTimeTracking.time20+ tblTimeTracking.time21+ tblTimeTracking.time22 ) AS Total
SELECT tbl_vlookup.Manager AS Manager
FROM tbl_vlookup INNER JOIN tblTimeTracking ON tbl_vlookup.[Associate Name] = tblTimeTracking.Associate
GROUP BY tbl_vlookup.Manager
PIVOT Format([Day],"yyyy-mm");
Associate Day Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7
John Smith 12/1/9999 1 0 0 5.5 1 0.25 0.25
Something like this:
TRANSFORM Sum(q.Time1) AS SumOfTime1
SELECT q.Associate, q.Day
FROM (SELECT t.Associate, t.Day, t.Time1,"Time1" As TimeType
FROM tbl t
UNION ALL
SELECT t.Associate, t.Day, t.Time2,"Time2" As TimeType
FROM tbl t
UNION ALL
SELECT t.Associate, t.Day, t.Time3,"Time3" As TimeType
FROM tbl t) AS q
GROUP BY q.Associate, q.Day
PIVOT q.TimeType;
As I mentioned, you need to flatten the table. It only seems to be different types of data :)

SQL Query Help - Grouping By Sequences of Digits

I have a table, which includes the following columns and data:
id dtime instance data dtype
1 2012-10-22 10000 d 1
2 2012-10-22 10000 d 1
..
7 2012-10-22 10004 d 1
..
15 2012-10-22 10000 # 1
16 2012-10-22 10004 d 1
17 2012-10-22 10000 d 1
I want to group sequences of 'd's in the data column, with the '#' at the end of the sequence.
This could have been done by grouping via the instance column, which is an individual stream of data, however there can be multiple sequences within the stream.
I also want to end a sequence if there are no data columns in the same instance for, say, 3 seconds after the last data of that instance and no '#'s have been found within that interval.
I have managed to do exactly this using cursors and while loops, which worked reasonably well for tables with 1000s of rows, however this query will be used on far more rows eventually, and these two methods would take around a minute with a dataset of just 3-5000 rows.
Reading on this website and others, it seems that set-based logic may be the way to go, however I can think of no way to do what I need without some kind of loop on each row that compares it to every other to build the 'sequences'.
If anyone could help, or point me in the direction of something that could, it would be greatly appreciated. :)
I would ideally like the data to be output in the following format:
datacount instance lastdata dtime
20 10000 # 2012-10-22
19 10000 d 2012-10-22
22 10004 # 2012-10-22
20 10022 # 2012-10-22
Where (datacount) is a count of the number of rows in a 'sequence' (which is the data leading up to a '#' or 3 second delay), (instance) is the instance ID from the original table, (lastdata) is the last data value in the sequence, (dtime) is the datetime value of the last data value.
Let me show you how to do this for the final '#'. The time difference follows a similar idea. The key idea is to get the next '#' after the current row. For this you need a correlated subquery. After that, you can do a group by:
select groupid, count(*) as NumInSeq, max(dtime) as LastDateTime
from (select t.*,
(select min(t2.id) from t t2 where t2.id > t.id and t2.data = '#'
) as groupid
from t
) t
group by groupid
Handling the time sequence is a bit more complicated. It is something like this:
select groupid, count(*) as NumInSeq, max(dtime) as LastDateTime,
(case when sum(case when data = '#' then 1 else 0 end) > 0 then '#' else 'd' end) as FinalData
from (select t.*,
(select min(t2.id)
from t t2
where t2.id > t.id and
(t2.data = '#' or UNIX_TIMESTAMP(t2.dtime) - UNIX_TIMESTAMP(t.dtime) < 3
) as groupid
from t
) t
group by groupid