MySQL Complicated SELECT - mysql

I have a MySQL table (tbl_filters) with 3 columns: id, cat, val
id & val are numeric, cat is varchar. There are multiple rows for each id.
I also have another table (tbl_info) with multiple columns, including an id which corresponds to the id from tbl_filters. There is a column called name, which is what I'm looking for.
I would like to select the name of all the rows which match a set value for cat, but only if the val for cat is the maximum for this id, and only if it is above a minimum set val.
In pseudocode it would be something like:
SELECT tbl_info.name FROM tbl_info,tbl_filters
WHERE (tbl_info.id=tbl_filters.id) AND (cat="mycat") AND (val>=0.3)
AND (there are no other rows for this id in tbl_info with a higher value for val)
Example:
tbl_filters
id,cat,val
1 eg1 0.43
1 eg2 0.60
1 eg3 0.78
tbl_info
id name
1 MyName
In the above example, a value should only be returned if I am looking for the cat called eg3, since that has the highest value. For the other cats, nothing should be returned, since they are not the highest value.
Another option would be to make a column in tbl_info just for the cat with the highest value, but that is a messy solution I would prefer to avoid.

I THINK I'm following you... The INNER-MOST query pre-qualifies the HIGHEST Value per ID of your minimum value qualification, and the category that qualifies. ONCE you get that list, re-join back to get the name from the tbl_info. I've re-joined to the tbl_filters a second time in case there were other elements on that record you want, such as the date of the rate, or other things. If you DONT need that, you can ignore the second "tf2" join and just change the fields list from tf2.val to PreQualified.HighestQualVal.
select
ti.id,
ti.name,
tf2.val
from
( select
tf.id,
max( tf.val ) as HighestQualVal
from
tbl_filters tf
where
tf.cat = "mycat"
and tf.val >= 0.3
) PreQualified
JOIN tbl_info ti
on PreQualified.id = ti.id
JOIN tbl_filters tf2
on PreQualified.id = tf2.id
AND PreQualified.HighestQualVal = tf2.val

What about?
select ti.name, MaxId.maxVal from
(select tf1.id, tf1.cat, max(tf1.val) as maxVal from tbl_filters1 tf1
where tf1.cat = 'eg3' and tf1.val >= 0.0
group by tf1.id, tf1.cat) MaxCat
inner join (
select tf2.id, max(tf2.val) as maxVal from tbl_filters2 tf2
group by tf2.id) MaxId
on (MaxCat.id = MaxId.id and MaxCat.maxVal = MaxId.maxVal)
inner join tbl_info ti on MaxId.id = ti.id
Example here
Basically, and if I'm not wrong (again), I'm getting all the maximum val per each id and cat pair. Then get the maximum val for each id. If both match, i.e. if the max for the cat is the same as the max for the whole id, then I return the results.
Feel free to correct me if I'm wrong.

Related

MySQL - Match certain IDs, but only those IDs

I have a table like so:
id_type id_option
"1" "1"
"1" "5"
"2" "1"
"2" "5"
"2" "8"
I am trying to write a query that given a list of option IDs finds the "type" that matches the list, but only those ID's
For example, if given 1 and 5 as options, it should return the type 1 but only the type 1 as the 8 required to match type 2 is not present.
I have tried the following:
SELECT *
FROM my_table
WHERE id_option IN (1, 5)
GROUP BY id_type
HAVING COUNT(DISTINCT id_option) = 2
This returns both "types" - I had hoped that the COUNT restriction of 2 would have helped but I now understand why it doesn't, but I can't think of a clever way to limit this.
I could just pull the first record as typically the types with less options are saved first but I don't think I can rely on this 100%
Thank you for your time
Here's a solution:
SELECT *
FROM my_table
GROUP BY id_type
HAVING SUM(id_option IN (1,5)) = COUNT(*)
It relies on a trick specific to MySQL: boolean true is literally the integer 1. So you can use SUM() to count the rows where a condition is true, but putting a boolean expression inside SUM().
For folks reading this who use other databases besides MySQL, you'd have to use an expression to convert the boolean condition to the integer 1:
HAVING SUM(CASE WHEN id_option IN (1,5) THEN 1 ELSE 0 END) = COUNT(*)
In this case, let all rows become part of the groups. That is, do not use a WHERE clause to restrict the query to rows where the id_option is 1 or 5. Then count the total rows in the group, and "count" (i.e. use the SUM() trick) the rows where the id_options is 1 or 5. Comparing these counts will be equal if there are no id_options values besides 1 or 5.
If you also want to make sure that both 1 and 5 are found, you need another condition:
SELECT *
FROM my_table
GROUP BY id_type
HAVING SUM(id_option IN (1,5)) = COUNT(*)
AND COUNT(DISTINCT CASE WHEN id_option IN (1,5) THEN id_option END) = 2
The CASE expression will return 1 or 5, or if there are any other values, those are converted to NULL. The COUNT() function ignores NULLs.
If you can pass the options as a sorted comma separated list string, then use GROUP_CONCAT():
SELECT id_type
FROM my_table
GROUP BY id_type
HAVING GROUP_CONCAT(id_option ORDER BY id_option) = '1,5'
If there are duplicate options for each type, use DISTINCT:
HAVING GROUP_CONCAT(DISTINCT id_option ORDER BY id_option) = '1,5'
While I can't comment yet, here's a tiny adjustment to Bill Karwin's last example (in the accepted solution):
SELECT *
FROM my_table
GROUP BY id_type
HAVING SUM(id_option IN (1,5)) = COUNT(*)
AND COUNT(DISTINCT id_option) = 2

Trying to use multiple AS, but I'm getting: Subquery returns more than 1 row

I'm using 3 tables from my database which I read data for my rank (top15) table. I'm trying to fill one 'tr' with only one query (using multiple Aliases), but I'm stuck here:
My last try was:
SELECT DISTINCT(mapname),
(SELECT his_time FROM primekz_records
WHERE primekz_records.id=$player_id AND his_aa = 10 AND tp > 0) AS nub10,
(SELECT his_time FROM primekz_records
WHERE primekz_records.id=$player_id AND his_aa = 10 AND tp = 0) AS pro10
FROM primekz_records
JOIN primekz_players ON primekz_records.id=primekz_players.id
JOIN primekz_maps ON primekz_maps.mid=primekz_records.mid
WHERE primekz_players.id=$player_id
Tables are structured:
primekz_players( id, steamid, name ...)
primekz_maps( mid, mapname )
primekz_records( id, mid, his_time, his_aa, tp, ... ) <-- this means one ID(player) can be max 4 times for one mid (map), variations are: his_aa (10/100), tp (0/more)
If I try with only one alias I get this result, which is totally wrong (see Noob100 column).
https://i.snag.gy/tHpUK8.jpg
Does it have something to do with ROW_NUMBER() + 4x AS ?

Average a column based upon the value in another column SQL

Suppose I have the following data
SqlUnixTime BID ASK VALID ASSET_ID
1504900871 101.50 101.6 Y XY1
1504900870 0 101.6 Y XY1
1504900871 101.50 20 N XY1
...
In the BID & ASK columns I can have a valid price, a 0 (meaning no data) or an invalid price (see the final row).
I'd like to compute a 30 day average. I have managed to handle the 0 case using the following query:
Select ASSET_ID, AVG(NULLIF(BID,0)) as AVG_BID_30D, AVG(NULLIF(ASK,0)) as AVG_ASK_30D FROM myDB.myTable where SqlUnixTime > 1504900870 GROUP BY ASSET_ID;
However, how would I only average those values where VALID = "Y". I thought about putting a where clause in the end but then it might not select asset_id that are invalid? I just want it to have a null?
UPDATED
group it by (ASSET_ID, VALID='Y') and then the resultant again group by VALID='Y'
I think it will work.
select A.ASSET_ID, A.AVG_BID_30D, A.AVG_ASK_30D
from (Select ASSET_ID, AVG(NULLIF(BID,0)) as AVG_BID_30D, AVG(NULLIF (ASK,0)) as AVG_ASK_30D, VALID
FROM myDB.myTable where SqlUnixTime > 1504900870
GROUP BY ASSET_ID, VALID='Y') as A
group by ASSET_ID='Y';
.

query optimization for mysql

I have the following query which takes about 28 seconds on my machine. I would like to optimize it and know if there is any way to make it faster by creating some indexes.
select rr1.person_id as person_id, rr1.t1_value, rr2.t0_value
from (select r1.person_id, avg(r1.avg_normalized_value1) as t1_value
from (select ma1.person_id, mn1.store_name, avg(mn1.normalized_value) as avg_normalized_value1
from matrix_report1 ma1, matrix_normalized_notes mn1
where ma1.final_value = 1
and (mn1.normalized_value != 0.2
and mn1.normalized_value != 0.0 )
and ma1.user_id = mn1.user_id
and ma1.request_id = mn1.request_id
and ma1.request_id = 4 group by ma1.person_id, mn1.store_name) r1
group by r1.person_id) rr1
,(select r2.person_id, avg(r2.avg_normalized_value) as t0_value
from (select ma.person_id, mn.store_name, avg(mn.normalized_value) as avg_normalized_value
from matrix_report1 ma, matrix_normalized_notes mn
where ma.final_value = 0 and (mn.normalized_value != 0.2 and mn.normalized_value != 0.0 )
and ma.user_id = mn.user_id
and ma.request_id = mn.request_id
and ma.request_id = 4
group by ma.person_id, mn.store_name) r2
group by r2.person_id) rr2
where rr1.person_id = rr2.person_id
Basically, it aggregates data depending on the request_id and final_value (0 or 1). Is there a way to simplify it for optimization? And it would be nice to know which columns should be indexed. I created an index on user_id and request_id, but it doesn't help much.
There are about 4907424 rows on matrix_report1 and 335740 rows on matrix_normalized_notes table. These tables will grow as we have more requests.
First, the others are right about knowing better how to format your samples. Also, trying to explain in plain language what you are trying to do is also a benefit. With sample data and sample result expectations is even better.
However, that said, I think it can be significantly simplified. Your queries are almost completely identical with the exception of the one field of "final_value" = 1 or 0 respectively. Since each query will result in 1 record per "person_id", you can just do the average based on a CASE/WHEN AND remove the rest.
To help optimize the query, your matrix_report1 table should have an index on ( request_id, final_value, user_id ). Your matrix_normalized_notes table should have an index on ( request_id, user_id, store_name, normalized_value ).
Since your outer query is doing the average based on an per stores averages, you do need to keep it nested. The following should help.
SELECT
r1.person_id,
avg(r1.ANV1) as t1_value,
avg(r1.ANV0) as t0_value
from
( select
ma1.person_id,
mn1.store_name,
avg( case when ma1.final_value = 1
then mn1.normalized_value end ) as ANV1,
avg( case when ma1.final_value = 0
then mn1.normalized_value end ) as ANV0
from
matrix_report1 ma1
JOIN matrix_normalized_notes mn1
ON ma1.request_id = mn1.request_id
AND ma1.user_id = mn1.user_id
AND NOT mn1.normalized_value in ( 0.0, 0.2 )
where
ma1.request_id = 4
AND ma1.final_Value in ( 0, 1 )
group by
ma1.person_id,
mn1.store_name) r1
group by
r1.person_id
Notice the inner query is pulling all transactions for the final value as either a zero OR one. But then, the AVG is based on a case/when of the respective value for the normalized value. When the condition is NOT the 1 or 0 respectively, the result is NULL and is thus not considered when the average is computed.
So at this point, it is grouped on a per-person basis already with each store and Avg1 and Avg0 already set. Now, roll these values up directly per person regardless of the store. Again, NULL values should not be considered as part of the average computation. So, if Store "A" doesn't have a value in the Avg1, it should not skew the results. Similarly if Store "B" doesnt have a value in Avg0 result.

mySQL count occurances of value on multiple fields. How?

I have a table with 5 fields. Each field can store a number from 1 - 59.
Similar to countif in Excel, how do I count the number of times a number from 1 - 59 shows up in all 5 fields?
Here's an example for the count of occurances for the number 1 in all five fields:
SELECT SUM(pick_1 = 1 OR pick_2 = 1 OR pick_3 = 1 OR pick_4 = 1 OR pick_5 = 1) AS total_count_1
FROM tbldraw
Hopefully I made sense.
There was an answer here that had a solution. I think this is just a variation.
Step1: Create a numbers table (1 field, called id, 59 records (values 1 -59))
Step2:
SELECT numbers_table.number as number
, COUNT(tbldraw.pk_record)
FROM numbers_table
LEFT JOIN tbldraw
ON numbers_table.number = tbldraw.pick_1
OR numbers_table.number = tbldraw.pick_2
OR numbers_table.number = tbldraw.pick_3
OR numbers_table.number = tbldraw.pick_4
OR numbers_table.number = tbldraw.pick_5
GROUP BY number
ORDER BY number
How about a two step process? Assuming a table called summary_table ( int id, int ttl), for each number you care about...
insert into summary_table values (1,
(select count(*)
from table
where field1 = 1 or field2 = 1 or field3 = 1 or field4 = 1 or field5 = 1))
do that 59 times, once for each value. You can use a loop in most cases. Then you can select from the summary_table
select *
from summary_table
order by id
That will do it. I leave the coversion of this SQL into a stored procedure for those that know what database is in use.
The ALL() function, which returns true if the preceding operator is true for all parameters, makes the query particularly elegant and succinct.
To find the count a particular number (eg 3):
select count(*)
from tbldraw
where 3 = all (pick_1, pick_2, pick_3, pick_4, pick_5)
To find the count of all such numbers:
select pick_1, count(*)
from tbldraw
where pick_1 = all (pick_2, pick_3, pick_4, pick_5)
group by pick_1