Compare rows and get percentage

Compare rows and get percentage - mysql

I found it hard to find a fitting title. For simplicity let's say I have the following table:
cook_id cook_rating
1 2
1 1
1 3
1 4
1 2
1 2
1 1
1 3
1 5
1 4
2 5
2 2
Now I would like to get an output of 'good' cooks. A good cook is someone who has a rating of at least 70% of 1, 2 or 3, but not 4 or 5.
So in my example table, the cook with id 1 has a total of 10 ratings, 7 of which have type 1, 2 and 3. Only three have type 4 or 5. Therefore the cook with id 1 would be a 'good' cook, and the output should be the cook's id with the number of good ratings.
cook_id cook_rating
1 7
The cook with id 2, however, doesn't satisfy my condition, therefore should not be listed at all.
select cook_id, count(cook_rating) - sum(case when cook_rating = 4 OR cook_rating = 5 then 1 else 0 end) as numberOfGoodRatings from cook
where cook_rating in (1,2,3,4,5)
group by cook_id
order by numberOfGoodRatings desc
However, this doesn't take into account the fact that there might be more 4 or 5 than good ratings, resulting in negative outputs. Plus, the requirement of at least 70% is not included.

You can get this with a comparison in your HAVING clause. If you must have just the two columns in the result set, this can be wrapped as a sub-select select cook_id, positive_ratings FROM (...)
SELECT
cook_id,
count(cook_rating < 4 OR cook_rating IS NULL) as positive_ratings,
count(*) as total_ratings
FROM cook
GROUP BY cook_id
HAVING (positive_ratings / total_ratings) >= 0.70
ORDER BY positive_ratings DESC
Edit Note that count(cook_rating < 4) is intended to only count rows where the rating is less than 4. The MySQL documentation says that count will only count non-null rows. I haven't tested this to see if it equates FALSE with NULL but I would be surprised it it doesn't. Worst case scenario we would need to wrap that in an IF(cook_rating < 4, 1,NULL).

I suggest you change a little your schema to make this kind of queries trivial.
Suppose you add 5 columns to your cook table, to simply count the number of each ratings :
nb_ratings_1 nb_ratings_2 nb_ratings_3 nb_ratings_4 nb_ratings_5
Updating such a table when a new rating is entered in DB is trivial, just as would be recomputing those numbers if having redundancy makes you nervous. And it makes all filterings and sortings fast and easy.

Related

aggregate of the group columns along with the group in a table

I am very new to Microsoft reporting. I have the following table in my database:
CategoryName Id
Normal 1
High 2
Normal 3
Low 4
Normal 5
Normal 6
Normal 7
Normal 8
Low 9
Low 10
Low 11
High 12
I want to group by Category and also show the count of each category. Here is what I did:
I inserted a two column table and I grouped by the categoryName in the first column and in the second column, I tried doing
=CountDistinct(Fields!CategoryName.Value)
This is what I am seeing in the report
High 1
1
Normal 1
1
1
1
1
1
1
Low 1
1
1
1
I want to see something like this:
Category Count
Normal 6
Low 4
High 2
any help will be highly appreciated

Delete the Detail row and put the Count epxression in the Group row.

You'd probably be better off doing this in the query. Easier and leaves less room for error. Something like the following should work.
SELECT categoryName, COUNT(*)
FROM your table
GROUP BY categoryName

Limit selected results by unique selected IDs when using left joins

I have a table users and some other tables like images and products
Table users:
user_id user_name
1 andrew
2 lutz
3 sophie
4 michael
5 peter
6 oscor
7 anton
8 billy
9 henry
10 jon
Tables images:
user_id img_type img_url
1 0 url1
1 1 url4
2 0 url5
7 0 url7
8 0 url8
9 1 url9
Table Products
user_id prod_id
1 5
1 55
2 555
8 5555
9 5
9 55
I use this kind of SELECT:
SELECT * FROM
(SELECT user.user_id,user.user_name, img.img_type, prod.prod_id FROM
users
LEFT JOIN images img ON img.user_id = users.user_id
LEFT JOIN products prod ON prod.user_id = users.user_id
WHERE user.user_id <= 5) AS users
ORDER BY user.user_id ASC
The result should be the following output. Due to performance improvements, I use ORDER BY and an inner select. If I put a LIMIT 5 within the inner or outer select, things won't work. MySQL will hard LIMIT the results to 5. However I need the LIMIT of 5 (pagination) found unique user_id results which would lead to 9 in this case.
Can I use maybe an if-statement to push an array with found user_id and break/finish up the select when the array consist of 5 UIDs? Or can I modify somehow the select?
user_id user_name img_type prod_id
1 andrew 0 5
1 andrew 1 5
1 andrew 0 55
1 andrew 1 55
2 lutz 0 5
2 lutz 0 55
3 sophie null null
4 michael null null
5 peter null null
results: 9

LIMIT 5 and user_id <= 5 do not necessarily give you the same results. One reason: There are multiple rows (after the JOINs) for user_id = 1. This is because there can be multiple images and/or multiple products for a given 'user'.
So, first decide which you want.
LIMIT without ORDER BY gives you an arbitrary set of rows. (Yeah, it is somewhat predictable, but you should not depend on it.)
ORDER BY + LIMIT usually implies gathering all the potentially relevant rows, sorting them, then doing the "limit". There are sometimes ways around this sluggishness.
LEFT leads to the NULLs you got; did you want that?
What do you want pagination to do if you are displaying 5 items per page, but user 1 has 6 images? You need to think about this edge case before we can help you with a solution. Maybe you want all of user 1 on a page, even if it exceeds 5? Maybe you want to break in the middle of '1'; but then we need an unambiguous way to know where to continue from for the next page.
Probably any viable solution will not use nested SELECTs. As you are finding out, it leads to "errors". Think of it this way: First find all the rows you need to display on all the pages, then carve out 5 for the current page.
Here are some more musings on pagination: http://mysql.rjweb.org/doc.php/pagination

MySQL custom sort by subquery impacts performance

I have 1 table from which I return search results and display them in a a specific order. This example is an exact, simplified version of my db structure: http://www.java2s.com/Code/SQL/Select-Clause/Orderbyvaluefromsubquery.htm
and here is my current code, which works but heavily impacts performance to a large extend because of the subquery used:
SELECT * FROM `table` AS p1
WHERE CONCAT(title,artist,creator,version) LIKE '%searchInput%'
ORDER BY
(SELECT
MAX(`rating`) FROM `table` AS p2 WHERE p1.setId=p2.setId
) DESC
the above code searches and sorts the result sets by the highest rating in the set and that all rows from the same set are kept together, for example:
id setId rating title,artist,etc...
1 1 5
2 1 5
3 2 7
4 1 6
5 2 1
6 3 3
would sort to:
id setId rating title,artist,etc...
3 2 7
5 2 1
4 1 6
1 1 5
2 1 5
6 3 3
Currently it takes around 8.5sec to query 1000 rows and over half a minute for a large amount of rows, is there any way to improve the performance or would it be better to fetch all the results and sort them in PHP memory?
Help is much appreciated

You can probably speed things up a bit by separating the LIKEs:
SELECT p1.* FROM `table` AS p1
WHERE (title LIKE '%searchInput%')
OR (artist LIKE '%searchInput%')
OR (creator LIKE '%searchInput%')
OR (version LIKE '%searchInput%')
ORDER BY
(SELECT MAX(`rating`) FROM `table` AS p2 WHERE p1.setId=p2.setId) DESC
You could also try to
CREATE INDEX tbl_ndx ON table(setId, rating)
to improve sorting performances.

MYSQL Can WHERE IN default to ALL if no rows returned

Have a existing table of results like this;
race_id race_num racer_id place
1 0 32 2
1 1 32 3
1 2 32 1
1 3 32 6
1 0 44 2
1 1 44 2
1 2 44 2
1 3 44 2
etc...
Have lots of PHP scripts that access this table output the results in a nice format.
Now I have a case where I need to output the results for only certain race_nums.
So I have created this table races_included.
race_view race_id race_num
Day 1 1 0
Day 1 1 1
Day 2 1 2
Day 2 1 3
And can use this query to get the right results.
SELECT racer_id, place from results WHERE race_id=1
AND race_num IN
(SELECT race_num FROM races_included WHERE race_id='1' AND race_view='Day 1')
This is great but I only need this feature for a few races and to have it work in a compatible mode for the simple case show all races. I need to add alot of rows to the races_included table. Like
race_view race_id race_num
All 1 0
All 1 1
All 1 2
All 1 3
95% of my races don't use the daily feature.
So I am looking for a way to change the query so that if for race 1 there are no records in the races_included table it defaults to all races. In addition I need it to be close the same execution speed as the query without the IN clause, because this query Or variations of it are used a lot.
One way that does work is to redefine the table as races_excluded and use NOT IN. This works great but is a pain to manage the table when races are added or deleted.
Is there a simple way to use EXISTS and IN in tandem as a subquery to get the desired results? Or some other neat trick I am missing.
To clarify I have found a working but very slow solution.
SELECT * FROM race_results WHERE race_id=1
AND FIND_IN_SET(race_num, (SELECT IF((SELECT Count(*) FROM races_excluded
WHERE rid=1>0),(SELECT GROUP_CONCAT(rnum) FROM races_excluded
WHERE rid=1 AND race_view='Day 1' GROUP BY rid),race_num)))
It basically checks if any records exists for that race_id and if not return a set equal to the current race_num and if yes returns a list of included race nums.

You can do this by using or in the subquery:
SELECT racer_id, plac
from results
WHERE race_id = 1 AND
race_num IN (SELECT race_num
FROM races_included
WHERE race_id = '1' AND (race_view = 'Day 1' or raw_view = 'ANY')
);

Please explain the functionality of select max(...) ... group by in sql

I appreciate that this may appear to many as a dum question but I cannot find a clear explanation anywhere as to what the effect of "group by" has on a select max(...) from SQL statement.
I have the following data (there is another column image of type mediumblob which is not shown):
id title test_id
1 bomb 0
2 Soft watch 2
3 Dali 1
4 Narciss 1
5 The Woman In Green 0
6 A summer in Vetheuil 0
7 Artist's Garden 2
8 Beech Forest 2
9 Claude Monet 0
I know if I perform
select max(id) from images
where image is not null;
I get the max value of id i.e.:
max(id)
9
However can someone please explain what is happening when I perform
select max(id), title, test_id
from images
where image is not null
group by id;
I find that the max(id) serves no useful purpose (results shown below)?
max(id) title test_id
1 bomb 0
2 Soft watch 2
3 Dali 1
4 Narciss 1
5 The Woman In Green 0
6 A summer in Vetheuil 0
7 Artist's Garden 2
8 Beech Forest 2
9 Claude Monet 0

In the case of using MAX() the GROUP BY clause essentially tells the query engine how to group the items from which to determine a maximum. In your first example you were selecting only a single column, so there was no need for grouping. But in your second example you had multiple columns. So you need to tell the query engine how to determine which ones are going to be compared to find a maximum.
You told it to group by the id column. Which means that it's going to compare records which have the same id and give you the maximum one for each unique id. Since every record has a different id, you essentially didn't do anything with that clause.
It grouped all records with an id of 1 (which was a single record), and returned the record with the maximum id from that group (which was that record). It did the same for 2, 3, etc.
In the case of the three columns shown here, the only place where it would make sense to group your records would be on the test_id column. Something like this:
SELECT MAX(id), title, test_id
FROM images
WHERE image IS NOT null
GROUP BY test_id
This would group them by the test_id, so the results will include records 6 (the maximum id for test_id 0), 4 (the maximum id for test_id 1), and 8 (the maximum id for test_id 2). By splitting the records into those three groups based on the three unique values in the test_id column, it can effectively find a "maximum" id within each group.

Yes, in your example it serves no useful purpose.
You're grouping by ID then finding the maximum ID. But that doesn't make sense since there's only one of each ID. Normally MAX() is used on quantities, like prices or item counts or such like.

Group by is not used for this kind of queries
Its is used for queries like this
OId OrderDate OrderPrice Customer
1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
Now if you want to get sum of material bought by each customer of these you will use group by
SELECT Customer,SUM(OrderPrice) FROM Orders
GROUP BY Customer
customer SUM(OrderPrice)
Hansen 2000
Nilsen 1700
Jensen 2000
In above case id is unique so group by id will not make any sense

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Compare rows and get percentage - mysql

Related

aggregate of the group columns along with the group in a table

Limit selected results by unique selected IDs when using left joins

MySQL custom sort by subquery impacts performance

MYSQL Can WHERE IN default to ALL if no rows returned

Please explain the functionality of select max(...) ... group by in sql

Categories

Resources