I have a Table with the following structure.
The Table has mostly records where gender = 1.
I'm looking for a solution to get a result set where on top around 60% of records have gender = 1 and around 40% with gender = 2 mixed, ordered by popularity desc.
The amount of member with gender = 2 is much less, which means after the result set should only have gender = 1 records.
Member table
id | nickname | gender | popularity
1 | jake | 1 | 80
2 | mike | 1 | 88
3 | dave | 1 | 75
4 | jenny | 2 | 85
5 | peter | 1 | 83
6 | nina | 2 | 88
7 | mister | 1 | 77
8 | drake | 1 | 80
Result should be something like, it must not meet exactly weighted list. the goal is to see mixed results of both genders.
id | nickname | gender | popularity
2 | mike | 1 | 88
5 | peter | 1 | 83
6 | nina | 2 | 88
1 | jake | 1 | 80
8 | drake | 1 | 80
4 | jenny | 2 | 85
7 | mister | 1 | 77
3 | dave | 1 | 75
My so far best result was (it don't take care about the 40:60 split):
SET #rank=0;
SET #rank2=0;
SELECT * FROM (
SELECT #rank:=#rank+1 AS rank, q.* FROM (SELECT * FROM test WHERE gender = 1 ORDER BY popularity DESC) AS q
UNION
SELECT #rank2:=#rank2+1 AS rank, q.* FROM (SELECT * FROM test WHERE gender = 2 ORDER BY popularity DESC) AS q
) AS r ORDER BY rank;
Please try...
SET #gender1Count = SELECT COUNT( * )
FROM tblMember
WHERE gender = 1;
SET #gender2Count = SELECT COUNT( * )
FROM tblMember
WHERE gender = 2;
SET #totalCount = SELECT COUNT( * )
FROM tblMember;
SELECT id AS id,
nickname AS nickname,
gender AS gender,
popularity AS popularity
FROM tblMember
JOIN ( SELECT id AS id
FROM tblMember
WHERE gender = 1
ORDER BY popularity DESC
LIMIT CASE
WHEN #gender1Count > #totalCount * 3 / 5
ROUND( #gender2Count * 3 / 2 )
ELSE
#gender1Count
END
UNION
SELECT id AS id
FROM tblMember
WHERE gender = 2
ORDER BY popularity DESC
LIMIT CASE
WHEN #gender1Count > #totalCount * 3 / 5
#gender2Count
ELSE
ROUND( #gender1Count * 2 / 3 )
END
) nominees ON tblMember.id = nominees.id
ORDER BY popularity DESC;
The above will give you a list where 60% of entries are gender = 1 and 40% are gender = 2. Please note that this is not the same as 60% or more of the total list as gender = 1 with the balance gender = 2 (or 40% or more of the total list as gender = 2 and the balance gender = 1).
It does this by forming a list of those whose gender equals 1 and sorting it into descending order of popularity. It then determines how many of the top entries it will grab from this list using LIMIT by checking if the count of gender = 1 members exceeds 60% (3/5ths) of the list. If it does then we will need to reduce the number of gender = 1 records to be retrieved to 3/2 times the count of gender = 2 members. The id's of the chosen records are then returned.
(A quickish explanation for those who aren't great at fractions, 40% is the same as 2/5 (two fifths). If gender = 2 has two fifths of the final list then gender = 1 must have the other three fifths (3/5). To find the size of 3/5ths of the list we start with the known 2/5ths (the count of gender = 2) and divide that into 2 so that we know the size of 1/5th of the list. We can then multiply this 1/5 by 3 to determine how many record will make up 3/5ths (60%) of our list.)
Similar logic is used to form the list of gender = 2 members to be included in the final list.
(Please note that the records at the end of each list will likely have popularity values equal to those of the most popular excluded members whose gender corresponds to each list. In the absence of any subsorting in the formation of the two lists the selection of those that are or are not chosen will be arbitrary (and essentially semirandom).)
The two lists are then joined using the UNION operator in what is a simple type of vertical join. (Note : The more familiar INNER JOIN, LEFT JOIN, etc., are all types of horizontal joins).
An inner JOIN is then performed upon our list of amalgamated id's with our original table, giving us our 60% / 40% list. Finally, this list is sorted into descending order of popularity.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Related
Im not even sure what the title of this question should be but lets start out with my data.
I have a table of users who have taken a few lessons while belonging to a particular training center.
lesson table
id | lesson_id | user_id | has_completed
----------------------------------------
1 | asdf3314 | 2 | 1
2 | d13saf12 | 2 | 1
3 | a33adff5 | 2 | 0
4 | a33adff5 | 1 | 1
5 | d13saf12 | 1 | 0
user table
id | center_id | ...
----------------------------------------
1 | 20 | ...
2 | 30 | ...
training center table
id | center_name | ...
----------------------------------------
20 | learn.co | ...
30 | teach.co | ...
I've written a small chunk but am now stuck as I don't know how to proceed. This statement gets the counted total of completed lessons per user. it then figures the average completed value from a center id. if two users belong to a center and have completed 3 lessons and 2 lessons it finds the average of 3 and 2 then returns that.
SELECT
FLOOR(AVG(a.total)) AS avg_completion,
FROM
(SELECT
user_id,
user.center_id,
count(user_id) AS total
FROM lesson
LEFT JOIN user ON user.id = user_id
WHERE is_completed = 1 AND center_id = 2
GROUP BY user_id) AS a;
The question I have is how do I loop through the training centers table and also append average data from similar select statement as above to each center that is queried. I cant seem to pass the center id down to the subquery so there must be a fundamentally different way to achieve the same query but also loop through training centers.
An example of desired result:
center.id | avg_completion | ...training center table
-----------------------------------------------------
20 | 2 | ...
Your main query needs to select a.center_id and then use GROUP BY center_id. You can then join it with the training_center table.
SELECT c.*, x.avg_completion
FROM training_center AS c
JOIN (
SELECT
a.center_id,
FLOOR(AVG(a.total)) AS avg_completion
FROM (
SELECT
user_id
user.center_id,
count(*) AS total
FROM lesson
JOIN user ON user.id = user_id
WHERE is_completed = 1 AND center_id = 2
GROUP BY user_id) AS a
GROUP BY a.center_id) AS x
ON x.center_id = c.id
If I understand correctly:
select u.center_id, count(*) as num_users,
sum(l.has_completed) as num_completed,
avg(l.has_completed) as completed_ratio
from lesson l join
user u
on l.user_id = u.id
group by u.center_id
I am in a very complicated problem. Let me explain you first what I am doing right now:
I have a table name feedback in which I am storing grades against course id. The table looks like this:
+-------+-------+-------+-------+-----------+--------------
| id | cid | grade |g_point| workload | easiness
+-------+-------+-------+-------+-----------+--------------
| 1 | 10 | A+ | 1 | 5 | 4
| 2 | 10 | A+ | 1 | 2 | 4
| 3 | 10 | B | 3 | 3 | 3
| 4 | 11 | B+ | 2 | 2 | 3
| 5 | 11 | A+ | 1 | 5 | 4
| 6 | 12 | B | 3 | 3 | 3
| 7 | 11 | B+ | 2 | 7 | 8
| 8 | 11 | A+ | 1 | 1 | 2
g_point has just specific values for the grades, thus I can use these values to show the user courses sorted by grades.
Okay, now first my task is to print out the grade of each course. The grade can be calculated by the maximum occurrence against each course. For example from this table we can see the result of cid = 10 will be A+, because it is present two times there. This is simple. I have already implemented this query which I will write here in the end.
The main problem is when we talk about the course cid = 11 which has two different grades. Now in that situation client asks me to take the average of workload and easiness of both these courses and whichever course has the greater average should be shown. The average would be computed like this:
all workload values of the grade against course
+ all easiness values of the grade against course
/ 2
From this example cid = 11 has four entries,have equal number of grades against a course
B+ grade average
avgworkload(2 + 7)/2=x
avgeasiness(3 + 8)/2 = y
answer x+y/2 = 10
A+ grade average
avgworkload(5 + 1)/2=x
avgeasiness(4 + 2)/2 = y
answer x+y/2 = 3
so the grade should be B+.
This is the query which I am running to get the max occurrence grade
SELECT
f3.coursecodeID cid,
f3.grade_point p,
f3.grade g
FROM (
SELECT
coursecodeID,
MAX(mode_qty) mode_qty
FROM (
SELECT
coursecodeID,
COUNT(grade_point) mode_qty
FROM feedback
GROUP BY
coursecodeID, grade_point
) f1
GROUP BY coursecodeID
) f2
INNER JOIN (
SELECT
coursecodeID,
grade_point,
grade,
COUNT(grade_point) mode_qty
FROM feedback
GROUP BY
coursecodeID, grade_point
) f3
ON
f2.coursecodeID = f3.coursecodeID AND
f2.mode_qty = f3.mode_qty
GROUP BY f3.coursecodeID
ORDER BY f3.grade_point
Here is SQL Fiddle.
I added a table Courses with the list of all course IDs, to make the main idea of the query easier to see. Most likely you have it in the real database. If not, you can generate it on the fly from feedback by grouping by cid.
For each cid we need to find the grade. Group feedback by cid, grade to get a list of all grades for the cid. We need to pick only one grade for a cid, so we use LIMIT 1. To determine which grade to pick we order them. First, by occurrence - simple COUNT. Second, by the average score. Finally, if there are several grades than have same occurrence and same average score, then pick the grade with the smallest g_point. You can adjust the rules by tweaking the ORDER BY clause.
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
FROM courses
ORDER BY courses.cid
result set
cid CourseGrade
10 A+
11 B+
12 B
UPDATE
MySQL doesn't have lateral joins, so one possible way to get the second column g_point is to repeat the correlated sub-query. SQL Fiddle
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
,(
SELECT feedback.g_point
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGPoint
FROM courses
ORDER BY CourseGPoint
result set
cid CourseGrade CourseGPoint
10 A+ 1
11 B+ 2
12 B 3
Update 2 Added average score into ORDER BY SQL Fiddle
SELECT
courses.cid
,(
SELECT feedback.grade
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGrade
,(
SELECT feedback.g_point
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS CourseGPoint
,(
SELECT (AVG(workload) + AVG(easiness))/2
FROM feedback
WHERE feedback.cid = courses.cid
GROUP BY
cid
,grade
ORDER BY
COUNT(*) DESC
,(AVG(workload) + AVG(easiness))/2 DESC
,g_point
LIMIT 1
) AS AvgScore
FROM courses
ORDER BY CourseGPoint, AvgScore DESC
result
cid CourseGrade CourseGPoint AvgScore
10 A+ 1 3.75
11 B+ 2 5
12 B 3 3
If I understood well you need an inner select to find the average, and a second outer select to find the maximum values of the average
select cid, grade, max(average)/2 from (
select cid, grade, avg(workload + easiness) as average
from feedback
group by cid, grade
) x group by cid, grade
This solution has been tested on your data usign sql fiddle at this link
If you change the previous query to
select cid, max(average)/2 from (
select cid, grade, avg(workload + easiness) as average
from feedback
group by cid, grade
) x group by cid
You will find the max average for each cid.
As mentioned in the comments you have to choose wich strategy use if you have more grades that meets the max average. For example if you have
+-------+-------+-------+-------+-----------+--------------
| id | cid | grade |g_point| workload | easiness
+-------+-------+-------+-------+-----------+--------------
| 1 | 10 | A+ | 1 | 5 | 4
| 2 | 10 | A+ | 1 | 2 | 4
| 3 | 10 | B | 3 | 3 | 3
| 4 | 11 | B+ | 2 | 2 | 3
| 5 | 11 | A+ | 1 | 5 | 4
| 9 | 11 | C | 1 | 3 | 6
You will have grades A+ and C soddisfing the maximum average 4.5
Is there a way to do a query that orders by a field after a certain element id. I am trying to implement pagination based on the last returned element, and want to be able to both order elements a property and return the next paged based on the last element of the previous page.
For example a user may ask for 25 elements after element with id = 10 sorted on cost.
Imagine you have:
id | name | price
1 | Fish | 5
2 | Burger | 2
3 | Veggies | 6
If we want to get after id=2 sorted by price it should return
2 | Burger | 2
1 | Fish | 5
3 | Veggies | 6
If we want to get after id=1 sorted by price it should return
1 | Fish | 5
3 | Veggies | 6
You can do:
SELECT *
FROM YOURTABLE
WHERE id >= 10
ORDER BY cost ASC
LIMIT 25;
EDIT:
According to your new information, you can do that with:
SELECT *
FROM YOURTABLE
WHERE price >= (SELECT price from Table1 WHERE id = 2)
ORDER BY price ASC
LIMIT 25;
sql fiddle demo
I've got a table containing persons gender-coded as 0 and 1. I need to select every other row as male/female. I thought I could manage this somehow by using modulo and the gender-codes 0 and 1, but I haven't managed to figure it out yet...
The result I'm looking for would look like this:
+-----+--------+-------+
| row | gender | name |
+-----+--------+-------+
| 1 | female | Lisa |
| 2 | male | Greg |
| 3 | female | Mary |
| 4 | male | John |
| 5 | female | Jenny |
+-----+--------+-------+
etc.
The alternative is to do it in PHP by merging 2 separate arrays, but I would really like it as a SQL query...
Any suggestions are appreciated!
Do two subqueries to select male and female. Use ranking function to have them enumerated.
Males:
1 | Peter
2 | John
3 | Chris
Females:
1 | Marry
2 | Christina
3 | Kate
Then multiplay ranking result by x10 and add 5 for females. So you have this:
Males:
10 | Peter
20 | John
30 | Chris
Females:
15 | Marry
25 | Christina
35 | Kate
Then do the UNION ALL and sort by new sort order/new ID.
Together it should like this (pseudo code)
SELECT
Name
FROM
(subquery for Males: RANK() AS sortOrd, Name)
UNION ALL
(subquery for Females: RANK()+1 AS SortOrd, Name)
ORDER BY SortOrd
Result should be like this:
Males and Females:
10 | Peter
15 | Marry
20 | John
25 | Christina
30 | Chris
35 | Kate
Found Emulate Row_Number() and modified a bit for your case.
set #rownum := 0;
set #pg := -1;
select p.name,
p.gender
from
(
select name,
gender,
#rownum := if(#pg = gender, #rownum+1, 1) as rn,
#pg := gender as pg
from persons
order by gender
) as p
order by p.rn, p.gender
Try on SQL Fiddle
Note: From 9.4. User-Defined Variables
As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the
results you expect, but this is not guaranteed.
I will leave it up to you do decide if you can use this. I don't use MySQL so I can't really tell you if you should be concerned or not.
Similar to Mikael's solution but without the need to order the resultset multiple times -
SELECT *
FROM (
SELECT people.*,
IF(gender=0, #mr:=#mr+1, #fr:=#fr+1) AS rank
FROM people, (SELECT #mr:=0, #fr:=0) initvars
) tmp
ORDER BY rank ASC, gender ASC;
To avoid having to order both the inner and outer selects I have used separate counters (#mr - male rank, #fr - female rank) in the inner select.
I've got a table containing persons gender-coded as 0 and 1
Then why would you make assumptions on the order of rows in the result set? Seems to me transforming the 0/1 into 'male'/'female' is far more robust:
select name, case gender when 0 then 'male' else 'female' end
from Person
SELECT alias.*, ROW_NUMBER() OVER (PARTITION BY GENDER ORDER BY GENDER) rnk
FROM TABLE_NAME
ORDER BY rnk, GENDER DESC
I am fetching all stations which belong to a station group from my database. SELECT * FROM stations WHERE station_group_id = 1.
Now, from all the fetched results, I want certain ones to appear first (e.g. the stations which have line_id = 2 to appear first). For example, if this is my stations table:
id | station_group_id | line_id
-------------------------------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
I would like the output to be:
id | station_group_id | line_id
-------------------------------
1 | 1 | 2
2 | 1 | 1
3 | 1 | 3
So that line_id = 2 is the first record in the output.
I thought about using ORDER BY, but it isn't quite an order issue, it is more a "preference" one.
So, is it possible to place some records on top of the output, based on a condition, preferably in one query? Thanks!
Try Below:
SELECT * FROM stations
WHERE station_group_id = 1
ORDER BY if(line_id in('2','X','Y','Z'),0,1)
SELECT * FROM stations WHERE station_group_id = 1 and line_id = 2
union
SELECT * FROM stations WHERE station_group_id = 1 and
line_id != 2 order by line_id asc
As you are saying, it is actually a preference, so you should either model it as an extra field on the table (e.g. ordinal, or order, or preferredOrder), or you keep sorting by line_id, and do the "special sort" in code. (find element with id=2, move to top)