How to distribute records based on column

How to distribute records based on column - mysql

I have a table storing competition entries.
Contestants must enter text and can optionally upload a photo
When I display entries on a page I paginate through 9 at a time.
How can I ensure as much as possible that each page contains at least one entry with a photo in it (presuming there is enough photo entries for one per page)? It would probably be sufficient to distribute entries with photos evenly amongst the pages

This was one of the more challenging questions I've seen recently -- thanks for that! I'm not able to get it working using a single SQL statement, but I was able to get it working (at least it appears) like this. Basically it tries to determine how many results will be returned, then how many of those have photos, and uses a percentage of the photos divided by the number of pages (using CEILING to ensure at least one for the first few pages).
Anyhow, here goes:
SET #page = 1;
SET #resultsPerPage = 9;
SELECT #recCount:= COUNT(Id) as RecCount
FROM Entries;
SELECT #photoCount:= COUNT(Photo) as PhotoCount
FROM Entries
WHERE Photo IS NOT NULL;
SET #pageCount = CEILING(#recCount/#resultsPerPage);
SET #photosPerPage = CEILING(#photoCount/#pageCount);
SET #nonPhotosPerPage = #resultsPerPage - CEILING(#photosPerPage);
SELECT *
FROM (
SELECT *,
#rownum := #rownum + 1 row_number
FROM Entries JOIN (SELECT #rownum := 0) r
WHERE Photo IS NOT NULL
) a
WHERE a.row_number > (#photosPerPage*(#page-1))
and a.row_number <= (#photosPerPage*(#page))
UNION
SELECT *
FROM (
SELECT *,
#rownum2 := #rownum2 + 1 row_number
FROM Entries JOIN (SELECT #rownum2 := 0) r
WHERE Photo IS NULL
) b
WHERE b.row_number > (#nonPhotosPerPage*(#page-1))
and b.row_number <= (#nonPhotosPerPage*(#page))
And the SQL Fiddle.
Best of luck!

I would suggest that you randomly order the rows:
order by rand()
This doesn't guarantee a photo on every page, but it helps.
The alternative is to do something like this:
select *, #seqnum:=#seqnum+1
from t
where nophoto
select *, #seqnum:=#seqnum+8
from t
where photo
Then sort by seqnum. What makes this cumbersome is handling the cases where there is fewer than one photo per page and more than one photo. The random method is probably sufficient.

For each page, do this (eg for page 3, page size 10):
select ...
from ...
where has_photo
order by created
limit 3, 1
union
select ...
from ...
where not has_photo
order by created
limit 27, 9
This query breaks up the two types of rows into two separate queries recombined by union.

Related

how to number a resultset from MySQL (another option to not use user variables)

For example a create this query to get all the users and their posts asociated
SELECT users.nameUser, posts.namePost
FROM users
JOIN posts ON users.id = posts.user_id;
It works fine, but I need to number the results; so I declared a user variable in this way
SET #counter = 0;
Now I use it
SELECT (#counter := #counter + 1) AS NP, users.nameUser, posts.namePost
FROM users
JOIN posts ON users.id = posts.user_id;
But now with MySQL 8 and window functions I need to use another way and avoid user variables
How I can achieve this?
Remember...
If you need your variable increments its value you must to use the
:= sintax otherwise it keeps the same value no matter the size of
the resultset

Since MySQL 8 supports Window Functions, you can use: row_number()
Whats does row_number() do?
It starts a counting since 1 and increments it on one plus one
row_number() needs to work with OVER clausule to make an asc o desc order over an specific column (in this case nameUser)
My code should looks like this
SELECT row_number() OVER(ORDER BY users.nameUser) AS NP, users.nameUser, posts.namePost
FROM users
JOIN posts ON users.id = posts.user_id;

Rails select top n records per group (memory leak)

I have this method that using find_by_sql which is return 10 latest records of each source
def latest_results
Entry.find_by_sql(["
select x.id,x.created_at,x.updated_at,x.source_id,x.`data`,x.`uuid`,x.source_entry_id
from
(select t.*,
(#num:=if(#group = `source_id`, #num +1, if(#group := `source_id`, 1, 1))) row_number
from (
select d.id,d.created_at,d.updated_at,d.source_id,d.`data`,d.`uuid`,d.source_entry_id
from `streams` a
JOIN `stream_filters` b
on b.stream_id=a.id
JOIN `filter_results` c
on c.filter_id=b.id
JOIN `entries` d
on d.id=c.entry_id
where a.id=?
) t
order by `source_id`,created_at desc
) as x
where x.row_number <= 10
ORDER BY x.created_at DESC
",self.id])
end
It's working properly on local environment with limited records.
I have t2.micro which has 2 Gib memory to serving the application. Now this query running out my whole memory and app get frizzing.
any suggestion how can I do it better ? I want to solve this without increasing the size of machine.

I had a similar problem once. The solution with mysql variables seems neat at the first place, though it is hard to optimize. It seems that is doing a full table scan in your case.
I would recommend to fetch the sources you want to display first. And then run a second query with multiple top 10 selects, one per source, all combined with a union.
The union top 10 select will have some repetive statements which you can easily autogenerate with ruby.
# pseudo code
sources = Entry.group(:source).limit(n)
sql = sources.map do |source|
"select * from entries where source = #{source} order by created_at limit 10"
end.join("\nunion all\n")
Entry.find_by_sql(sql)

Getting previous row in MySQL

I'm stucked in a MySQL problem that I was not able to find a solution yet. I have the following query that brings to me the month-year and the number new users of each period in my platform:
select
u.period ,
u.count_new as new_users
from
(select DATE_FORMAT(u.registration_date,'%Y-%m') as period, count(distinct u.id) as count_new from users u group by DATE_FORMAT(u.registration_date,'%Y-%m')) u
order by period desc;
The result is the table:
period,new_users
2016-10,103699
2016-09,149001
2016-08,169841
2016-07,150672
2016-06,148920
2016-05,160206
2016-04,147715
2016-03,173394
2016-02,157743
2016-01,173013
So, I need to calculate for each month-year the difference between the period and the last month-year. I need a result table like this:
period,new_users
2016-10,calculate(103699 - 149001)
2016-09,calculate(149001- 169841)
2016-08,calculate(169841- 150672)
2016-07,So on...
2016-06,...
2016-05,...
2016-04,...
2016-03,...
2016-02,...
2016-01,...
Any ideas: =/
Thankss

You should be able to use a similar approach as I posted in another S/O question. You are on a good track to start. You have your inner query get the counts and have it ordered in the final direction you need. By using inline mysql variables, you can have a holding column of the previous record's value, then use that as computation base for the next result, then set the variable to the new balance to be used for each subsequent cycle.
The JOIN to the SqlVars alias does not have any "ON" condition as the SqlVars would only return a single row anyhow and would not result in any Cartesian product.
select
u.period,
if( #prevCount = -1, 0, u.count_new - #prevCount ) as new_users,
#prevCount := new_users as HoldColumnForNextCycle
from
( select
DATE_FORMAT(u.registration_date,'%Y-%m') as period,
count(distinct u.id) as count_new
from
users u
group by
DATE_FORMAT(u.registration_date,'%Y-%m') ) u
JOIN ( select #prevCount := -1 ) as SqlVars
order by
u.period desc;
You may have to play with it a little as there is no "starting" point in counts, so the first entry in either sorted direction may look strange. I am starting the "#prevCount" variable as -1. So the first record processed gets a new user count of 0 into the "new_users" column. THEN, whatever was the distinct new user count was for the record, I then assign back to the #prevCount as the basis for all subsequent records being processed. yes, it is an extra column in the result set that can be ignored, but is needed. Again, it is just a per-line place-holder and you can see in the result query how it gets its value as each line progresses...

I would create a temp table with two columns and then fill it using a cursor that
does something like this (don't remember the exact syntax - so this is just a pseudo-code):
#val = CURSOR.col2 - (select col2 from OriginalTable t2 where (t2.Period = (CURSOR.Period-1) )))
INSERT tmpTable (Period, NewUsers) Values ( CURSOR.Period, #val)

improve mysql query performance, full text search

I have two tables - posts and comments. Using MySQL full-text search I need to find posts and comments that match some entered text. For an each post I need to find also 0-3(3 can be changed to 5 or more) comments. Also, I'd like to have a pagination by posts.
Right now I have a following query:
SELECT score_comment + score_post AS score_sum, temp.* FROM (
SELECT t_1.*,
#num := IF(#post_id = post_id, #num + 1, 1) AS row_number,
#post_id := post_id AS dummy
FROM (SELECT * FROM (SELECT MATCH (comm.content) AGAINST ("standart server web detected") AS score_comment, paged_post.*,
comm.comment_id, comm.last_edited AS c_last_edited, comm.content FROM comments AS comm
INNER JOIN
(SELECT *,
MATCH (description) AGAINST ("standart server web detected") AS score_post
FROM posts
WHERE MATCH (description) AGAINST ("standart server web detected") > 0
LIMIT 0, 10) AS paged_post
ON comm.post_id = paged_post.post_id
WHERE MATCH (comm.content) AGAINST ("standart server web detected") > 0) AS comm_view ORDER BY post_id, c_last_edited DESC) AS t_1
) AS temp WHERE temp.row_number <=3
ORDER BY score_comment + score_post DESC;
Looks like this query is doing a right job but I'm worried about performance. Right now on my home machine for 10k posts and 500k comments it works approximately 2.1 -2.2 seconds
Is there any way to improve this query in order to decrease the execution time ?

MySQL user variable addition: '+1' return '+2'

I'm coding a movie search engine called Movieovo, I want to select the top 5 rows (movie links) from each group (movie title), but I met a problem in this SQL query:
SELECT link_movie_id, link_id,
#num := if(#link_movie_id = link_movie_id, #num + 1, 1) as row_number,
#link_movie_id := link_movie_id as dummy
FROM link GROUP BY link_movie_id, link_id HAVING row_number <= 5 LIMIT 30
Result: ( Too many characters so I upload as an image )
http://i.imgur.com/phFzUF1.png
You can see "row_number" does not +1 each time
I tried directly in MySQL command line shows me the same results, can anyone help me? I have already wasted 5 hours in this problem..

You are incrementing link_movie_id by 1, but you are grouping by link_movie_id and link_id. With the effect, for some link_movie_id, there are multiple values of link_id in the result. If you are want to get what you are looking for, either group by a single field, or increment it based on a concatenated combination of link_movie_id and link_id.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008