LIMIT showing duplicate results - mysql

I can't figure out why this is happening. I have a table with the following columns:
+-------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------+------+-----+---------+----------------+
| adid | int(11) | NO | PRI | NULL | auto_increment |
| price | float | YES | | NULL | |
| categoryid | int(11) | YES | | NULL | |
| visible | tinyint(4) | YES | MUL | NULL | |
+-------------+------------+------+-----+---------+----------------+
There are 7 records in this table that are visible and have category set as 3. I do a simple query like this:
SELECT adid FROM ads as a
WHERE categoryid = 3
and visible = 1
order by price desc
limit 0, 5
I get the following adid's returned: 1,4,3,15,7
On the next page the query is:
SELECT adid FROM ads as a
WHERE categoryid = 3
and visible = 1
order by price desc
limit 5, 5
I get: 11,15
Maybe I am up too late, but why do I get 15 twice?

For the results to be stable and consistent you need to have any unique column to participate in sorting.
In this case it might be
ORDER BY price DESC, adid

Related

HQL/MySQL for listing distincts and duplicates

I have list of 20.000+ objects. These objects have a fk to a table called title. Two tipps are considered duplicate if they are linked to the same title, and they belong to the same package(tipp_pkg_fk, this is a parameter).
I need a list of all objects, with the duplicates listed together. For example:
tippA.title.name = "One"
tippB.title.name = "Two"
tippC.title.name = "Two"
Ideally from the above I will get a list result like this: [[tippA],[tippB,tippC]]
I am not sure how to do this, I have made an attempt (first in Mysql so I can test it, then ill change it to HQL):
select tipp.tipp_id, 1 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a, title_instance_package_platform b
where a.tipp_pkg_fk= 1 and b.tipp_pkg_fk = 1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates,
title_instance_package_platform tipp
where tipp.tipp_id != duplicates.id
union all
select duplicates.id, 2 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a , title_instance_package_platform b
where a.tipp_pkg_fk = 1 and b.tipp_pkg_fk=1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates
order by sortOrder, id;
This executed for 330 seconds, then I got the message fetching in MySQL workbench, and computer started dying at that point. So the idea is that first I select all the IDs that are not duplicate, then I select all the IDS that are duplicate, and then I merge them and order them so that they appear together. I am looking for the most efficient way to do this, as I will be executing this query several times during an overnight job.
For my TIPP model, the following are part of the mapping:
static mapping = {
pkg column:'tipp_pkg_fk', index: 'tipp_idx'
title column:'tipp_ti_fk', index: 'tipp_idx'
}
+-----------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+--------------+------+-----+---------+----------------+
| tipp_id | bigint(20) | NO | PRI | NULL | auto_increment |
| tipp_version | bigint(20) | NO | | NULL | |
| tipp_pkg_fk | bigint(20) | NO | MUL | NULL | |
| tipp_plat_fk | bigint(20) | NO | MUL | NULL | |
| tipp_ti_fk | bigint(20) | NO | MUL | NULL | |
| date_created | datetime | NO | | NULL | |
| last_updated | datetime | NO | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| ti_id | bigint(20) | NO | PRI | NULL | auto_increment |
| ti_version | bigint(20) | NO | | NULL | |
| date_created | datetime | NO | | NULL | |
| ti_imp_id | varchar(255) | NO | MUL | NULL | |
| last_updated | datetime | NO | | NULL | |
| ti_title | varchar(1024) | YES | | NULL | |
| ti_key_title | varchar(1024) | YES | | NULL | |
| ti_norm_title | varchar(1024) | YES | | NULL | |
| sort_title | varchar(1024) | YES | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
Update
After some changes it is working:
select tipp.tipp_id as id, 1 as sortOrder
from
title_instance_package_platform tipp
where tipp.tipp_id not in (select distinct a.tipp_id as id
from title_instance_package_platform a, title_instance_package_platform b
where a.tipp_pkg_fk= 1 and b.tipp_pkg_fk = 1 and a.tipp_ti_fk = b.tipp_ti_fk)
union all
select duplicates.id as id, 2 as sortOrder
from (select distinct a.tipp_id as id
from title_instance_package_platform a , title_instance_package_platform b
where a.tipp_pkg_fk = 1 and b.tipp_pkg_fk=1 and a.tipp_ti_fk = b.tipp_ti_fk) duplicates
order by sortOrder, id;
I still haven't got the duplicates grouped together though, instead everything comes as a list, which means I still need to group them.
Can you do your select from the other side?
select all titles and packages and list all tipps to these, only if a tipp exists (count > 0) and bundle these together to get the array you showed?
Seems like you could compute both the dups and the non-dups at the same time. Something like
SELECT ( a.tipp_ti_fk = b.tipp_ti_fk ) AS sortOrder,
a.tipp_id as id
from title_instance_package_platform a ,
title_instance_package_platform b
where a.tipp_pkg_fk = 1
and b.tipp_pkg_fk = 1
You might need a DISTINCT.
This composite index would help:
INDEX(tipp_pkg_fk, tipp_ti_fk, tipp_id)

Conditional logic in SQL query

I have a table that looks like the following:
mysql> desc mlb_lineups;
+----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| player_id | int(11) | NO | MUL | NULL | |
| team_id | int(11) | NO | | NULL | |
| game_id | int(11) | NO | MUL | NULL | |
| gamedate | date | NO | | NULL | |
| pos | int(11) | NO | | NULL | |
| is_home | int(11) | NO | | 0 | |
| is_pitcher | int(11) | YES | MUL | 0 | |
| opponent_team_id | int(11) | NO | MUL | NULL | |
| first_name | varchar(255) | YES | | NULL | |
| last_name | varchar(255) | YES | | NULL | |
| position | varchar(20) | YES | | NULL | |
| hand_throws_with | varchar(1) | YES | | NULL | |
+----------------------+--------------+------+-----+---------+----------------+
In order for me to retrieve a lineup that a team used last, let's say a team with team_id 31 in this case, I'd run the following query:
select * from mlb_lineups
where team_id = 31
and pos > -1
order by gamedate DESC,
pos ASC LIMIT 9;
That works fine and dandy. What I'm trying to do though is a bit tricky and I can't seem to piece the way the inner query and/or conditional logic would work here. I want to run a query that basically says: retrieve a lineup that a team used last where the opponent_team_id had an is_pitcher equal to 1 with a hand_throws_with equal to L. mlb_lineups table will contain at least one row where a player is_pitcher is equal to 1 and hand_throws_with is equal to L where a team has a lefty throwing on the mound.
Essentially what I'd need to do to find out what the last lineup a team_id did used when their opposing pitcher had a hand_throws_with equal to L I'd have to run a query that would figure out what the last opponent_team_id they faced is with that particular handedness and then retrieve their lineup for that game_id. Does this schema provide enough information to run a single query for that? Did I provide enough information to make my problem understandable?
Presumably you'll need to JOIN the table back to itself using the opponent_team_id and game_id fields. There are a couple of ways to do this.
Here is one method using EXISTS:
select *
from mlb_lineups ml1
where tsn_team_id = 31
and pos > -1
and exists (
select 1
from mlb_lineups ml2
where ml1.opponent_team_id = ml2.team_id
and ml2.is_pitcher = 1
and ml2.hand_throws_with = 'L'
and ml1.game_id = ml2.game_id
)
order by gamedate desc, pos
limit 9;
This method uses a standard JOIN but it may require DISTINCT, depends on the data:
select ml1.*
from mlb_lineups ml1
inner join mlb_lineups ml2 on ml1.game_id = ml2.game_id
and ml1.team_id = ml2.opponent_team_id
and ml2.is_pitcher = 1
and ml2.hand_throws_with = 'L'
where ml1.tsn_team_id = 31
and ml1.pos > -1
order by ml1.gamedate desc, ml1.pos
limit 9;

mysql average latest 5 rows

I have table:
describe tests;
+-----------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-----------+------+-----+-------------------+-----------------------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| line_id | int(11) | NO | | NULL | |
| test_time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| alarm_id | int(11) | YES | | NULL | |
| result | int(11) | NO | | NULL | |
+-----------+-----------+------+-----+-------------------+-----------------------------+
And I execute query:
SELECT avg(result) FROM tests WHERE line_id = 4 ORDER BY test_time LIMIT 5;
which I want to generate average of 5 latest results.
Still something is not ok, because query generates average of all table data.
What can be wrong?
If you want the last five rows, then you need to order by the time column in descending order:
select avg(result)
from (select result
from tests
where line_id = 4
order by test_time desc
limit 5
) t
the guy before submitted something link that
for my it works
select avg( id ) from ( select id from rand limit 5) as id;
Only one result set will be returned because of the AVG function.

Selecting latest conversations from table containing private messages

I have a table containing personal messages from one user to another.
Here is the table structure:
mysql> describe pms;
+---------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| time | datetime | NO | | NULL | |
| from | int(11) | NO | | NULL | |
| from_ip | int(11) | NO | | NULL | |
| to | int(11) | NO | | NULL | |
| message | varchar(255) | NO | | NULL | |
| read | tinyint(4) | NO | | 0 | |
+---------+--------------+------+-----+---------+----------------+
I am creating a view which shows 10 latest conversations of a particular user id. As I want to find conversations, I thought of using GROUP BY from, to. This, however, returned duplicate rows (both from this user and to this user), and I also noticed that ordering does not work as it should.
In order to be able to properly order the results and thus select the 10 latest conversations, the groups should contain the latest row of the group instead of the first.
Here is the query I tried:
SELECT *
FROM `pms`
WHERE `from` = 1
OR `to` = 1
GROUP BY `from` , `to`
ORDER BY `id` DESC
LIMIT 10
Which gives the wrong row from the group, and therefore ordering by id (or time) gives a wrong order.
Any ideas how I could get it working?
This assumes that a conversation is defined by the from-to pair in either order, and that the latest conversation has the largest id:
SELECT least(`from`, `to`), greatest(`from`, `to`)
FROM `pms`
WHERE `from` = 1 OR `to` = 1
GROUP BY least(`from`, `to`), greatest(`from`, `to`)
ORDER BY max( `id`) DESC
LIMIT 10

MYSQL : Improving query perfomance on join with order by clause

I have two tables which contains the daily activities of a user . I have two join these tables and select top ten ids from this table .
Table 1 : buildlog
+----------------+------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------------+------+-----+---------+----------------+
| NAME | varchar(50) | YES | | NULL | |
| ID | int(11) | NO | PRI | NULL | auto_increment |
| DATE_AND_TIME | datetime | YES | | NULL | |
| COMMENT | mediumtext | YES | | NULL | |
+----------------+------------------------+------+-----+---------+----------------+
Number Of Rows : 276186
Table 2 : reports
+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| r_id | int(10) | NO | PRI | NULL | auto_increment |
| id | int(15) | YES | UNI | NULL | |
| label | varchar(200) | YES | | NULL | |
+---------------+--------------+------+-----+---------+----------------+
Number Of Rows : 134058
If I am using only join query with this two tables using id it comes very quickly .
Query 1:
select buildlog.id,reports.label from buildlog join reports on reports.id = buildlog.id limit 10\G
Query Time : 10 rows in set (0.01 sec)
If I add order by to get latest ten build ids,label it takes 1 to 2 minutes to execute .
Query 2 :
select buildlog.id,reports.label from buildlog join reports on reports.id = buildlog.id order by buildlog.id desc limit 10\G
Query Time : 10 rows in set (0.98 sec)
order by column is an primary key buildlog.id . So, It's already indexed why It takes more time to execute this query ? . Can anyone suggest how can I optimize this?
SELECT * FROM (
SELECT
buildlog.id,
reports.label
FROM
buildlog
JOIN
reports
ON
reports.id = buildlog.id
) AS myval_new
ORDER BY id DESC limit 10
The slow down comes because it is probably choosing to do the ordering before doing the join. Doing the order by in an outer query forces it to only order the selected items.