How to use SELECT DISTINCT ON with MySQL and Rails - mysql

I have quite a complex query to essentially select the cheapest delivery service price per delivery service.
In order to get unique records per delivery service, I utilise the DISTINCT function in SQL. This query provides correct results:
DeliveryServicePrice.active.select('DISTINCT ON (delivery_service_id) *').order('delivery_service_id, price ASC')
(only a part of the query)
However, this query only seems to work with PostgreSQL (which I think is strange considering PostgreSQL is a lot more strict with SQL standards); it does not work with MySQL and SQLite. I receive the following error:
Mysql2::Error: You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right syntax to
use near 'ON (delivery_service_id) * FROM `delivery_service_prices`
WHERE `delivery_servi' at line 1: SELECT DISTINCT ON
(delivery_service_id) * FROM `delivery_service_prices` WHERE
`delivery_service_prices`.`active` = 1 AND (2808.0 >= min_weight AND
2808.0 <= max_weight AND 104.0 >= min_length AND 104.0 <= max_length AND 104.0 >= min_thickness AND 104.0 <= max_thickness) ORDER BY delivery_service_id, price ASC
The application I'm building is open source, so it's required to support all 3 database types.
How do I create DISTINCT ON queries for MySQL and SQLite in the Rails framework syntax?
I'm using Rails 4.1.
Resources
My previous problem for reference:
How to select unique records based on foreign key column in Rails?
File and line number for where the query is being used.
Finished answer
DeliveryServicePrice.select('delivery_service_prices.id').active.joins('LEFT OUTER JOIN delivery_service_prices t2 ON (delivery_service_prices.delivery_service_id = t2.delivery_service_id AND delivery_service_prices.price > t2.price)').where('t2.delivery_service_id IS NULL')

DISTINCT ON is a Postgres specific extension to the standard SQL DISTINCT. Neither of them is a "function", both are SQL key words - even though the parentheses required after DISTINCT ON make it look like a function.
There are a couple of techniques to rewrite this with standard-SQL, all of them more verbose, though. Since MySQL does not support window-functions row_number() is out.
Details and more possible query techniques:
Select first row in each GROUP BY group?
Fetch the row which has the Max value for a column
Rewritten with NOT EXISTS:
SELECT *
FROM delivery_service_prices d1
WHERE active = 1
AND 2808.0 BETWEEN min_weight AND max_weight
AND 104.0 BETWEEN min_length AND max_length
AND 104.0 BETWEEN min_thickness AND max_thickness
AND NOT EXISTS (
SELECT 1
FROM delivery_service_prices d2
WHERE active = 1
AND 2808.0 BETWEEN min_weight AND max_weight
AND 104.0 BETWEEN min_length AND max_length
AND 104.0 BETWEEN min_thickness AND max_thickness
AND d2.delivery_service_id = d1.delivery_service_id
AND d2.price < d1.price
AND d2.<some_unique_id> < d1.<some_unique_id> -- tiebreaker!
)
ORDER BY delivery_service_id
If there can be multiple rows with the same price for the same delivery_service_id, you need to add some unique tie-breaker to avoid multiple results per delivery_service_id. At least if you want a perfectly equivalent query. My example would select the row with the smallest <some_unique_id> from each set of dupes.
Unlike with DISTINCT ON, ORDER BY is optional here.

DeliveryServicePrice.active.select(:delivery_service_id).distinct.order('delivery_service_id, price ASC')

Related

Rewrite the SQL in version 5.7

I have a following SQL which works in mysql version 5.6 but is breaking in mysql version 5.7.x.
SELECT * FROM (SELECT * FROM photos WHERE photoable_type = 'Mobilehome'
AND photoable_id IN (SELECT id FROM mobilehomes WHERE
mobilehomes.community_id = 1) AND photos.image_file_size IS NOT NULL
AND photos.is_published IS TRUE ORDER BY photos.priority ASC) AS tmp_table
GROUP BY photoable_id
It's throwing me following error:
Expression #1 of SELECT list is not in GROUP BY clause and contains
nonaggregated column 'tmp_table.id' which is not functionally
dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
In this case or you change the sql mode for instrcut the db to work as mysql 5.6 version or you can adeguate your query to the new behavior
In this second case
If you use group by whithout aggregation function this mean that for all the column different from photoable_id you accept casual result
This mean that you could, probably, also accepted an aggregated result based greagtion function eg: on min() or max ()
assuming your tables containg col1, col2, .. the you must declare explicitally the column you need
SELECT photos.photoable_id, min(col1), min(col2),....
FROM photos
INNER JOIN mobilehomes ON mobilehomes.community_id = 1
AND photos.photoable_type = 'Mobilehome'
AND photos.photoable_id = mobilehomes.id
AND photos.image_file_size IS NOT NULL
AND photos.is_published IS TRUE
GROUP BY photos.photoable_id
ORDER BY photos.priority ASC
Looking to your code seems also that you could avoid the subquery

Delete duplicates for multiple columns in JOIN on same table

I am trying to make a delete from joined same table like this:
DELETE FROM `sp10_seo_url` AS sp1 JOIN
(
SELECT seo_url_pk, COUNT(*) AS maxc
FROM `sp10_seo_url`
GROUP BY seo_url_entity_type, seo_url_entity_id, seo_url_language_fk
HAVING maxc > 1
) AS sp2
ON sp1.seo_url_pk = sp2.seo_url_pk
However I am getting a mysql error
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'AS sp1 JOIN ( SELECT seo_url_pk, COUNT(*) AS maxc FROM `sp10_s' at line 1
And I am not sure at all where the error is. The inner query runs just fine and returns the expected set of results. The "ON" keys are properly named (same since we are talking about the same table).
I guess the idea of the query is pretty clear (clean the table of different rows have the same set of values for the three "group by" columns. Is there another way to do this?
Thanks!
you can "cheat" mysql with a double indirection (as explained here Deleting a row based on the max value):
delete from `sp10_seo_url`
where seo_url_pk in (
select seo_url_pk from (
SELECT seo_url_pk
FROM `sp10_seo_url` sp1,
(
SELECT seo_url_entity_type, seo_url_entity_id, seo_url_language_fk
FROM `sp10_seo_url`
GROUP BY seo_url_entity_type, seo_url_entity_id, seo_url_language_fk
HAVING count(*) > 1
) sp2
where sp1.seo_url_entity_type = sp2.seo_url_entity_type
and sp1.seo_url_entity_id = sp2.seo_url_entity_id
and sp1.seo_url_language_fk = sp2.seo_url_language_fk
) t
);
http://sqlfiddle.com/#!2/899ff5/1

How to set correct limit for mysql delete statement

I have a script that checks for duplicate pairs in a database and selects all entries that need to be deleted except for one.
I have this script that selects the first 100 entries that need to be deleted and works fine:
SELECT*
FROM vl_posts_testing
INNER JOIN (
SELECT max(ID) AS lastId, `post_content`,`post_title`
FROM vl_posts_testing WHERE vl_posts_testing.post_type='post'
GROUP BY `post_content`,`post_title`
HAVING count(*) > 1) duplic
ON duplic.`post_content` = vl_posts_testing.`post_content`
AND duplic.`post_title` = vl_posts_testing.`post_title`
WHERE vl_posts_testing.id < duplic.lastId
AND vl_posts_testing.post_type='post'
LIMIT 0,100
However when I try to delete this set of data using:
DELETE vl_posts_testing
FROM vl_posts_testing
INNER JOIN (
SELECT max(ID) AS lastId, `post_content`,`post_title`
FROM vl_posts_testing WHERE vl_posts_testing.post_type='post'
GROUP BY `post_content`,`post_title`
HAVING count(*) > 1) duplic
ON duplic.`post_content` = vl_posts_testing.`post_content`
AND duplic.`post_title` = vl_posts_testing.`post_title`
WHERE vl_posts_testing.id < duplic.lastId
AND vl_posts_testing.post_type='post'
LIMIT 100
I receive the fallowing error:
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near 'LIMIT 10' at line 8
The script has been constructed using this answer https://stackoverflow.com/a/6108860/1168944
Actually the script works just fine on a small amount of data without the limits set, however due to the fact that I run it on a big table (some 600k entries) I need to split this script in a routine that processes only a limited amount of data due to server limits like processor, memory etc.
Also took into consideration this example: MySQL LIMIT on DELETE statement but the result is different since no modification is executed no matter how small is the limit.
After several retries I have found a way to make it work:
DELETE vl_posts_testing
FROM vl_posts_testing
INNER JOIN (
SELECT max(ID) AS lastId, `post_content`,`post_title`
FROM vl_posts_testing WHERE vl_posts_testing.post_type='post'
GROUP BY `post_content`,`post_title`
HAVING count(*) > 1
LIMIT 0,100 ) duplic
ON duplic.`post_content` = vl_posts_testing.`post_content`
AND duplic.`post_title` = vl_posts_testing.`post_title`
WHERE vl_posts_testing.id < duplic.lastId
AND vl_posts_testing.post_type='post'
Actually what I did is set an inner limit to the first set of data and compare it to the rest of the database in order to make it work. It work but I am not sure this is the correct way ot do it.

Common Query for mysql and sqlserver to retrieve top 1 record

I've a sql query, which works perfectly in SQL Server, to arrive the last purchase rate of an item supplied by a particular supplier as :
Select Top 1 Rate
From TXPurchaseDetail
Where CompanyCode = 200
And VoucherSeries = 'INPURSCR'
And SupplierCode = 1042
And ItemCode = 1521
And voucherdate <= '2011/05/25'
Order By voucherdate desc, vouchernumber desc ;
Now, I'm converting the whole application which will work in both sql server and mysql. In mysql the 'TOP' parameter isn't working.
How to make it to work on both database servers ? Any help please.
Regards
Vaishu
In mysql you use LIMIT. This query would then be written as:
Select Rate
From TXPurchaseDetail
Where CompanyCode = 200
And VoucherSeries = 'INPURSCR'
And SupplierCode = 1042
And ItemCode = 1521
And voucherdate <= '2011/05/25'
Order By voucherdate desc, vouchernumber desc LIMIT 1;
Note the LIMIT 1 in the end.
So you cannot have a single query for both MSSQL SERVER and MySQL, as both use different syntax for limiting the result set to only 1 row.
If "Top 1" just means return the first row, then "LIMIT 1" is what you're after in MySQL.
Since the two platforms use different syntax for limiting rows, I suggest you consider implementing a stored procedure in each case, now the stored procedure code can use the appropriate syntax, and the application just calls the stored procedure - it doesn't have to care whether it's calling MySQL or SQL Server. This is one of the primary benefits of using stored procedures rather than riddling application code with ad hoc SQL.

MySQL to Oracle Syntax Error (Limit / Offset / Update)

I have a MySQL query that works on my current MySQL database. I've been forced to move over to oracle, so I'm trying to port all my stored procedures / programs to use the Oracle SQL Syntax. I'm having a lot of trouble on one particular query. Here is the MySQL query. It updates a table using a subquery.
update table1 alf
set nextcontractid =
(
select
contractid from table1copy alf2
where
alf2.assetid = alf.AssetID
and
alf2.lasttradedate > alf.LastTradeDate
order by lasttradedate asc limit 1
)
where complete = 0
In oracle, I can't use the the limit command, so I've looked for the workaround. Here is my oracle query. (which doesn't work.)
update table1 alf
set nextcontractid =
(select contractid from
(
SELECT contractid, rownum as row_number
FROM table1copy alf2
WHERE alf2.assetid = alf.assetid
AND alf2.lasttradedate > alf.lasttradedate
ORDER BY lasttradedate ASC
)
where row_number = 1)
where alf.complete = 0
I get the following error:
Error at Command Line:8 Column:29
Error report:
SQL Error: ORA-00904: "ALF"."LASTTRADEDATE": invalid identifier
00904. 00000 - "%s: invalid identifier"
line 8 is:
AND alf2.lasttradedate > alf.lasttradedate
Removing the update statement and putting in some dummy values into the subquery yields the correct results for the subquery:
(select contractid from
(
SELECT contractid, rownum as row_number
FROM asset_list_futures_copy alf2
WHERE alf2.assetid = 'GOLD'
AND alf2.lasttradedate > '20110101'
ORDER BY lasttradedate ASC
)
where row_number = 1)
Looking at the error, it looks like the second reference to alf isn't working. Any idea how I can change my query so that it works in oracle?
Seems the parser does not like that, despite the fact it is sintacticaly correct. Probably the two imbricated and ordered clause is blinding him somehow. I reproduced that.
You can use an analytical function:
update table1 alf
set nextcontractid =
(SELECT min(contractid) keep (dense_rank first order by lasttradedate asc)
FROM table1copy alf2
WHERE alf2.assetid = alf.assetid
AND alf2.lasttradedate > alf.lasttradedate
)
where alf.complete = 0
You can use WHERE rownum = 1, or rownum BETWEEN x AND y in cases that you want more results.