Select all but max(column) with different values for max(column) - mysql

I have a table called document_versions that looks like this:
CREATE TABLE `document_versions` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`document_id` int(11) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`document` text,
`created_on` datetime NOT NULL,
`version` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
)
I want to select all but the max(version) of each document in a list of document_id . I can get all entries for a list of document_ids without any problem. The issue is when I need to put the constraint of "all but max(version)". I've been doing it like this, which obviously doesn't work properly:
SELECT * FROM document_versions WHERE document_id IN (SELECT document_id FROM documents WHERE account_id=?) AND version < (SELECT MAX(version) FROM document_versions)
Is there a way to apply the document_id constraint inside that second subselect, or am I approaching this the wrong way?

You could use:
SELECT *
FROM document_versions
WHERE (document_id, id) NOT IN (SELECT document_id, MAX(id)
FROM document_versions
WHERE account_id = ?
GROUP BY document_id)
AND account_id = ?;

Related

How to delete all rows in a group except the newest one?

Say, I have a table similar to this:
CREATE TABLE `mytable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`application_id` int(11) NOT NULL,
`company_name` varchar(100) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
I want to make application_id unique, but there're some duplicates in the table already. How can I group by application_id and remove all records per group, leaving just the one with the highest id?
delete from mytable
where id not in
(
select max(id)
from mytable
group by application_id
)

MySQL 'Unrecognized Data Type' When Creating a Table

I'm creating a materialized view in MySQL to reduce server load when data is queried from a bunch of other tables(one product at a time). My simplified code is as follows:
DROP TABLE IF EXISTS `db`.`view_stock`;
CREATE TABLE IF NOT EXISTS `db`.`view_stock` (
SELECT A.title, on_order,(stock-sales) AS 'Stock' FROM
(SELECT SUM(`bought_products`.`qty`) AS 'on_order'
,`bought_products`.product_id, title FROM
`bought_products` GROUP BY product_id)
A,
(SELECT SUM(num) AS `stock`,product_id FROM plugins__stock GROUP BY
product_id)
B,
(SELECT SUM(`bought_products`.`qty`) AS `sales`
,`bought_products`.`product_id` FROM `storage__bought_products` JOIN
`plugins__orders` WHERE `bought_products`.`order_id` =
`plugins__orders`.`id` AND
((`plugins__orders`.`status` = 'paid') OR
(`plugins__orders`.`status` = 'shipped'))
GROUP BY product_id)
C
WHERE B.product_id = A.product_id AND C.product_id = A.product_id ORDER BY on_order)
When I run the query by itself it works and returns the data as expected. However, when I try to create the table in the above context, I get the following error: Unrecognized data type. (near 'A') This error is highlighted at the beginning of the query where 'A' is first mentioned (near 'A.title').
Heres a sample result:
Title on_order Stock
'Widget' 6 15
'Gadget' 3 10
I've tried using other ways to declare the table, but it seems like nothing is working. Does anyone have any ideas?
The table structure of bought_products is:
CREATE TABLE `bought_products` (
`id` bigint(20) NOT NULL,
`order_id` bigint(20) NOT NULL,
`product_id` bigint(20) NOT NULL,
`qty` int(11) NOT NULL,
`stock_count` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
)
The table structure of plugins__stock is:
CREATE TABLE `plugins__stock` (
`id` bigint(20) NOT NULL,
`product_id` bigint(20) NOT NULL,
`num` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
The table structure of plugins__orders is:
CREATE TABLE `plugins__orders` (
`id` bigint(20) NOT NULL,
`name` tinytext NOT NULL,
...
`status` enum('open','paid','shipped','deleted')
)
These are obviously shortened to keep the post length short.

performance issue when joining two large tables

I have a multilingual CMS that uses a translation table (70k rows) that contains all of the texts
CREATE TABLE IF NOT EXISTS `translations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`key` int(11) NOT NULL,
`lang` int(11) NOT NULL,
`value` text CHARACTER SET utf8,
PRIMARY KEY (`id`),
KEY `key` (`key`,`lang`)
) ENGINE=MyISAM
and products table (4k rows) containing products with translation keys
CREATE TABLE IF NOT EXISTS `products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name_trans_id` int(11) NOT NULL,
`desc_trans_id` int(11) DEFAULT NULL,
`text_trans_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_index` (`name_trans_id`),
KEY `desc_index` (`desc_trans_id`),
KEY `text_index` (`text_trans_id`)
) ENGINE=MyISAM
now i need to get top 20 products in alphabetical order, to do that i use this query :
SELECT
SQL_CALC_FOUND_ROWS
dt_table.* ,
t_name.value as 'name'
FROM
products as dt_table
LEFT JOIN
`translations` as t_name on dt_table.name_trans_id = t_name.key
WHERE
(t_name.lang = 1 OR t_name.lang is null)
ORDER BY
name ASC LIMIT 0, 20
It takes forever.
Any help optimizing this query/tables will be appreciated.
Thank you.
Try to change your structure of translations table to:
CREATE TABLE IF NOT EXISTS `translations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`key` int(11) NOT NULL,
`lang` int(11) NOT NULL DEFAULT 0,
`value` text CHARACTER SET utf8,
PRIMARY KEY (`id`),
KEY `lang` (`lang`),
KEY `key` (`key`,`lang`),
FULLTEXT idx (`value`)
) ENGINE=InnoDB;
because you really need lang to be indexed as soon as you use it in WHERE clause.
And try to change your query a little bit:
SELECT
dt_table.* ,
t_name.value as 'name',
SUBSTR(t_name.value,0,100) as text_order
FROM
products as dt_table
LEFT JOIN (
SELECT key, value FROM `translations`
WHERE lang = 1 OR lang is null
) as t_name
ON dt_table.name_trans_id = t_name.key
ORDER BY
text_order ASC LIMIT 0, 20
and if you really need SQL_CALC_FOUND_ROWS (I don't understand why do you need counter for translations items)
you can run another query just right after the first one:
SELECT COUNT(*) FROM products;
I am pretty sure you will be surprised with performance :-)

SELECT with WHERE IN and subquery extremely slow

I want to exectute the following query:
SELECT *
FROM `bm_tracking`
WHERE `oid` IN
(SELECT `oid`
FROM `bm_tracking`
GROUP BY `oid` HAVING COUNT(*) >1)
The subquery:
SELECT `oid`
FROM `bm_tracking`
GROUP BY `oid`
HAVING COUNT( * ) >1
executes in 0.0525 secs
The whole query "stucks" (still processing after 3 minutes...). Column oid is indexed.
Table bm_tracking contains around 64k rows.
What could be the reason for this "stuck"?
[Edit: Upon request]
CREATE TABLE `bm_tracking` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`oid` varchar(10) NOT NULL,
`trk_main` varchar(50) NOT NULL,
`tracking` varchar(50) NOT NULL,
`label` text NOT NULL,
`void` int(11) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `oid` (`oid`),
KEY `trk_main` (`trk_main`),
KEY `tracking` (`tracking`),
KEY `created` (`created`)
) ENGINE=MyISAM AUTO_INCREMENT=63331 DEFAULT CHARSET=latin1
[Execution Plan]
Generally exists EXISTS faster than IN so you can try this and see if it executes better for you
SELECT *
FROM `bm_tracking` bt
WHERE EXISTS
( SELECT 1
FROM `bm_tracking` bt1
WHERE bt.oid = bt1.oid
GROUP BY `oid`
HAVING COUNT(*) >1
)
EDIT:
if you notice from the EXPLAIN you posted... the IN() is considered as a DEPENDENT SUBQUERY which is a correlated subquery... meaning that for every row in the table all rows in the table are pulled and compared... so for example 1000 rows in the table would mean 1000 * 1000 = 1 million comparisons -- thats why its taking such a long time

select count, group by and having optimization

I have this query
SELECT
t2.counter_id,
t2.hash_counter,
count(1) AS cnt
FROM
table1 t1
RIGHT JOIN
table2 t2 USING(counter_id)
WHERE
t2.hash_id = 973
GROUP BY
t1.counter_id
HAVING
cnt < 8000
Here are the tables.
CREATE TABLE `table1` (
`id` varchar(255) NOT NULL,
`platform` varchar(32) DEFAULT NULL,
`version` varchar(10) DEFAULT NULL,
`edition` varchar(2) NOT NULL DEFAULT 'us',
`counter_id` int(11) NOT NULL,
`created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `counter_id` (`counter_id`)
) ENGINE=InnoDB
CREATE TABLE `table2` (
`counter_id` int(11) NOT NULL AUTO_INCREMENT,
`hash_id` int(11) DEFAULT NULL,
`hash_counter` int(11) DEFAULT NULL,
PRIMARY KEY (`counter_id`),
UNIQUE KEY `counter_key` (`hash_id`,`hash_counter`)
) ENGINE=InnoDB
The "EXPLAIN" shows "Using index; Using temporary; Using filesort" for table t2. Is there any way to get rid off temporary/filesort ? or any other ideas about optimizing this guy.
Your comment above gives more insight into what you want. It is always better to explain more about what you are trying to achieve - just looking at the non-working SQL leads people down the wrong path.
So, you want to know which table2 rows have < 8000 table1 rows?
Why not this:
select *
from table2 as t2
where hash_id = 973
and 8000 < (select count(*) from table1 as t1 where t1.counter_id = t2.counter_id)
;