How to use GROUP BY in MySQL subquery - mysql

I'm using phpMyAdmin for submitting queries. When using GROUP BY in subquery the whole application just hangs without errors until I restart the browser.
I have three tables: files stores information about uploaded files, file_category defines the available categories for files and file_category_r stores relations between files and categories.
I want to count how many files each category has, but some files can have multiple entries in the files table, so I need to group them by files.filename.
I tried two different approaches, both resulting in a hang:
SELECT
fc.*,
(SELECT COUNT(*) FROM file_category_r
WHERE file_category_r.category_id = fc.id
AND file_category_r.file_id IN
(SELECT f2.id FROM
(SELECT * FROM files f3 GROUP BY f3.filename) f2
WHERE f2.mandant_id = 1)
) as file_count
FROM file_category fc ORDER BY name ASC
or
SELECT
fc.*,
(SELECT COUNT(*) FROM file_category_r
WHERE file_category_r.category_id = fc.id
AND file_category_r.file_id IN
(SELECT id FROM files WHERE mandant_id = 1 GROUP BY filename)
) as file_count
FROM file_category fc ORDER BY name ASC
I don't see a problem with my queries, running the subquery alone works ok. Even removing the GROUP BY return the result, but the result is wrong, because it's counting duplicate values.
Here is the table schema:
CREATE TABLE IF NOT EXISTS `files` (
`id` bigint(20) unsigned NOT NULL,
`project_id` bigint(20) unsigned DEFAULT NULL,
`customer_id` bigint(20) unsigned DEFAULT NULL,
`opportunity_id` int(11) DEFAULT NULL,
`task_id` bigint(20) unsigned DEFAULT NULL,
`calendar_event_id` bigint(20) unsigned DEFAULT NULL,
`mandant_id` tinyint(4) DEFAULT NULL,
`time` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`size` float NOT NULL,
`mime_type` varchar(100) NOT NULL,
`filename` text NOT NULL,
`file` longblob NOT NULL,
`folder_id` int(11) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`is_public` tinyint(1) unsigned NOT NULL DEFAULT '0',
`description` text,
`file_link` varchar(500) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=104832 ;
CREATE TABLE IF NOT EXISTS `file_category` (
`id` int(11) NOT NULL,
`name` varchar(200) NOT NULL,
`parent` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=445 ;
CREATE TABLE IF NOT EXISTS `file_category_r` (
`id` bigint(20) unsigned NOT NULL,
`file_id` bigint(20) unsigned NOT NULL,
`category_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=300346 ;
What am I doing wrong? The tables are quite big, is it possible the request is too heavy? I'm out of ideas, please help! Thanks!

select fc.name, count(*)
from file_category fc
inner join file_category_r fcr on fc.id = fcr.category_id
group by fc.name
Not quite sure about that "some files can have multiple entries in the files table, so I need to group them by files.filename", though.
Maybe you need something like
select fc.name, count(distinct f.filename)
from file_category fc
inner join file_category_r fcr on fc.id = fcr.category_id
inner join files f on fcr.file_id = f.id
group by fc.name

Often, the use of in can result in inefficient query plans. You can try exists instead:
SELECT fc.*,
(SELECT COUNT(*)
FROM file_category_r fcr
WHERE fcr.category_id = fc.id AND
exists (select 1 from files f where f.mandant_id = 1 and fcr.file_id = f.id)
) as file_count
FROM file_category fc
ORDER BY name ASC;
Now, you should add indexes. Start with file_category_r(category_id, file_id) and files(id, mandant_id).

I use heidisql, not phpmyadmin and your query works fine here. Maybe phpmyadmin has got problems to parse your query.
edit: also, there is a limit for query length. if your "in"-statement is to long, mysql will return an error that phpmyadmin should return.
but if phpmyadmin hangs, i'd try to execute your query my mysqlc or another mysql client like heidisql.

Related

mySQL Left Join 5 Tables?

Thanks in advance for any help. I am working with 5 tables in a mySQL database. The system is such that I have a top level table called "owners" (clients) that have local business (shops). These owners go out and create accounts at websites like yelp (citation_sources) and as such have login credential (citation_login). Once they have an account at a citation source, they add shops to the directory.
I am hoping to create one query that would select ALL of the citation sources, regardless of if an owner has an account or not, and loop through the recordset, showing login for each citation source they have an account with, as well as any shop listings.
My question pertains to doing a left join on 5 tables. I left out most fo the fields but have set up primary and foreign keys Is the sequence of the join important, ie. start with one particular table, ending with another?
I tried this command but it only brings back 33 rows when in fact there are 96 citation_sources.
I think I figured it out. I created a new table called "citation_shop" with a composite primary key - citation - shop. I then ran a query and it got me the results I was after. I ended up putting a condition in the first left join.
SELECT citation_sources.name, citation_shop.shop from citation_sources left join citation_shop on citation_sources.id = citation_shop.citation and citation_shop.shop in (6,7) left join shops on citation_shop.shop = shops.id group by citation_sources.name, citation_shop.shop limit 100
CREATE TABLE `citation_shop` (
`shop` smallint(5) UNSIGNED NOT NULL,
`citation` smallint(6) UNSIGNED NOT NULL,
`url` text NOT NULL,
`count` smallint(3) UNSIGNED NOT NULL,
`status` tinyint(1) UNSIGNED NOT NULL,
`sort` tinyint(3) UNSIGNED NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `citation_shop`
--
ALTER TABLE `citation_shop`
ADD PRIMARY KEY (`citation`,`shop`);
select owners.id as owner_id, shops.id as shop_id, citation_sources.name, citation_shop_urls.url, citation_logins.password
from owners
inner join shops on owners.id = shops.owner_id
left join citation_logins on owners.id = citation_logins.owner
left join citation_sources on citation_logins.c_source = citation_sources.id
left join citation_shop_urls on citation_sources.id = citation_shop_urls.citation_id
where owners.id = 3
group by citation_sources.name
Here are my tables in order of what I think is relevlance:
CREATE TABLE `owners` (
`id` smallint(6) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
CREATE TABLE `shops` (
`id` smallint(5) UNSIGNED NOT NULL,
`title` varchar(50) DEFAULT '',
`owner_id` smallint(5) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `citation_sources` (
`id` smallint(6) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `citation_shop_urls` (
`shop` smallint(5) UNSIGNED NOT NULL DEFAULT '0',
`citation_id` tinyint(5) UNSIGNED NOT NULL DEFAULT '0',
`owner` smallint(6) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `citation_logins` (
`c_source` smallint(5) UNSIGNED NOT NULL DEFAULT '0',
`owner` smallint(6) NOT NULL,
`user_name` text NOT NULL,
`password` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
In a LEFT JOIN, the first table is the one where you get all the rows, even if they don't have a match in other tables. So if you want all citation_sources, even those not associated with any owner, then citation_sources should be the table on the left of the LEFT JOIN.
To filter the owner information only to id = 3, put o.id = 3 in the ON clause that joins with owners. Then use a WHERE clause to remove all the other rows.
SELECT o.id as owner_id, s.id as shop_id, cs.name, u.url, cl.password
FROM citation_sources AS cs
LEFT JOIN citation_shop_urls AS u ON u.citation_id = cs.id
LEFT JOIN citation_logins AS cl ON cs.id = cl.c_source
LEFT JOIN owners AS o ON o.id = cl.owner AND o.id = 3
LEFT JOIN shops AS s ON s.owner_id = o.id
WHERE o.id IS NULL OR o.id = 3

MySQL returning all matches from a table and indicating if an id is on another table

How can I return, on a select, a field that indicates that an id was found?
My goal is to return all songs(song) from a specific source(source) checking if an user(user) has it or not (user_song).
The query I made almost works. If I remove 'hasSong' (which Im trying to indicate that an user has a song or not), I can see all songs.
If I keep 'hasSong', I see all songs repeating the song for each user.
QUERY:
SELECT DISTINCT(song.id) AS id_song, CONCAT(song.article, ' ', song.name) AS name
FROM `song`
LEFT JOIN `user_song` ON `song`.`id` = `user_song`.`id_song`
LEFT JOIN `user` ON `user`.`id` = `user_song`.`id_user`
JOIN `song_source` ON `song`.`id` = `song_source`.`id_song`
WHERE `song_source`.`id_source` = '1'
AND ( `user_song`.`id_user` = '3' OR song.id = song_source.id_song )
ORDER BY `song`.`name` ASC
DB:
CREATE TABLE `song` (
`id` int(11) NOT NULL,
`article` varchar(10) NOT NULL,
`name` varchar(150) NOT NULL,
`shortname` varchar(150) NOT NULL,
`year` int(11) NOT NULL,
`artist` int(11) NOT NULL,
`duration` int(11) NOT NULL,
`genre` int(11) NOT NULL,
`updated` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `song_source` (
`id_song` int(11) NOT NULL,
`id_source` int(11) NOT NULL
)
CREATE TABLE `source` (
`id` int(11) NOT NULL,
`article` varchar(10) NOT NULL,
`name` varchar(150) NOT NULL,
`updated` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user` (
`id` int(11) NOT NULL,
`email` varchar(100) NOT NULL,
`password` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `user_song` (
`id_user` int(11) NOT NULL,
`id_song` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The specification isn't entirely clear, ...
To return all songs (with no repeated values of song.id) that are from a particular source (id_source='1'),
along with an indicator, a value of 0 or 1, that tells us if there's a row in user_song that matches on id_song and is related to a particular user,(id_user = '3')
something like this:
SELECT s.id AS id_song
, MAX( CONCAT(s.article,' ',s.name) ) AS name
, MAX( IF(us.id_user = '3' ,1,0) ) AS has_song
FROM `song` s
JOIN `song_source` ss
ON ss.id_song = s.id
AND ss.id_source = '1'
LEFT
JOIN `user_song` us
ON us.id_song = s.id
AND us.id_user = '3'
GROUP BY s.id
ORDER BY MAX(s.name)
There are a couple of other query patterns that will return an equivalent result. For example, we could use a correlated subquery in the SELECT list.
SELECT s.id AS id_song
, MAX( CONCAT(s.article,' ',s.name) ) AS name
, ( SELECT IF( COUNT(us.id_user) >0,1,0)
FROM `user_song` us
WHERE us.id_song = s.id
AND us.id_user = '3'
) AS has_song
FROM `song` s
JOIN `song_source` ss
ON ss.id_song = s.id
AND ss.id_source = '1'
GROUP BY s.id
ORDER BY MAX(s.name)
These queries are complicated by the fact that there are no guarantees of uniqueness in any of the tables. If we had guarantees, we could eliminate the need for a GROUP BY and aggregate functions.
Please consider adding PRIMARY and/or UNIQUE KEY constraints on the tables, to prevent duplication. The way the tables are defined, we could add multiple rows to song with the same id value. (And those could have different name values.)
(And the queries would be much simpler if we had some guarantees of uniqueness.)

MySQL 'Unrecognized Data Type' When Creating a Table

I'm creating a materialized view in MySQL to reduce server load when data is queried from a bunch of other tables(one product at a time). My simplified code is as follows:
DROP TABLE IF EXISTS `db`.`view_stock`;
CREATE TABLE IF NOT EXISTS `db`.`view_stock` (
SELECT A.title, on_order,(stock-sales) AS 'Stock' FROM
(SELECT SUM(`bought_products`.`qty`) AS 'on_order'
,`bought_products`.product_id, title FROM
`bought_products` GROUP BY product_id)
A,
(SELECT SUM(num) AS `stock`,product_id FROM plugins__stock GROUP BY
product_id)
B,
(SELECT SUM(`bought_products`.`qty`) AS `sales`
,`bought_products`.`product_id` FROM `storage__bought_products` JOIN
`plugins__orders` WHERE `bought_products`.`order_id` =
`plugins__orders`.`id` AND
((`plugins__orders`.`status` = 'paid') OR
(`plugins__orders`.`status` = 'shipped'))
GROUP BY product_id)
C
WHERE B.product_id = A.product_id AND C.product_id = A.product_id ORDER BY on_order)
When I run the query by itself it works and returns the data as expected. However, when I try to create the table in the above context, I get the following error: Unrecognized data type. (near 'A') This error is highlighted at the beginning of the query where 'A' is first mentioned (near 'A.title').
Heres a sample result:
Title on_order Stock
'Widget' 6 15
'Gadget' 3 10
I've tried using other ways to declare the table, but it seems like nothing is working. Does anyone have any ideas?
The table structure of bought_products is:
CREATE TABLE `bought_products` (
`id` bigint(20) NOT NULL,
`order_id` bigint(20) NOT NULL,
`product_id` bigint(20) NOT NULL,
`qty` int(11) NOT NULL,
`stock_count` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
)
The table structure of plugins__stock is:
CREATE TABLE `plugins__stock` (
`id` bigint(20) NOT NULL,
`product_id` bigint(20) NOT NULL,
`num` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
The table structure of plugins__orders is:
CREATE TABLE `plugins__orders` (
`id` bigint(20) NOT NULL,
`name` tinytext NOT NULL,
...
`status` enum('open','paid','shipped','deleted')
)
These are obviously shortened to keep the post length short.

Select a record from millions of records slowness

I have a standalone table, we insert it's data through a weekly job, and retrieve data in our search module.
the table has around 4 millions records (and will get bigger) when I execute the straight forward select query it take long time (around 15 second). I am using MySql DB.
Here is my table structure
CREATE TABLE `myTable` (
`myTableId` int(11) NOT NULL AUTO_INCREMENT,
`date` varchar(255) DEFAULT NULL,
`startTime` int(11) DEFAULT NULL,
`endTime` int(11) DEFAULT NULL,
`price` decimal(19,4) DEFAULT NULL,
`total` decimal(19,4) DEFAULT NULL,
`taxes` decimal(19,4) DEFAULT NULL,
`persons` int(11) NOT NULL DEFAULT '0',
`length` int(11) DEFAULT NULL,
`total` decimal(19,4) DEFAULT NULL,
`totalPerPerson` decimal(19,4) DEFAULT NULL,
`dayId` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`myTableId`)
);
When I run the following statement it take around 15 second to retrieve results.
So, how to optimize it to be faster.
SELECT
tt.testTableId,
(SELECT
totalPerPerson
FROM
myTable mt
WHERE
mt.venueId = tt.venueId
ORDER BY totalPerPerson ASC
LIMIT 1) AS minValue
FROM
testTable tt
WHERE
status is NULL;
Please note that testTable tble has around 15 records only.
This is the query:
SELECT tt.testTableId,
(SELECT mt.totalPerPerson
FROM myTable mt
WHERE mt.venueId = tt.venueId
ORDER BY mt.totalPerPerson ASC
LIMIT 1
) as minValue
FROM testTable tt
WHERE status is NULL;
For the subquery, you want an index on mytable(venueId, totalPerPerson). For the outer query, an index is unnecessary. However, if the table were larger, you would want in index on testTable(status, venueId, testTableId).
Using MIN and GROUP BY may be faster.
SELECT tt.testTableId, MIN(totalPerPerson)
FROM testTable tt
INNER JOIN mytable mt ON tt.venueId = mt.venueId
WHERE tt.status is NULL
GROUP BY tt.testTableId

Improving the MySQL Query

I have the following query which filters the row with replyAutoId=0 and then fetches the most recent record of each propertyId. Now the query takes 0.23225 sec for fetching just 5,435 from 21,369 rows and I want to improve this. All I am asking is, Is there a better way of writing this query ? Any suggestions ?
SELECT pc1.* FROM (SELECT * FROM propertyComment WHERE replyAutoId=0) as pc1
LEFT JOIN propertyComment as pc2
ON pc1.propertyId= pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc2.propertyId IS NULL
The SHOW CREATE TABLE propertyComment Output:
CREATE TABLE `propertyComment` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`propertyId` int(11) NOT NULL,
`agentId` int(11) NOT NULL,
`comment` longtext COLLATE utf8_unicode_ci NOT NULL,
`replyAutoId` int(11) NOT NULL,
`updatedDate` datetime NOT NULL,
`contactDate` date NOT NULL,
`status` enum('Y','N') COLLATE utf8_unicode_ci NOT NULL DEFAULT 'N',
`clientStatusId` int(11) NOT NULL,
`adminsId` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `propertyId` (`propertyId`),
KEY `agentId` (`agentId`),
KEY `status` (`status`),
KEY `adminsId` (`adminsId`),
KEY `replyAutoId` (`replyAutoId`)
) ENGINE=MyISAM AUTO_INCREMENT=21404 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Try to get rid of the nested query.
The following query should give the same result as your original query:
SELECT pc1.*
FROM propertyComment AS pc1
LEFT JOIN propertyComment AS pc2
ON pc1.propertyID = pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc1.replyAutoId = 0 AND pc2.propertyID IS NULL
SELECT pc1.* FROM (SELECT * WHERE replyAutoId=0) as pc1
LEFT JOIN (SELECT propertyID, updatedDate from propertyComment order by 1,2) as pc2
ON pc1.propertyId= pc2.propertyId AND pc1.updatedDate < pc2.updatedDate
WHERE pc2.propertyId IS NULL
You also don't have any indexes?
If you did on primary key, you're not joining on it, so why include it?
Why not only select the columns you're interested from B table? This will limit the number of columns you're selecting from table B. Since you're pulling everything from table A where replyAutoID = 0, it wouldn't make much sense to limit the columns there. This should speed it up little.