Which of the following queries style is better for performance?
Basically, I'm returning many related records into one row with GROUP_CONCAT and I need to filter by another join on the GROUP_CONCAT value, and I will need to add many more either joins/group_concats/havings or sub queries in order to filter by more related values. I saw that, officially, LEFT JOIN was faster, but I wonder if the GROUP_CONCAT and HAVING through that off.
(This is a very simplified example, the actual data has many more attributes and it's reading from a Drupal MySQL architecture)
Thanks!
Main Records
+----+-----------------+----------------+-----------+-----------+
| id | other_record_id | value | type | attribute |
+----+-----------------+----------------+-----------+-----------+
| 1 | 0 | Red Building | building | |
| 2 | 1 | ACME Plumbing | attribute | company |
| 3 | 1 | east_side | attribute | location |
| 4 | 0 | Green Building | building | |
| 5 | 4 | AJAX Heating | attribute | company |
| 6 | 4 | west_side | attribute | location |
| 7 | 0 | Blue Building | building | |
| 8 | 7 | ZZZ Mattresses | attribute | company |
| 9 | 7 | south_side | attribute | location |
+----+-----------------+----------------+-----------+-----------+
location_transaltions
+-------------+------------+
| location_id | value |
+-------------+------------+
| 1 | east_side |
| 2 | west_side |
| 3 | south_side |
+-------------+------------+
locations
+----+--------------------+
| id | name |
+----+--------------------+
| 1 | Arts District |
| 2 | Warehouse District |
| 3 | Suburb |
+----+--------------------+
Query #1
SELECT
a.id,
GROUP_CONCAT(
IF(b.attribute = 'company', b.value, NULL)
) AS company_value,
GROUP_CONCAT(
IF(b.attribute = 'location', b.value, NULL)
) AS location_value,
GROUP_CONCAT(
IF(b.attribute = 'location', lt.location_id, NULL)
) AS location_id
FROM
records a
LEFT JOIN records b ON b.other_record_id = a.id AND b.type = 'attribute'
LEFT JOIN location_translations lt ON lt.value = b.value
WHERE a.type = 'building'
GROUP BY a.id
HAVING location_id = 2
Query #2
SELECT temp.* FROM (
SELECT
a.id,
GROUP_CONCAT(
IF(b.attribute = 'company', b.value, NULL)
) AS company_value,
GROUP_CONCAT(
IF(b.attribute = 'location', b.value, NULL)
) AS location_value
FROM
records a
LEFT JOIN records b ON b.other_record_id = a.id AND b.type = 'attribute'
WHERE a.type = 'building'
GROUP BY a.id
) as temp
LEFT JOIN location_translations lt ON lt.value = temp.location_value
WHERE location_id = 2
Using JOIN is preferable in most cases, because it helps optimizer to understand which indexes he can to use. In your case, query #1 looks good enough.
Of course, it works only if tables has indexes. Check table records has indexes on id, other_record_id, value and type columns, table location_translations on value
Related
I got working code from three queries but I would like to combine them into one or two. Basically I am checking if a provided phone number exists in table contacts or leads as well as if it exists as a secondary number in customfieldsvalues (not all leads have a customfield value though). I am using a CRM system based on CodeIgniter.
What I want to do (non-correct/hypothetical query):
SELECT * FROM contacts OR leads WHERE phonenumber = replace(X, '-', '')
OR leads.id = customvaluefields.relid AND cfields.fieldid = 41 AND cfields.value = X
Tables
table : contacts
+-------+----------------+----------------+
| id | firstname | phonenumber |
+-------+----------------+----------------+
| 1 | John | 214-444-1234 |
| 2 | Mary | 555-111-1234 |
+-------+----------------+----------------+
table : leads
+-------+-----------+---------------------+
| id | name | phonenumber |
+-------+-----------+---------------------+
| 1 | John | 214-444-1234 |
| 2 | Mary | 555-111-1234 |
+-------+-----------+---------------------+
table : customvaluefields
+-------+-----------+-------------+-----------+
| id | relid | fieldid | value |
+-------+-----------+-------------+-----------+
| 1 | 1 | 41 | 222333444 |
| 2 | 1 | 20 | Management|
| 3 | 2 | 41 | 333444555 |
+-------+-----------+-------------+-----------+
If I understand what you are trying to, maybe UNION ALL would work. This is something to get you started:
SELECT C.ID, C.FirstName, C.Phonenumber
FROM Contacts C
JOIN CustomValueField CVF
ON c.ID = CVF.RelID AND
CVF.ID = 41
AND REPLACE(Phonenumber,'-','') = cvf.Value
UNION ALL
SELECT L.ID, L.FirstName, L.Phonenumber
FROM Leads L
JOIN CustomValueField CVF
ON L.ID = CVF.RelID AND
CVF.ID = 41
AND REPLACE(Phonenumber,'-','') = cvf.Value
I'm joining the contacts and leads tables to CustomeValueField in each query and then UNION them together along with the WHERE clause in each. I'm sure it's not 100% correct for what you need, but should get you headed to a solution. Here is more information: https://dev.mysql.com/doc/refman/8.0/en/union.html
I have this query, it joins two tables and give me results of all the data under one a condition CATID is
'videography'
SELECT
pm_categories_images.Image,
pm_categories_images.FileURL,
pm_categories.catname,
pm_categories.`status`,
pm_categories.sortorder,
pm_categories.parentID,
pm_categories_images.CatID
FROM
pm_categories
LEFT JOIN pm_categories_images ON pm_categories_images.CatID = pm_categories.catID
where pm_categories_images.CatID IN (select catid from pm_categories where
parentID = (select catID from pm_categories where catname = 'Videography'))
Now this videography has a results like this
http://prntscr.com/gpkuyl
now i want to get 1 record for every catname
Without a MCVE and actual requirements on which image you want from the images table and a better understanding of why you need a left join when your where clause makes it behave like an inner... and why the where clause is so complex... ...I'm really unsure what the question is after... Here's a shot... and a DEMO:http://rextester.com/CRBN50943
Sample data expected results always a plus: I made my own and several assumptions
I interperted the question as: I would like a list of the categories along with a image having the earliest alphabetic value for each category.
SELECT
CI.Image,
CI.FileURL,
C.catname,
C.`status`,
C.sortorder,
C.parentID,
CI.CatID
FROM pm_categories C
INNER JOIN pm_categories_images CI
ON CI.CatID = C.catID
INNER JOIN (SELECT Min(Image) MI, catID FROM pm_categories_images group by CATID) Z
on CI.Image = Z.MI
and CI.CatID = Z.CatId
##WHERE C.catname = 'Videography'
Order by sortOrder
Giving us
+----+------------+-----------------------------------------------+-------------+--------+-----------+----------+-------+
| | Image | FileURL | catname | status | sortorder | parentID | CatID |
+----+------------+-----------------------------------------------+-------------+--------+-----------+----------+-------+
| 1 | guid1.jpg | https://drive.google.com/BusinessID/Postings/ | Real Estate | 1 | 1 | NULL | 1 |
| 2 | guid4.jpg | https://drive.google.com/BusinessID/Postings/ | commercial | 1 | 2 | NULL | 2 |
| 3 | guid6.jpg | https://drive.google.com/BusinessID/Postings/ | Videography | 1 | 3 | NULL | 3 |
| 4 | guid10.jpg | https://drive.google.com/BusinessID/Postings/ | Other | 1 | 4 | NULL | 4 |
| 5 | guid11.jpg | https://drive.google.com/BusinessID/Postings/ | LackingMCVE | 1 | 5 | NULL | 5 |
+----+------------+-----------------------------------------------+-------------+--------+-----------+----------+-------+
I am trying to fetch all the categories and their count (no of products in that category) of those products where keyword matches. The query I tried doesn't give me the correct result.
Also I want the parent categories till level 1 and their count as well.
e.g. I am trying with keyword watch, then category "watches" should be there with some count. Also the parent category "accessories" with the sum of its descendant categories count.
my table structures are:
tblProducts: There are 5 categories of a product, fldCategoryId1, fldCategoryId2, fldCategoryId3, fldCategoryId4 and fldCategoryId5. fldProductStatus should be 'A'
+-----------------------------+-------------------+
| Field | Type |
+-----------------------------+-------------------+
| fldUniqueId | bigint(20) |
| fldCategoryId1 | bigint(20) |
| fldCategoryId2 | bigint(20) |
| fldCategoryId3 | bigint(20) |
| fldCategoryId4 | bigint(20) |
| fldCategoryId5 | bigint(20) |
| fldProductStatus | enum('A','P','D') |
| fldForSearch | longtext |
+-----------------------------+-------------------+
tblCategory:
+------------------------------+-----------------------+
| Field | Type |
+------------------------------+-----------------------+
| fldCategoryId | bigint(20) |
| fldCategoryName | varchar(128) |
| fldCategoryParent | int(11) |
| fldCategoryLevel | enum('0','1','2','3') |
| fldCategoryActive | enum('Y','N') |
+------------------------------+-----------------------+
Search Query:
SELECT count( c.fldCategoryId ) AS cnt, c.fldCategoryLevel, c.fldCategoryParent, c.fldCategoryId, c.fldCategoryName, p.fldForSearch, c.fldCategoryParent
FROM tblCategory c, tblProducts p
WHERE (
c.fldCategoryId = p.fldCategoryId1
OR c.fldCategoryId = p.fldCategoryId2
OR c.fldCategoryId = p.fldCategoryId3
OR c.fldCategoryId = p.fldCategoryId4
OR c.fldCategoryId = p.fldCategoryId5
)
AND p.fldProductStatus = 'A'
AND (
MATCH ( p.fldForSearch )
AGAINST (
'+(watches watch)'
IN BOOLEAN MODE
)
)
GROUP BY c.fldCategoryId
Note: The table is in the InnoDB engine and have FULLTEXT search index on 'fldForSearch' column.
EDIT: sample data can be found in sqlfiddle
I'm not sure what you mean by:
Also I want the parent categories till level 1 and their count as well.
But the following query will show you a count for each category (including those with 0 found products), and a general rollup:
SELECT
c.fldCategoryId,
c.fldCategoryLevel,
c.fldCategoryName,
COUNT( * ) AS cnt
FROM tblCategory c
LEFT JOIN tblProducts p ON
(c.fldCategoryId = p.fldCategoryId1
OR c.fldCategoryId = p.fldCategoryId2
OR c.fldCategoryId = p.fldCategoryId3
OR c.fldCategoryId = p.fldCategoryId4
OR c.fldCategoryId = p.fldCategoryId5)
AND p.fldProductStatus = 'A'
AND MATCH ( p.fldForSearch )
AGAINST (
'+(watches watch)'
IN BOOLEAN MODE
)
GROUP BY
c.fldCategoryId
c.fldCategoryLevel,
c.fldCategoryName
WITH ROLLUP;
Notes:
you cannot select p.fldForSearch if you expect a count of all the products in the category. fldForSearch is on a per product basis, it defeats the grouping purpose
I left joined with products so it returns the categories with 0 products matching your keywords. If you don't want this to happen just remove the LEFT keyword
I haven't checked the MATCH condition I assume it's correct.
Start by not splaying an array (fldCategoryId...) across columns. Instead, add a new table.
Once you have done that, the queries change, such as getting rid of OR clauses.
Hopefully, any further issues will fall into place.
Since your category tree has a fixed height (4 levels), you can create a transitive closure table on the fly with
SELECT c1.fldCategoryId AS descendantId, c.fldCategoryId AS ancestorId
FROM tblcategory c1
LEFT JOIN tblcategory c2 ON c2.fldCategoryId = c1.fldCategoryParent
LEFT JOIN tblcategory c3 ON c3.fldCategoryId = c2.fldCategoryParent
JOIN tblcategory c ON c.fldCategoryId IN (
c1.fldCategoryId,
c1.fldCategoryParent,
c2.fldCategoryParent,
c3.fldCategoryParent
)
The result will look like
| descendantId | ancestorId |
|--------------|------------|
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| ... | ... |
| 5 | 1 |
| 5 | 2 |
| 5 | 5 |
| ... | ... |
You can now use it in a subquery (derived table) to join it with products using descendantId and with categories using ancestorId. That means that a product from category X will be indirectly associated with all ancestors of X (as well as with X). For example: Category 5 is a child of 2 - and 2 is a child of 1. So all products from category 5 must be counted for categories 5, 2 and 1.
Final query:
SELECT c.*, coalesce(sub.cnt, 0) as cnt
FROM tblCategory c
LEFT JOIN (
SELECT tc.ancestorId, COUNT(DISTINCT p.fldUniqueId) AS cnt
FROM tblProducts p
JOIN (
SELECT c1.fldCategoryId AS descendantId, c.fldCategoryId AS ancestorId
FROM tblcategory c1
LEFT JOIN tblcategory c2 ON c2.fldCategoryId = c1.fldCategoryParent
LEFT JOIN tblcategory c3 ON c3.fldCategoryId = c2.fldCategoryParent
JOIN tblcategory c ON c.fldCategoryId IN (
c1.fldCategoryId,
c1.fldCategoryParent,
c2.fldCategoryParent,
c3.fldCategoryParent
)
) tc ON tc.descendantId IN (
p.fldCategoryId1,
p.fldCategoryId2,
p.fldCategoryId3,
p.fldCategoryId4,
p.fldCategoryId5
)
WHERE p.fldProductStatus = 'A'
AND MATCH ( p.fldForSearch )
AGAINST ( '+(watches watch)' IN BOOLEAN MODE )
GROUP BY tc.ancestorId
) sub ON c.fldCategoryId = sub.ancestorId
Result for your sample data (without level, since it seems to be wrong anyway):
| fldCategoryId | fldCategoryName | fldCategoryParent | fldCategoryActive | cnt |
|---------------|-----------------|-------------------|-------------------|-----|
| 1 | Men | 0 | Y | 5 |
| 2 | Accessories | 1 | Y | 5 |
| 3 | Men Watch | 1 | Y | 3 |
| 5 | Watch | 2 | Y | 5 |
| 6 | Clock | 2 | Y | 3 |
| 7 | Wrist watch | 1 | Y | 2 |
| 8 | Watch | 2 | Y | 4 |
| 9 | watch2 | 3 | Y | 2 |
| 10 | fastrack | 8 | Y | 3 |
| 11 | swish | 8 | Y | 2 |
| 12 | digital | 5 | Y | 2 |
| 13 | analog | 5 | Y | 2 |
| 14 | dual | 5 | Y | 1 |
Demos:
sqlfiddle
rextester
Note that the outer (left joined) subquery is logically not necessary. But from my experience MySQL doesn't perform well without it.
There are still ways for performance optimisation. One is to store the transitive closure table in an indexed temporary table. You can also persist it in a regular table, if categories do rarely change. You can also manage it with triggers.
I have 2 tables in mysql database as shown below. I am looking for a query that will select * from books but if preview_image = 'none' then preview_image = the hash_id of the row with the largest size where books.id = images.parentid. Hope this makes sense.
table books
+----------------+---------------+
| id | title | preview_image |
+----------------+---------------+
| 1 | book1 | 55859076d906 |
| 2 | book2 | 20a14f9fd7cf |
| 3 | book3 | none |
| 4 | book4 | ce805ecff5c9 |
| 5 | book5 | e60a7217b3e2 |
+----------------+---------------+
table images
+-------------+------+---------------+
| parentid | size | hash_id |
+--------------------+---------------+
| 2 | 100 | 55859076d906 |
| 1 | 200 | 20a14f9fd7cf |
| 3 | 300 | 34805fr5c9e5 |
| 3 | 400 | ce805ecff5c9 |
| 3 | 500 | e60a7217b3e2 |
+--------------------+---------------+
Thanks
You can use SUBSTRING_INDEX() to obtain the first record from a sorted GROUP_CONCAT(), and switch using a CASE expression:
SELECT books.id, books.title, CASE books.preview_image
WHEN 'none' THEN SUBSTRING_INDEX(
GROUP_CONCAT(images.hash_id ORDER BY images.size DESC SEPARATOR ',')
, ',', 1)
ELSE books.preview_image
END AS preview_image
FROM books LEFT JOIN images ON images.parentid = books.id
GROUP BY books.id
Write a subquery that finds the desired hash ID for each parent ID, using one of the techniques in SQL Select only rows with Max Value on a Column. Then join this with the books table.
SELECT b.id, b.title, IF(b.preview_image = 'none', i.hash_id, b.preview_image) AS image
FROM books AS b
LEFT JOIN (SELECT i1.parentid, i1.hash_id
FROM images AS i1
JOIN (SELECT parentid, MAX(size) AS maxsize
FROM images
GROUP BY parentid) AS i2
ON i1.parentid = i2.parentid AND i1.size = i2.size) AS i
ON b.id = i.parentid
In this example, I have a listing of users (main_data), a pass list (pass_list) and a corresponding priority to each pass code type (pass_code). The query I am constructing is looking for a list of users and the corresponding pass code type with the lowest priority. The query below works but it just seems like there may be a faster way to construct it I am missing. SQL Fiddle: http://sqlfiddle.com/#!2/2ec8d/2/0 or see below for table details.
SELECT md.first_name, md.last_name, pl.*
FROM main_data md
JOIN pass_list pl on pl.main_data_id = md.id
AND
pl.id =
(
SELECT pl2.id
FROM pass_list pl2
JOIN pass_code pc2 on pl2.pass_code_type = pc2.type
WHERE pl2.main_data_id = md.id
ORDER BY pc2.priority
LIMIT 1
)
Results:
+------------+-----------+----+--------------+----------------+
| first_name | last_name | id | main_data_id | pass_code_type |
+------------+-----------+----+--------------+----------------+
| Bob | Smith | 1 | 1 | S |
| Mary | Vance | 8 | 2 | M |
| Margret | Cough | 5 | 3 | H |
| Mark | Johnson | 9 | 4 | H |
| Tim | Allen | 13 | 5 | M |
+------------+-----------+----+--------------+----------------+
users (main_data)
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 1 | Bob | Smith |
| 2 | Mary | Vance |
| 3 | Margret | Cough |
| 4 | Mark | Johnson |
| 5 | Tim | Allen |
+----+------------+-----------+
pass list (pass_list)
+----+--------------+----------------+
| id | main_data_id | pass_code_type |
+----+--------------+----------------+
| 1 | 1 | S |
| 3 | 2 | E |
| 4 | 2 | H |
| 5 | 3 | H |
| 7 | 4 | E |
| 8 | 2 | M |
| 9 | 4 | H |
| 10 | 4 | H |
| 11 | 5 | S |
| 12 | 3 | S |
| 13 | 5 | M |
| 14 | 1 | E |
+----+--------------+----------------+
Table which specifies priority (pass_code)
+----+------+----------+
| id | type | priority |
+----+------+----------+
| 1 | M | 1 |
| 2 | H | 2 |
| 3 | S | 3 |
| 4 | E | 4 |
+----+------+----------+
Due to mysql's unique extension to its GROUP BY, it's simple:
SELECT * FROM
(SELECT md.first_name, md.last_name, pl.*
FROM main_data md
JOIN pass_list pl on pl.main_data_id = md.id
ORDER BY pc2.priority) x
GROUP BY md.id
This returns only the first row encountered for each unique value of md.id, so by using an inner query to order the rows before applying the group by you get only the rows you want.
A version that will get the details as required, and should also work across different flavours of SQL
SELECT md.first_name, md.last_name, MinId, pl.main_data_id, pl.pass_code_type
FROM main_data md
INNER JOIN pass_list pl
ON md.id = pl.main_data_id
INNER JOIN pass_code pc
ON pl.pass_code_type = pc.type
INNER JOIN
(
SELECT pl.main_data_id, pl.pass_code_type, Sub0.MinPriority, MIN(pl.id) AS MinId
FROM pass_list pl
INNER JOIN pass_code pc
ON pl.pass_code_type = pc.type
INNER JOIN
(
SELECT main_data_id, MIN(priority) AS MinPriority
FROM pass_list a
INNER JOIN pass_code b
ON a.pass_code_type = b.type
GROUP BY main_data_id
) Sub0
ON pl.main_data_id = Sub0.main_data_id
AND pc.priority = Sub0.MinPriority
GROUP BY pl.main_data_id, pl.pass_code_type, Sub0.MinPriority
) Sub1
ON pl.main_data_id = Sub1.main_data_id
AND pl.id = Sub1.MinId
AND pc.priority = Sub1.MinPriority
ORDER BY pl.main_data_id
This does not rely on the flexibility of MySQLs GROUP BY functionality.
I'm not familiar with the special behavior of MySQL's group by, but my solution for these types of problems is to simply express as where there doesn't exist a row with a lower priority. This is standard SQL so should work on any DB.
select distinct u.id, u.first_name, u.last_name, pl.pass_code_type, pc.id, pc.priority
from main_data u
inner join pass_list pl on pl.main_data_id = u.id
inner join pass_code pc on pc.type = pl.pass_code_type
where not exists (select 1
from pass_list pl2
inner join pass_code pc2 on pc2.type = pl2.pass_code_type
where pl2.main_data_id = u.id and pc2.priority < pc.priority);
How well this performs is going to depend on having the proper indexes (assuming that main_data and pass_list are somewhat large). In this case indexes on the primary (should be automatically created) and foreign keys should be sufficient. There may be other queries that are faster, I would start by comparing this to your query.
Also, I had to add distinct because you have duplicate rows in pass_list (id 9 & 10), but if you ensure that duplicates can't exist (unique index on main_data_id, pass_code_type) then you will save some time by removing the distinct which forces a final sort of the result set. This savings would be more noticeable the larger the result set is.