MySQL Query to Match Unrelated Terms - mysql
I'm trying to construct a query that's driving me crazy. I had no idea where to start with solving it, but after searching around a bit I started playing with subqueries. Now I'm at the point where I'm not sure if that will solve my issue or, if it will, how to create one that does what I want.
Here's a very simplistic view of my current table (call it tbl_1):
---------------------------------
| row | name | other_names |
|-------------------------------|
| 1 | A | B, C |
| 2 | B | C |
| 3 | A | C |
| 4 | D | E |
| 5 | C | A, B |
---------------------------------
Some of the items I'm working with have multiple names (brand names, names in other countries, code names, etc.), but ultimately all of those different names refer to the same item. I originally was running a search query along the lines of:
SELECT * FROM tbl_1
WHERE name LIKE '%A%'
OR other_names LIKE '%A%';
Which would return rows 1 and 3. However, I quickly realized that my query should also return row 2, as A = B = C. How would I go about doing something like that? I'm open to alternative suggestions outside of a fancy query, such as constructing another table that somehow combines all the names into one row, but I figure something like that would be error prone or inefficient.
Additionally, I'm running MySQL 5.5.23 using InnoDB with other code written in PHP and Python.
Thanks!
Update 5/26/12:
I went back to my original thinking of using a subquery, but right when I thought I was getting somewhere I ran into a documented MySQL issue where the query is evaluated from the outside in and my subquery will be evaluated for every row and won't finish in a realistic amount of time. Here's what I was attempting to do:
SELECT * FROM tbl_1
WHERE name = ANY
(SELECT name FROM tbl_1 WHERE other_names LIKE '%A%' or name LIKE '%A%')
OR other_names = ANY
(SELECT name FROM tbl_1 WHERE other_names LIKE '%A%' or name LIKE '%A%')
Which returns what I want using the example table, but the aforementioned MySQL issue/bug causes the subquery to be considered a dependent query rather than an independent one. As a result, I haven't been able to test the query on my real table (~250,000 rows) as it eventually times out.
I've read that the main workaround for the issue is to use joins rather than subqueries, but I'm not sure how I would apply that to what I'm trying to do. The more I think about it, I might be better off running the subqueries independently using PHP/Python and using the resulting arrays to craft the main query that I want. However, I still think there is the potential to miss some results because the terms in the columns aren't nearly as nice as my example (some of the terms are multiple words, some have parenthesis, the other names aren't necessarily comma-separated, etc).
Alternatively, I'm thinking about constructing a separate table that will build the necessary links, something like:
| 1 | A | B, C|
| 2 | B | C, A|
| 3 | C | A, B|
but I think that's a lot easier said than done considering the data I'm working with and the non-standardized format in which it exists.
The route that I'm strongly considering at the point is to build a separate table with the links that are easily constructed (i.e. 1:1 ratio for name:other_names) so I don't have to deal with the formatting issues that exist in the other_names column. I may also eliminate/limit the use of LIKE and require users to know at least one exact name in order to simplify the results and probably increase the overall performance.
In conclusion, I hate working with input data that I have no control over.
Stumbled on this question by accident, so i don't know if my suggestion is relevant, but this looks like good usage for something like an "union-find".
The SELECT would be extremely easy and fast.
But the insert & update is relativly complex and you will probably need an in-code loop (while updated rows > 0)... and several databse calls
Example for the table:
---------------------------
| row | name | group |
|-------------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 4 | C | 1 |
| 5 | D | 2 |
| 6 | X | 1 |
| 7 | Z | 2 |
---------------------------
selecting:
SELECT name FROM tbl WHERE group = (SELECT group FROM tbl WHERE name LIKE '%A%')
inserting relation K = T: (psedu codeish..)
SELECT group as gk WHERE name = K;
SELECT group as gt WHERE name = T;
if (gk empty result) and (gt empty result) insert both with new group
---------------------------
| row | name | group |
|-------------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 4 | C | 1 |
| 5 | D | 2 |
| 6 | X | 1 |
| 7 | Z | 2 |
| 8 | K | 3 |
| 9 | T | 3 |
---------------------------
if (gk empty result) and (gt NOT empty result) insert t with group = gx.group
---------------------------
| row | name | group |
|-------------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 4 | C | 1 |
| 5 | D | 2 |
| 6 | X | 1 |
| 7 | Z | 2 |
| 8 | K | 2 |
| 9 | T | 2 |
---------------------------
(the same in the other case)
and when both not empty, update one group to be the other
UPDATE tbl1 SET group = gt WHERE group = gk
I can't think of a query, that supports unlimited depth of name identity. But if you could work with a limited number of "recursions", you might consider using a query similar to this, starting with the query you provided, you retrieve all rows with name identities:
SELECT a.* FROM tbl_1 a
WHERE a.name='A'
OR a.other_names LIKE '%A%'
UNION
SELECT b.* FROM tbl_1 a
JOIN tbl_1 b ON a.other_names LIKE '%' || b.name || '%' OR b.other_names LIKE '%' || a.name || '%'
WHERE a.name='A'
OR a.other_names LIKE '%A%';
This query would return row 2, but it wouldn't return any additional rows having "B" as "other_name" in your example. So you would have to union another query:
SELECT a.* FROM tbl_1 a
WHERE a.name='A'
OR a.other_names LIKE '%A%'
UNION
SELECT b.* FROM tbl_1 a
JOIN tbl_1 b ON a.other_names LIKE '%' || b.name || '%' OR b.other_names LIKE '%' || a.name || '%'
WHERE a.name='A'
OR a.other_names LIKE '%A%';
UNION
SELECT c.* FROM tbl_1 a
JOIN tbl_1 b ON (a.other_names LIKE '%' || b.name || '%' OR b.other_names LIKE '%' || a.name || '%')
JOIN tbl_1 c ON (b.other_names LIKE '%' || c.name || '%' OR c.other_names LIKE '%' || b.name || '%')
WHERE a.name='A'
OR a.other_names LIKE '%A%';
As you can see, the query would grow and accelerate rapidly with increasing depth, and it also isn't what I would call beautiful. But it might fit your needs. I'm not very experienced working with MySQL functions, but I guess you would be able to create a more elegant solution also working with unlimited depth using those. You might also consider solving the problem programmatically with Python.
Related
How to find data based on comma separated parameter in comma separated data in my SQL query
We have below data, plant table ---------------------------- | name | classification | | A | 1,4,7 | | B | 2,3,7 | | C | 3,4,9,8 | | D | 1,5,6,9 | Now from front end side, they will send multiple parameter like "4,9", and the objective output should be like this plant table --------------------------- | name | classification | | A | 1,4,7 | | C | 3,4,9,8 | | D | 1,5,6,9 | Already tried with FIND_IN_SET code, but only able to fetch only with 1 parameter select * from plant o where find_in_set('4',classification ) <> 0 Another solution is by doing multiple queries, for example if the parameter is "4,9" then we do loop the query two times with parameter 4 and 9, but actually that solution will consume so much resources since the data is around 10000+ rows and the parameter itself actually can be more than 5 params If the table design is in bad practice then OK but we are unable to change it since the table is in third party Any solution or any insight will be appreciated, Thank you
Schema (MySQL v8.0) CREATE TABLE broken_table (name CHAR(12) PRIMARY KEY,classification VARCHAR(12)); INSERT INTO broken_table VALUES ('A','1,4,7'), ('B','2,3,7'), ('C','3,4,9,8'), ('D','1,5,6,9'); Query #1 WITH RECURSIVE cte (n) AS ( SELECT 1 UNION ALL SELECT n + 1 FROM cte WHERE n < 5 ) SELECT DISTINCT x.name, x.classification FROM broken_table x JOIN cte WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(classification,',',n),',',-1) IN (4,9); name classification A 1,4,7 C 3,4,9,8 D 1,5,6,9 View on DB Fiddle EDIT: or, for older versions... SELECT DISTINCT x.name, x.classification FROM broken_table x JOIN ( SELECT 1 n UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 ) cte WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(classification,',',n),',',-1) IN (4,9)
Let's just avoid the CSV altogether and fix your table design: plant table ---------------------------- | name | classification | | A | 1 | | A | 4 | | A | 7 | | B | 2 | | B | 3 | | B | 7 | | ... | ... | Now with this design, you may use the following statement: SELECT * FROM plant WHERE classification IN (?); To the ? placeholder, you may bind your collection of values to match (e.g. (4,9)).
You want or so you can use regular expressions. If everything were one digit: where classification regexp replace('4,9', ',', '|') However, this would match 42 and 19, which I'm guessing you do not want. So, make this a little more complicated so you have comma delimiters: where classification regexp concat('(,|^)', replace('4,9', ',', ',|,'), '(,|$)')
In mysql how can I get only rows from one table which do not link to any rows in another table with a specific ID
I have two tables with the following structures (unnecessary columns trimmed out) ----------------- --------------------- | mod_personnel | | mod_skills | | | | | | - prs_id | | - prs_id | | - name | | - skl_id | ----------------- | | --------------------- There may be 0 to many rows in the skills table for each prs_id What I want is all the personnel records which do NOT have an associated skill record with skill_id 1. In plain English "I want all the people who do not have the skill x". Currently, I have only been able to do it with the following nested select. But I am hoping to find a faster way. SELECT * FROM `mod_personnel` WHERE `prs_id` NOT IN ( SELECT `prs_id` FROM `mod_skills` WHERE `skl_id` = 1 )
This may be faster: SELECT `mod_personnel`.* FROM `mod_personnel` left outer join `mod_skills` on `mod_skills`.`prs_id` = `mod_personnel`.`prs_id` and `mod_skills`.`skl_id` = 1 WHERE `mod_skills`.`prs_id` is null;
Using a NOT EXISTS might be faster. SELECT * FROM `mod_personnel` p WHERE NOT EXISTS (SELECT * FROM `mod_skills` s WHERE s.`prs_id` = p.`prs_id` AND s.`skl_id` = 1 );
Query two tables where one row is like another
I have two table queries. The first table looks like: | id | Number | +--------+-------------+ | 1 | WDX | | 2 | ABd32 | | 3 | CACY | and second like: | id | realNumber | +--------+-------------+ | 1 | w_WDX_zed | | 2 | ABd32_ala | | 3 | guava | The output needs to looks like: | id | output | +--------+-------------+ | 1 | w_WDX_zed | | 2 | ABd32_ala | | 3 | CACY | In the first table there are car plates, and in second there are plates_username. I need to connect them and update first table to match. I was trying to do this like: UPDATE `TAB_a` a, `TAB_b` b SET a.`Number` = b.`realNumber` WHERE a.`Number` LIKE CONCAT('%',b.`realNumber`,'%') AND a.Number <> b.`realNumber`; But that does not work.
update plates p inner join plate_users pu on pu.realNumber like concat(concat('%', p.number), '%') set p.number = pu.number This is, however, fraught with danger -- if there are plates that are substrings of another plate you're likely to get unwanted results. if realNumber is supposed to be , how come the first one is _? If those type weren't in there, it would be a lot easier and safer. EDIT Ok, after a little more info below, here are two more options you can try: update plates p inner join plate_users pu on pu.number like concat('%\_', p.number) or pu.number like concat(p.number, '\_%') or pu.number like concat(concat('%\_', p.number), '\_%') set p.number = pu.number Fiddle: http://sqlfiddle.com/#!9/e8999/1 and update plates p inner join plate_users pu on pu.number REGEXP concat(concat('.*_?', p.number), '_?.*') set p.number = pu.number Fiddle: http://sqlfiddle.com/#!9/c5bc7/1 All of the above give the desired results on your minimal dataset, but i strongly suggest backing up your data before you run any of these on your live data. The last two options are preferable, because they require at least one underscore in the realNumber.
mysql select from 2 other columns in the same table
I have a table which looks like this but much longer... | CategoryID | Category | ParentCategoryID | +------------+----------+------------------+ | 23 | Screws | 3 | | 3 | Packs | 0 | I am aiming to retrieve one column from this which in this instance would give me the following... | Category | +--------------+ | Packs/Screws | Please excuse me for not knowing exactly how to word this, so far I can only think to split the whole table into multiple tables and use LEFT JOIN, this seems like a very good opportunity for a learning curve however. I realise that CONCAT() will come into play when combining the two retrieved Category names but beyond that I am stumped.
SELECT CONCAT(x.category,'/',y.category) Category FROM my_table x JOIN my_table y ON y.categoryid = x.parentcategoryid [WHERE x.parentcategoryid = 0]
Mysql select IN return null if id not exists
I have a table like this: +----+---------+---------+ | Id | column1 | column2 | +----+---------+---------+ | 1 | a | b | | 2 | a | b | +----+---------+---------+ and a query like this SELECT * FROM table WHERE id IN (1,2,3) what query do I need to get a result like this(I need to get null values for nonexisten id's): +----+---------+---------+ | Id | column1 | column2 | +----+---------+---------+ | 1 | a | b | | 2 | a | b | | 3 | null | null | +----+---------+---------+ EDIT Thanks for the responses so far. Is there a more 'dynamic way' to do this, the query above it's just an example. In reality I need to check around 1000 id's!
You could use something like this: SELECT ids.ID, your_table.column1, your_table.column2 FROM (SELECT 1 as ID UNION ALL SELECT 2 UNION ALL SELECT 3) ids left join your_table on ids.ID = your_table.ID First subquery returns each value you need in a different row. Then you can try to join each row with your_table. If you use a left join, all values from the first subquery are shown, and if there's a match with your_table, values from your_table are shown, otherwise you will get nulls.
That is not the way SQL works unfortunately. I would think it would be pretty trivial for your application to determine the differences between the id's it asked for and the id's returned. So rather than hack or some weird query to mock up your result, why not have your application handle it? I still can't understand though what the use case might be to where you would be querying rows on teh database by id's that may or may not exist.