Using MySQL "NOT IN" but allowing for substrings

Using MySQL "NOT IN" but allowing for substrings - mysql

I am trying to query for rows that are not in another set of rows. However, the other set of rows may contain strings that include strings from the first table.
I'm confusing myself trying to explain so I'll use the following example tables:
mysql> DESCRIBE tablea;
+------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+-------+
| name | char(40) | NO | PRI | | |
+------------+----------+------+-----+---------+-------+
mysql> DESCRIBE tableb;
+------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+-------+
| nametag | char(40) | NO | PRI | | |
+------------+----------+------+-----+---------+-------+
mysql> SELECT name FROM tablea;
+------------+
| name |
+------------+
| cat |
| dog |
| cow |
+------------+
mysql> SELECT nametag FROM tableb;
+------------+
| nametag |
+------------+
| wolf |
| dog |
| browncow |
+------------+
I am trying to find a method similar to the NOT IN operation, however because cow is "in" browncow, I also want to exclude this value.
mysql> SELECT name FROM tablea WHERE name NOT IN ( SELECT nametag FROM tableb );
+------------+
| name |
+------------+
| cat |
| cow |
+------------+
# I am looking for something that would only return "cat" for this example.
Is there any operation where I can search for rows that aren't contained in another set with additional modifiers?

You could use an anti-join pattern, with a LIKE predicate to do the matching. (The anti-join is an outer join, to return all the rows from one table, along with matches from another table, and then a predicate to exclude the rows that had a match
SELECT a.name
FROM tablea a
JOIN tableb b
ON b.nametag LIKE CONCAT('%',a.name,'%')
WHERE b.nametag IS NULL
(Any rows from a that had a matching row from b... the row from b will have a non-NULL value. Or, to put it another way... rows from a that didn't have a matching row in b will have a NULL value for the columns from b.)
If there's a row in a that has name='cow', and a row from b that has nametag='browncow', those rows will match.
The row from a with name='cat' will only be returned if the string 'cat' doesn't appear in any values of b.nametag.
NOTE: The percent and underscore characters are wildcards in the LIKE predicate. If you want to do matching on those characters, you'd need to "escape" those with a backslash. There's similar issues using a REGEXP match, but a lot more possible mischievous characters.
There are other query patterns that will return an equivalent result.
For example:
SELECT a.name
FROM tablea a
WHERE NOT EXISTS
( SELECT 1
FROM tableb b
WHERE b.nametag LIKE CONCAT('%',a.name,'%')
)
Personally, I prefer the anti-join pattern.

Use "NOT EXISTS" along with INSTR
select *
from tablea a
where not exists(select 1 from tableb b where INSTR(a.name, b.nametag) > 0)
;
To exclude empty strings:
select *
from tablea a
where not exists(select 1 from tableb b where INSTR(a.name, b.nametag) > 0)
and length(a.name) > 0
;

Related

Very Strange MySQL Results

Can someone explain this... before I have myself committed? The first result set should have two results, same as the second, no?
mysql> SELECT * FROM kuru_footwear_2.customer_address_entity_varchar
-> WHERE attribute_id=31 AND entity_id=324134;
+----------+----------------+--------------+-----------+-------+
| value_id | entity_type_id | attribute_id | entity_id | value |
+----------+----------------+--------------+-----------+-------+
| 885263 | 2 | 31 | 324134 | NULL |
+----------+----------------+--------------+-----------+-------+
1 row in set (0.00 sec)
mysql> SELECT * FROM kuru_footwear_2.customer_address_entity_varchar
-> WHERE value_id=885263 OR value_id=950181;
+----------+----------------+--------------+-----------+-------+
| value_id | entity_type_id | attribute_id | entity_id | value |
+----------+----------------+--------------+-----------+-------+
| 885263 | 2 | 31 | 324134 | NULL |
| 950181 | 2 | 31 | 324134 | NULL |
+----------+----------------+--------------+-----------+-------+
2 rows in set (0.00 sec)
attribute_id is a SMALLINT(5)
entity_id is a INT(10)

The problem is that you have a unique index on (entity_id,attribute_id). The query optimizer notices this when you write a query whose WHERE clause is covered by the index, and only returns 1 row since the uniqueness of the index implies that there's at most one matching row.
I'm not sure how you can have those duplicates in the first place, it seems like there's something corrupted in the table. Adding a unique index to a table will normally remove any duplicates. In fact, this is often suggested as a way to get rid of duplicates in a table, see How do I delete all the duplicate records in a MySQL table without temp tables.

In the first statement's selection (the 'WHERE' clause), you are using AND; in the second statement, you are using OR. This boils down to the definition of these logical operators. MySQL's official documentation doesn't say much about these other than that AND and OR are their own natural logical operators. If this is confusing, you may want to read up on basic Boolean Algebra.

My mysql statement to query by primary key sometimes returns more than one row, so what happened?

My schema is this:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_name` varchar(10) NOT NULL,
`account_type` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1
INSERT INTO user VALUES (1, "zhangsan", "premiumv"), (2, "lisi", "premiumv"), (3, "wangwu", "p"), (4, "maliu", "p"), (5, "hengqi", "p"), (6, "shuba", "p");
I have the following 6 rows in the table:
+----+-----------+--------------+
| id | user_name | account_type |
+----+-----------+--------------+
| 1 | zhangsan | premiumv |
| 2 | lisi | premiumv |
| 3 | wangwu | p |
| 4 | maliu | p |
| 5 | hengqi | p |
| 6 | shuba | p |
+----+-----------+--------------+
Here is mysql to query the table by id:
SELECT * FROM user WHERE id = floor(rand()*6) + 1;
I expect it to return one row, but the actual result is non-predictive. It either will return 0 row, 1 row or sometimes more than one row. Can somebody help clarify this? Thanks!

You're testing each row against a different random number, so sometimes multiple rows will match. To fix this, calculate the random number once in a subquery.
SELECT u.*
FROM user AS u
JOIN (SELECT floor(rand()*6) + 1 AS r) AS r
ON u.id = r.r
This method of selecting a random row from a table seems like a poor design. If there are any gaps in the id sequence (which can happen easily -- MySQL doesn't guarantee that they'll always be sequential, and deleting rows will leave gaps) then it could return an empty result. The usual way to select a random row from a table is with:
SELECT *
FROM user
ORDER BY RAND()
LIMIT 1

The WHERE part must be evaluated for each row to see if there is a match. Because of this, the rand() function is evaluated for every row. Getting an inconsistent number of rows seems reasonable.
If you add LIMIT 1 to your query, the probability of returning rows from the end diminishes.

It's because the WHERE clause floor(rand()*6) + 1 is evaluated against every rows in the table to see if the condition matches the criteria. The value could be different each time it is matched against the row from the table.
You can test with a table that has same values in the column used in WHERE clause, and you can see the result:
select * from test;
+------+------+
| id | name |
+------+------+
| 1 | a |
| 2 | b |
| 1 | c |
| 2 | d |
| 1 | e |
| 2 | f |
+------+------+
select * from test where id = floor(rand()*2) + 1;
+------+------+
| id | name |
+------+------+
| 1 | a |
| 2 | d |
| 1 | e |
+------+------+
In the above example, the expression floor(rand()*2) + 1 returns 1 when matching against the first row (with name = 'a') so it is included in the result set. But then it returns 2 when matching against the forth row (with name = 'd'), so it is also included in the result set even the value of id is different from the value of the first row in the result set.

MySQL: Duplicate columns having the same name in join

Two questions on SO seem to be asking about different behaviour for equivalent MySQL queries. In both cases a join is being performed on tables having identical column names. This poster is asking how to eliminate duplicated columns having the same name from the result and this poster is asking how to achieve the duplication of columns having the same name in the result.
To test this I created toy tables:
mysql> describe table_1;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| col_name | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+-------+
mysql> describe table_2;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| col_name | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+-------+
I inserted the value 'value' into both tables and and executed these joins where JOIN_OP is either "join" or "left join":
mysql> select * from table_1 as t1 JOIN_OP table_2 as t2 on t1.col_name = t2.col_name;
+----------+----------+
| col_name | col_name |
+----------+----------+
| value | value |
+----------+----------+
This result conforms to the results in the first post. What is the difference between the two queries and the two results? Why is the second poster not seeing any duplication?

By default MySQL will return all columns for all tables if you use *. Two columns are returned as one is for table_1 and other for table_2. You will need to explicitly enter column names in your query to retrieve them the way you want. Use the query as follows:
select t1.col_name from table_1 as t1 JOIN_OP table_2 as t2 on t1.col_name = t2.col_name;

SQL:2003 rules say that if you use the USING() join condition, the redundant column is eliminated from the select-list, because it's totally clear that the columns have the same name and the same value.
mysql> select * from table_1 as t1 JOIN table_2 as t2 USING (col_name);
+----------+
| col_name |
+----------+
| value |
+----------+
Whereas if you use the more verbose JOIN ON syntax, and you use select *, the select-list retains a separate column from each of the joined tables, even though in this case we can tell that the value is bound to be identical.
But apparently SQL isn't smart enough to make that inference, because a condition in the ON clause could be something other than equality, e.g. it could be ON t1.col_name <> t2.col_name, therefore both columns should be retained in the select-list because they'll have different values.
MySQL 5.0.12 and later supports this standard behavior (several fixes were made to join semantics in MySQL 5.0.12, to be more standard-compliant).
You can see more discussion here: http://bugs.mysql.com/bug.php?id=6489

Pattern for LIKE in SQL Statement

I would like to select all rows that start with any character.
SELECT * FROM table WHERE field LIKE '[a-z]%' ;
The type of rows I would like to find look like this:
ID DATA
993 DEF055900960
994 DEF055900961
995 DEF055900964
996 DEF056102254
997 DEF056131201
I have unsucessfully tried RLIKE and REGEXP and also added upper case A-Z ot the pattern.
Why is the following not working?

SELECT * FROM table WHERE field RLIKE '[a-z]' ;
SQL Fiddle Demo
I went through here to read about Pattern Matching in Mysql

Try this instead:
SELECT * FROM table WHERE field LIKE '[^A-Z]%';

Try this.
SELECT `firstname`
FROM `users`
WHERE firstname
REGEXP BINARY '^[A-Z]'

Use REGEXP
http://dev.mysql.com/doc/refman/5.1/en/regexp.html#operator_regexp
For example, assume we have this data set.
mysql> select * from a;
+------+
| b |
+------+
| abc |
| zxxb |
| kkfy |
| 0002 |
+------+
4 rows in set
We want to select everything with the a-z pattern
mysql> SELECT * FROM a WHERE b REGEXP BINARY '[a-z]';
+------+
| b |
+------+
| abc |
| zxxb |
| kkfy |
+------+
3 rows in set
SQLFiddle
For your data set
SQLFiddle
Use the regular expression [a-zA-Z0-9]
mysql> SELECT * FROM a WHERE b REGEXP BINARY '[a-zA-Z0-9]';
+--------------+
| b |
+--------------+
| DEF055900960 |
| DEF055900961 |
| DEF055900964 |
| DEF056102254 |
| DEF056131201 |
+--------------+
5 rows in set

Why is this MySQL JOIN statement returning more results?

I have two (characteristic_list and measure_list) tables that are related to each other by a column called 'm_id'. I want to retrieve records using filters (columns from characteristic_list) within a date range (columns from measure_list). When I gave the following SQL using INNER JOIN, it takes a while to retrieve the record. What am I doing wrong?
mysql> explain select c.power_set_point, m.value, m.uut_id, m.m_id, m.measurement_status, m.step_name from measure_list as m INNER JOIN characteristic_lis
t as c ON (m.m_id=c.m_id) WHERE (m.sequence_end_time BETWEEN '2010-06-18' AND '2010-06-20');
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 82952 | |
| 1 | SIMPLE | m | ALL | NULL | NULL | NULL | NULL | 85321 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
2 rows in set (0.00 sec)
mysql> select count(*) from measure_list;
+----------+
| count(*) |
+----------+
| 83635 |
+----------+
1 row in set (0.18 sec)
mysql> select count(*) from characteristic_list;
+----------+
| count(*) |
+----------+
| 83635 |
+----------+
1 row in set (0.10 sec)

The reason this query takes a while to execute is because it has to scan the entire table. You never want to see "ALL" as the type of the query. To speed things up, you need to make smart decisions about what to index.
See the following documents at the MySQL site:
http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html

As an add-on to the previous answer by Dan, you should consider indexing the join columns and the where columns. In this case, that means the m_id cols in both tables and the sequence_end_time in the measure_list table. They are small enough that you could add an index, run explain plan and time it, then change the index and compare. Should be relatively quick to solve.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Using MySQL "NOT IN" but allowing for substrings - mysql

Use "NOT EXISTS" along with INSTR select * from tablea a where not exists(select 1 from tableb b where INSTR(a.name, b.nametag) > 0) ; To exclude empty strings: select * from tablea a where not exists(select 1 from tableb b where INSTR(a.name, b.nametag) > 0) and length(a.name) > 0 ;

Related

Very Strange MySQL Results

My mysql statement to query by primary key sometimes returns more than one row, so what happened?

MySQL: Duplicate columns having the same name in join

Pattern for LIKE in SQL Statement

Why is this MySQL JOIN statement returning more results?

Categories

Resources