I want to create a temporal table from a SELECT statement in MySQL. It involves several JOINs, and it can produce NULL values that I want MySQL to take as zeroes. It sounds like an easy problem (simply default to zero), but MySQL (5.6.12) fails to elicit the default value.
For example, take the following two tables:
mysql> select * from TEST1;
+------+------+
| a | b |
+------+------+
| 1 | 2 |
| 4 | 25 |
+------+------+
2 rows in set (0.00 sec)
mysql> select * from TEST2;
+------+------+
| b | c |
+------+------+
| 2 | 100 |
| 3 | 100 |
+------+------+
2 rows in set (0.00 sec)
A left join gives:
mysql> select TEST1.*,c from TEST1 left join TEST2 on TEST1.b=TEST2.b;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 2 | 100 |
| 4 | 25 | NULL |
+------+------+------+
2 rows in set (0.00 sec)
Now, if I want to save these values in a temporal table (changing NULL for zero), this is the code I would use:
mysql> create temporary table TEST_JOIN (a int, b int, c int default 0 not null)
select TEST1.*,c from TEST1 left join TEST2 on TEST1.b=TEST2.b;
ERROR 1048 (23000): Column 'c' cannot be null
What am I doing wrong? The worst part is that this code used to work before I did a system-wide upgrade (I don't remember which version of MySQL I had, but surely it was lower than my current 5.6). It used to produce the behavior I would expect: if it's NULL, use the default, not the frustrating error I'm getting now.
From the documentation of 5.6 (unchanged since 4.1):
Inserting NULL into a column that has been declared NOT NULL. For
multiple-row INSERT statements or INSERT INTO ... SELECT statements,
the column is set to the implicit default value for the column data
type. This is 0 for numeric types, the empty string ('') for string
types, and the “zero” value for date and time types. INSERT INTO ...
SELECT statements are handled the same way as multiple-row inserts
because the server does not examine the result set from the SELECT to
see whether it returns a single row. (For a single-row INSERT, no
warning occurs when NULL is inserted into a NOT NULL column. Instead,
the statement fails with an error.)
My current workaround is to store the NULL values in the temporal table, and then replace them by zeroes, but it seems rather cumbersome with many columns (and terribly inefficient). Is there a better way to do it?
BTW, I cannot simply ignore some columns in the query (as suggested for another question), because it's a multirow query.
IFNULL(`my_column`,0);
That would set NULLs to 0. Other values stay as is.
Just wrap your values/column names with IFNULL and it will convert them to whatever default value you put into the function. E.g. 0. Or "european swallow", or whatever you want.
Then you can keep strict mode on and still handle NULLs gracefully.
Related
When playing with MySQL I noticed that if I run
SELECT * FROM table WHERE !value;
returns the same thing as
SELECT * FROM table WHERE value IS NOT NULL;
The table is "employees" and the column is "MiddleInitial" (strings).
So I get all employees who's MiddleInitial is not null.
Is this a proper shorthand or just a coincidence? I am wondering if this is a safe way to write? I cannot seem to find any information on this.
I was expecting
SELECT * FROM table WHERE !value;
to return all null values. Oddly enough
SELECT * FROM table WHERE value;
returns nothing.
No, it's not.
It might seem like it because it's doing type coercion to force whatever is in the column into a boolean value. NULL values will coerce to false when forced as a boolean predicate, and most non-null column values will coerce to true. But some column values (that are not null) will also coerce to false.
You can see examples here:
https://dbfiddle.uk/ABLUgLex
Notice the last example is missing the 0 row. Also notice the one before that does not include the null row, which leads me to suspect your server might have an option set for non-standard null handling.
Here's a few more samples:
https://dbfiddle.uk/BNxiujKt
Notice the treatment of the '1' row.
In SQL, NULL is not the same as false.
Negating NULL is not true, it's still NULL.
mysql> select null;
+------+
| NULL |
+------+
| NULL |
+------+
mysql> select !(null);
+---------+
| !(null) |
+---------+
| NULL |
+---------+
Think of NULL as the value "unknown." If some piece of information is unknown, how can its opposite be known? It can't — the opposite is also unknown, because we don't know what we started with.
When used in a WHERE clause condition, NULL acts more or less like false because neither are strictly true. Only rows where the conditions are true become part of the result set of the query.
There are other values that act like false in MySQL:
mysql> select 1 where '';
Empty set (0.01 sec)
mysql> select 1 where 0;
Empty set (0.01 sec)
MySQL is a bit nonstandard because the boolean values true and false are literally the same as the integer values 1 and 0 respectively (this is not the way booleans are implemented in most other brands of SQL database).
These values are not NULL, so they can be negated and you can treat their opposites as true.
mysql> select 1 where !0;
+---+
| 1 |
+---+
| 1 |
+---+
mysql> select 1 where !'';
+---+
| 1 |
+---+
| 1 |
+---+
As the comment above said, the ! operator is deprecated in MySQL 8.0. It's not standard SQL, and using it makes your code less clear than if you use more explicit language like IS NOT NULL or <>.
I have a MEDIUMTEXT blob in a table, which contains paths, separated by new line characters. I'd like to add a "/" to the begging of each line if it is not already there. Is there a way to write a query to do this with built-in procedures?
I suppose an alternative would be to write a Python script to get the field, convert to a List, process each line and update the record. There aren't that many records in the DB, so I can take the processing delay (if it doesn't lock the entire DB or table). About 8K+ rows.
Either way would be fine. If second option is recommended, do I need to know of specific locking schematics before getting into this -- as this would be run on a live prod DB (of course, I'd take a DB snapshot). But in place updates would be best to not have downtime.
Demo:
mysql> create table mytable (id int primary key, t text );
mysql> insert into mytable values (1, 'path1\npath2\npath3');
mysql> select * from mytable;
+----+-------------------+
| id | t |
+----+-------------------+
| 1 | path1
path2
path3 |
+----+-------------------+
1 row in set (0.00 sec)
mysql> update mytable set t = concat('/', replace(t, '\n', '\n/'));
mysql> select * from mytable;
+----+----------------------+
| id | t |
+----+----------------------+
| 1 | /path1
/path2
/path3 |
+----+----------------------+
However, I would strongly recommend to store each path on its own row, so you don't have to think about this. In SQL, each column should store one value per row, not a set of values.
I'm trying to understand a huge performance difference that I'm seeing in equivalent code. Or at least code I think is equivalent.
I have a table with about 10 million records on it. It contains a field, which is indexed defined as:
USPatentNum char(8)
If I set a variable withing MySql to a value, it takes over 218 seconds. The exact same query with a string literal takes under 1/4 of a second.
In the code below, the first select statement (with where USPatentNum = #pn;) takes forever, but the second, with the literal value
(where USPatentNum = '5288812';) is nearly instant
mysql> select #pn := '5288812';
+------------------+
| #pn := '5288812' |
+------------------+
| 5288812 |
+------------------+
1 row in set (0.00 sec)
mysql> select patentId, USPatentNum, grantDate from patents where USPatentNum = #pn;
+----------+-------------+------------+
| patentId | USPatentNum | grantDate |
+----------+-------------+------------+
| 306309 | 5288812 | 1994-02-22 |
+----------+-------------+------------+
1 row in set (3 min 38.17 sec)
mysql> select #pn;
+---------+
| #pn |
+---------+
| 5288812 |
+---------+
1 row in set (0.00 sec)
mysql> select patentId, USPatentNum, grantDate from patents where USPatentNum = '5288812';
+----------+-------------+------------+
| patentId | USPatentNum | grantDate |
+----------+-------------+------------+
| 306309 | 5288812 | 1994-02-22 |
+----------+-------------+------------+
1 row in set (0.21 sec)
Two questions:
Why is the use of the #pn so much slower?
Can I change the select statement so that the performance will be the same?
Declare #pn as char(8) before setting its value.
I suspect it will be a varchar as you do it now. If so, the performance loss is because MySql can't mach the index with your variable.
It doesn't matter whether you use constant or #var. You get different result because the second time MySQL gets results from cache. If you execute once again your scenario but trade places queries with const and with #var you will get them same results (but with another value). First will be slowed, second will be fast.
Hope it helps
I have a MySQL table with 2 columns and each column has thousands of records
For Example 15000 Email addresses in Column1 and 15005 Email addresses in column 2
How to find those 5 records from 15005 which are unmatched in column1?
I wish MySql query to compare both columns and give result of only 5 unmatched records
Thanks
Not sure if I got it right... but would it be something like?
select column2 from table
where column2 not in (select column1 from table)
Richard, it's highly unusual to find matching/missing rows from one column in a table compared against another column in the same table.
You can think of a table as being a collection of facts, with each row being one fact. Converting values into predicates is how we understand the data. The value "12" in one table may mean "there exists a day on which 12 widgets were made," or "12 people bought widgets on Jan. 1," or "on Jan. 12, no widgets were sold," but whatever the table's corresponding predicate is, "12" should represent a fact.
It's common to want to find the difference between two tables: "what facts are in B that aren't in A?" But in a table with two columns, each row should conceptually be a fact about that pair of values. Perhaps the predicate for the row (12, 13) might be "on Jan. 12, we sold 13 widgets." But in that case I doubt you'd be asking for this information.
So, if (12,13) is really two of the same predicate -- "someone in district 12 bought widgets, and also, someone in district 13 bought widgets" -- in the long run life will be easier if those are one column, not two. And if it's two different predicates, it would make more sense for them to be in two tables. SQL's flexible and can handle these situations, but you may run into more problems later. If you're interested in more about this subject, searching on "normalization" will find you way more than you want to know :)
Anyway, I think the query you're looking for uses a LEFT JOIN to compare the table against itself. I added the values 1-15000 to col1 and 1-15005 to col2 in this table:
CREATE TABLE `foo` (
`col1` int(11) DEFAULT NULL,
`col2` int(11) DEFAULT NULL,
KEY `idx_col1` (`col1`),
KEY `idx_col2` (`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
mysql> select count(distinct col1), count(distinct col2) from foo;
+----------------------+----------------------+
| count(distinct col1) | count(distinct col2) |
+----------------------+----------------------+
| 15000 | 15005 |
+----------------------+----------------------+
1 row in set (0.01 sec)
By giving the same table two names, I can compare its two columns against each other, and find the col2 values that have no corresponding col1 values -- in those cases, f1.col1 will be NULL:
mysql> select f2.col2
from foo as f2 left join foo as f1 on (f2.col2=f1.col1)
where f1.col1 is null;
+-------+
| col2 |
+-------+
| 15001 |
| 15002 |
| 15003 |
| 15004 |
| 15005 |
+-------+
5 rows in set (0.03 sec)
Regarding Mosty's solution yesterday, I'm not sure it's correct. I try not to use subqueries, so I'm a little out of my depth here. But it doesn't seem to work for at least my attempt to replicate your data set:
mysql> select col2 from foo where col2 not in
(select col1 from foo);
Empty set (0.02 sec)
It works if I exclude the 5 NULLs from the subquery, which suggests to me that "NOT IN (NULL)" doesn't necessarily work the way one might think it works:
mysql> select col2 from foo where col2 not in
(select col1 from foo where col1 is not null);
+-------+
| col2 |
+-------+
| 15001 |
| 15002 |
| 15003 |
| 15004 |
| 15005 |
+-------+
5 rows in set (0.02 sec)
The main reason I avoid subqueries in MySQL is that they have unpredictable performance characteristics, or at least, complex enough that I can't predict them. For more information, see the "O(MxN)" comment in http://dev.mysql.com/doc/refman/5.5/en/subquery-restrictions.html and the advice on the short webpage http://dev.mysql.com/doc/refman/5.5/en/rewriting-subqueries.html .
For example - I create database and a table from cli and insert some data:
CREATE DATABASE testdb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
USE testdb;
CREATE TABLE test (id INT, str VARCHAR(100)) TYPE=innodb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
INSERT INTO test VALUES (9, 'some string');
Now I can do this and these examples do work (so - quotes don't affect anything it seems):
SELECT * FROM test WHERE id = '9';
INSERT INTO test VALUES ('11', 'some string');
So - in these examples I've selected a row by a string that actually stored as INT in mysql and then I inserted a string in a column that is INT.
I don't quite get why this works the way it works here. Why is string allowed to be inserted in an INT column?
Can I insert all MySQL data types as strings?
Is this behavior standard across different RDBMS?
MySQL is a lot like PHP, and will auto-convert data types as best it can. Since you're working with an int field (left-hand side), it'll try to transparently convert the right-hand-side of the argument into an int as well, so '9' just becomes 9.
Strictly speaking, the quotes are unnecessary, and force MySQL to do a typecasting/conversion, so it wastes a bit of CPU time. In practice, unless you're running a Google-sized operation, such conversion overhead is going to be microscopically small.
You should never put quotes around numbers. There is a valid reason for this.
The real issue comes down to type casting. When you put numbers inside quotes, it is treated as a string and MySQL must convert it to a number before it can execute the query. While this may take a small amount of time, the real problems start to occur when MySQL doesn't do a good job of converting your string. For example, MySQL will convert basic strings like '123' to the integer 123, but will convert some larger numbers, like '18015376320243459', to floating point. Since floating point can be rounded, your queries may return inconsistent results. Learn more about type casting here. Depending on your server hardware and software, these results will vary. MySQL explains this.
If you are worried about SQL injections, always check the value first and use PHP to strip out any non numbers. You can use preg_replace for this: preg_replace("/[^0-9]/", "", $string)
In addition, if you write your SQL queries with quotes they will not work on databases like PostgreSQL or Oracle.
Check this, you can understand better ...
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | test_no | index | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | NULL | 3126240 | Using where; Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| 1 | SIMPLE | test_no | const | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | const | 1 | Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql>
mysql>
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set, 1 warning (7.94 sec)
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
AFAIK it is standard, but it is considered bad practice because
- using it in a WHERE clause will prevent the optimizer from using indices (explain plan should show that)
- the database has to do additional work to convert the string to a number
- if you're using this for floating-point numbers ('9.4'), you'll run into trouble if client and server use different language settings (9.4 vs 9,4)
In short: don't do it (but YMMV)
This is not standard behavior.
For MySQL 5.5. this is the default SQL Mode
mysql> select ##sql_mode;
+------------+
| ##sql_mode |
+------------+
| |
+------------+
1 row in set (0.00 sec)
ANSI and TRADITIONAL are used more rigorously by Oracle and PostgreSQL. The SQL Modes MySQL permits must be set IF AND ONLY IF you want to make the SQL more ANSI-compliant. Otherwise, you don't have to touch a thing. I've never done so.
It depends on the column type!
if you run
SELECT * FROM `users` WHERE `username` = 0;
in mysql/maria-db you will get all the records where username IS NOT NULL.
Always quote values if the column is of type string (char, varchar,...) otherwise you'll get unexpected results!
You don't need to quote the numbers but it is always a good habit if you do as it is consistent.
The issue is, let's say that we have a table called users, which has a column called current_balance of type FLOAT, if you run this query:
UPDATE `users` SET `current_balance`='231608.09' WHERE `user_id`=9;
The current_balance field will be updated to 231608, because MySQL made a rounding, similarly if you try this query:
UPDATE `users` SET `current_balance`='231608.55' WHERE `user_id`=9;
The current_balance field will be updated to 231609