MySQL SELECT column FROM table WHERE column IS NULL - mysql

This is not working for me, using Toad for MySQL. I'm using MySQL 5.5 from XAMPP 1.83 on Windows.
I have a table with column InstitutionState defined as VARCHAR(20). Some rows appear to have this column "empty", meaning LENGTH(InstitutionState) = 0.
If I SELECT ... WHERE InstitutionState IS NULL, I get no rows.
If I SELECT ... WHERE InstitutionState = '', It works. Why is this?
Here's sample data.
mysql> select InstitutionState, ISNULL(InstitutionState), length(InstitutionState)
-> from institution;
+----------------------+--------------------------+--------------------------+
| InstitutionState | ISNULL(InstitutionState) | length(InstitutionState) |
+----------------------+--------------------------+--------------------------+
| NY | 0 | 2 |
| NY | 0 | 2 |
| NY | 0 | 2 |
| IL | 0 | 2 |
| NC | 0 | 2 |
| TX | 0 | 2 |
| DC | 0 | 2 |
| NY | 0 | 2 |
| CA | 0 | 2 |
| | 0 | 0 |
| KS | 0 | 2 |
| | 0 | 0 |
| NY | 0 | 2 |
| ND | 0 | 2 |
| PA | 0 | 2 |
| WI | 0 | 2 |
| PA | 0 | 2 |
| MD | 0 | 2 |
| IN | 0 | 2 |
| PA | 0 | 2 |
| NE | 0 | 2 |
| ID | 0 | 2 |
| CA | 0 | 2 |
| | 0 | 0 |
| FL | 0 | 2 |
| MO | 0 | 2 |
| | 0 | 0 |
| OH | 0 | 2 |
| IL | 0 | 2 |
| OH | 0 | 2 |

Conceptually, NULL means “a missing unknown value”
OR
NULL means no data, emptiness, nothing, unknown, missing value, etc. The value empty string means an empty string.
Confusing the NULL value and the empty string may cause data integrity problem.
What NULL means in the context of a relational database is that the pointer to the character field is set to 0x00 in the row's header, therefore no data to access.
NULL and '' take up the exact same number of bytes on the disk.
Hence, there is no space savings.
You can add an index on a column that can have NULL values. Otherwise, you must declare an indexed column NOT NULL, and you cannot insert NULL into the column.
Furthermore, allowing NULL is a less restrictive configuration than disallowing NULL. It only follows that if any entity integrity issues are to arise, it would be from FEWER checks that the data are sound. Therefore, logically, allowing NULL should always have a good, solid reason, and disallowing NULL is a good practice.
mysql> INSERT INTO ... (InstitutionState) VALUES (NULL);
mysql> INSERT INTO ... (InstitutionState) VALUES ('');
Both statements will insert a value into the InstitutionState column, but the first inserts a NULL value and the second inserts an empty string. The meaning of the first can be regarded as “InstitutionState is not known” and the meaning of the second can be regarded as “the Institution is known to have no state, and thus no InstitutionState.”
To search for column values that are NULL, you cannot use an expr = NULL test. The following statement returns no rows, because expr = NULL is never true for any expression:
mysql> SELECT ... WHERE InstitutionState = NULL;
To look for NULL values, you must use the IS NULL test. The following statements show how to find the NULL InstitutionState and the empty InstitutionState:
mysql> SELECT ... WHERE InstitutionState IS NULL;
mysql> SELECT ... WHERE InstitutionState = '';
mysql> SELECT 1 IS NULL, 1 IS NOT NULL;
+-----------+---------------+
| 1 IS NULL | 1 IS NOT NULL |
+-----------+---------------+
| 0 | 1 |
+-----------+---------------+
You cannot use arithmetic comparison operators such as =, <, or <> to test for NULL. To demonstrate this for yourself, try the following query:
mysql> SELECT 1 = NULL, 1 <> NULL, 1 < NULL, 1 > NULL;
+----------+-----------+----------+----------+
| 1 = NULL | 1 <> NULL | 1 < NULL | 1 > NULL |
+----------+-----------+----------+----------+
| NULL | NULL | NULL | NULL |
+----------+-----------+----------+----------+
In addition,
In MyISAM MYSQL you save one bit per row not using NULL.
While a NULL itself does not require any storage space, NDBCLUSTER reserves 4 bytes per row if the table definition contains any columns defined as NULL, up to 32 NULL columns. (If a MySQL Cluster table is defined with more than 32 NULL columns up to 64 NULL columns, then 8 bytes per row is reserved.)
It also makes the database work faster.
To get '' AND NULLs,
We would use:
SELECT ... WHERE IFNULL(InstitutionState , '') = '';
Which says if the field is NULL pretend that it is an empty string i.e. ''.

The NULL value isn't an actual value in SQL, but the lack of a value. One can think of it as unknown. For that reason, not even NULL is equal to another null.
Null values are actually implemented as a bitmask on the row, which indicate which columns have null values. So, these values aren't even stored on the heap table in the same way as other values, which is one of the reasons why you have to explicitly declare a column as nullable.
The string '' is actually known. It's known to be ''. This isn't null, nor is that null bit set on the tuple.
For this reason, querying for rows where a column IS NULL will not return rows with a value of '' nor will querying for rows where a column is '' return null values. They are two completely different things.
There are actually a few exceptions. For example, in Oracle, any reference to '' will be implicitly cast to NULL. This behavior was implemented back in the 80s before a real SQL standard, so Oracle has had to maintain it for backwards compatibility reasons.

Related

MySQL: Strange behavior of UPDATE query (ERROR 1062 Duplicate entry)

I have a MySQL database the stores news articles with the publications date (just day information), the source, and category. Based on these I want to generate a table that holds the article counts w.r.t. to these 3 parameters.
Since for some combinations of these 3 parameters there might be no article, a simple GROUP BY won't do. I therefore first generate a table news_article_counts with all possible combinations of the 3 parameters, and an default article_count of 0 -- like this:
SELECT * FROM news_article_counts;
+--------------+------------+----------+---------------+
| published_at | source | category | article_count |
+------------- +------------+----------+---------------+
| 2016-08-05 | 1826089206 | 0 | 0 |
| 2016-08-05 | 1826089206 | 1 | 0 |
| 2016-08-05 | 1826089206 | 2 | 0 |
| 2016-08-05 | 1826089206 | 3 | 0 |
| 2016-08-05 | 1826089206 | 4 | 0 |
| ... | ... | ... | ... |
+--------------+------------+----------+---------------+
For testing, I now created a temporary table tmp as the GROUP BY result from the original news article table:
SELECT * FROM tmp LIMIT 6;
+--------------+------------+----------+-----+
| published_at | source | category | cnt |
+--------------+------------+----------+-----+
| 2016-08-05 | 1826089206 | 3 | 1 |
| 2003-09-19 | 1826089206 | 4 | 1 |
| 2005-08-08 | 1826089206 | 3 | 1 |
| 2008-07-22 | 1826089206 | 4 | 1 |
| 2008-11-26 | 1826089206 | 8 | 1 |
| ... | ... | ... | ... |
+--------------+------------+----------+-----+
Given these two tables, the following query works as expected:
SELECT * FROM news_article_counts c, tmp t
WHERE c.published_at = t.published_at AND c.source = t.source AND c.category = t.category;
But now I need to update the article_count of table news_article_counts with the values in table tmp where the 3 parameters match up. For this I'm using the following query (I've tried different ways but with the same results):
UPDATE
news_article_counts c
INNER JOIN
tmp t
ON
c.published_at = t.published_at AND
c.source = t.source AND
c.category = t.category
SET
c.article_count = t.cnt;
Executing this query yields this error:
ERROR 1062 (23000): Duplicate entry '2018-04-07 14:46:17-1826089206-1' for key 'uniqueIndex'
uniqueIndex is a joint index over published_at, source, category of table news_article_counts. But this shouldn't be a problem since I do not -- as far as I can tell -- update any of those 3 values, only article_count.
What confuses me most is that in the error it mentions the timestamp I executed the query (here: 2018-04-07 14:46:17). I have no absolutely idea where this comes into play. In fact, some rows in news_article_counts now have 2018-04-07 14:46:17 as value for published_at. While this explains the error, I cannot see why published_at gets overwritten with the current timestamp. There is no ON UPDATE CURRENT_TIMESTAMP on this column; see:
CREATE TABLE IF NOT EXISTS `test`.`news_article_counts` (
`published_at` TIMESTAMP NOT NULL,
`source` INT UNSIGNED NOT NULL,
`category` INT UNSIGNED NOT NULL,
`article_count` INT UNSIGNED NOT NULL DEFAULT 0,
UNIQUE INDEX `uniqueIndex` (`published_at` ASC, `source` ASC, `category` ASC))
ENGINE = MyISAM
DEFAULT CHARACTER SET = utf8mb4;
What am I missing here?
UPDATE 1: I actually checked the table definition of news_article_counts in the database. And there's indeed the following:
mysql> SHOW COLUMNS FROM news_article_counts;
+---------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+-------------------+-----------------------------+
| published_at | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| source | int(10) unsigned | NO | | NULL | |
| category | int(10) unsigned | NO | | NULL | |
| article_count | int(10) unsigned | NO | | 0 | |
+---------------+------------------+------+-----+-------------------+-----------------------------+
But why is on update CURRENT_TIMESTAMP set. I double and triple-checked my CREATE TABLE statement. I removed the joint index, I added an artificial primary key (auto_increment). Nothing help. I've even tried to explicitly remove these attributes from published_at with:
ALTER TABLE `news_article_counts` CHANGE `published_at` `published_at` TIMESTAMP NOT NULL;
Nothing seems to work for me.
It looks like you have the explicit_defaults_for_timestamp system variable disabled. One of the effects of this is:
The first TIMESTAMP column in a table, if not explicitly declared with the NULL attribute or an explicit DEFAULT or ON UPDATE attribute, is automatically declared with the DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP attributes.
You could try enabling this system variable, but that could potentially impact other applications. I think it only takes effect when you're actually creating a table, so it shouldn't affect any existing tables.
If you don't to make a system-level change like this, you could add an explicit DEFAULT attribute to the published_at column of this table, then it won't automatically add ON UPDATE.

Why the size of MySQL MyISAM table is the same after striping some data from VARCHAR column?

I need to reduce the size of MySQL database. I recoded some information which striped ';' and ':' from sources column (~10% char reduction). After doing so, the size of the table is exactly the same as before. How is it possible? I'm using MyISAM engine.
btw: Unfortunately, I cannot compress the tables with myisampack.
mysql> INSERT INTO test SELECT protid1, protid2, CS, REPLACE(REPLACE(sources, ':', ''), ';', '') FROM homologs_9606;
Query OK, 41917131 rows affected (4 min 11.30 sec)
Records: 41917131 Duplicates: 0 Warnings: 0
mysql> select TABLE_NAME name, ROUND(TABLE_ROWS/1e6, 3) 'million rows', ROUND(DATA_LENGTH/power(2,30), 3) 'data GB', ROUND(INDEX_LENGTH/power(2,30), 3) 'index GB' from information_schema.TABLES WHERE TABLE_NAME IN ('homologs_9606', 'test') ORDER BY TABLE_ROWS DESC LIMIT 10;
+---------------+--------------+---------+----------+
| name | million rows | data GB | index GB |
+---------------+--------------+---------+----------+
| test | 41.917 | 0.857 | 1.075 |
| homologs_9606 | 41.917 | 0.887 | 1.075 |
+---------------+--------------+---------+----------+
2 rows in set (0.01 sec)
mysql> select * from homologs_9606 limit 10;
+---------+---------+-------+--------------------------------+
| protid1 | protid2 | CS | sources |
+---------+---------+-------+--------------------------------+
| 5635338 | 1028608 | 0.000 | 10:,1 |
| 5644385 | 1028611 | 0.947 | 5:1,1;8:0.943,35;10:1,1;11:1,1 |
| 5652325 | 1028611 | 0.947 | 5:1,1;8:0.943,35;10:1,1;11:1,1 |
| 5641128 | 1028612 | 1.000 | 8:1,10 |
| 5636414 | 1028616 | 0.038 | 8:0.038,104;10:,1 |
| 5636557 | 1028616 | 0.000 | 8:,4 |
| 5637419 | 1028616 | 0.011 | 5:,1;8:0.011,91;10:,1 |
| 5641196 | 1028616 | 0.080 | 5:1,1;8:0.074,94;10:,1;11:,4 |
| 5642914 | 1028616 | 0.000 | 8:,3 |
| 5643778 | 1028616 | 0.056 | 8:0.057,70;10:,1 |
+---------+---------+-------+--------------------------------+
10 rows in set (4.55 sec)
mysql> select * from test limit 10;
+---------+---------+-------+-------------------------+
| protid1 | protid2 | CS | sources |
+---------+---------+-------+-------------------------+
| 5635338 | 1028608 | 0.000 | 10,1 |
| 5644385 | 1028611 | 0.947 | 51,180.943,35101,1111,1 |
| 5652325 | 1028611 | 0.947 | 51,180.943,35101,1111,1 |
| 5641128 | 1028612 | 1.000 | 81,10 |
| 5636414 | 1028616 | 0.038 | 80.038,10410,1 |
| 5636557 | 1028616 | 0.000 | 8,4 |
| 5637419 | 1028616 | 0.011 | 5,180.011,9110,1 |
| 5641196 | 1028616 | 0.080 | 51,180.074,9410,111,4 |
| 5642914 | 1028616 | 0.000 | 8,3 |
| 5643778 | 1028616 | 0.056 | 80.057,7010,1 |
+---------+---------+-------+-------------------------+
10 rows in set (0.00 sec)
mysql> describe test;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| protid1 | int(10) unsigned | YES | PRI | NULL | |
| protid2 | int(10) unsigned | YES | PRI | NULL | |
| CS | float(4,3) | YES | | NULL | |
| sources | varchar(100) | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> describe homologs_9606;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| protid1 | int(10) unsigned | NO | PRI | 0 | |
| protid2 | int(10) unsigned | NO | PRI | 0 | |
| CS | float(4,3) | YES | | NULL | |
| sources | varchar(100) | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
EDIT1: Added average column length.
mysql> select AVG(LENGTH(sources)) from test;
+----------------------+
| AVG(LENGTH(sources)) |
+----------------------+
| 5.2177 |
+----------------------+
1 row in set (10.04 sec)
mysql> select AVG(LENGTH(sources)) from homologs_9606;
+----------------------+
| AVG(LENGTH(sources)) |
+----------------------+
| 6.8792 |
+----------------------+
1 row in set (9.95 sec)
EDIT2: I was able to strip some more MB by setting NOT NULL to all columns.
mysql> drop table test
Query OK, 0 rows affected (0.42 sec)
mysql> CREATE table test (protid1 INT UNSIGNED NOT NULL DEFAULT '0', protid2 INT UNSIGNED NOT NULL DEFAULT '0', CS FLOAT(4,3) NOT NULL DEFAULT '0', sources VARCHAR(100) NOT NULL DEFAULT '0', PRIMARY KEY (protid1, protid2), KEY `idx_protid2` (protid2)) ENGINE=MyISAM CHARSET=ascii;
Query OK, 0 rows affected (0.06 sec)
mysql> INSERT INTO test SELECT protid1, protid2, CS, REPLACE(REPLACE(sources, ':', ''), ';', '') FROM homologs_9606;
Query OK, 41917131 rows affected (2 min 7.84 sec)
mysql> select TABLE_NAME name, ROUND(TABLE_ROWS/1e6, 3) 'million rows', ROUND(DATA_LENGTH/power(2,30), 3) 'data GB', ROUND(INDEX_LENGTH/power(2,30), 3) 'index GB' from information_schema.TABLES WHERE TABLE_NAME IN ('homologs_9606', 'test');
Records: 41917131 Duplicates: 0 Warnings: 0
+---------------+--------------+---------+----------+
| name | million rows | data GB | index GB |
+---------------+--------------+---------+----------+
| homologs_9606 | 41.917 | 0.887 | 1.075 |
| test | 41.917 | 0.842 | 1.075 |
+---------------+--------------+---------+----------+
2 rows in set (0.02 sec)
They are not exactly the same. Your query clearly shows that test is about 30 MB smaller than homologs_9606:
+---------------+--------------+---------+
| name | million rows | data GB |
+---------------+--------------+---------+
| test | 41.917 | 0.857 | <-- 0.857 < 0.887
| homologs_9606 | 41.917 | 0.887 |
+---------------+--------------+---------+
How much storage should we expect for your table? Let us check Data Type Storage Requirements:
INTEGER(10): 4 bytes
FLOAT(4): 4 bytes
VARCHAR(100): L+1
where L is the number of character bytes, which is usually one byte per character but sometimes more if you use a Unicode character set.
Your rows on average will need:
INTEGER + INTEGER + FLOAT + VARCHAR =
4 + 4 + 4 + (L + 1) = L + 13 bytes
We can infer your original average L as (0.887*1024^3 / 41917131) - 13 = 9.72. You say that you stripped 10% from sources, which means your new L is 9.72*0.9 = 8.75. That gives an expected new total storage requirement of ((8.75 + 13) * 41917131) / 1024^3 = 0.849 GB
I suspect that the difference (between 0.849 and 0.857) might be due to the fact that test have two columns set as NULLable that homologs_9606 do not have, but I do not know enough about the MyISAM engine to calculate this exactly. I can however guess! On a minimum you would need 1 bit per column per row to store a NULL state, which in your case means two bits per row or 2*41917131 = 83834262 bits = 10 479 283 bytes = 0.010 GB. The total 0.849+0.010 = 0.859 shoots slightly over the goal (about 2 MB too much). But I have made some roundings and your 10% figure is also an estimate so I am sure the rest is lost in translation.
Another reason could be if you use a Unicode character set on sources in test, in which case some characters may use more than one byte each, but since the NULLable columns seems to account for everything I do not think this is the case for your table.
Summary
Your two tables are not the same size, they differ by 30 MB.
The size of your new table is around the expected size.
You can save some more space in your new table by making protid1 and protid2 into NOT NULL columns.
The "table" is stored in a .MYD file. This file will never shrink due to UPDATEs or DELETEs. SHOW TABLE STATUS (or the equivalent query into information_schema) may show Data_length shrinking, but Data_free will increase.
You can shrink the .MYD file by doing OPTIMIZE TABLE. But that will copy the table over, thereby needing extra disk space during the process. And this action is only very rarely worth doing.
Changing to NOT NULL may not free up space if you had a lot of nulls -- "" takes 1 or 2 bytes for a VARCHAR because of the length. (And your code may need to handle '' differently than NULL.)
The space taken for each row is actually 1 byte more than previously mentioned -- this byte handles knowing whether the row exists or is the beginning of a hole.
For large text fields, I like to do this to save space. (This applies to both MyISAM and InnoDB.) Compress the text and store it into a BLOB column (instead of TEXT). For most text, that is a 3:1 shrinkage. It takes a little extra code and CPU time in the client, but it saves a lot of I/O in the server. Often the net result is "faster". I would not use it for the varchar you have; I would only do it on columns bigger than, say, 50 characters average.
Back to the original question. It sounds like there were only about 30M colons and semicolons in the entire table. Could it be that the first 10 rows are not representative?

Why mysql matching rows don't update?

For some reasons it seems that the rows are not being updated. Any idea why this would happen ?
UPDATE hts SET assigned='1' AND Owner='ms' WHERE hid='217477'
Query OK, 0 rows affected (0.16 sec)
Rows matched: 1 Changed: 0 Warnings: 0
select assigned, Owner from hts where hid='217477';
+----------+-------+
| assigned | Owner |
+----------+-------+
| NULL | NULL |
+----------+-------+
Show columns from hts
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| hid | varchar(25) | YES | UNI | NULL | |
| assigned | int(11) | NO | | 0 | |
| Owner | varchar(10) | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
two things you can try..
first try removing the AND from the SET.. usually you do that with a comma..
UPDATE hts SET assigned=1, Owner='ms' WHERE hid='217477'
second try removing the quotes form the hid if it is an INT and not a VARCHAR
UPDATE hts SET assigned=1, Owner='ms' WHERE hid=217477
not sure why you are storing integers as strings.. when in doubt you should ALWAYS store data by its intended datatype.
RECOMMENDATION: change the datatypes if they are varchar to int. your update would look like this.
UPDATE hts SET assigned=1, Owner='ms' WHERE hid=217477
assigned should be integer as well as hid

Delete a row (Record) in MySQL

Here is my code to delete my first row.
But not effected!
mysql> select * from myt;
+--------+--------------+------+---------+
| Fname | Lname | age | phone |
+--------+--------------+------+---------+
| NULL | Jackson | NULL | NULL |
| stive | NULL | NULL | NULL |
| ghbfgf | rtrgf | 22 | 111 |
| zxas | zxa | 30 | 6547812 |
| wewew | uytree | 22 | 658478 |
+--------+--------------+------+---------+
5 rows in set (0.00 sec)
mysql> delete from myt
-> Where Fname = "NULL";
Query OK, 0 rows affected (0.00 sec)
Thanks!
use IS NULL.
You cannot use arithmetic comparison operators such as =, <, or <> to test for NULL.
DELETE FROM myt WHERE Fname IS NULL
Working with NULL Values
NULL is not a value.
NULL means nothing is present.
So usage of FNAME = "NULL" is wrong.
delete from myt Where Fname IS NULL;
Your first row is NULL (none) not "NULL"
NULL is not a value in RDBMS; it is a marker for a missing value. When you are using "NULL" it denotes a string value. You can simply use "IS NULL". Hope this helps.

Mysql, as 1 query, if row does not exist, do other query

For a preferences module I have "system defaults", and "user preferences".
If there is no personal/user preference stored, then use the system default values instead.
Here is my system preferences table:
mysql> desc rbl;
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| id | varchar(3) | NO | PRI | | |
| rbl_url | varchar(100) | NO | | | |
| description | varchar(100) | NO | | | |
| is_default | tinyint(1) unsigned | YES | | 1 | |
+-------------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Example data from system prefs:
mysql> select * from rbl;
+----+----------------------+------------------------------+------------+
| id | rbl_url | description | is_default |
+----+----------------------+------------------------------+------------+
| 1 | sbl-xbl.spamhaus.org | Spamhaus SBL-XBL | 1 |
| 2 | pbl.spamhaus.org | Spamhaus PBL | 1 |
| 3 | bl.spamcop.net | Spamcop Blacklist | 1 |
| 4 | rbl.example.com | Example RBL - not functional | 0 |
+----+----------------------+------------------------------+------------+
... and Query for system defaults:
mysql> SELECT rbl_url FROM rbl WHERE is_default='1';
+----------------------+
| rbl_url |
+----------------------+
| sbl-xbl.spamhaus.org |
| pbl.spamhaus.org |
| bl.spamcop.net |
+----------------------+
3 rows in set (0.01 sec)
So far so good.
OK. Now I need a user preferences table, and I came up with this:
mysql> desc rbl_pref;
+-----------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| domain_id | mediumint(8) unsigned | NO | | NULL | |
| rbl_id | tinyint(1) unsigned | NO | | NULL | |
+-----------+-----------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
(FYI - A "user" is represented by "domain_id". )
Let's view the preferences of a specific user who has personalized preferences saved:
mysql> select * from rbl_pref where domain_id='2277';
+----+-----------+--------+
| id | domain_id | rbl_id |
+----+-----------+--------+
| 4 | 2277 | 1 |
| 5 | 2277 | 2 |
| 6 | 2277 | 4 |
+----+-----------+--------+
3 rows in set (0.00 sec)
... again, but in a simpler format:
mysql> SELECT rbl.rbl_url FROM rbl_pref,rbl
WHERE rbl_pref.rbl_id=rbl.id AND domain_id='2277';
+----------------------+
| rbl_url |
+----------------------+
| sbl-xbl.spamhaus.org |
| pbl.spamhaus.org |
| rbl.example.com |
+----------------------+
3 rows in set (0.00 sec)
.. so far so good. If a user has stored a preference, a result is found.
The problem example now is, user 1999 has no custom preferences.
In place of the "Empty set" result, I want the system defaults.
mysql> SELECT rbl.rbl_url FROM rbl_pref,rbl
WHERE rbl_pref.rbl_id=rbl.id AND domain_id='1999';
Empty set (0.00 sec)
I was excited to find a very similar question:
mysql if row doesn't exist, grab default value
However after a couple of days trial and error and documentation review, I could not translate that answer over to here.
Like the above question, this must be done as a single MySQL query. I am not actually making this query from PHP, but from Exim macros (and it is a very picky language... best to feed it "one liners" as variable assignments, as I try to do here.. )
UPDATE: Tried one type of a UNION query suggested by #Biff McGriff, below. The table did not display in my comment reply, so here it is again:
mysql> SELECT rbl.rbl_url FROM rbl_pref,rbl
WHERE rbl_pref.rbl_id=rbl.id AND domain_id='2277'
UNION SELECT rbl_url FROM rbl WHERE is_default='1';
+----------------------+
| rbl_url |
+----------------------+
| sbl-xbl.spamhaus.org |
| pbl.spamhaus.org |
| rbl.example.com |
| bl.spamcop.net |
+----------------------+
4 rows in set (0.00 sec)
As you can see above, user 2277 did not opt in to rbl_id 3 (bl.spamcop.net), but that's showing up anyways.
What my UNION query seems to be doing is combining the result set. So user_pref acts as "in addition to" global defaults, and I was assuming/expecting I would get a result set matching either half of the query.
So my question now is, is it better (or possible, how) to solve this as "either result set" (either subquery on either side of the UNION)? OR do I really need a new field on rbl_pref, called for example "enabled". The latter seems to be more correct - that I need something in rbl_pref to explicitly designate opt-in or opt-out (other than the implicit "that pref is not here - no rbl_id=3 - in the over ridden user result SET")
UPDATE: All set, thanks #Imre L, and everyone else. I learned something through this example.
You should be able to use a left join and then coalesce the user's field with the default field.
NOTE: you have to enter the domain_id in two places.
SELECT rbl.rbl_url FROM rbl
JOIN rbl_pref ON rbl_pref.rbl_id=rbl.id AND domain_id=2277
UNION
SELECT rbl.rbl_url FROM rbl
WHERE rbl.is_default
AND NOT EXISTS (SELECT 1 FROM rbl_pref WHERE domain_id=2277 LIMIT 1)
;
Now one or the other side of UNION will be optimized away with impossible where
You also should not use varchar(3) for rbl.id but some sort of integer
and preferable same type as rbl_pref.rbl_id for which tinyint is too tiny
and when you compare integers fields in sql code domain_id='2277' you should not use ' or " around constants integers.
You can get away whith it mostly but sometimes it may confuse mysql optimizer.
Also for optimal performance and consistency i suggest you the add the index:
ALTER TABLE rbl_pref
ADD UNIQUE INDEX ux_domain_rbl (domain_id, rbl_id);