Mysql, as 1 query, if row does not exist, do other query - mysql

For a preferences module I have "system defaults", and "user preferences".
If there is no personal/user preference stored, then use the system default values instead.
Here is my system preferences table:
mysql> desc rbl;
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| id | varchar(3) | NO | PRI | | |
| rbl_url | varchar(100) | NO | | | |
| description | varchar(100) | NO | | | |
| is_default | tinyint(1) unsigned | YES | | 1 | |
+-------------+---------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Example data from system prefs:
mysql> select * from rbl;
+----+----------------------+------------------------------+------------+
| id | rbl_url | description | is_default |
+----+----------------------+------------------------------+------------+
| 1 | sbl-xbl.spamhaus.org | Spamhaus SBL-XBL | 1 |
| 2 | pbl.spamhaus.org | Spamhaus PBL | 1 |
| 3 | bl.spamcop.net | Spamcop Blacklist | 1 |
| 4 | rbl.example.com | Example RBL - not functional | 0 |
+----+----------------------+------------------------------+------------+
... and Query for system defaults:
mysql> SELECT rbl_url FROM rbl WHERE is_default='1';
+----------------------+
| rbl_url |
+----------------------+
| sbl-xbl.spamhaus.org |
| pbl.spamhaus.org |
| bl.spamcop.net |
+----------------------+
3 rows in set (0.01 sec)
So far so good.
OK. Now I need a user preferences table, and I came up with this:
mysql> desc rbl_pref;
+-----------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| domain_id | mediumint(8) unsigned | NO | | NULL | |
| rbl_id | tinyint(1) unsigned | NO | | NULL | |
+-----------+-----------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
(FYI - A "user" is represented by "domain_id". )
Let's view the preferences of a specific user who has personalized preferences saved:
mysql> select * from rbl_pref where domain_id='2277';
+----+-----------+--------+
| id | domain_id | rbl_id |
+----+-----------+--------+
| 4 | 2277 | 1 |
| 5 | 2277 | 2 |
| 6 | 2277 | 4 |
+----+-----------+--------+
3 rows in set (0.00 sec)
... again, but in a simpler format:
mysql> SELECT rbl.rbl_url FROM rbl_pref,rbl
WHERE rbl_pref.rbl_id=rbl.id AND domain_id='2277';
+----------------------+
| rbl_url |
+----------------------+
| sbl-xbl.spamhaus.org |
| pbl.spamhaus.org |
| rbl.example.com |
+----------------------+
3 rows in set (0.00 sec)
.. so far so good. If a user has stored a preference, a result is found.
The problem example now is, user 1999 has no custom preferences.
In place of the "Empty set" result, I want the system defaults.
mysql> SELECT rbl.rbl_url FROM rbl_pref,rbl
WHERE rbl_pref.rbl_id=rbl.id AND domain_id='1999';
Empty set (0.00 sec)
I was excited to find a very similar question:
mysql if row doesn't exist, grab default value
However after a couple of days trial and error and documentation review, I could not translate that answer over to here.
Like the above question, this must be done as a single MySQL query. I am not actually making this query from PHP, but from Exim macros (and it is a very picky language... best to feed it "one liners" as variable assignments, as I try to do here.. )
UPDATE: Tried one type of a UNION query suggested by #Biff McGriff, below. The table did not display in my comment reply, so here it is again:
mysql> SELECT rbl.rbl_url FROM rbl_pref,rbl
WHERE rbl_pref.rbl_id=rbl.id AND domain_id='2277'
UNION SELECT rbl_url FROM rbl WHERE is_default='1';
+----------------------+
| rbl_url |
+----------------------+
| sbl-xbl.spamhaus.org |
| pbl.spamhaus.org |
| rbl.example.com |
| bl.spamcop.net |
+----------------------+
4 rows in set (0.00 sec)
As you can see above, user 2277 did not opt in to rbl_id 3 (bl.spamcop.net), but that's showing up anyways.
What my UNION query seems to be doing is combining the result set. So user_pref acts as "in addition to" global defaults, and I was assuming/expecting I would get a result set matching either half of the query.
So my question now is, is it better (or possible, how) to solve this as "either result set" (either subquery on either side of the UNION)? OR do I really need a new field on rbl_pref, called for example "enabled". The latter seems to be more correct - that I need something in rbl_pref to explicitly designate opt-in or opt-out (other than the implicit "that pref is not here - no rbl_id=3 - in the over ridden user result SET")
UPDATE: All set, thanks #Imre L, and everyone else. I learned something through this example.

You should be able to use a left join and then coalesce the user's field with the default field.

NOTE: you have to enter the domain_id in two places.
SELECT rbl.rbl_url FROM rbl
JOIN rbl_pref ON rbl_pref.rbl_id=rbl.id AND domain_id=2277
UNION
SELECT rbl.rbl_url FROM rbl
WHERE rbl.is_default
AND NOT EXISTS (SELECT 1 FROM rbl_pref WHERE domain_id=2277 LIMIT 1)
;
Now one or the other side of UNION will be optimized away with impossible where
You also should not use varchar(3) for rbl.id but some sort of integer
and preferable same type as rbl_pref.rbl_id for which tinyint is too tiny
and when you compare integers fields in sql code domain_id='2277' you should not use ' or " around constants integers.
You can get away whith it mostly but sometimes it may confuse mysql optimizer.
Also for optimal performance and consistency i suggest you the add the index:
ALTER TABLE rbl_pref
ADD UNIQUE INDEX ux_domain_rbl (domain_id, rbl_id);

Related

MySQL/Python learning error

I am currently learning the basics of creating a database and doing some line of data analysis. i have been struggling to understand how to 'start coding'
so i finally decided to come up with a simple diary project to kick start my coding life.
Here is what i have so far, in terms of python so far nothing except i managed to link python and mysql.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| diary |
| mysql |
| performance_schema |
| sakila |
| sys |
| world |
+--------------------+
7 rows in set (0.00 sec)
mysql> desc diary;
+---------------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+-------------------+-------+
| TASK_COMMENTS | varchar(255) | YES | | NULL | |
| TASK | varchar(55) | NO | | NULL | |
| TS | timestamp | NO | | CURRENT_TIMESTAMP | |
+---------------+--------------+------+-----+-------------------+-------+
3 rows in set (0.00 sec)
mysql> select * from diary;
+---------------+---------------+---------------------+
| TASK_COMMENTS | TASK | TS |
+---------------+---------------+---------------------+
| NULL | Food Shopping | 2016-12-25 18:53:32 |
+---------------+---------------+---------------------+
1 row in set (0.00 sec)
here is the question finally:): Is it correct if i make the time stamp a primary key or is it more 'database error-free' to create an actual id instead of using automated timestamp as the pk?
also i am trying to make TASK_COMMENTS field not null aswell but i get this:
mysql> ALTER TABLE Diary MODIFY COLUMN TASK_COMMENTS VARCHAR(255) NOT NULL;
ERROR 1138 (22004): Invalid use of NULL value
Thank you for helping.
You can't alter the default value of a column to NOT NULL if a NULL value already exists. Either delete the row, or set it to something, then you can alter the column.
Using a timestamp is not a good idea for a primary key because it is very possible to get duplicate values. It's also easy not to, but it's just not a good idea. Use an id column, set the PK, and typically one would give it AUTO INCREMENT to ensure no duplicates.

Why the size of MySQL MyISAM table is the same after striping some data from VARCHAR column?

I need to reduce the size of MySQL database. I recoded some information which striped ';' and ':' from sources column (~10% char reduction). After doing so, the size of the table is exactly the same as before. How is it possible? I'm using MyISAM engine.
btw: Unfortunately, I cannot compress the tables with myisampack.
mysql> INSERT INTO test SELECT protid1, protid2, CS, REPLACE(REPLACE(sources, ':', ''), ';', '') FROM homologs_9606;
Query OK, 41917131 rows affected (4 min 11.30 sec)
Records: 41917131 Duplicates: 0 Warnings: 0
mysql> select TABLE_NAME name, ROUND(TABLE_ROWS/1e6, 3) 'million rows', ROUND(DATA_LENGTH/power(2,30), 3) 'data GB', ROUND(INDEX_LENGTH/power(2,30), 3) 'index GB' from information_schema.TABLES WHERE TABLE_NAME IN ('homologs_9606', 'test') ORDER BY TABLE_ROWS DESC LIMIT 10;
+---------------+--------------+---------+----------+
| name | million rows | data GB | index GB |
+---------------+--------------+---------+----------+
| test | 41.917 | 0.857 | 1.075 |
| homologs_9606 | 41.917 | 0.887 | 1.075 |
+---------------+--------------+---------+----------+
2 rows in set (0.01 sec)
mysql> select * from homologs_9606 limit 10;
+---------+---------+-------+--------------------------------+
| protid1 | protid2 | CS | sources |
+---------+---------+-------+--------------------------------+
| 5635338 | 1028608 | 0.000 | 10:,1 |
| 5644385 | 1028611 | 0.947 | 5:1,1;8:0.943,35;10:1,1;11:1,1 |
| 5652325 | 1028611 | 0.947 | 5:1,1;8:0.943,35;10:1,1;11:1,1 |
| 5641128 | 1028612 | 1.000 | 8:1,10 |
| 5636414 | 1028616 | 0.038 | 8:0.038,104;10:,1 |
| 5636557 | 1028616 | 0.000 | 8:,4 |
| 5637419 | 1028616 | 0.011 | 5:,1;8:0.011,91;10:,1 |
| 5641196 | 1028616 | 0.080 | 5:1,1;8:0.074,94;10:,1;11:,4 |
| 5642914 | 1028616 | 0.000 | 8:,3 |
| 5643778 | 1028616 | 0.056 | 8:0.057,70;10:,1 |
+---------+---------+-------+--------------------------------+
10 rows in set (4.55 sec)
mysql> select * from test limit 10;
+---------+---------+-------+-------------------------+
| protid1 | protid2 | CS | sources |
+---------+---------+-------+-------------------------+
| 5635338 | 1028608 | 0.000 | 10,1 |
| 5644385 | 1028611 | 0.947 | 51,180.943,35101,1111,1 |
| 5652325 | 1028611 | 0.947 | 51,180.943,35101,1111,1 |
| 5641128 | 1028612 | 1.000 | 81,10 |
| 5636414 | 1028616 | 0.038 | 80.038,10410,1 |
| 5636557 | 1028616 | 0.000 | 8,4 |
| 5637419 | 1028616 | 0.011 | 5,180.011,9110,1 |
| 5641196 | 1028616 | 0.080 | 51,180.074,9410,111,4 |
| 5642914 | 1028616 | 0.000 | 8,3 |
| 5643778 | 1028616 | 0.056 | 80.057,7010,1 |
+---------+---------+-------+-------------------------+
10 rows in set (0.00 sec)
mysql> describe test;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| protid1 | int(10) unsigned | YES | PRI | NULL | |
| protid2 | int(10) unsigned | YES | PRI | NULL | |
| CS | float(4,3) | YES | | NULL | |
| sources | varchar(100) | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> describe homologs_9606;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| protid1 | int(10) unsigned | NO | PRI | 0 | |
| protid2 | int(10) unsigned | NO | PRI | 0 | |
| CS | float(4,3) | YES | | NULL | |
| sources | varchar(100) | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
EDIT1: Added average column length.
mysql> select AVG(LENGTH(sources)) from test;
+----------------------+
| AVG(LENGTH(sources)) |
+----------------------+
| 5.2177 |
+----------------------+
1 row in set (10.04 sec)
mysql> select AVG(LENGTH(sources)) from homologs_9606;
+----------------------+
| AVG(LENGTH(sources)) |
+----------------------+
| 6.8792 |
+----------------------+
1 row in set (9.95 sec)
EDIT2: I was able to strip some more MB by setting NOT NULL to all columns.
mysql> drop table test
Query OK, 0 rows affected (0.42 sec)
mysql> CREATE table test (protid1 INT UNSIGNED NOT NULL DEFAULT '0', protid2 INT UNSIGNED NOT NULL DEFAULT '0', CS FLOAT(4,3) NOT NULL DEFAULT '0', sources VARCHAR(100) NOT NULL DEFAULT '0', PRIMARY KEY (protid1, protid2), KEY `idx_protid2` (protid2)) ENGINE=MyISAM CHARSET=ascii;
Query OK, 0 rows affected (0.06 sec)
mysql> INSERT INTO test SELECT protid1, protid2, CS, REPLACE(REPLACE(sources, ':', ''), ';', '') FROM homologs_9606;
Query OK, 41917131 rows affected (2 min 7.84 sec)
mysql> select TABLE_NAME name, ROUND(TABLE_ROWS/1e6, 3) 'million rows', ROUND(DATA_LENGTH/power(2,30), 3) 'data GB', ROUND(INDEX_LENGTH/power(2,30), 3) 'index GB' from information_schema.TABLES WHERE TABLE_NAME IN ('homologs_9606', 'test');
Records: 41917131 Duplicates: 0 Warnings: 0
+---------------+--------------+---------+----------+
| name | million rows | data GB | index GB |
+---------------+--------------+---------+----------+
| homologs_9606 | 41.917 | 0.887 | 1.075 |
| test | 41.917 | 0.842 | 1.075 |
+---------------+--------------+---------+----------+
2 rows in set (0.02 sec)
They are not exactly the same. Your query clearly shows that test is about 30 MB smaller than homologs_9606:
+---------------+--------------+---------+
| name | million rows | data GB |
+---------------+--------------+---------+
| test | 41.917 | 0.857 | <-- 0.857 < 0.887
| homologs_9606 | 41.917 | 0.887 |
+---------------+--------------+---------+
How much storage should we expect for your table? Let us check Data Type Storage Requirements:
INTEGER(10): 4 bytes
FLOAT(4): 4 bytes
VARCHAR(100): L+1
where L is the number of character bytes, which is usually one byte per character but sometimes more if you use a Unicode character set.
Your rows on average will need:
INTEGER + INTEGER + FLOAT + VARCHAR =
4 + 4 + 4 + (L + 1) = L + 13 bytes
We can infer your original average L as (0.887*1024^3 / 41917131) - 13 = 9.72. You say that you stripped 10% from sources, which means your new L is 9.72*0.9 = 8.75. That gives an expected new total storage requirement of ((8.75 + 13) * 41917131) / 1024^3 = 0.849 GB
I suspect that the difference (between 0.849 and 0.857) might be due to the fact that test have two columns set as NULLable that homologs_9606 do not have, but I do not know enough about the MyISAM engine to calculate this exactly. I can however guess! On a minimum you would need 1 bit per column per row to store a NULL state, which in your case means two bits per row or 2*41917131 = 83834262 bits = 10 479 283 bytes = 0.010 GB. The total 0.849+0.010 = 0.859 shoots slightly over the goal (about 2 MB too much). But I have made some roundings and your 10% figure is also an estimate so I am sure the rest is lost in translation.
Another reason could be if you use a Unicode character set on sources in test, in which case some characters may use more than one byte each, but since the NULLable columns seems to account for everything I do not think this is the case for your table.
Summary
Your two tables are not the same size, they differ by 30 MB.
The size of your new table is around the expected size.
You can save some more space in your new table by making protid1 and protid2 into NOT NULL columns.
The "table" is stored in a .MYD file. This file will never shrink due to UPDATEs or DELETEs. SHOW TABLE STATUS (or the equivalent query into information_schema) may show Data_length shrinking, but Data_free will increase.
You can shrink the .MYD file by doing OPTIMIZE TABLE. But that will copy the table over, thereby needing extra disk space during the process. And this action is only very rarely worth doing.
Changing to NOT NULL may not free up space if you had a lot of nulls -- "" takes 1 or 2 bytes for a VARCHAR because of the length. (And your code may need to handle '' differently than NULL.)
The space taken for each row is actually 1 byte more than previously mentioned -- this byte handles knowing whether the row exists or is the beginning of a hole.
For large text fields, I like to do this to save space. (This applies to both MyISAM and InnoDB.) Compress the text and store it into a BLOB column (instead of TEXT). For most text, that is a 3:1 shrinkage. It takes a little extra code and CPU time in the client, but it saves a lot of I/O in the server. Often the net result is "faster". I would not use it for the varchar you have; I would only do it on columns bigger than, say, 50 characters average.
Back to the original question. It sounds like there were only about 30M colons and semicolons in the entire table. Could it be that the first 10 rows are not representative?

MySQL: Slow avg query for 411M rows

I have a simple table (created by django) - engine InnoDB:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| correlation | double | NO | | NULL | |
| gene1_id | int(10) unsigned | NO | MUL | NULL | |
| gene2_id | int(10) unsigned | NO | MUL | NULL | |
+-------------+------------------+------+-----+---------+----------------+
The table has more than 411 million rows.
(The target table will have around 461M rows, 21471*21470 rows)
My main query looks like this, there might be up to 10 genes specified at most.
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
This query is very slow, it takes almost 2 mins to run:
21471 rows in set (1 min 11.03 sec)
Indexes (cardinality looks strange - too small?):
Non_unique| Key_name | Seq_in_index | Column_name | Collation | Cardinality |
0 | PRIMARY | 1 | id | A | 411512194 |
1 | c_gene1_id_6b1d81605661118_fk_genes_gene_entrez | 1 | gene1_id | A | 18 |
1 | c_gene2_id_2d0044eaa6fd8c0f_fk_genes_gene_entrez | 1 | gene2_id | A | 18 |
I just run select count(*) on that table and it took 22 mins:
select count(*) from predictions_genescorrelation;
+-----------+
| count(*) |
+-----------+
| 411512002 |
+-----------+
1 row in set (22 min 45.05 sec)
What could be wrong?
I suspect that mysql configuration is not set up right.
During the import of data I experienced problem with space, so that might also affected the database, although I ran check table later - it took 2hours and stated OK.
Additionally - the cardinality of the indexes look strange. I have set up smaller database locally and there values are totally different (254945589,56528,17).
Should I redo indexes?
What params should I check of MySQL?
My tables are set up as InnoDB, would MyISAM make any difference?
Thanks,
matali
https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/
SELECT COUNT(*) queries are very slow without WHERE clause or without SELECT COUNT(id) ... USE INDEX (PRIMARY).
to speedup this:
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
you should have composite key on (gene2_id, gene1_id, correlation) in that order. try
About index-cardinality: stats of Innodb tables are approximate, not accurate (sometimes insane). there even was (IS?) a bug-report https://bugs.mysql.com/bug.php?id=58382
Try to ANALIZE table and watch cardinality again

Why mysql matching rows don't update?

For some reasons it seems that the rows are not being updated. Any idea why this would happen ?
UPDATE hts SET assigned='1' AND Owner='ms' WHERE hid='217477'
Query OK, 0 rows affected (0.16 sec)
Rows matched: 1 Changed: 0 Warnings: 0
select assigned, Owner from hts where hid='217477';
+----------+-------+
| assigned | Owner |
+----------+-------+
| NULL | NULL |
+----------+-------+
Show columns from hts
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| hid | varchar(25) | YES | UNI | NULL | |
| assigned | int(11) | NO | | 0 | |
| Owner | varchar(10) | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
two things you can try..
first try removing the AND from the SET.. usually you do that with a comma..
UPDATE hts SET assigned=1, Owner='ms' WHERE hid='217477'
second try removing the quotes form the hid if it is an INT and not a VARCHAR
UPDATE hts SET assigned=1, Owner='ms' WHERE hid=217477
not sure why you are storing integers as strings.. when in doubt you should ALWAYS store data by its intended datatype.
RECOMMENDATION: change the datatypes if they are varchar to int. your update would look like this.
UPDATE hts SET assigned=1, Owner='ms' WHERE hid=217477
assigned should be integer as well as hid

Converting MySQL warning into an error

Is there any way to convert the warning that MySQL is issuing about an invalid datetime into a hard error? I've tried using SET sql_mode='TRADITIONAL'; which apparently is supposed to turn (some) things that are warnings into errors, but it does not have any effect here. This is MySQL 5.1.56. Something that works on a session-level would be ideal, but I'll take what I can get.
mysql> describe test_table2;
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| value | int(11) | YES | | NULL | |
| name | varchar(16) | YES | | NULL | |
| sometime | datetime | YES | | NULL | |
+----------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> select * from test_table2;
+-------+-------+---------------------+
| value | name | sometime |
+-------+-------+---------------------+
| 1 | one | 2002-09-01 10:00:00 |
| 2 | two | 2002-09-02 11:00:00 |
| 3 | three | 2002-09-03 12:00:00 |
| 4 | four | 2002-01-04 13:00:00 |
| 5 | five | 2002-01-05 14:00:00 |
+-------+-------+---------------------+
5 rows in set (0.00 sec)
mysql> select * from test_table2 where sometime = 'foo';
Empty set, 2 warnings (0.00 sec)
Warning (Code 1292): Incorrect datetime value: 'foo' for column 'sometime' at row 1
Warning (Code 1292): Incorrect datetime value: 'foo' for column 'sometime' at row 1
With SET sql_mode='TRADITIONAL', doing an INSERT with an invalid date causes an error, but doing a SELECT with an invalid date still causes a warning. You can trigger the error by passing the (possibly invalid) date value to this query first:
CREATE TEMPORARY TABLE IF NOT EXISTS date_guard (date DATE) SELECT 'foo' AS date;
where 'foo' is the date value you want to validate.
Who is supposed to see the error?
If this is a fixed string 'foo' just try converting 'foo' to a date and see if you can a valid result (i.e. not 00-00-000). Do a pre-query to check the validity of the date, and then continue after.
I have not been able to make MySQL give an error in this case (or even convert the invalid date to a NULL - it insists on making it 00-00-0000).