"is not null" vs boolean MySQL - Performance - mysql

I have a column that is a datetime, converted_at.
I plan on making calls that check WHERE converted_at is not null very often. As such, I'm considering having a boolean field converted. Is their a significant performance difference between checking if a field is not null vs if it is false?
Thanks.

If things are answerable in a single field you favour that over to splitting the same thing into two fields. This creates more infrastructure, which, in your case is avoidable.
As to the nub of the question, I believe most database implementation, MySQL included, will have an internal flag which is boolean anyways for representing the NULLability of a field.
You should rely that this is done for you correctly.
As to performance, the bigger question should be on profiling the typical queries that you run on your database and where you created appropriate indexes and analyze table on to improve execution plans and which indexes are used during queries. This question will have a far bigger impact to performance.

Using WHERE converted_at is not null or WHERE converted = FALSE will probably be the same in matters of query performance.
But if you have this additional bit field, that is used to store whether the converted_at field is Null or not, you'll have to somehow maintain integrity (via triggers?) whenever a new row is added and every time the column is updated. So, this is a de-normalization. And also means more complicated code. Moreover, you'll have at least one more index on the table (which means a bit slower Insert/Update/Delete operations).
Therefore, I don't think it's good to add this bit field.
If you can change the column in question from NULL to NOT NULL (possibly by normalizing the table), you may get some performance gain (at the cost/gain of having more tables).

I had the same question for my own usage. So I decided to put it to the test.
So I created all the fields required for the 3 possibilities I imagined:
# option 1
ALTER TABLE mytable ADD deleted_at DATETIME NULL;
ALTER TABLE mytable ADD archived_at DATETIME NULL;
# option 2
ALTER TABLE mytable ADD deleted boolean NOT NULL DEFAULT 0;
ALTER TABLE mytable ADD archived boolean NOT NULL DEFAULT 0;
# option 3
ALTER TABLE mytable ADD invisibility TINYINT(1) UNSIGNED NOT NULL DEFAULT 0
COMMENT '4 values possible' ;
The last is a bitfield where 1=archived, 2=deleted, 3=deleted + archived
First difference, you have to create indexes for optioon 2 and 3.
CREATE INDEX mytable_deleted_IDX USING BTREE ON mytable (deleted) ;
CREATE INDEX mytable_archived_IDX USING BTREE ON mytable (archived) ;
CREATE INDEX mytable_invisibility_IDX USING BTREE ON mytable (invisibility) ;
Then I tried all of the options using a real life SQL request, on 13k records on the main table, here is how it looks
SELECT *
FROM mytable
LEFT JOIN table1 ON mytable.id_qcm = table1.id_qcm
LEFT JOIN table2 ON table2.id_class = mytable.id_class
INNER JOIN user ON mytable.id_user = user.id_user
where mytable.id_user=1
and mytable.deleted_at is null and mytable.archived_at is null
# and deleted=0
# and invisibility=0
order BY id_mytable
Used alternatively the above commented filter options.
Used mysql 5.7.21-1 debian9
My conclusion:
The "is null" solution (option 1) is a bit faster, or at least same performance.
The 2 others ("deleted=0" and "invisibility=0") seems in average a bit slower.
But the nullable fields option have decisive advantages: No index to create, easier to update, easier to query. And less storage space used.
(additionnaly inserts & updates virtually should be faster as well, since mysql do not need to update indexes, but you never would be able to notice that).
So you should use the nullable datatime fields option.

Related

Default scope of a null datetime. How to index?

In my app, there is a very large table (>40 million rows) that will have a default scope set on the model.
The default scope will look at a specific DATETIME column and check that it IS NULL. The DATETIME column will probably never be used to search for a specific date. Should I be using an index here, and if so, how?
The WHERE <column_name> IS NULL will be added to almost every single query made on this table from the app. On the one hand, since the column is essentially being treated as a boolean, I am tempted to think that it should not be indexed. However, it seems that with such a huge table, an index should provide value, especially for a query like
SELECT COUNT(*) FROM <table_name> WHERE <column_name> IS NULL
I am also a bit confused about how I should index, since the WHERE clause will be appended to every query. I do not think that it would make sense to created an index on all columns of this table. This is being done in MySQL. Thanks

MySQL UPDATE performance with or without WHERE IS NULL

Lets say I have 300 million users in my mysql database (innodb). Some of them have username set, while some of them don't (username is null), and lets say 60% of them are not null (have actual varchar value).
If I wanted to set all 300 million users' usernames to null, would
UPDATE users SET username = null WHERE username IS NOT NULL
perform better than
UPDATE users SET username = null - without a WHERE clause, just blanket null them all?
I know that WHERE always performs faster when setting actual values, but somehow null fields made me think about this.
Both will take terrrrribly long. I suggest you do it in 'chunks' as described in my blog here:
http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks
Here is another solution:
ALTER TABLE t DROP COLUMN c;
ALTER TABLE t ADD COLUMN c VARCHAR(...) DEFAULT NULL;
Each ALTER will copy the table over once without writing to the ROLLBACK log (etc), thereby being significantly faster. (I doubt if you can combine the two into a single statement.)
But first, let's back up and discuss why you need to do this unusual task. It is likely to indicate a poor schema design. And rethinking the design may be a better approach.

Update field value but also apply cap?

Is it possible to update the value of a field but cap it at the same time?:
UPDATE users SET num_apples=num_apples-1 WHERE xxx = ?
I don't want to the field "num_apples" to fall below zero. Can I do that in one operation?
Thanks
----- Update ------------------
UPDATE users SET num_apples=num_apples-1 WHERE user_id = 123 AND num_apples > 0;
If I only have an index on "user_id", and not "num_apples", is that going to be bad for performance? I'm not sure how mysql implements this operation. I'm hoping that the WHERE on the user_id part makes it fast. I have to perform this operation somewhat frequently.
Thanks
Just add a WHERE condition specifying only rows > 0, so it won't update any rows into the negative.
UPDATE users SET num_apples=num_apples-1 WHERE num_apples > 0;
Update
Following your subquestion on indexing, as always, the way to test performance is to benchmark it for yourself. Examine the EXPLAIN for the query and make sure it is using the index on user_id (it should be). And finally, don't worry too much about performance of this simple operation until it becomes a problem. You don't have an index on num_apples now, but could you not add one if performance wasn't scaling to your needs?
You don't need to create 2 indexes as only one will be used. You should index both fields into one index. The index should be the pair user_id and num_apples:
alter table t add index(user_id, num_apples) yourNewIndex;
You can actually remove the previous index as this will also include it:
alter table t drop index yourOldIndex;
Before dropping it you can get information on what index is being used by running:
EXPLAIN UPDATE users SET num_apples=num_apples-1
WHERE user_id = 123 AND num_apples > 0;
If the index used is the yourNewIndex, then MySQL realized that it is faster to use that than the previous one.
Edit:
do I even need any checks? Will mysql prevent the value from going < 0 by default in that case?
Yes you will. You'll get a data truncation error when running the update if you do not control that:
Data truncation: BIGINT UNSIGNED value is out of range

MYSQL indexing issue

I am having some difficulties finding an answer to this question...
For simplicity lets create use this situation.
I create a table like this..
CREATE TABLE `test` (
`MerchID` int(10) DEFAULT NULL,
KEY `MerchID` (`MerchID`)
) ENGINE=InnoDB AUTO_INCREMENT=32769 DEFAULT CHARSET=utf8;
I will insert some data into the column of this table...
INSERT INTO test
SELECT 1
UNION
SELECT 2
UNION
SELECT null
Now I examine the query using MYSQL's explain feature...
EXPLAIN
SELECT * FROM test
WHERE merchid IS NOT NULL
Resting in ID=1
,select_type=SIMPLE
,table=test
,type=index
,possible_keys=MerchID
,key=MerchID
,key_len=5
,ref=NULL
,rows=3
,Extra= Using where
;Using index
In production in my real procedure something like this takes a long time with this index. If I re declare the table with the index line reading "KEY MerchID (MerchID) USING BTREE' I get much better results. The explain feature seems to return the same results too. I have read some basics about the BTREE, HASH and RTREE storage types for indexes/keys. When no storage type is specified I was unded the assumption that BTREE would be assumed. However I am kinda stumped why when modifying my index to use this storage type my procedure seems to fly. Any ideas?
I am using MYSQL 5.1 and coding in MYSQL Workbench. The part of procedure that appears to be help up is like the one I illustrated above where the column of a joined table is tested for NULL.
I think you are on the wrong path. For InnoDB storage the only available index method is the BTREE so if you are safe to omit the BTREE keyword from you table create script.Supported index types here along with other useful information.
The performance issue is coming from a different place.
Whenever testing performance, be sure to always use the SQL_NO_CACHE directive, otherwise, with query caching, the second time you run a query, your results may be returned a lot faster simply due to caching.
With a covering index (all of the selected and filtered columns are in the index), the query is rather efficient. Using index in the EXPLAIN result shows that it's being used as a covering index.
However, if the index were not a covering index, MySQL would have to perform a seek for each row returned by the index in order to grab the actual table data. While this would still be fast for a small result set, with a result set of 1 million rows, that would be 1 million seeks. If the number of NULL rows were a high percentage, MySQL would abandon the index altogether to avoid the seeks.
Ensure that your real "production" index is a covering index as well.

mysql multiple index question

I have a table(users) with columns as
id INT AUTOINVREMENT PRIMARY
uid INT index
email CHAR(128) UNIQUE
activated TINYINT
And I'll need to query this table like this:
SELECT * FROM users WHERE uid = ? AND activated = 1
My questions is, since there's an index set on the 'uid' column, in order to get the best performance for the above query, do I need to set another index to the 'activated' column too? This table(would be a big one) will be heavily accessed by 'INSERT', 'UPDATE' statements as well as 'SELECT' ones.
As I've learned from other sources that indexes goes opposite to 'INSERT' and 'UPDATE' statements, so if the index on the uid column is enough for the query above I won't have to set another index for activated for 'insert & update's performance sake.
MySQL will only use 1 index per table anyway, so having an additional index will not help.
However, if you want really optimal performance, define your index on both columns in this order: (eg. 1 index across 2 columns)
index_name (uid, activated)
That will allow optimized lookups of just uid, or uid AND activated.
It depends upon your data distribution and the selectivity of uid versus the selectivity of uid and activated. If you have lots of unique values of uid and this would have high selectivity ie searching for uid = x only returns a few rows then including activated in the index would provide little value. Whereas if uid = x returns lots of rows and uid = x and activated = 1 returns few rows then there's value in the index.
It's hard to provide a specific answer without know the data distribution.
Creating the index won't make you selects more slow.
However, it will make them significantly faster only if your search for unlike events.
This index will only be useful if the majority of your accounts are activated and you search for not-activated ones, or the other way round: the majority of your accounts are non-activated and you search for activated ones.
Creating this index will also improve UPDATE and DELETE concurrency: without this index, all accounts (both activated and not-activated) for a given uid will be locked for the duration of UPDATE operation in InnoDB.
However, an additional index will of course hamper the DML performance.