I have a MySQL table or around 150,000 rows and a good half of them have a blob (image) stored in a longblob field. I'm trying to create a query to select rows and include a field that simply indicates that the longblob (image) is exists. Basically
select ID, address, IF(house_image != '', 1, 0) AS has_image from homes where userid='1234';
That query times out after 300 seconds. If I remove the 'IF(house_image != '', 1, 0)' it completes in less than a second. I've also tried the following, but they all time out.
IF(ISNULL(house_image),0,1) as has_image
LEFT (house_image,1) AS has_image
SUBSTRING(house_image,0,1) AS has_image
I am not a DBA (obviously), but I'm suspecting that the query is selecting the entire longblob to know if it's empty or null.
Is there an efficient way to know if a field is empty?
Thanks for any assistance.
I had similar problem long time ago and the workaround I ended up with was to move all blob/text columns into a separate table (bonus: this design allows multiple images per home). So once you've changed the design and moved the data around you could do this:
select id, address, (
select 1
from home_images
where home_images.home_id = homes.id
limit 1
) as has_image -- will be 1 or null
from homes
where userid = 1234
PS: I make no guarantees. Depending on storage engine and row format, the blobs could get stored inline. If that is the case then reading the data will take much more disk IO than needed even if you're not "select"ing the blob column.
It looks to me like you are treating the house_image column as a string when really you should be checking it for NULL.
select ID, address, IF(house_image IS NOT NULL, 1, 0) AS has_image
from homes where userid='1234';
LONGBLOBs can be indexed in MariaDB / MySQL, but the indexes are imperfect: they are so-called prefix indexes, and only consider the first bytes of the BLOB.
Try creating this compound index with a 20-byte prefix on your BLOB.
ALTER TABLE homes ADD INDEX user_image (userid, house_image(20));
Then this subquery will, efficiently, give you the IDs of rows with empty house_image columns.
SELECT ID
FROM homes
WHERE userid = '1234'
AND (house_image IS NULL OR house_image = '')
The prefix index can satisfy (house_image IS NULL OR house_image = '') directly without inspecting the BLOBs. That saves a whole mess of IO and CPU on your database server.
You can then incorporate your subquery into a main query to get your result.
SELECT h.ID, h.address,
CASE WHEN empty.ID IS NULL 1 ELSE 0 END has_image
FROM homes h
LEFT JOIN (
SELECT ID
FROM homes
WHERE userid = '1234'
AND (house_image IS NULL OR house_image = '')
) empty ON h.ID = empty.ID
WHERE h.userid = '1234'
The IS NULL ... LEFT JOIN trick means "any rows that do NOT show up in the subquery have images."
This issue came up when I got different records counts for what I thought were identical queries one using a not in where constraint and the other a left join. The table in the not in constraint had one null value (bad data) which caused that query to return a count of 0 records. I sort of understand why but I could use some help fully grasping the concept.
To state it simply, why does query A return a result but B doesn't?
A: select 'true' where 3 in (1, 2, 3, null)
B: select 'true' where 3 not in (1, 2, null)
This was on SQL Server 2005. I also found that calling set ansi_nulls off causes B to return a result.
Query A is the same as:
select 'true' where 3 = 1 or 3 = 2 or 3 = 3 or 3 = null
Since 3 = 3 is true, you get a result.
Query B is the same as:
select 'true' where 3 <> 1 and 3 <> 2 and 3 <> null
When ansi_nulls is on, 3 <> null is UNKNOWN, so the predicate evaluates to UNKNOWN, and you don't get any rows.
When ansi_nulls is off, 3 <> null is true, so the predicate evaluates to true, and you get a row.
NOT IN returns 0 records when compared against an unknown value
Since NULL is an unknown, a NOT IN query containing a NULL or NULLs in the list of possible values will always return 0 records since there is no way to be sure that the NULL value is not the value being tested.
Whenever you use NULL you are really dealing with a Three-Valued logic.
Your first query returns results as the WHERE clause evaluates to:
3 = 1 or 3 = 2 or 3 = 3 or 3 = null
which is:
FALSE or FALSE or TRUE or UNKNOWN
which evaluates to
TRUE
The second one:
3 <> 1 and 3 <> 2 and 3 <> null
which evaluates to:
TRUE and TRUE and UNKNOWN
which evaluates to:
UNKNOWN
The UNKNOWN is not the same as FALSE
you can easily test it by calling:
select 'true' where 3 <> null
select 'true' where not (3 <> null)
Both queries will give you no results
If the UNKNOWN was the same as FALSE then assuming that the first query would give you FALSE the second would have to evaluate to TRUE as it would have been the same as NOT(FALSE).
That is not the case.
There is a very good article on this subject on SqlServerCentral.
The whole issue of NULLs and Three-Valued Logic can be a bit confusing at first but it is essential to understand in order to write correct queries in TSQL
Another article I would recommend is SQL Aggregate Functions and NULL.
Compare to null is undefined, unless you use IS NULL.
So, when comparing 3 to NULL (query A), it returns undefined.
I.e. SELECT 'true' where 3 in (1,2,null)
and
SELECT 'true' where 3 not in (1,2,null)
will produce the same result, as NOT (UNDEFINED) is still undefined, but not TRUE
IF you want to filter with NOT IN for a subquery containg NULLs justcheck for not null
SELECT blah FROM t WHERE blah NOT IN
(SELECT someotherBlah FROM t2 WHERE someotherBlah IS NOT NULL )
The title of this question at the time of writing is
SQL NOT IN constraint and NULL values
From the text of the question it appears that the problem was occurring in a SQL DML SELECT query, rather than a SQL DDL CONSTRAINT.
However, especially given the wording of the title, I want to point out that some statements made here are potentially misleading statements, those along the lines of (paraphrasing)
When the predicate evaluates to UNKNOWN you don't get any rows.
Although this is the case for SQL DML, when considering constraints the effect is different.
Consider this very simple table with two constraints taken directly from the predicates in the question (and addressed in an excellent answer by #Brannon):
DECLARE #T TABLE
(
true CHAR(4) DEFAULT 'true' NOT NULL,
CHECK ( 3 IN (1, 2, 3, NULL )),
CHECK ( 3 NOT IN (1, 2, NULL ))
);
INSERT INTO #T VALUES ('true');
SELECT COUNT(*) AS tally FROM #T;
As per #Brannon's answer, the first constraint (using IN) evaluates to TRUE and the second constraint (using NOT IN) evaluates to UNKNOWN. However, the insert succeeds! Therefore, in this case it is not strictly correct to say, "you don't get any rows" because we have indeed got a row inserted as a result.
The above effect is indeed the correct one as regards the SQL-92 Standard. Compare and contrast the following section from the SQL-92 spec
7.6 where clause
The result of the is a table of those rows of T for
which the result of the search condition is true.
4.10 Integrity constraints
A table check constraint is satisfied if and only if the specified
search condition is not false for any row of a table.
In other words:
In SQL DML, rows are removed from the result when the WHERE evaluates to UNKNOWN because it does not satisfy the condition "is true".
In SQL DDL (i.e. constraints), rows are not removed from the result when they evaluate to UNKNOWN because it does satisfy the condition "is not false".
Although the effects in SQL DML and SQL DDL respectively may seem contradictory, there is practical reason for giving UNKNOWN results the 'benefit of the doubt' by allowing them to satisfy a constraint (more correctly, allowing them to not fail to satisfy a constraint): without this behaviour, every constraints would have to explicitly handle nulls and that would be very unsatisfactory from a language design perspective (not to mention, a right pain for coders!)
p.s. if you are finding it as challenging to follow such logic as "unknown does not fail to satisfy a constraint" as I am to write it, then consider you can dispense with all this simply by avoiding nullable columns in SQL DDL and anything in SQL DML that produces nulls (e.g. outer joins)!
In A, 3 is tested for equality against each member of the set, yielding (FALSE, FALSE, TRUE, UNKNOWN). Since one of the elements is TRUE, the condition is TRUE. (It's also possible that some short-circuiting takes place here, so it actually stops as soon as it hits the first TRUE and never evaluates 3=NULL.)
In B, I think it is evaluating the condition as NOT (3 in (1,2,null)). Testing 3 for equality against the set yields (FALSE, FALSE, UNKNOWN), which is aggregated to UNKNOWN. NOT ( UNKNOWN ) yields UNKNOWN. So overall the truth of the condition is unknown, which at the end is essentially treated as FALSE.
SQL uses three-valued logic for truth values. The IN query produces the expected result:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE col IN (NULL, 1)
-- returns first row
But adding a NOT does not invert the results:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE NOT col IN (NULL, 1)
-- returns zero rows
This is because the above query is equivalent of the following:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE NOT (col = NULL OR col = 1)
Here is how the where clause is evaluated:
| col | col = NULL⁽¹⁾ | col = 1 | col = NULL OR col = 1 | NOT (col = NULL OR col = 1) |
|-----|----------------|---------|-----------------------|-----------------------------|
| 1 | UNKNOWN | TRUE | TRUE | FALSE |
| 2 | UNKNOWN | FALSE | UNKNOWN⁽²⁾ | UNKNOWN⁽³⁾ |
Notice that:
The comparison involving NULL yields UNKNOWN
The OR expression where none of the operands are TRUE and at least one operand is UNKNOWN yields UNKNOWN (ref)
The NOT of UNKNOWN yields UNKNOWN (ref)
You can extend the above example to more than two values (e.g. NULL, 1 and 2) but the result will be same: if one of the values is NULL then no row will match.
Null signifies and absence of data, that is it is unknown, not a data value of nothing. It's very easy for people from a programming background to confuse this because in C type languages when using pointers null is indeed nothing.
Hence in the first case 3 is indeed in the set of (1,2,3,null) so true is returned
In the second however you can reduce it to
select 'true' where 3 not in (null)
So nothing is returned because the parser knows nothing about the set to which you are comparing it - it's not an empty set but an unknown set. Using (1, 2, null) doesn't help because the (1,2) set is obviously false, but then you're and'ing that against unknown, which is unknown.
It may be concluded from answers here that NOT IN (subquery) doesn't handle nulls correctly and should be avoided in favour of NOT EXISTS. However, such a conclusion may be premature. In the following scenario, credited to Chris Date (Database Programming and Design, Vol 2 No 9, September 1989), it is NOT IN that handles nulls correctly and returns the correct result, rather than NOT EXISTS.
Consider a table sp to represent suppliers (sno) who are known to supply parts (pno) in quantity (qty). The table currently holds the following values:
VALUES ('S1', 'P1', NULL),
('S2', 'P1', 200),
('S3', 'P1', 1000)
Note that quantity is nullable i.e. to be able to record the fact a supplier is known to supply parts even if it is not known in what quantity.
The task is to find the suppliers who are known supply part number 'P1' but not in quantities of 1000.
The following uses NOT IN to correctly identify supplier 'S2' only:
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND 1000 NOT IN (
SELECT spy.qty
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
);
However, the below query uses the same general structure but with NOT EXISTS but incorrectly includes supplier 'S1' in the result (i.e. for which the quantity is null):
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND NOT EXISTS (
SELECT *
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
AND spy.qty = 1000
);
So NOT EXISTS is not the silver bullet it may have appeared!
Of course, source of the problem is the presence of nulls, therefore the 'real' solution is to eliminate those nulls.
This can be achieved (among other possible designs) using two tables:
sp suppliers known to supply parts
spq suppliers known to supply parts in known quantities
noting there should probably be a foreign key constraint where spq references sp.
The result can then be obtained using the 'minus' relational operator (being the EXCEPT keyword in Standard SQL) e.g.
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1' ),
( 'S2', 'P1' ),
( 'S3', 'P1' ) )
AS T ( sno, pno )
),
spq AS
( SELECT *
FROM ( VALUES ( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT sno
FROM spq
WHERE pno = 'P1'
EXCEPT
SELECT sno
FROM spq
WHERE pno = 'P1'
AND qty = 1000;
this is for Boy:
select party_code
from abc as a
where party_code not in (select party_code
from xyz
where party_code = a.party_code);
this works regardless of ansi settings
also this might be of use to know the logical difference between join, exists and in
http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
I'm far from a MYSQL expert, and I'm struggling with a relatively complicated query.
I have two tables:
A Data table with columns as follows:
`Location` bigint(20) unsigned NOT NULL,
`Source` bigint(20) unsigned NOT NULL,
`Param` bigint(20) unsigned NOT NULL,
`Type` bigint(20) unsigned NOT NULL,
`InitTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`ValidTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`Value` double DEFAULT NULL
A Location Group table with columns as follows:
`Group` bigint(20) unsigned NOT NULL,
`Location` bigint(20) unsigned NOT NULL,
The data table stores data of interest, where each 'value' is valid for a particular 'validtime'. However, the data in the table comes from a calculation which is run periodically. The initialisation time at which the calculation is run is stored in the 'inittime' field. A given calculation with particular inittime may result in, say 10 values being output with valid times (A - J). A more recent calculation, with a more recent inittime, may result in another 10 values being output with valid times (B - K). There is therefore an overlap in available values. I always want a result set of Values and ValidTimes for the most recent inittime (i.e. max(inittime)).
I can determine the most recent inittime using the following query:
SELECT MAX(InitTime)
FROM Data
WHERE
Location = 100060 AND
Source = 10 AND
Param = 1 AND
Type = 1;
This takes 0.072 secs to execute.
However, using this as a sub-query to retrieve data from the Data table results in an execution time of 45 seconds (it's a pretty huge table, but not super ridiculous).
Sub-Query:
SELECT Location, ValidTime, Value
FROM Data data
WHERE Source = 10
AND Location IN (SELECT Location FROM Location Group WHERE Group = 3)
AND InitTime = (SELECT max(data2.InitTime) FROM Data data2 WHERE data.Location = data2.Location AND data.Source = data2.Source AND data.Param = data2.Param AND data.Type = data2.Type)
ORDER BY Location, ValidTime ASC;
(Snipped ValidTime qualifiers for brevity)
I know there's likely some optimisation that would help here, but I'm not sure where to start. Instead, I created a stored procedure to effectively perform the MAX(InitTime) query, but because the MAX(InitTime) is determined by a combo of Location, Source, Param and Type, I need to pass in all the locations that comprise a particular group. I implemented a cursors-based stored procedure for this before realising there must be an easier way.
Putting aside the question of optimisation via indices, how could I efficiently perform a query on the data table using the most recent InitTime for a given location group, source, param and type?
Thanks in advance!
MySQL can do a poor job optimizing IN with a subquery (sometimes). Also, indexes might be able to help. So, I would write the query as:
SELECT d.Location, d.ValidTime, d.Value
FROM Data d
WHERE d.Source = 10 AND
EXISTS (SELECT 1 FROM LocationGroup lg WHERE d.Location = lg.Location and lg.Group = 3) AND
d.InitTime = (SELECT max(d2.InitTime)
FROM Data d2
WHERE d.Location = d2.Location AND
d.Source = d2.Source AND
d.Param = d2.Param AND
d.Type = d2.Type
)
ORDER BY d.Location, d.ValidTime ASC;
For this query, you want indexes on data(Location, Source, Param, Type, InitTime) and LocationGroup(Location, Group), and data(Source, Location, ValidTime).
I have a Customer table, my client want to not physically delete any record from this table so I use a TINYINT field "IsDeleted" to keep track of deleted customers.
Now i m in a situation where i need to exclude Deleted Customers but when i tired following Query it gives me less number of records
select count(*) from customer where IsDeleted <> 1; (Count = 1477)
then the following
select count(*) from customer where (IsDeleted = 0 or IsDeleted is null); (Count = 1552)
why the above query counts are different?
why "NULL" value is not counted in " IsDeleted <> 1" check?
Please suggest.
Like Duniyadnd and triclosan point out this is caused by the column type for IsDeleted.
Change the query in the right panel so you can see the difference between using int and varchar column types. sqlfiddle.com/#!2/7bf0a/5
You cannot correct use comparing operations for NULL-values. Consider to change type of IsDeleted to enum('N','Y') with Not NULL option.
Null is not an int. As soon as you did <> 1, that means it would only look at ints. Null ideally means something that does not exist (that's why a lot of people use it instead of 0 in case you do store 0s in the table).
If you only have null, 0 and 1 values in the isDeleted column, you would probably find a difference between the two queries (1522-1477) to be the total number of nulls in your table (75).
Greetings and thank you for looking at my question, I hope you can provide some insight or direction.
I have three tables (fundamentally): 'value_meta', 'value', and 'values_visibility'. The schema follows:
TABLE 'value_meta'
COMMENT: contains a list of different values, each referencing a different 'value' table
int id PK
tinyint value1 FK to value1.value
tinyint value2 FK to value2.value
tinyint value3 FK to value3.value
...
TABLE 'value'
COMMENT: there are different value tables (for example, if it were for user profile data, there would be a value table for "occupation", "body type", and/or "education level"
tinyint id PK
varchar(255) value
TABLE 'value_visibility'
COMMENT: one value visibility entry per value[n] in the 'value_meta' table, each a boolean value. If 1, the coding query will return the value as rerefenced in 'value[n]' table. if 0, return null
int id PK
BOOLEAN 'value1_visibility'
BOOLEAN 'value2_visibility'
BOOLEAN 'value3_visibility'
....
What I want to do is create a proper MySQL query to check "for each 'value' in 'value_meta', if corresponding value entry in 'value_visibility' is 1, display value varchar. else return null". By proper I want to make it most efficient (dereived tables vs. correlated subqueries, proper conditionals and function uses... I hear ISNULL is bad).
I used to be good at query building straight out of my mind back in college but after years of not using it, I've become three straws short of a full broom. Can anyone help me? Thanks!
SELECT vm.id,
IF(vv.id IS NULL, NULL, vm1v.value) value1,
IF(vv.id IS NULL, NULL, vm2v.value) value2,
IF(vv.id IS NULL, NULL, vm3v.value) value3
FROM value_meta vm
LEFT JOIN value vmv1 ON vm.value1 = vmv1.id
LEFT JOIN value vmv2 ON vm.value1 = vmv2.id
LEFT JOIN value vmv3 ON vm.value1 = vmv3.id
LEFT JOIN value_visibility vv ON vm.id = vv.id AND vv.value1_visibility = 1
You should think about restructuring your value_meta table, is there a reason why you are storing value1 2 and 3 in the same row?