Just wondering, is there any quick way to count all the NULL values (from all columns) in a MySQL table?
Thanks for any idea!
If you want this done exclusively by MYSQL and without enumerating all of the columns take a look at this solution.
In this method you don't have to maintain the number of database columns by hard coding them. If your table schema will get modified this method will work, and won't require code change.
SET #db = 'testing'; -- database
SET #tb = 'fuzzysearch'; -- table
SET #x = ''; -- will hold the column names with ASCII method applied to retrieve the number of the first char
SET #numcolumns = 0; -- will hold the number of columns in the table
-- figure out how many columns we have
SELECT count(*) into #numcolumns FROM information_schema.columns where table_name=#tb and table_schema=#db;
-- we have to prepare some query from all columns of the table
SELECT group_concat(CONCAT('ASCII(',column_name,')') SEPARATOR ",") into #x from information_schema.columns where table_name=#tb and table_schema=#db;
-- after this query we have a variable separated with comma like
-- ASCII(col1),ASCII(col2),ASCII(col3)
-- we now generate a query to concat the columns using comma as separator (null values are omitted from concat)
-- then figgure out how many times the comma is in that substring (this is done by using length(value)-length(replace(value,',',''))
-- the number returned is how many non null columns we have in that column
-- then we deduct the number from the known number of columns, calculated previously
-- the +1 is added because there is no comma for single value
SET #s = CONCAT('SELECT #numcolumns - (length(CONCAT_WS(\',\',', #x, '))-length(replace(CONCAT_WS(\',\',', #x, '),\',\',\'\')) + 1) FROM ',#db,'.',#tb,';');
PREPARE stmt FROM #s;
EXECUTE stmt;
-- after this execution we have returned for each row the number of null columns
-- I will leave to you to add a sum() group call if you want to find the null values for the whole table
DEALLOCATE PREPARE stmt;
The ASCII is used to avoid reading, concatenating very long columns for nothing, also ASCII makes us safe for values where the first char is a comma(,).
Since you are working with reports, you may find this helpful as this can be reused for each table if you put in a method.
I tried to let as many comments as possible.
Let's split on pieces the above compact way (reverse way):
I wanted to end up having a query like this
SELECT totalcolumns - notnullcolumns from table; -- to return null columns for each row
While the first one is easy to calcule by running:
SELECT count(*) FROM information_schema.columns where table_name=#tb and table_schema=#db;
The second one the notnullcolumns is a bit of pain.
After a piece of examination of the functions available in MySQL, we detect that CONCAT_WS does not CONCAT null values
So running a query like this:
SELECT CONCAT_WS(",","First name",NULL,"Last Name");
returns: 'First name,Last Name'
This is good, we take rid of the null values from the enumeration.
But how do we get how many columns were actually concatenated?
Well that is tricky. We have to calculate the number of commas+1 to get the actually concatenated columns.
For this trick we used the following SQL notation
select length(value)-length(replace(value,',','')) +1 from table
Ok, so we have now the number of concatenated columns.
But the harder part is coming next.
We have to enumerate for CONCAT_WS() all values.
We need to have something like this:
SELECT CONCAT_WS(",",col1,col2,col3,col4,col5);
This is where we have to take use of the prepared statements, as we have to prepare an SQL query dynamically from yet unknown columns. We don't know how many columns will be in our table.
So for this we use data from information_schema columns table. We need to pass the table name, but also the database name, as we might have the same table name in separate databases.
We need a query that returns col1,col2,col3,col4,col5 to us on the CONCAT_WS "string"
So for this we run a query
SELECT group_concat(column_name SEPARATOR ",") into #x from information_schema.columns where table_name=#tb and table_schema=#db;
One more thing to mention. When we used the length() and replace() method to find out how many columns were concatenated, we have to make sure we do not have commas among the values. But also take note that we can have really long values in our database cells. For both of this trick we use method ASCII('value'), which will return the ASCII char of the first char, which cannot be comma and will return null for null columns.
That being said we can compact all this in the above comprehensive solution.
Something like
select id
, sum ( case when col1 is null then 1 else 0 end case ) col1
, sum ( case when col2 is null then 1 else 0 end case ) col2
, sum ( case when col3 is null then 1 else 0 end case ) col3
from contacts
group by id
Something like this (substitute COL_COUNT as appropriate):
select count(*) * COL_COUNT - count(col1) - count(col2) - ... - count(col_n) from table;
You should really do this using not only SQL, but the language which is at your disposal:
Obtain the metadata of each table - either using DESCRIBE table, or using a built-in metadata functionality in your db access technology
Create queries of the following type in a loop for each column. (in pseudo-code)
int nulls = 0;
for (String colmnName : columNames) {
query = "SELECT COUNT(*) FROM tableName WHERE " + columnName + " IS NULL";
Result result = executeQuery(query);
nulls += result.size();
}
Related
i have a working dynamic pivot code. i'm stuck on this almost 1 week since i was trying to find a way to add another column that counts the 0's or null.
SET #sql_dynamic:= (SELECT GROUP_CONCAT
(DISTINCT
CONCAT('if(sum(if(attendance_date = "',
date_format(attendance_date, '%Y-%m-%d'),
'",1,0))=0,0,attendance_status) AS `',
date_format(attendance_date, '%Y-%m-%d'),'`'
)
) from attendance
WHERE subject_id=1 AND attendance_month = "January"
);
SET #sql = CONCAT('SELECT studentidnumber, student_fullname,
subject_id, attendance_month, ', #sql_dynamic,'
FROM attendance
WHERE subject_id=1 AND attendance_month = "January"
GROUP BY studentidnumber'
);
PREPARE stmt FROM #sql;
EXECUTE stmt;`
which results to this:
pivot
now i want to add another column to the dynamic table which counts the 0 or null values.
pls help.
You need a separate table with all the dates in it. (Or at least enough dates to handle your date range.) MariaDB (not MySQL) has a neat feature of sequence tables. For example, seq_1_to_100 is a virtual table that contains all the integers from 1 to 100. Use something like that, together with + INTERVAL seq to generate all the dates. Then LEFT JOIN with our table to get the "missing" dates to generate NULL values. Change those to 0 with IFNULL(), if desired.
I am a newbie in coding.
Here is what I did:
DECALRE #v VARCHAR(100)
SET #v = (SELECT TOP 100 NAMES FROM TestTable WITH(NOLOCK))
SELECT #v AS SampleData
But it returned an error:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Are there any way to set multiple values in one variable in SQL Server 2008?
Thanks in advance.
There is one type of variable designed for holding multiple values. It's called a table variable:
declare #v table (Name varchar(100) not null)
insert into #v(Name )
select top 100 name from TestTable /* no ORDER BY means this is ill-defined */
You can insert/update/delete in this table variable and query from it via selects in exactly the same way as any other table.
Note though that this looks like you're breaking things down into "procedural" steps - first I'll get the top 100 names, then I'll do X, then I'll do Y. In SQL, you should try to put as much as possible into single queries, and leave it to the optimizer to determine what order to do things in, which subresults should be stored, etc.
I have in Mytable some value = 'NA'
Insted of this value I would like to put NULL.
So I've write:
INSERT INTO Mytable
VALUES(NULL)
WHERE VALUES('NA');
But I didn't work.
I didn't put the name of the column because potentially all column can have some 'NA' value.
I hope someone have a idea to do it.
Regards
Sam
UPDATE Mytable
SET value = NULL
WHERE value = 'NA';
Yes, you must do this for each column/attribute that you want to update.
UPDATE Mytable SET value = NULL WHERE value = 'NA'
To replace occurrences of 'NA' with NULL in multiple columns, for all rows in a table, you can do this in a single update query. The trick is to assign the current value of the column back to the column when you don't want the value changed.
For example:
UPDATE Mytable t
SET t.column_one = IF(t.column_one='NA',NULL,t.column_one)
, t.column_two = IF(t.column_two='NA',NULL,t.column_two)
, t.column_fee = IF(t.column_fee='NA',NULL,t.column_fee)
WHERE t.column_one = 'NA'
OR t.column_two = 'NA'
OR t.column_fee = 'NA'
NOTES:
Repeat the column assignment for each column you need to do the replacement. (The example above references three columns, named column_one, column_two and column_fee. I don't know the names of the columns in your table; you would need to replace those references with the actual names of the columns in your table.)
The WHERE clause is optional; the query would have the same net result without that WHERE clause. (Without the WHERE clause, the query would update every row in the table; any rows that don't have an 'NA' in one of the three columns would not be changed, since the columns will all be assigned their current values.
For a lot of columns, it's more efficient to do it in a single operation, to apply several changes to a row in one statement, rather than separate statements each making updates to the same row.)
The expresssion IF(a,b,c) evaluates expression a as a boolean; if it returns TRUE, it returns expression b, otherwise it returns expression c.
To see how this works, you can run a SELECT statement (remove the SET clause, and replace the UPDATE keyword with SELECT and relevant expressions in the SELECT list:
For example:
SELECT t.column_one AS _one_old
, IF(t.column_one='NA',NULL,t.column_one) AS _one_new
, t.column_two AS _two_old
, IF(t.column_two='NA',NULL,t.column_two) AS _two_new
, t.column_fee AS _fee_old
, IF(t.column_fee='NA',NULL,t.column_fee) AS _fee_new
FROM Mytable t
WHERE t.column_one = 'NA'
OR t.column_two = 'NA'
OR t.column_fee = 'NA'
The _old columns return the existing values in the columns; the _new columns return the value that would be assigned (by the UPDATE statement earlier in my answer.)
The results from that query will verify that IF() expressions will return a NULL when the existing value in the column is 'NA'; it will also confirm that the IF() expression will return the existing value when the existing value in the column is not 'NA'.
FOLLOWUP
With 20 different tables with 12 columns each, I'd make use of the information_schema.columns table in MySQL to help me generate the required expressions.
Something like this:
SELECT CONCAT(' , t.'
,c.column_name,' = IF(t.'
,c.column_name,'=''NA'',NULL,t.'
,c.column_name,')') AS expr
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase' -- the name of your database
AND c.table_name = 'mytable' -- the name of your table
AND c.data_type IN ('varchar','char') -- only character type columns
ORDER BY c.ordinal_position
Which will return something like this:
expr
-------------------------------------
, t.fee = IF(t.fee='NA',NULL,t.fee)
, t.fi = IF(t.fi='NA',NULL,t.fi)
, t.fo = IF(t.fo='NA',NULL,t.fo)
, t.fum = IF(t.fum='NA',NULL,t.fum)
So, this doesn't actually update the table, it's just a convenient way to avoid typing out a bunch of SQL expressions. You can copy that result set, and use it to form a statement similar to the one I showed in my answer above. (Obviously, you would omit rows that you don't want to change, and the first comma would need to be changed to the SET keyword, and the rest of the statement would need to be wrapped around this.
Personally, I wouldn't bother with a WHERE clause, because if it's a lot of columns, the query is going to do a full scan of the table anyway.
SELECT CONCAT(' OR t.',c.column_name,' = ''NA''') AS expr
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase' -- the name of your database
AND c.table_name = 'mytable' -- the name of your table
AND c.data_type IN ('varchar','char') -- only character type columns
ORDER BY c.ordinal_position
This will return something like:
expr
----------------------
OR t.fee = 'NA'
OR t.fi = 'NA'
OR t.fo = 'NA'
OR t.fum = 'NA'
CAUTION: be careful that you don't do comparison of numeric columns to 'NA', because MySQL will evaluate 'NA' as a numeric value of zero (0) in a numeric context.
I have a table of student scorecard.
here is the table,
subject | mark1 | mark2 | mark3 |......|markn
stud1 | 99 | 87 | 92 | | 46
stud2 |....................................
.
.
studn |....................................|
Now, I need to sum it for each student of total marks. I got it by using sum(mark1+mark2+...+markn) group by stud. I want to know how to sum it without adding each column name,it will be huge when in case up to marks26. so could anyone know how to fix it.
SELECT student, (SUM(mark1)+SUM(mark2)+SUM(mark3)....+SUM(markn)) AS Total
FROM your_table
GROUP BY student
Another way of doing this is by generating the select query. Play with this fiddle.
SELECT CONCAT('SELECT ', group_concat(`COLUMN_NAME` SEPARATOR '+'), ' FROM scorecard')
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA` = (select database())
AND `TABLE_NAME` = 'scorecard'
AND `COLUMN_NAME` LIKE 'mark%';
The query above will generate another query that will do the selecting for you.
Run the above query.
Get the result and run that resulting query.
Sample result:
SELECT mark1+mark2+mark3 FROM scorecard
You won't have to manually add all the columns anymore.
If any of your markn columns are "AllowNull" then you will need to do a little extra to insure the correct result is returned, this is because 1 NULL value will result in a NULL total.
This is what i would consider to be the correct answer.
SUM(IFNULL(`mark1`, 0) + IFNULL(`mark2`, 0) + IFNULL(`mark3`, 0)) AS `total_marks`
IFNULL will return the 2nd parameter if the 1st is NULL.
COALESCE could be used but i prefer to only use it if it is required.
See What is the difference bewteen ifnull and coalesce in mysql?
SUM-ing the entire calculation is tidier than SUM-ing each column individually.
SELECT `student`, SUM(IFNULL(`mark1`, 0) + IFNULL(`mark2`, 0) + IFNULL(`mark3`, 0)) AS `total_marks`
FROM student_scorecard
GROUP BY `student`
i want to know how to sum it without adding each column name,it will be huge when in case up to marks26
To generate and execute this statement dynamically in sql you would need to use the INFORMATION_SCHEMA.COLUMNS table to create a query string then execute it using a prepared statement saved in a variable.
SELECT CONCAT('SELECT `student`, SUM(IFNULL(`', group_concat(`COLUMN_NAME` SEPARATOR '`, 0) + IFNULL(`'), '`, 0) AS `total_marks` FROM `student_scorecard` GROUP BY `student`')
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA` = (select DATABASE())
AND `TABLE_NAME` = 'student_scorecard'
AND `COLUMN_NAME` LIKE 'mark%'
# adapted from https://stackoverflow.com/a/22369767/2273611
# insert statement sql into a variable
INTO #statement_var;
#prepare the statement string
PREPARE stmt_name FROM #statement_var;
#execute the prepared statement/query
EXECUTE stmt_name;
# release statement
DEALLOCATE PREPARE stmt_name;
SELECT student, SUM(mark1+mark2+mark3+....+markn) AS Total FROM your_table
The short answer is there's no great way to do this given the design you have. Here's a related question on the topic: Sum values of a single row?
If you normalized your schema and created a separate table called "Marks" which had a subject_id and a mark column this would allow you to take advantage of the SUM function as intended by a relational model.
Then your query would be
SELECT subject, SUM(mark) total
FROM Subjects s
INNER JOIN Marks m ON m.subject_id = s.id
GROUP BY s.id
//Mysql sum of multiple rows
Hi Here is the simple way to do sum of columns
SELECT sum(IF(day_1 = 1,1,0)+IF(day_3 = 1,1,0)++IF(day_4 = 1,1,0)) from attendence WHERE class_period_id='1' and student_id='1'
You could change the database structure such that all subject rows become a column variable (like spreadsheet). This makes such analysis much easier
Is there a way to use LIKE and IN together?
I want to achieve something like this.
SELECT * FROM tablename WHERE column IN ('M510%', 'M615%', 'M515%', 'M612%');
So basically I want to be able to match the column with a bunch of different strings. Is there another way to do this with one query or will I have to loop over the array of strings I am looking for?
How about using a substring with IN.
select * from tablename where substring(column,1,4) IN ('M510','M615','M515','M612')
You can do it by in one query by stringing together the individual LIKEs with ORs:
SELECT * FROM tablename
WHERE column LIKE 'M510%'
OR column LIKE 'M615%'
OR column LIKE 'M515%'
OR column LIKE 'M612%';
Just be aware that things like LIKE and per-row functions don't always scale that well. If your table is likely to grow large, you may want to consider adding another column to your table to store the first four characters of the field independently.
This duplicates data but you can guarantee it stays consistent by using insert and update triggers. Then put an index on that new column and your queries become:
SELECT * FROM tablename WHERE newcolumn IN ('M510','M615','M515','M612');
This moves the cost-of-calculation to the point where it's necessary (when the data changes), not every single time you read it. In fact, you could go even further and have your new column as a boolean indicating that it was one of the four special types (if that group of specials will change infrequently). Then the query would be an even faster:
SELECT * FROM tablename WHERE is_special = 1;
This tradeoff of storage requirement for speed is a useful trick for larger databases - generally, disk space is cheap, CPU grunt is precious, and data is read far more often than written. By moving the cost-of-calculation to the write stage, you amortise the cost across all the reads.
You'll need to use multiple LIKE terms, joined by OR.
Use the longer version of IN which is a bunch of OR.
SELECT * FROM tablename
WHERE column LIKE 'M510%'
OR column LIKE 'M615%'
OR column LIKE 'M515%'
OR column LIKE 'M612%';
SELECT * FROM tablename
WHERE column IN
(select column from tablename
where column like 'M510%'
or column like 'M615%'
OR column like 'M515%'
or column like'M612%'
)
substr([column name],
[desired starting position (numeric)],
[# characters to include (numeric)]) in ([complete as usual])
Example
substr([column name],1,4) in ('M510','M615', 'M515', 'M612')
I tried another way
Say the table has values
1 M510
2 M615
3 M515
4 M612
5 M510MM
6 M615NN
7 M515OO
8 M612PP
9 A
10 B
11 C
12 D
Here cols 1 to 8 are valid while the rest of them are invalid
SELECT COL_VAL
FROM SO_LIKE_TABLE SLT
WHERE (SELECT DECODE(SUM(CASE
WHEN INSTR(SLT.COL_VAL, COLUMN_VALUE) > 0 THEN
1
ELSE
0
END),
0,
'FALSE',
'TRUE')
FROM TABLE(SYS.DBMS_DEBUG_VC2COLl('M510', 'M615', 'M515', 'M612'))) =
'TRUE'
What I have done is using the INSTR function, I have tried to find is the value in table matches with any of the values as input. In case it does, it will return it's index, i.e. greater than ZERO. In case the table's value does not match with any of the input, then it will return ZERO. This index I have added up, to indicate successful match.
It seems to be working.
Hope it helps.
You can use a sub-query with wildcards:
SELECT 'Valid Expression'
WHERE 'Source Column' LIKE (SELECT '%Column' --FROM TABLE)
Or you can use a single string:
SELECT 'Valid Expression'
WHERE 'Source Column' LIKE ('%Source%' + '%Column%')
u can even try this
Function
CREATE FUNCTION [dbo].[fn_Split](#text varchar(8000), #delimiter varchar(20))
RETURNS #Strings TABLE
(
position int IDENTITY PRIMARY KEY,
value varchar(8000)
)
AS
BEGIN
DECLARE #index int
SET #index = -1
WHILE (LEN(#text) > 0)
BEGIN
SET #index = CHARINDEX(#delimiter , #text)
IF (#index = 0) AND (LEN(#text) > 0)
BEGIN
INSERT INTO #Strings VALUES (#text)
BREAK
END
IF (#index > 1)
BEGIN
INSERT INTO #Strings VALUES (LEFT(#text, #index - 1))
SET #text = RIGHT(#text, (LEN(#text) - #index))
END
ELSE
SET #text = RIGHT(#text, (LEN(#text) - #index))
END
RETURN
END
Query
select * from my_table inner join (select value from fn_split('M510', 'M615', 'M515', 'M612',','))
as split_table on my_table.column_name like '%'+split_table.value+'%';
For a perfectly dynamic solution, this is achievable by combining a cursor and a temp table. With this solution you do not need to know the starting position nor the length, and it is expandable without having to add any OR's to your SQL query.
For this example, let's say you want to select the ID, Details & creation date from a table where a certain list of text is inside 'Details'.
First create a table FilterTable with the search strings in a column called Search.
As the question starter requested:
insert into [DATABASE].dbo.FilterTable
select 'M510' union
select 'M615' union
select 'M515' union
select 'M612'
Then you can filter your data as following:
DECLARE #DATA NVARCHAR(MAX)
CREATE TABLE #Result (ID uniqueIdentifier, Details nvarchar(MAX), Created datetime)
DECLARE DataCursor CURSOR local forward_only FOR
SELECT '%' + Search + '%'
FROM [DATABASE].dbo.FilterTable
OPEN DataCursor
FETCH NEXT FROM DataCursor INTO #DATA
WHILE ##FETCH_STATUS = 0
BEGIN
insert into #Result
select ID, Details, Created
from [DATABASE].dbo.Table (nolock)
where Details like #DATA
FETCH NEXT FROM DataCursor INTO #DATA
END
CLOSE DataCursor
DEALLOCATE DataCursor
select * from #Result
drop table #Result
Hope this helped
select *
from tablename
where regexp_like (column, '^M510|M615|^M515|^M612')
Note: This works even if say, we want the code M615 to match if it occurs in the middle of the column. The rest of the codes will match only if the column starts with it.