Unexpected result in MyISAM when grouping by bit and selecting distinct values - mysql

We have a MyISAM table with a single column bit and two rows, containing 0 and 1. We group by this column, make a count and select it. The result as follows is expected.
select count( bit), bit from tab GROUP BY bit;
| count(bit) | bit |
|------------|-----|
| 1 | 0 |
| 1 | 1 |
But when using the distinct keyword, the output value of the column is always 1. Why?
select count(distinct bit), bit from tab GROUP BY bit;
| count(bit) | bit |
|------------|-----|
| 1 | 1 | # WHYYY
| 1 | 1 |
I've been crawling the documentation and the internet but with no luck.
Here is the setup:
CREATE TABLE `tab` (
`bit` bit(1) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8; # When using InnoDB everything's fine
INSERT INTO `tab` (`bit`) VALUES
(CONV('1', 2, 10) + 0),
(CONV('0', 2, 10) + 0);
PS: One more thing. I've been doing several experiments. Using group_concat, the column bit becomes independent again.
select count(distinct bit), group_concat(bit) from tab GROUP BY bit;
| count(bit) | bit |
|------------|------------|
| 1 | 1 byte (0) |
| 1 | 1 byte (1) |

Thanks to comments, I am from now on convinced of not using the bit column at all. The more reliable alternative is tinyint(1).
Inspired from the Adminer application bit handling, I recommend using bin function to cast bit on an expected value every time when selecting:
select count(distinct bit), BIN(bit) from tab GROUP BY bit;

Related

Find sequences of numbers, in the fastest way on mysql,

I have one billion lines. Each line is a sequence of numbers:
32098;1278;23902;8469
42710;17864;32230
230984;812918;420322;182972
339028;232329;2190120;23302;182972
232329;17864;32230;23302;182972
How to store that data and search in it, so the search time is minimal to find any sub sequences:
Example: searching for sequence "17864;32230" outputs:
42710;17864;32230
232329;17864;32230;23302;182972
What i have tried:
storing lines in varchar (ascii), and searching: like "%17864;32230%" => very slow...
storing lines in varchar (ascii), will fulltext index and searching: against(' "17864;32230" ' in boolean mode) => faster...
storing lines in varchar (ascii), will fulltext index and searching: against(' +17864 +32230' in boolean mode) and line like "%17864;32230%" => fastest i found...
Any faster method ?
searching for sequence "17864;32230" outputs Does the next two values will be selected: "17864;123456;32230", "123456;32230;17864" ? – Akina
#akina, "17864;123456;32230", "123456;32230;17864" must not be outputs, because they do not contain the sequence "17864;32230" – JoJo
I.e. your sequence is positionally-dependent... well. Does the sequence to be found is always 2-valued, or its length (in elements) may vary? – Akina
#Akina, sequence to be found is always 2-valued. You are right :) – JoJo
Does each separate value in "array" has some upper limit? not more that 6 digits, for example... – Akina
#Akina, you are right, in my specific case, numbers in sequence are limited to 8 digits – JoJo 10 mins ago
Look for this solution:
fiddle
CREATE TABLE sourcetable ( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
dataarray TEXT );
INSERT INTO sourcetable (dataarray) VALUES
('32098;1278;23902;8469'),
('42710;17864;32230'),
('230984;812918;420322;182972'),
('339028;232329;2190120;23302;182972'),
('232329;17864;32230;23302;182972');
-- create indexing table
CREATE TABLE indexingtable ( id BIGINT UNSIGNED NOT NULL,
sequence BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (sequence, id) );
-- and fill it
INSERT IGNORE INTO indexingtable
-- assume not more than 6 elements per "array"
WITH cte AS ( SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 )
SELECT id, CONCAT(LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX(dataarray, ';', num), ';', -1), 9, '0'),
LPAD(SUBSTRING_INDEX(SUBSTRING_INDEX(dataarray, ';', num+1), ';', -1), 9, '0'))
FROM sourcetable, cte;
-- search for "17864;32230"
SET #criteria := 17864000032230;
-- perform searching
SELECT sourcetable.*
FROM sourcetable
JOIN indexingtable USING (id)
WHERE sequence = #criteria;
id | dataarray
-: | :------------------------------
2 | 42710;17864;32230
5 | 232329;17864;32230;23302;182972
EXPLAIN
SELECT sourcetable.*
FROM sourcetable
JOIN indexingtable USING (id)
WHERE sequence = #criteria;
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-: | :---------- | :------------ | :--------- | :----- | :------------ | :------ | :------ | :------------------------------------------- | ---: | -------: | :-----------------------
1 | SIMPLE | indexingtable | null | ref | PRIMARY | PRIMARY | 8 | const | 2 | 100.00 | Using where; Using index
1 | SIMPLE | sourcetable | null | eq_ref | PRIMARY | PRIMARY | 8 | fiddle_KJQBRBTPCZAIOJRJHGJJ.indexingtable.id | 1 | 100.00 | null
db<>fiddle here
The indexingtable create by the query will be extremely long and expensive process on a billion source records. I'd recommend to export source data to text (SELECT .. INTO OUTFILE), convert it using any script/progrmming language, then import into the indexingtable. It will be also long, but much faster than by the query.
If you want to rebuild the data, you could do the following.
Structure the data vertically -- you'll have billions of rows:
id n val
1 1 32098
1 2 1278
1 3 23902
1 4 8469
2 1 42710
2 2 17864
2 3 32230
With another table:
id sequence
1 32098;1278;23902;8469
2 42710;17864;32230
Then you can try:
select ta.id
from table1 ta join
table1 tb
on tb.id = ta.id and
tb.n = ta.n + 1 and
tb.val = 32230
where ta.val = 17864
For this you want indexes on (id, n, val) and (val, id, n).
I would expect this to be fairly competitive with the full-text search method. I'm actually surprised that option (3) is faster than option (2).
The advantage is that it might give you more flexibility on the types of sequences that you are looking for.

How to count numbers or occurrences in all columns and list them using MySQL?

I have a table looking like:
| A | B | C | ... | Z | <- names of columns
-----------------------
| 1 | 0 | 1 | ... | 1 |
| 0 | 1 | 1 | ... | 1 |
| 1 | 1 | 1 | ... | 1 |
| 0 | 1 | 1 | ... | 0 |
| 1 | 0 | 1 | ... | 1 |
And I would like to sum up all 1s in all the columns and list them out. How can I do that using MySQL? Number of columns is about 80, if possible I would like not to list them in the SQL call.
I would like to get a response similar to this one:
A: 3
B: 3
C: 5
...
Z: 4
This table has been designed in a way that makes the query you describe more difficult.
Using many columns for data values that should be compared or counted together because they're the same type of value is called repeating groups. It's a violation of database normalization rules.
The more traditional way to store this data would be over 80 rows, not 80 columns.
CREATE TABLE mytable (
id INT PRIMARY KEY,
label CHAR(1) NOT NULL,
value TINYINT NOT NULL
);
INSERT INTO mytable VALUES
('A', 1), ('B', 0), ('C', 1), ...
Then you could use a simple query with an aggregate function like this:
SELECT label, SUM(value)
FROM mytable
GROUP BY label;
There are times when it's worth using a denormalized table design (like your current table), but that time is when you want to optimize for a particular query. Be careful about using denormalized designs, because they optimize for one query at the expense of all other queries you might run against the same data. The query you want to make is one of those that is made more difficult by using the denormalized design you currently have.
There is no easy way, you will need to explicitly list the columns. A UNION query should be what you need, like:
SELECT 'A' column_name, SUM(A) cnt FROM mytable
UNION ALL SELECT 'B', SUM(B) FROM mytable
UNION ALL SELECT 'C', SUM(C) FROM mytable
...
NB: it should be possible to generate the query programmatically using any text manipulation tool (Excel, perl, ...), or dynamically using a prepared statement.

What is this query supposed to do? (and why does it fail?)

I'm tasked with revive this old piece of legacy software.
It used to run on an old server (2012) which has died the ugly way (hard disk failure).
Before this server died, the code worked without problems.
I've rebuild the MySQL database and data from backups.
However, one query is does not work and fails with error: Query preparation failed: Unknown column '_operationId' in 'where clause'. The query in question is:
SELECT
#r AS _operationId
, #r := (
SELECT
operationId
FROM operations
WHERE operationId = _operationId
) AS includesOperationId
FROM (SELECT #r := %i) AS tmp
INNER JOIN operations
WHERE #r > 0 AND #r IS NOT NULL
From what I understand, the query tries to join back onto itself building a tree of some sort??
For some reason, this query must have worked on some previous version of MySQL (5.0??) but with the current version (MySQL 5.7) the query fails.
Is there any 'mysql whisperer' out there who can explain to me:
what the query attempts to do?
why it worked on some previous version but not anymore?
how to change the query to make it work again?
thanks a million in advance.
Update:
The operations table definition and data:
+-------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+----------------+
| operationId | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| operation | varchar(40) | NO | UNI | NULL | |
| description | text | YES | | NULL | |
+-------------+---------------------+------+-----+---------+----------------+
+-------------+-----------+-------------+
| operationId | operation | description |
+-------------+-----------+-------------+
| 1 | add | NULL |
| 2 | delete | NULL |
| 3 | edit | NULL |
| 4 | view | NULL |
| 5 | disable | NULL |
| 6 | execute | NULL |
+-------------+-----------+-------------+
The query is attempting to do some sort of tree traversal. I don't know that it would work in any version of MySQL, but my best guess is that the intention is something like this:
SELECT #r AS _operationId,
#r := (SELECT operationId
FROM operations
WHERE operationId = #r
) AS includesOperationId
FROM operations CROSS JOIN
(SELECT #r := %i) params
WHERE #r > 0 AND #r IS NOT NULL;
Having said that, if this happens to work, there is no guarantee that it will work again or in another version of MySQL. This violates two rules of using variables:
A variable assigned in one expression in a SELECT should not be used in another. The order of evaluation of expressions is not defined, so the expressions could be evaluated in any order.
There is not guarantee on when the conditions in the WHERE clause using variables are evaluated and definitely no guarantee about some sort of "sequential" evaluation with respect to the SELECT.
The subquery is also problematic.
The good news is that if operations has no column called _operationId, then the query should fail on all versions of MySQL with an undefined column type of error (although perhaps older versions did something funky).
The bad news is that if you want to walk through a hierarchy in MySQL, you either need to change the data structure or use a stored procedure.

MySQL - search for formatted conbinations of column values

I have a database where a table has, amongst other, two columns: group and article. Both are int but is always formatted before shown to the user:
The group is always shown as 2 digits, with leading zeros if
needed.
The article is always shown as 4 digits, with leading zeros
if needed.
The group and article are separated with a dash -
Example:
An item with group = 1 and article = 23 will be shown to the user as 01-0023.
Moving on.
I have a php-script in which as user can search for an article. The user will of course write the article in the formatted way. The script I have today, uses regular expressions to separate a search for an article from free-text and then isolates the group and article from each other before searching the database.
My question is, is it possible to pass the formatted string (e.g. 01-0023) in the query instead and if so, how would I manipulate the SQL?
If you want a comparison to a value shown to the user, you can do:
where `group` = substring_index(#Group_Article, '-', 1) + 0 AND
`article` = substring_index(#Group_Article, '-', -1) + 0
SqlFiddleDemo
SELECT CAST(LEFT('01-0023', 2) AS UNSIGNED) AS `Group`
,CAST(RIGHT('01-0023',4) AS UNSIGNED) AS `Article`
Alter your table like that
ALTER TABLE `my_table`
CHANGE COLUMN `group` `group` INT(2) ZEROFILL,
CHANGE COLUMN `article` `article` INT(4) ZEROFILL;
Create a view
CREATE VIEW `v_my_table` AS
SELECT id,article,`group`,CONCAT(`group`,"-",`article`) AS conbined FROM my_table;
Use your view for select statements
mysql> select * from v_my_table;
+----+---------+-------+----------+
| id | article | group | conbined |
+----+---------+-------+----------+
| 1 | 0005 | 03 | 03-0005 |
| 2 | 0005 | 03 | 03-0005 |
| 3 | 0021 | 12 | 12-0021 |
| 4 | 0212 | 55 | 55-0212 |
| 5 | 2113 | 04 | 04-2113 |
+----+---------+-------+----------+

SQL to add a summary row to MySQL result set

If I have a MySQL table such as:
I want to use SQL to calculate the sum of the PositiveResult column and also the NegativeResult column. Normally I could simply do SUM(PositiveResult) in a query.
But what if I wanted to go a step further and place the totals in a row at the bottom of the result set:
Can this be achieved at the data level or is it a presentation layer issue? If it can be done by SQL, how might I do this? I am a bit of an SQL newbie.
Thanks to the respondents. I will now check things with the customer.
Also, can a text column be added so that the value of the last row of data is not shown in the summary row? Like this:
I would also do this in the presentation layer, but you can do it MySQL...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,pos DECIMAL(5,2)
,neg DECIMAL(5,2)
);
INSERT INTO my_table VALUES
(1,0,0),
(2,1,-2.5),
(3,1.6,-1),
(4,1,-2);
SELECT COALESCE(id,'total') my_id,SUM(pos),SUM(neg) FROM my_table GROUP BY id WITH ROLLUP;
+-------+----------+----------+
| my_id | SUM(pos) | SUM(neg) |
+-------+----------+----------+
| 1 | 0.00 | 0.00 |
| 2 | 1.00 | -2.50 |
| 3 | 1.60 | -1.00 |
| 4 | 1.00 | -2.00 |
| total| 3.60 | -5.50 |
+-------+----------+----------+
5 rows in set (0.02 sec)
Here's a hack for the amended problem - it ain't pretty but I think it works...
SELECT COALESCE(id,'') my_id
, SUM(pos)
, SUM(neg)
, COALESCE(string,'') n
FROM my_table
GROUP
BY id
, string
WITH ROLLUP
HAVING n <> '' OR my_id = ''
;
select keyword,sum(positiveResults)+sum(NegativeResults)
from mytable
group by
Keyword
if you need the absolute value put sum(abs(NegativeResults)
This should be handled at least one layer above the SQL query layer.
The initial query can fetch the detail info and then the application layer can calculate the aggregation (summary row). Or, a second db call to fetch the summary directly can be used (although this would be efficient only for cases where the calculation of the summary is very resource-intensive and a second db call is really necessary - most of the time the app layer can do it more efficiently).
The ordering/layout of the results (i.e. the detail rows followed by the "footer" summary row) should be handled at the presentation layer.
I'd recommend doing this at the presentation layer. To do something like this in SQL is also possible.
create table test (
keywordid int,
positiveresult decimal(10,2),
negativeresult decimal(10,2)
);
insert into test values
(1, 0, 0), (2, 1, -2.5), (3, 1.6, -1), (4, 1, -2);
select * from (
select keywordid, positiveresult, negativeresult
from test
union all
select null, sum(positiveresult), sum(negativeresult) from test
) main
order by
case when keywordid is null then 1000000 else keywordid end;
I added ordering using a arbitrarily high number if keywordid is null to make sure the ordered recordset can be pulled easily by the view for displaying.
Result:
+-----------+----------------+----------------+
| keywordid | positiveresult | negativeresult |
+-----------+----------------+----------------+
| 1 | 0.00 | 0.00 |
| 2 | 1.00 | -2.50 |
| 3 | 1.60 | -1.00 |
| 4 | 1.00 | -2.00 |
| NULL | 3.60 | -5.50 |
+-----------+----------------+----------------+