UNION in MySQL 5.7.2 - mysql

I'm using MySQL 5.7.
I am getting bad results by a UNION of COUNT(*).
SELECT
COUNT(*) AS Piezas
, ''Motor
from parque
where `parque`.`CausasParalizacion` = 2
UNION
SELECT
''Piezas
, COUNT(*) AS Motor
from parque
where `parque`.`CausasParalizacion` = 3
The result should be 30 and 12, and I am getting 3330 and 3132.
Can anyone help?

I don't think MySQL is returning a "bad" result. The results returned by MySQL are per the specification.
Given no GROUP BY each of the SELECT statements will return one row. We can verify by running each SELECT statement separately. We'd expect the UNION result of the two SELECT to be something like
Piezas Motor
------ -----
mmm
ppp
You say the results should be '30' and '12'
My guess is that MySQL is returning the characters '30' and '12'.
But we should be very suspicious, and note the hex representation of the ASCII encoding of those characters
x'30' -> '0'
x'31' -> '1'
x'32' -> '2'
x'33' -> '3'
As a demonstration
SELECT HEX('30'), HEX('12')
returns
HEX('30') HEX('12')
--------- ---------
3330 3132
I don't think MySQL is returning "bad" results. I suspect that the column metadata for the columns is confusing the client. (We do note that both of the columns is a mix of two different datatypes being UNION'd. On one row, the datatype is string/varchar (an empty string), and the other row is integer/numeric (result of COUNT() aggregate.)
And I'm not sure what the resultset metadata for the columns ends up as.
I suspect that the issue with the client interpretation the resultset metadata, determining the datatype of the columns. And the client is deciding that the most appropriate way to display the values is as a hex representation of the raw bytes.
Personally, I would avoid returning a UNION result of different/incompatible datatypes. I'd prefer the datatypes be consistent.
If I had to do the UNION of incompatible datatypes, I would include an explicit conversion into compatible/appropriate datatypes.
But once I am at that point, I have to question why I need any of that rigmarole with the mismatched datatypes, why we need to return two separate rows, when we could just return a single row (probably more efficiently to boot)
SELECT SUM( p.`CausasParalizacion` = 2 ) AS Piezas
, SUM( p.`CausasParalizacion` = 3 ) AS Motor
FROM parque p
WHERE p.`CausasParalizacion` IN (2,3)
To avoid the aggregate functions returning NULL,
we can wrap the aggregate expressions in an IFNULL (or ANSI-standard COALESCE) function..
SELECT IFNULL(SUM( p.`CausasParalizacion` = 2 ),0) AS Piezas
, IFNULL(SUM( p.`CausasParalizacion` = 3 ),0) AS Motor
FROM parque p
WHERE p.`CausasParalizacion` IN (2,3)
-or-
we could use a COUNT() of an expression that is either NULL or non-NULL
SELECT COUNT(IF( p.`CausasParalizacion` = 2 ,1,NULL) AS Piezas
, COUNT(IF( p.`CausasParalizacion` = 3 ,1,NULL) AS Motor
FROM parque p
WHERE p.`CausasParalizacion` IN (2,3)
If, for some reason it turns out it is faster to run two separate SELECT statements, we could still combine the results into a single row. For example:
SELECT s.Piezas
, t.Motor
FROM ( SELECT COUNT(*) AS Piezas
FROM parque p
WHERE p.`CausasParalizacion` = 2
) s
CROSS
JOIN ( SELECT COUNT(*) AS Motor
FROM parque q
WHERE q.`CausasParalizacion` = 3
) t

Spencer, I think that the problem was about encoding. Ej. When I execute the consult in console, the result was the expected, the otherwise in the phpmyadmin.
However, I must say that your first solution works perfectly, Thanks a lot bro.

Related

mysql - query to extract report from book register

I have the below query in mysql, when I run the query, it gives me the complete report and "where clause does not work"
SELECT oo.dateaccessioned AS 'Date',
oo.barcode AS 'Acc. No.',
ooo.title AS 'Title',
ooo.author AS 'Author/Editor',
concat_ws(' , ', o.editionstatement, oo.enumchron) AS 'Ed./Vol.',
concat_ws(' ', o.place, o.publishercode) AS 'Place & Publisher',
ooo.copyrightdate AS 'Year', o.pages AS 'Page(s)',
ooooooo.name AS 'Source',
oo.itemcallnumber AS 'Class No./Book No.',
concat_ws(', ₹', concat(' ', ooooo.symbol, oooo.listprice), oooo.rrp_tax_included) AS 'Cost',
concat_ws(' , ', oooooo.invoicenumber, oooooo.shipmentdate) AS 'Bill No. & Date',
'' AS 'Withdrawn Date',
'' AS 'Remarks'
FROM biblioitems o
LEFT JOIN items oo ON oo.biblioitemnumber=o.biblioitemnumber
LEFT JOIN biblio ooo ON ooo.biblionumber=o.biblionumber
LEFT JOIN aqorders oooo ON oooo.biblionumber=o.biblionumber
LEFT JOIN currency ooooo ON ooooo.currency=oooo.currency
LEFT JOIN aqinvoices oooooo ON oooooo.booksellerid=oo.booksellerid
LEFT JOIN aqbooksellers ooooooo ON ooooooo.id=oo.booksellerid
WHERE cast(oo.barcode AS UNSIGNED) BETWEEN <<Accession Number>> AND <<To Accession Number>>
GROUP BY oo.barcode
ORDER BY oo.barcode ASC
Can you please help me to generate a report based on above query - oo.barcode (it is a varchar). I am a Library team member than a database administrator. My oo.barcode begins with HYD and then numercs. I know if it(oo.barcode) is a number only field the above query works without any issue.
I search about how cast works but not able to understand as i am not into database administration.
If the barcode column is VARCHAR and begins with "HYD", CAST AS UNSIGNED will cause a value of HYD123 to result in 0.
The non-numeric characters of the string would need to be removed prior to casting the value as an integer.
This can be achieved by trimming the leading text "HYD" from the barcode.
CAST(TRIM(LEADING 'HYD' FROM barcode) AS UNSIGNED)
Otherwise, if the prefix is always 3 characters, the substring position of barcode can be used.
CAST(SUBSTR(barcode, 4) AS UNSIGNED)
If any other non-numeric characters are contained within the string, such as HYD-123-456-789, HYD123-456-789PT, HYD123-456.789, etc, they will also needed to be removed, as the type conversion will treat them in unexpected ways.
In addition, any leading 0's of the resulting numeric string value will be truncated from the resulting integer, causing 0123 to become 123.
For more details on how CAST functions see: 12.3 Type Conversion in Expression Evaluation
Examples db<>fiddle
CREATE TABLE tester (
barcode varchar(255)
);
INSERT INTO tester(barcode)
VALUES ('HYD123'), ('HYD0123'), ('HYD4231');
Results
SELECT cast(barcode AS UNSIGNED)
FROM tester;
cast(barcode AS UNSIGNED)
0
0
0
SELECT CAST(TRIM(LEADING 'HYD' FROM barcode) AS UNSIGNED)
FROM tester;
CAST(TRIM(LEADING 'HYD' FROM barcode) AS UNSIGNED)
123
123
4231
SELECT barcode
FROM tester
WHERE CAST(TRIM(LEADING 'HYD' FROM barcode) AS UNSIGNED) BETWEEN 120 AND 4232;
barcode
HYD123
HYD0123
HYD4231
SELECT CAST(SUBSTR(barcode, 4) AS UNSIGNED)
FROM tester;
CAST(SUBSTR(barcode, 4) AS UNSIGNED)
123
123
4231
SELECT barcode
FROM tester
WHERE CAST(SUBSTR(barcode, 4) AS UNSIGNED) BETWEEN 120 AND 4232;
barcode
HYD123
HYD0123
HYD4231
JOIN optimization
To obtain the expected results, you most likely want an INNER JOIN of the items table with an ON criteria matching the desired barcode range condition. Since INNER JOIN is the equivalent of using WHERE oo.barcode IS NOT NULL, as is the case with your current criteria - NULL matches within the items table are already being excluded.
INNER JOIN items AS oo
ON oo.biblioitemnumber = o.biblioitemnumber
AND CAST(SUBSTR(oo.barcode, 4) AS UNSIGNED) BETWEEN ? AND ?
Full-Table Scanning
It is important to understand that transforming the column value to suit a criteria will cause a full-table scan that does not benefit from indexing, which will run very slowly.
Instead it is best to store the integer only version of the value in the database to see the benefits of indexing.
This can be accomplished in many ways, such as generated columns.
GROUP BY without an aggregate
Lastly, you should avoid using GROUP BY without an aggregate function. You most likely are expecting DISTINCT or similar form of limiting the record set. Please see MySQL select one column DISTINCT, with corresponding other columns on ways to accomplish this.
To ensure MySQL is not selecting "any value from each group" at random (leading to differing results between query executions), limit the subset data to the distinct biblioitemnumber column values from the available barcode matches. One approach to accomplish the limited subset is as follows.
/* ... */
FROM biblioitems o
INNER JOIN (
SELECT biblioitemnumber, barcode, booksellerid, enumchron, itemcallnumber
FROM items WHERE biblioitemnumber IN(
SELECT MIN(biblioitemnumber)
FROM items
WHERE CAST(SUBSTR(barcode, 4) AS UNSIGNED) BETWEEN ? AND ?
GROUP BY barcode
)
) AS oo
ON oo.biblioitemnumber = o.biblioitemnumber
LEFT JOIN biblio ooo ON ooo.biblionumber=o.biblionumber
LEFT JOIN aqorders oooo ON oooo.biblionumber=o.biblionumber
LEFT JOIN currency ooooo ON ooooo.currency=oooo.currency
LEFT JOIN aqinvoices oooooo ON oooooo.booksellerid=oo.booksellerid
LEFT JOIN aqbooksellers ooooooo ON ooooooo.id=oo.booksellerid
ORDER BY oo.barcode ASC
Try this :
...
WHERE cast(SUBSTRING_INDEX(oo.barcode,'HYD',-1) AS UNSIGNED INTEGER) BETWEEN <<Accession Number>> AND <<To Accession Number>>
...
SUBSTRING_INDEX(oo.barcode,'HYD',-1) will transform HYD132453741 to 132453741
demo here

Aggregating row values in MySQl or Snowflake

I would like to calculate the std dev. min and max of the mer_data array into 3 other fields called std_dev,min_mer and max_mer grouped by mac and timestamp.
This needs to be done without flattening the data as each mer_data row consists of 4000 float values and multiplying that with 700k rows gives a very high dimensional table.
The mer_data field is currently saved as varchar(30000) and maybe Json format might help, I'm not sure.
Input:
Output:
This can be done in Snowflake or MySQL.
Also, the query needs to be optimized so that it does not take much computation time.
While you don't want to split the data up, you will need to if you want to do it in pure SQL. Snowflake has no problems with such aggregations.
WITH fake_data(mac, mer_data) AS (
SELECT * FROM VALUES
('abc','43,44.25,44.5,42.75,44,44.25,42.75,43'),
('def','32.75,33.25,34.25,34.5,32.75,34,34.25,32.75,43')
)
SELECT f.mac,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM fake_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
ORDER BY 1;
I would however discourage the use of strings in the grouping process, so would break it apart like so:
WITH fake_data(mac, mer_data, timestamp) AS (
SELECT * FROM VALUES
('abc','43,44.25,44.5,42.75,44,44.25,42.75,43', '01-01-22'),
('def','32.75,33.25,34.25,34.5,32.75,34,34.25,32.75,43', '02-01-22')
), boost_data AS (
SELECT seq8() as seq, *
FROM fake_data
), math_step AS (
SELECT f.seq,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM boost_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
)
SELECT b.mac,
m.avg_dev,
m.std_dev,
m.MIN_MER,
m.Max_MER,
b.timestamp
FROM boost_data b
JOIN math_step m
ON b.seq = m.seq
ORDER BY 1;
MAC
AVG_DEV
STD_DEV
MIN_MER
MAX_MER
TIMESTAMP
abc
43.5625
0.7529703087
42.75
44.5
01-01-22
def
34.611111111
3.226141056
32.75
43
02-01-22
performance testing:
so using this SQL to make 70K rows of 4000 values each:
create table fake_data_tab AS
WITH cte_a AS (
SELECT SEQ8() as s
FROM TABLE(GENERATOR(ROWCOUNT =>70000))
), cte_b AS (
SELECT a.s, uniform(20::float, 50::float, random()) as v
FROM TABLE(GENERATOR(ROWCOUNT =>4000))
CROSS JOIN cte_a a
)
SELECT s::text as mac
,LISTAGG(v,',') AS mer_data
,dateadd(day,s,'2020-01-01')::date as timestamp
FROM cte_b
GROUP BY 1,3;
takes 79 seconds on a XTRA_SMALL,
now with that we can test the two solutions:
The second set of code (group by numbers, with a join):
WITH boost_data AS (
SELECT seq8() as seq, *
FROM fake_data_tab
), math_step AS (
SELECT f.seq,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER
FROM boost_data f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1
)
SELECT b.mac,
m.avg_dev,
m.std_dev,
m.MIN_MER,
m.Max_MER,
b.timestamp
FROM boost_data b
JOIN math_step m
ON b.seq = m.seq
ORDER BY 1;
takes 1m47s
the original group by strings/dates
SELECT f.mac,
avg(d.value::float) as avg_dev,
stddev(d.value::float) as std_dev,
MIN(d.value::float) as MIN_MER,
Max(d.value::float) as Max_MER,
f.timestamp
FROM fake_data_tab f, table(split_to_table(f.mer_data,',')) d
GROUP BY 1,6
ORDER BY 1;
takes 1m46s
Hmm, so leaving the "mac" as a number made the code very fast (~3s), and dealing with strings in ether way changed the data processed from 1.5GB for strings and 150MB for numbers.
If the numbers were in rows, not packed together like that, we can discuss how to do it in SQL.
In rows, GROUP_CONCAT(...) can construct a commalist like you show, and MIN(), STDDEV(), etc can do the other stuff.
If you continue to have the commalist, the do the rest of work in you app programming language. (It is very ugly to have SQL pick apart an array.)

Is there a way to use aggregate COUNT() values within CASE?

I need to retrieve unique yet truncated part numbers, with their description values being conditionally determined.
DATA:
Here's some simplified sample data:
(the real table has half a million rows)
create table inventory(
partnumber VARCHAR(10),
description VARCHAR(10)
);
INSERT INTO inventory (partnumber,description) VALUES
('12345','ABCDE'),
('123456','ABCDEF'),
('1234567','ABCDEFG'),
('98765','ZYXWV'),
('987654','ZYXWVU'),
('9876543','ZYXWVUT'),
('abcde',''),
('abcdef','123'),
('abcdefg','321'),
('zyxwv',NULL),
('zyxwvu','987'),
('zyxwvut','789');
TRIED:
I've tried too many things to list here.
I've finally found a way to get past all the 'unknown field' errors and at least get SOME results, but:
it's SUPER kludgy!
my results are not limited to unique prods.
Here's my current query:
SELECT
LEFT(i.partnumber, 6) AS prod,
CASE
WHEN agg.cnt > 1
OR i.description IS NULL
OR i.description = ''
THEN LEFT(i.partnumber, 6)
ELSE i.description
END AS `descrip`
FROM inventory i
INNER JOIN (SELECT LEFT(ii.partnumber, 6) t, COUNT(*) cnt
FROM inventory ii GROUP BY ii.partnumber) AS agg
ON LEFT(i.partnumber, 6) = agg.t;
GOAL:
My goal is to retrieve:
prod
descrip
12345
ABCDE
123456
123456
98765
ZYXWV
987654
987654
abcde
abcde
abcdef
abcdef
zyxwv
zyxwv
zyxwvu
zyxwvu
QUESTION:
What are some cleaner ways to use the COUNT() aggregate data with a CASE type conditional?
How can I limit my results so that all prods are UNIQUE?
You can check if a left(partnumber, 6) is not unique in the result by checking if count(*) > 1. In such a case let descrip be left(partnumber, 6). Otherwise you can use max(description) (or min(description)) to get the single description but satisfy the needs to use an aggregation function on columns not in the GROUP BY. To replace empty or NULL descriptions, nullif() and coalesce() can be used.
That would lead to the following using just one level of aggregation and no joins:
SELECT left(partnumber, 6) AS prod,
CASE
WHEN count(*) > 1 THEN
left(partnumber, 6)
ELSE
coalesce(nullif(max(description), ''), left(partnumber, 6))
END AS descrip
FROM inventory
GROUP BY left(partnumber, 6)
ORDER BY left(partnumber, 6);
But there seems to be a bug in MySQL and this query fails. The engine doesn't "see" that, in the list after SELECT partnumber is only used in the expression left(partnumber, 6), which is also in the GROUP BY. Instead the engine falsely complains about partnumber not being in the GROUP BY and not subject to an aggregation function.
As a workaround, we can use a derived table, that does the shortening of partnumber to its first six characters. We then use use that column of the derived table instead of left(partnumber, 6).
SELECT l6pn AS prod,
CASE
WHEN count(*) > 1 THEN
l6pn
ELSE
coalesce(nullif(max(description), ''), l6pn)
END AS descrip
FROM (SELECT left(partnumber, 6) AS l6pn,
description
FROM inventory) AS x
GROUP BY l6pn
ORDER BY l6pn;
Or we slap some actually pointless max()es around the left(partnumber, 6) other than the first, to work around the bug.
SELECT left(partnumber, 6) AS prod,
CASE
WHEN count(*) > 1 THEN
max(left(partnumber, 6))
ELSE
coalesce(nullif(max(description), ''), max(left(partnumber, 6)))
END AS descrip
FROM inventory
GROUP BY left(partnumber, 6)
ORDER BY left(partnumber, 6);
db<>fiddle (Change the DBMS to some other like Postgres or MariaDB to see that they also accept the first query.)

query optimization for mysql

I have the following query which takes about 28 seconds on my machine. I would like to optimize it and know if there is any way to make it faster by creating some indexes.
select rr1.person_id as person_id, rr1.t1_value, rr2.t0_value
from (select r1.person_id, avg(r1.avg_normalized_value1) as t1_value
from (select ma1.person_id, mn1.store_name, avg(mn1.normalized_value) as avg_normalized_value1
from matrix_report1 ma1, matrix_normalized_notes mn1
where ma1.final_value = 1
and (mn1.normalized_value != 0.2
and mn1.normalized_value != 0.0 )
and ma1.user_id = mn1.user_id
and ma1.request_id = mn1.request_id
and ma1.request_id = 4 group by ma1.person_id, mn1.store_name) r1
group by r1.person_id) rr1
,(select r2.person_id, avg(r2.avg_normalized_value) as t0_value
from (select ma.person_id, mn.store_name, avg(mn.normalized_value) as avg_normalized_value
from matrix_report1 ma, matrix_normalized_notes mn
where ma.final_value = 0 and (mn.normalized_value != 0.2 and mn.normalized_value != 0.0 )
and ma.user_id = mn.user_id
and ma.request_id = mn.request_id
and ma.request_id = 4
group by ma.person_id, mn.store_name) r2
group by r2.person_id) rr2
where rr1.person_id = rr2.person_id
Basically, it aggregates data depending on the request_id and final_value (0 or 1). Is there a way to simplify it for optimization? And it would be nice to know which columns should be indexed. I created an index on user_id and request_id, but it doesn't help much.
There are about 4907424 rows on matrix_report1 and 335740 rows on matrix_normalized_notes table. These tables will grow as we have more requests.
First, the others are right about knowing better how to format your samples. Also, trying to explain in plain language what you are trying to do is also a benefit. With sample data and sample result expectations is even better.
However, that said, I think it can be significantly simplified. Your queries are almost completely identical with the exception of the one field of "final_value" = 1 or 0 respectively. Since each query will result in 1 record per "person_id", you can just do the average based on a CASE/WHEN AND remove the rest.
To help optimize the query, your matrix_report1 table should have an index on ( request_id, final_value, user_id ). Your matrix_normalized_notes table should have an index on ( request_id, user_id, store_name, normalized_value ).
Since your outer query is doing the average based on an per stores averages, you do need to keep it nested. The following should help.
SELECT
r1.person_id,
avg(r1.ANV1) as t1_value,
avg(r1.ANV0) as t0_value
from
( select
ma1.person_id,
mn1.store_name,
avg( case when ma1.final_value = 1
then mn1.normalized_value end ) as ANV1,
avg( case when ma1.final_value = 0
then mn1.normalized_value end ) as ANV0
from
matrix_report1 ma1
JOIN matrix_normalized_notes mn1
ON ma1.request_id = mn1.request_id
AND ma1.user_id = mn1.user_id
AND NOT mn1.normalized_value in ( 0.0, 0.2 )
where
ma1.request_id = 4
AND ma1.final_Value in ( 0, 1 )
group by
ma1.person_id,
mn1.store_name) r1
group by
r1.person_id
Notice the inner query is pulling all transactions for the final value as either a zero OR one. But then, the AVG is based on a case/when of the respective value for the normalized value. When the condition is NOT the 1 or 0 respectively, the result is NULL and is thus not considered when the average is computed.
So at this point, it is grouped on a per-person basis already with each store and Avg1 and Avg0 already set. Now, roll these values up directly per person regardless of the store. Again, NULL values should not be considered as part of the average computation. So, if Store "A" doesn't have a value in the Avg1, it should not skew the results. Similarly if Store "B" doesnt have a value in Avg0 result.

Return a boolean value if string matches one of list MYSQL

I have 2 tables
SEQUENCES
-----------------
sequence (blob)
KNOWN_SEQUENCES
-----------------
sequence (blob)
I need to return a list of all entries in the sequences table and Id like to return a boolean if it is in the known table list
sequence known
----------------------------------
111423fa686ca 0
066787caf5671 1
See use of 'CASE' in mysql. http://dev.mysql.com/doc/refman/5.0/en/case-statement.html.
I am posting a sample here. Sorry I dont have access to a sql server right now to test this. Try to see if something like this helps.
Select s.sequence,
CASE
WHEN (select count(*) from KNOWN_SEQUENCES k where k.sequence = s.sequence) > 0 THEN '1'
ESLE '0'
END
`known`,
from SEQUENCES s;
Also, indexing the table 'KNOWN_SEQUENCES' on column 'sequence' might be better keeping performance in mind.
I did get it working, but it takes 3 seconds for only 9,000 records. That may be the best I can do. Luckily I just need to run this once.
SELECT
sequences.*,
1 as `known`
FROM
sequences, known_sequences
WHERE sequences.sequence = known_sequences.sequence
UNION
SELECT
sequences.*,
0 as `known`
FROM
sequences, known_sequences
WHERE sequences.sequence NOT IN(
SELECT known_sequences.sequence
FROM known_sequences) GROUP BY sequences.id
This might be a little long, but it should work.
SELECT
s.*,
1 as `known`
FROM
sequence AS s
INNER JOIN
known_sequences AS ks
ON
s.sequence = ks.sequence
UNION
SELECT
s.sequence,
0 AS `known`
FROM
sequence AS s
WHERE
s.sequence NOT IN
(
SELECT
s2.sequence
FROM
sequence AS s2
LEFT JOIN
known_sequences AS ks
ON
s.sequence == ks.sequence
)