Problems with an UNION MySQL - mysql

I have this query:
SELECT COUNT(*) AS invoice_count, IFNULL(SUM(qa_invoices.invoice_total), 0)
AS invoice_total, IFNULL(SUM(qa_invoices.invoice_discount) ,0) AS invoice_discount
FROM qa_invoices
WHERE (DATE(qa_invoices.invoice_date) BETWEEN '12/06/25' AND '12/06/25')
AND qa_invoices.status_code IN (5, 8)
UNION
SELECT IFNULL(SUM(qa_returns.client_credit), 0)
FROM qa_returns
WHERE (DATE(qa_returns.returnlog_date) BETWEEN '12/06/25' AND '12/06/25');
I get the error:
The used SELECT statements have a different number of columns.
I'm trying to join this 2 selects with an UNION command, if we look returnlog_date and invoice_date have the same data condition, if there is any way to perform both queries into one would be better.

Use a subselect:
SELECT
COUNT(*) AS invoice_count,
IFNULL(SUM(invoice_total), 0) AS invoice_total,
IFNULL(SUM(invoice_discount), 0) AS invoice_discount,
(
SELECT IFNULL(SUM(qa_returns.client_credit), 0)
FROM qa_returns
WHERE qa_returns.returnlog_date >= '2012-06-25'
AND qa_returns.returnlog_date < '2012-06-26'
) AS client_credit
FROM qa_invoices
WHERE invoice_date >= '2012-06-25'
AND invoice_date < '2012-06-26'
AND status_code IN (5, 8)

The error is telling you exactly what the problem is, for a UNION you have to have the same number of columns in each query.
I am not sure which column in your second query corresponds to your first query, but you can insert a zero in your second query.
Something like this:
SELECT COUNT(*) AS invoice_count
, IFNULL(SUM(qa_invoices.invoice_total), 0) AS invoice_total
, IFNULL(SUM(qa_invoices.invoice_discount) ,0) AS invoice_discount
FROM qa_invoices
WHERE (DATE(qa_invoices.invoice_date) BETWEEN '12/06/25' AND '12/06/25')
AND qa_invoices.status_code IN (5, 8)
UNION
SELECT 0
, IFNULL(SUM(qa_returns.client_credit), 0)
, 0
FROM qa_returns
WHERE (DATE(qa_returns.returnlog_date) BETWEEN '12/06/25' AND '12/06/25');

Result set you union together have to have the exact same columns.

Well in order to do a UNION u need to have same number of columns

Related

Drop Off Funnel in SQL

I have a table that has user_seq_id and no of days a user was active in the program. I want to understand the drop-off funnel. Like how many users were active on day 0 (100%) and on day 1, 2 and so on.
Input table :
create table test (
user_seq_id int ,
NoOfDaysUserWasActive int
);
insert into test (user_seq_id , NoOfDaysUserWasActive)
values (13451, 2), (76453, 1), (22342, 3), (11654, 0),
(54659, 2), (64420, 1), (48906, 5);
I want Day, ActiveUsers, and % Distribution of these users.
One method doesn't use window functions at all. Just a list of days and aggregation:
select v.day, count(t.user_seq_id),
count(t.user_seq_id) / c.cnt as ratio
from (select 0 as day union all select 1 union all select 2 union all select 3 union all select 4 union all select 5
) v(day) left join
test t
on v.day <= t.NoOfDaysUserWasActive cross join
(select count(*) as cnt from test) c
group by v.day, c.cnt
order by v.day asc;
Here is a db<>fiddle.
The mention of window function suggests that you are thinking:
select NoOfDaysUserWasActive,
sum(count(*)) over (order by NoOfDaysUserWasActive desc) as cnt,
sum(count(*)) over (order by NoOfDaysUserWasActive desc) / sum(count(*)) over () as ratio
from test
group by NoOfDaysUserWasActive
order by NoOfDaysUserWasActive
The problem is that this does not "fill in" the days that are not explicitly in the original data. If that is not an issue, then this should have better performance.

Mysql LEFT to match first 3 chars

Im trying to get all matching records from the invoice_id field where the first 3 characters are RBK, case sensitivity not important. I've tried to use the LEFT function in the bottom 2 ways but its not working. Any ideas on how to achieve this?
SELECT *, IF( LEFT( invoice_id, 3) = 'RBK') FROM `invoices` ORDER BY id ASC
SELECT *, IF( LEFT( invoice_id, 3) = 'RBK', 3, 0) FROM `invoices` ORDER BY id ASC
an if inside the select is not to filter results,if you want to filter result use where clause.
SELECT * FROM `invoices` WHERE LEFT(invoice_id, 3) = "RBK" ORDER BY id ASC

aggregation condition in case when

I have a dataset with a structure similar to the one bellow
fruit, value
apple, 234
apple, 2341
pear, 3233
grape, 323
pear, 3234
grap 1234
I am trying to find a count of a range of the numbers that are in the bottom 10% of the range by performing a query like the one below. (the ultimate goal of the query is to calculate and see the ranges of the calc go up in increments of 10%) I also have a group by clause so I would like the counts to be grouped by the fruit and aggregated that way. Bellow is the query I have tried
select fruit, count(case when (value <= (((max(value) - min(value)) * .1) + min(value))) then 1 end)
from fruit_juice
group by substring(fruit, 5, 5);
Aggregate the table in the from clause to get the limits you want. Join those results back to your query and use those values for the query:
select substring(fj.fruit, 5, 5),
sum(fj.value <= fmm.minv + (fmm.maxv - fmm.minv) * 0.1)
from fruit_juice fj join
(select substring(fruit, 5, 5) as fruit5,
max(value) as maxv, min(value) as minv
from fruit_juice
group by substring(fruit, 5, 5)
) fmm
on fmm.fruit5 = substring(fj.fruit, 5, 5)
group by substring(fruit, 5, 5);
Note that your group by expressions should match the expressions in the select clause.
EDIT:
I'm not sure where the substring() is coming from in your question, so this version removes it:
select fj.fruit, sum(fj.value <= fmm.minv + (fmm.maxv - fmm.minv) * 0.1)
from fruit_juice fj join
(select fruit,
max(value) as maxv, min(value) as minv
from fruit_juice
group by fruit
) fmm
on fmm.fruit = fj.fruit
group by fruit;

How to generate data in MySQL?

Here is my SQL:
SELECT
COUNT(id),
CONCAT(YEAR(created_at), '-', MONTH(created_at), '-', DAY(created_at))
FROM my_table
GROUP BY YEAR(created_at), MONTH(created_at), DAY(created_at)
I want a row to show up even for days where there was no ID created. Right now I'm missing a ton of dates for days where there was no activity.
Any thoughts on how to change this query to do that?
SQL is notoriously bad at returning data that is not in the database. You can find the beginning and ending values for gaps of dates, but getting all the dates is hard.
The solution is to create a calendar table with one record for each date and OUTER JOIN it to your query.
Here is an example assuming that created_at is type DATE:
SELECT calendar_date, COUNT(`id`)
FROM calendar LEFT OUTER JOIN my_table ON calendar.calendar_date = my_table.created_at
GROUP BY calendar_date
(I'm guessing that created_at is really DATETIME, so you'll have to do a bit more gymnastics to JOIN the tables).
General idea
There are two main approaches to generating data in MySQL. One is to generate the data on the fly when running the query and the other one is to have it in the database and using it when necessary. Of course, the second one would be faster than the first one if you're going to run your query frequently. However, the second one will require a table in the database which only purpose will be to generate the missing data. It will also require you to have privileges enough to create that table.
Dynamic data generation
This approach involves making UNIONs to generate a fake table that can be used to join the actual table with. The awful and repetitive query is:
select aDate from (
select #maxDate - interval (a.a+(10*b.a)+(100*c.a)+(1000*d.a)) day aDate from
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) a, /*10 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) b, /*100 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) c, /*1000 day range*/
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) d, /*10000 day range*/
(select #minDate := '2001-01-01', #maxDate := '2002-02-02') e
) f
where aDate between #minDate and #maxDate
Anyway, it is simpler than it seems. It makes cartesian products of derived tables with 10 numeric values so the result will have 10^X rows where X is the amount of derived tables in the query. In this example there is 10000 day range so you would be able to represent periods of over 27 years. If you need more, add another UNION to the query and update the interval, and if you don't need so many you can remove UNIONs or individual values from the derived tables. Just to clarify, you can fine tune the date period by applying a filter with a WHERE clause on #minDate and #maxDate variables (but don't use a longer period than the one you created with the cartesian products).
Static data generation
This solution will require you to generate a table in your database. The approach is similar to the previous one. You'll have to first insert data into that table: a range of integers ranging from 1 to X where X is the maximum needed range. Again, if you are unsure just insert 100000 values and you'll be able to create day ranges for over 273 years. So, once you've got the integer sequence, you can transform it into a date range like this:
select '2012-01-01' + interval value - 1 day aDay from seq
having aDay <= '2012-01-05'
Assuming a table named seq with a column named value. On top the from date and at the bottom the to date.
Turning this into something useful
Ok, now we have our date periods generated but we're still missing a way to query data and display the missing values as an actual 0. This is where left join comes to the rescue. To make sure we're all on the same page, a left join is similar to an inner join but with only one difference: it will preserve all records from the left table of the join, regardless of whether there is a matching record on the table of the right. In other words, an inner join will remove all non-matched rows on the join while the left join will keep the ones on the left table and, for the records on the left that have no matching record on the right table, the left join will fill that "space" with a null value.
So we should join our domain table (the one that has "missing" data) with our newly generated table putting the latter on the left part of the join and the former on the right, so that all elements are considered, regardless of their presence in the domain table.
For example, if we had a table domainTable with fields ID, birthDate and we would like to see a count of all the birthDate in the first 5 days of 2012 per day and if the count is 0 to show that value, then this query could be run:
select allDays.aDay, count(dt.id) from (
select '2012-01-01' + interval value - 1 day aDay from seq
having aDay <= '2012-01-05'
) allDays
left join domainTable dt on allDays.aDay = dt.birthDate
group by allDays.aDay
This generates a derived table with all the requried days (notice I'm using the static data generation) and performs a left join against our domain table, so all days will be displayed, regardless of whether they have a matching values in our domain tables. Also note the count should be done on the field that will have null values as those are not counted.
Notes to be considered
1) The queries can be used to query other intervals (months, years) performing small changes to the code
2) Instead of hardcoding the dates you can query for min and max values from the domain tables like this:
select (select min(aDate) from domainTable) + interval value - 1 day aDay
from seq
having aDay <= (select max(aDate) from domainTable)
This would avoid generating more records than necessary.
Actually answering your question
I think you should have already figured out how to do what you want. Anyway, here are the steps so that others can benefit from them too. Firstly, create the integer table. Secondly, run this query:
select allDays.aDay, count(mt.id) aCount from (
select (select date(min(created_at)) from my_table) + interval value - 1 day aDay
from seq s
having aDay <= (select date(max(created_at)) from my_table)
) allDays
left join my_table mt on allDays.aDay = date(mt.created_at)
group by allDays.aDay
I guess created_at is a datetime and that's why you're concatenating that way. However, that happens to be the way MySQL natively stores dates, so I'm just grouping by the date field but casting the created_at to an actual date datatype. You can play with it using this fiddle.
And here is the solution generating data dynamically:
select allDays.aDay, count(mt.id) aCount from (
select #maxDate - interval a.a day aDay from
(select 0 as a union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) a, /*10 day range*/
(select #minDate := (select date(min(created_at)) from my_table),
#maxDate := (select date(max(created_at)) from my_table)) e
where #maxDate - interval a.a day between #minDate and #maxDate
) allDays
left join my_table mt on allDays.aDay = date(mt.created_at)
group by allDays.aDay
As you can see the skeleton of the query is the same as the previous one. The only thing that changes is how the derived table allDays is generated. Now, the way the derived table is generated is also slightly different from the one I added before. This is because in the example filddle I only needed a 10-day range. As you can see, it is more readable than adding a 1000 day range. Here is the fiddle for the dynamic solution so that you can play with it too.
Hope this helps!
The way to do it in one query:
SELECT COUNT(my_table.id) AS total,
CONCAT(YEAR(dates.ddate), '-', MONTH(dates.ddate), '-', DAY(dates.ddate))
FROM (
-- Creates "on the fly" 65536 days beginning from 2000-01-01 (179 years)
SELECT DATE_ADD("2000-01-01", INTERVAL (b1.b + b2.b + b3.b + b4.b + b5.b + b6.b + b7.b + b8.b + b9.b + b10.b + b11.b + b12.b + b13.b + b14.b + b15.b + b16.b) DAY) AS ddate FROM
(SELECT 0 AS b UNION SELECT 1) b1,
(SELECT 0 AS b UNION SELECT 2) b2,
(SELECT 0 AS b UNION SELECT 4) b3,
(SELECT 0 AS b UNION SELECT 8) b4,
(SELECT 0 AS b UNION SELECT 16) b5,
(SELECT 0 AS b UNION SELECT 32) b6,
(SELECT 0 AS b UNION SELECT 64) b7,
(SELECT 0 AS b UNION SELECT 128) b8,
(SELECT 0 AS b UNION SELECT 256) b9,
(SELECT 0 AS b UNION SELECT 512) b10,
(SELECT 0 AS b UNION SELECT 1024) b11,
(SELECT 0 AS b UNION SELECT 2048) b12,
(SELECT 0 AS b UNION SELECT 4096) b13,
(SELECT 0 AS b UNION SELECT 8192) b14,
(SELECT 0 AS b UNION SELECT 16384) b15,
(SELECT 0 AS b UNION SELECT 32768) b16
) dates
LEFT JOIN my_table ON dates.ddate = my_table.created_at
GROUP BY dates.ddate
ORDER BY dates.ddate
The next code is only necessary if you want to test and don't have the "my_table" indicated on the question:
create table `my_table` (
`id` int (11),
`created_at` date
);
insert into `my_table` (`id`, `created_at`) values('1','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('2','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('3','2000-01-01');
insert into `my_table` (`id`, `created_at`) values('4','2001-01-01');
insert into `my_table` (`id`, `created_at`) values('5','2100-06-06');
Testbed:
create table testbed (id integer, created_at date);
insert into testbed values
(1, '2012-04-01'),
(1, '2012-04-30'),
(2, '2012-04-02'),
(3, '2012-04-03'),
(3, '2012-04-04'),
(4, '2012-04-04');
I also use any_table, which I created artificially like this:
create table any_table (id integer);
insert into any_table values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
insert into any_table select * from any_table; -- repeat this insert 7-8 times
You can use any table in your database that is expected to have more rows then max(created_dt) - min(created_dt) range, at least 365 to cover a year.
Query:
SELECT concat(year(dr._date),'-',month(dr._date),'-',day(dr._date)),
-- or, instead of concat(), simply: dr._date
count(id)
FROM (
SELECT date_add(r.mindt, INTERVAL #dist day) _date,
#dist := #dist + 1 AS days_away
FROM any_table t
JOIN (SELECT min(created_at) mindt,
max(created_at) maxdt,
#dist := 0
FROM testbed) r
WHERE date_add(r.mindt, INTERVAL #dist day) <= r.maxdt) dr
LEFT JOIN testbed tb ON dr._date = tb.created_at
GROUP BY dr._date;

question about group by

in mysql how to write a sql like this, to get the amount of X > 20 and <20
select date, numberOfXMoreThan20,numberOfXLessThan20, otherValues
from table
group by (date, X>20 and X<20)
my way, but i think it's not good
select less20.id_date, a,b
from
(select id_date,count(Duree_Attente_Avant_Abandon) as a from cnav_reporting.contact_global where Duree_Attente_Avant_Abandon>20 group by id_date) as less20,
(select
id_date,count(Duree_Attente_Avant_Abandon)
as b from
cnav_reporting.contact_global where
Duree_Attente_Avant_Abandon<20 group
by id_date) as more20
where
less20.id_date=more20.id_date
thanks
SELECT
date,
SUM( IF(X > 20), 1, 0 ) AS overTwenty,
SUM( IF(X < 20), 1, 0 ) AS belowTwenty,
otherValue
FROM `table`
GROUP BY `date`, `otherValue`
You're probably looking for the COUNT aggregate:
SELECT COUNT(*) FROM table Where X > 20