COUNT(DISTINCT Column1) for Only One Column of Multiple Columns - mysql

Admittedly, I've seen this question on here a few times -- but all the answers seem to solve their problems by using a GROUP BY or a WHERE, so I was curious how to get around this if your query is getting too large where that wouldn't work.
For example, I'm writing something that uses two left joins to my main table, bringing the overlaps over into the results. As I'm still relatively new to SQL, I'm not exactly sure what's doing it -- but I know that I'm getting an extra thousand or so people when I run the counts; I'm imagining this is the case because there are duplicate IDs for each person (purposefully) in the two tables I'm joining.
All my queries populating the results I want to get for this project is using COUNT() or SUM() pending on the column. Is there a way that I can use DISTINCT to make only one column at a time treat my IDs only as one? Based on what I've done so far, I've noticed that whenever you set DISTINCT it works beyond just the one column you're trying to attribute it to. Any suggestions? It'd be very appreciated!
Here's an example of my code so far that includes duplicate IDs:
SELECT
targeted.person AS "Person",
targeted.work AS "Occu",
(COUNT(targeted.id)) AS "Targeted",
(COALESCE(SUM(targeted.signed="Yes"),0)) AS "Signed",
(COALESCE(SUM(targeted.signed="Yes"),0))/COUNT(targeted.id)*100 AS "Signed %",
(COALESCE(COUNT(question.questionid="96766"),0)) AS "Donated",
(COALESCE(COUNT(question.questionid="96766"),0))/(COALESCE(SUM(targeted.signed="Yes"),0))*100 AS "Donated %",
(COALESCE(SUM(question.surveyresponsename),0)) AS "Donation $",
ROUND((COALESCE(SUM(question.surveyresponsename),0))/(COALESCE(COUNT(question.questionid="96766"),0)),2) AS "Avg Donation",
(CASE WHEN (left(targeted.datesigned,1)="5" AND right(question.datecontacted,2)="13") THEN (COALESCE(SUM(targeted.signed="Yes"),0)) ELSE 0 END) AS "Signed This Month",
(CASE WHEN (left(question.datecontacted,1)="5" AND right(question.datecontacted,2)="13") THEN (COALESCE(COUNT(question.questionid="96766"),0)) ELSE 0 END) AS "Donated This Month",
(CASE WHEN question.ContactType="House Visit" THEN COUNT(question.id) ELSE 0 END) AS "At Home",
(CASE WHEN question.ContactType="Worksite" THEN COUNT(question.id) ELSE 0 END) AS "At Work",
(CASE WHEN (left(events.day,1)="5" AND right(events.day,2)="13") THEN COUNT(events.id) ELSE 0 END) AS "Events This Month"
FROM targeted
LEFT JOIN question ON targeted.id=question.id
LEFT JOIN events ON targeted.id=events.id
GROUP BY targeted.person, targeted.work;
Here are the basics of the table structures:
Targeted:
Field Type Null Key Default
ID bigint(11) YES Primary NO
Work varchar(255) YES NULL
Person varchar(255) YES NULL
Signed varchar(255) YES NULL
DateSigned varchar(255) YES NULL
Question:
Field Type Null Key Default
ID bigint(11) YES Primary NO
QuestionID int(11) YES NULL
SurveyResponseId int(11) YES NULL
SurveyResponseName varchar(255) YES NULL
DateContacted varchar(255) YES NULL
ContactType varchar(255) YES NULL
Events:
Field Type Null Key Default
ID bigint(11) NO Primary NO
Day varchar(255) YES NULL
EventType varchar(255) YES NULL
And the results would are intended to look something like:
Person Occu Targeted Signed Signed % ...
1 Job 1 1413 765 54.14 ...
2 Job 2 111 80 72.072 ...
2 Job 3 931 715 76.7991 ...
3 Job 4 2720 1435 52.7573 ...
4 Job 5 401 218 54.364 ...
Thanks for the help!

The proper way to solve this problem is by doing the aggregation in the subqueries. To aggregate questions and events to the right level, you need to join in the targeted table. Then, you will not need the aggregation at the outermost level:
select . . .
from (select t.name, t.work,
count(t.id) as Targeted,
. . .
from targets t
group by t.name, t.work
) t left join
(select t.name, t.work,
sum(case when question_id = 96766 then 1 else 0 end) as Donated,
. . .
from question q join
targeted t
on t.id = t.id
group by t.name, t.work
) q
on t.name = q.name and t.work = q.work left join
(select t.name, t.work,
sum(CASE WHEN (left(events.day,1)="5" AND right(events.day,2)="13") THEN 1 ELSE 0 END
) AS "Events This Month"
from events e join
targeted t
on e.id = t.id
) e
on e.name = t.name and e.work = t.work

Related

Replace duplicate records set null ot empty in the column mysql

this is mysql query i want to set duplicate value as null or empty
SELECT
som.sale_invoice_id
,CONCAT(cm.first_name,cm.last_name) AS customername
,product_master.product_name
FROM
sale_invoice_master as som
LEFT JOIN customer_master as cm
ON som.customer_id = cm.customer_id
LEFT JOIN product_sale_item_master as soi
ON som.sale_invoice_id = soi.sale_invoice_id
LEFT JOIN product_master
ON soi.product_id =product_master.product_id
LEFT JOIN vehicle_master
ON soi.vehicle_id = vehicle_master.id
This is mycurrent result
sale_invoice_id
customername
product_name
1
JummakhanDilawarkhan
Apollo TYRE 16.9-28 12PR KRISHAK GOLD -D
1
JummakhanDilawarkhan
APOLLO TUBE 7.50x16
2
PareshKhanchandani
Apollo TL 155R13 AMAZER XL 8PR
i want this:
sale_invoice_id
customername
product_name
1
JummakhanDilawarkhan
Apollo TYRE 16.9-28 12PR KRISHAK GOLD -D
APOLLO TUBE 7.50x16
2
PareshKhanchandani
Apollo TL 155R13
second duplicate row should be null or empty
I don't have your data to work out with.
So what I have done is taken your current result as my primary data and used a query to produce your output:
I have mentioned it all in db-fiddle
Sure enough, you can modify this to use Window Functions as you are using MariaDB 10.4.
I have just mentioned a possible solution.
/*To create table*/
CREATE TABLE `sales_invoice_data` (
`sales_invoice_id` int(11) DEFAULT NULL,
`customername` varchar(50) DEFAULT NULL,
`product_name` varchar(50) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*to insert data*/
insert into `sales_invoice_data` (`sales_invoice_id`, `customername`, `product_name`) values('1','JummakhanDilawarkhan','Apollo TYRE 16.9-28 12PR KRISHAK GOLD -D');
insert into `sales_invoice_data` (`sales_invoice_id`, `customername`, `product_name`) values('1','JummakhanDilawarkhan','APOLLO TUBE 7.50x16');
insert into `sales_invoice_data` (`sales_invoice_id`, `customername`, `product_name`) values('2','PareshKhanchandani','Apollo TL 155R13');
/*to retrieve your OP*/
SELECT
sales_invoice_id,
customerName,
product_name
FROM
(SELECT
(CASE WHEN sales_invoice_id=#running_sales_id THEN '' ELSE sales_invoice_id END) sales_invoice_id,
(CASE WHEN customername=#running_customer THEN '' ELSE customername END) customerName,
product_name,
(CASE WHEN #running_sales_id=0 THEN #running_sales_id:=sales_invoice_id ELSE #running_sales_id:=#running_sales_id END) ,
(CASE WHEN #running_customer='' THEN #running_customer:=customername ELSE #running_customer:=#running_customer END) ,
#running_sales_id:=a.sales_invoice_id,
#running_customer:=customername
FROM
(SELECT
s.sales_invoice_id ,
s.customername,
s.product_name,
#running_sales_id:=0,
#running_customer:=''
FROM
`sales_invoice_data` s) a ) final
;
One possible solution would involve using MySQL's LAG() function. Here are the docs for it:
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_lag

MySQL: Using case when to sum/average values shared and not shared?

I have a query I am trying to expand on and have hit a roadblock. What I want do is return rows that contain counts, sums and averages for data provided around attributes that are shared and not shared.
I have it pretty close but returning nulls and 0's where I need to see data.
Let me explain...but please let me know if I need to clarify.
First here is my table:
CREATE TABLE `fruits` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`fruit` varchar(11) DEFAULT NULL,
`fruit_attribute` varchar(11) DEFAULT '',
`submissions` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8;
INSERT INTO `fruits` (`id`, `fruit`, `fruit_attribute`, `submissions`)
VALUES
(1,'Orange','tough peel',59),
(2,'Lemon','tough peel',70),
(3,'Orange','citrus',100),
(4,'Orange','juice',90),
(5,'Lemon','juice',75),
(6,'Lemon','tart',35),
(7,'Lemon','citurs',65),
(8,'Orange','breakfast',110),
(9,'Lemon','lemonaid',120),
(10,'Orange','florida',50);
Next, my query:
SELECT ft.fruit,
COUNT(distinct ft1.fruit_attribute) As att_shared_lemon,
SUM(CASE WHEN ft1.fruit_attribute IS NULL THEN 1 ELSE 0 END) As not_shared_lemon,
SUM(CASE WHEN ft1.fruit_attribute IS NOT NULL THEN ft.submissions END) as sum_shared_submissions,
SUM(CASE WHEN ft1.fruit_attribute IS NULL THEN ft.submissions END) as sum_notshared_submissions
FROM fruits ft LEFT JOIN
fruits ft1
ON ft.fruit_attribute = ft1.fruit_attribute and ft1.fruit = 'Orange'
GROUP BY ft.fruit
having fruit='Orange'
ORDER BY att_shared_lemon desc;
Here is an SQL Fiddle of the above:
http://sqlfiddle.com/#!9/86e863/12
Desired output would not include the 0 and Null value seen in below:
+--------+------------------+------------------+------------------------+---------------------------+
| fruit | attr_shared_orange | attr_not_shared_orange| sum_shared_submissions | sum_notshared_submissions |
+--------+------------------+------------------+------------------------+---------------------------+
| Orange | 5 | 0 | 409 | (null) |
+--------+------------------+------------------+------------------------+---------------------------+
Instead there would be the total amount of attributes that were not shared by 'Orange' and the sum of submissions for the attributes not shared with 'Orange'
I am running mysql 5.6 on a mac Yosemite.
Ideally I would like to achieve this without a subselect but if it's required and have no option then I would like to understand more about that.
I think there's a minor issue with your join logic here, you want to sum links between fruits but your query ensures that you're always joining oranges to oranges and as such there will never be attributes which aren't shared:
ON ft.fruit_attribute = ft1.fruit_attribute and ft1.fruit = 'Orange'
Try this query instead:
SELECT ft.fruit,
COUNT(distinct ft1.fruit_attribute) As att_shared_lemon,
SUM(CASE WHEN ft1.fruit_attribute IS NULL THEN 1 ELSE 0 END) As not_shared_lemon,
SUM(CASE WHEN ft1.fruit_attribute IS NOT NULL THEN ft.submissions END) as sum_shared_submissions,
SUM(CASE WHEN ft1.fruit_attribute IS NULL THEN ft.submissions END) as sum_notshared_submissions
FROM fruits ft
LEFT JOIN fruits ft1
ON ft.fruit_attribute = ft1.fruit_attribute and ft.fruit = 'Orange'
AND ft1.fruit != ft.fruit
WHERE ft.fruit='Orange'
GROUP BY ft.fruit
ORDER BY att_shared_lemon desc;

Making a score table (hard)

I need your help.
I have a database with a schema like this:
teams:
id
name
fundation_date
matchs:
id
date
id_local_team (foreign key to teams)
id_visit_team (foreign key to teams)
winner ('local', 'visit', 'draw')
players:
id
name
born
position ('arq','def','med','del')
id_team
goals:
id
id_match
id_player
time
and I need to do (among other things) this:
Show by team: Played matchs, winned matchs and drawn matchs (in different columns)
I have something like this:
SELECT t.name,
SUM(CASE t.id WHEN m.id_local_team THEN 1 WHEN m.id_visit_team THEN 1 ELSE 0 END) AS played,
SUM(CASE (CASE m.winner
WHEN 'local' THEN m.id_local_team
WHEN 'visit' THEN m.id_visit_team
ELSE NULL END)
WHEN t.id THEN 1
ELSE 0 END) AS winned,
SUM(CASE m.winner WHEN 'draw' THEN 1 ELSE 0 END) AS drawn
FROM teams AS t
INNER JOIN matchs AS m
ON (t.id = m.id_local_team OR t.id = m.id_visit_team)
GROUP BY t.name;
But that is giving me wrong results. Like, there are 8 matchs total, and the (4) teams are returning 12, 9, or 10 matchs winned (total of 43 matchs), a total of 16 winned matchs and a total of 10 drawn matchs. All above of 8.
What is happening??
In the full query I also have two more inner joins:
INNER JOIN players AS p
ON (p.id_team = t.id)
INNER JOIN goals AS g
ON (p.id = g.id_jugador)
I don't think it has nothing to do with these last ones. I know (think?) that i didn't do the matchs join correctly.
I appreciate if you have made it this far into the post!
The real schema is in spanish actually in Spanish (sorry for that guys) but here is all the magic:
SCHEMA
| equipos | CREATE TABLE `equipos` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`nombre` varchar(180) NOT NULL,
`f_fundacion` date DEFAULT NULL,
PRIMARY KEY (`id`)
)
| partidos | CREATE TABLE `partidos` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fecha` datetime DEFAULT NULL,
`id_equipo_local` int(11) DEFAULT NULL,
`id_equipo_visitante` int(11) DEFAULT NULL,
`ganador` enum('local','visitante','empate') DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_partidos_equipos_1` (`id_equipo_local`),
KEY `fk_partidos_equipos_2` (`id_equipo_visitante`),
CONSTRAINT `fk_partidos_equipos_1` FOREIGN KEY (`id_equipo_local`) REFERENCES `equipos` (`id`),
CONSTRAINT `fk_partidos_equipos_2` FOREIGN KEY (`id_equipo_visitante`) REFERENCES `equipos` (`id`)
)
QUERY
SELECT e.nombre,
SUM(CASE e.id WHEN p.id_equipo_visitante THEN 1 WHEN p.id_equipo_local THEN 1 ELSE 0 END) AS jugados,
SUM(CASE (CASE ganador
WHEN 'local' THEN p.id_equipo_local
WHEN 'visitante' THEN p.id_equipo_visitante
ELSE NULL END)
WHEN e.id THEN 1
ELSE 0 END) AS ganados,
SUM(CASE ganador WHEN 'empate' THEN 1 ELSE 0 END) AS empatados,
SUM(CASE (CASE ganador
WHEN 'local' THEN p.id_equipo_local
WHEN 'visitante' THEN p.id_equipo_visitante
ELSE NULL END)
WHEN e.id THEN 1
ELSE 0 END) * 3 + SUM(CASE ganador WHEN 'empate' THEN 1 ELSE 0 END) AS puntos,
COUNT(DISTINCT g.id) AS goles_a_favor
FROM equipos AS e
INNER JOIN partidos AS p
ON (e.id = p.id_equipo_visitante OR e.id = p.id_equipo_local)
INNER JOIN jugadores AS j
ON (j.id_equipo = e.id)
INNER JOIN goles AS g
ON (j.id = g.id_jugador)
GROUP BY e.nombre;
RESULTS
+----------------------------------+---------+---------+-----------+--------+---------------+
| nombre | jugados | ganados | empatados | puntos | goles_a_favor |
+----------------------------------+---------+---------+-----------+--------+---------------+
| Club Atlético All Boys | 12 | 6 | 3 | 21 | 3 |
| Club Atlético Chacarita Juniors | 12 | 3 | 0 | 9 | 3 |
| Club Atlético Ferrocarril Oeste | 9 | 3 | 3 | 12 | 3 |
| Club Atlético Tucumán | 10 | 4 | 4 | 16 | 2 |
+----------------------------------+---------+---------+-----------+--------+---------------+
You say that the full query contains joins to each goal made in a given match. This would lead to a situation where the each match is counted N times where N is the number of goals in the match. So for a 0-0 draw the match won't be counted at all, for a 1-0 match the match is counted once for the home team and zero times for the visiting team and 1-2 once for the home team and twice for the visiting team.
To check the number of goals in favor you should first calculate the the goal balance per match using a subquery or a view and then join with that. Then you won't have to problem caused by joining with the player-table.
It does look like the Matchs JOIN is a problem. So you are matching every match at least twice, once for the home team and once for the visiting team, but that doesn't quite explain 43 matches being displayed. Would it be possible to maybe see the full set of results? Sometimes SQL stuff can get touch to debug without access to the tables themselves, but at least seeing the results and what are duplicated might help.
You may want to join only on the winning teams - that should cut half of it out. Actually, since you seem to be trying to get match information, I would SELECT data FROM matches rather than teams. Selecting FROM the table that will limit your total selected rows is always your best bet, then JOIN from there.

Validating presence of value(s) in a (sub)table and return a "boolean" result

I want to create a query in MySQL, on an order table and verify if it has a booking id, if it does not have a booking_id it should available on all relations in the invoice table.
I want the value returned to be a boolean in a single field.
Taken the example given, in
Case of id #1 I expect an immediate true, because it's available
Case of id #2 I expect an "delayed" false from the invoice table as not all related invoices have an booking_id, it should only return true if invoice id #3 actually has an booking id, meaning all invoices have an booking_id when the order does not.
I've tried several ways but still failed and don't even know what the best way to tackle this is.
Thanks for your input in advance!
Table order:
|----+------------+
| id | booking_id |
|----+------------+
| 1 | 123 |
| 2 | NULL |
|----+------------+
Table invoice:
+----+----------+------------+
| id | order_id | booking_id |
+----+----------+------------+
| 1 | 1 | 123 |
| 2 | 2 | 124 |
| 3 | 2 | NULL |
+----+----------+------------+
Schema
CREATE TABLE IF NOT EXISTS `invoice` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`order_id` int(11) NOT NULL,
`booking_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
)
CREATE TABLE IF NOT EXISTS `order` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`booking_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
If I understand you correctly, this is the base query for your request:
SELECT
O.id
, SUM(CASE WHEN I.booking_id IS NOT NULL THEN 1 ELSE 0 END) AS booked_count
, COUNT(1) AS total_count
, CASE WHEN SUM(CASE WHEN I.booking_id IS NOT NULL THEN 1 ELSE 0 END) = COUNT(1) THEN 1 ELSE 0 END AS has_all_bookings
FROM
`order` O
LEFT JOIN invoice I
ON O.id = I.order_id
GROUP BY
O.id
If you want to check if there is no record in the invoice table add the COUNT(1) to the last CASE statement as an additional condition (COUNT(1) = 0)
Fiddle Demo
I have not understood how the logic works out when the order is booked but some of the invoices are not. I'll presume either is good for a true value (OR logic). I'd avoid COUNT and GROUP BY and go for a SUBSELECT, which works fine in MySQL (I'm using and old 5.1.73-1 version).
This query gives you both values in distinct columns:
SELECT o.*
, (booking_id IS NOT NULL) AS order_booked
, (NOT EXISTS (SELECT id FROM `invoice` WHERE order_id=o.id AND booking_id IS NULL)) AS invoices_all_booked
FROM `order` o
Of course you can combine the values:
SELECT o.*
, (booking_id IS NOT NULL OR NOT EXISTS (SELECT id FROM `invoice` WHERE order_id=o.id AND booking_id IS NULL)) AS booked
FROM `order` o
Here you go, create a view that does it
create view booked_view as
select `order`.id as order_id
,
case when booking_id > 0 then true
when exists (SELECT id FROM invoice WHERE order_id=`order`.id AND invoice.booking_id IS NULL) then true
else false
end as booked
from `order` ;
Then just join your view to the order table and you will have your boolean column 'booked'
select o.id, booked from `order` o
join booked_view on (o.id = booked_view.order_id)

Join Distinct Id on non-distinct id (MySql)

I'm trying to join distinct ID's from a subquery in a FROM onto a table which has the same ID's, but non-distinct as they are repeated to create a whole entity. How can one do this? All of my tries are continuously amounting to single ID's in the non-distinct-id-table.
For example:
Table 1
ID val_string val_int val_datetime
1 null 3435 null
1 bla null null
1 null null 2013-08-27
2 null 428 null
2 blob null null
2 null null 2013-08-30
etc. etc. etc.
Virtual "v_table" from SubQuery
ID
1
2
Now, if I create the query along the lines of:
SELECT t.ID, t.val_string, t.val_int, t.val_datetime
FROM table1 AS t
JOIN (subquery) AS v_table
ON t.ID = v_table.ID
I get the result:
Result Table:
ID val_string val_int val_datetime
1 null 3436 null
2 null 428 null
What I'd like is to see the whole of Table 1 based on this example. (Actual query has some more parameters, but this is the issue I'm stuck on).
How would I go about making sure that I get everything from Table 1 where the ID's match the ID's from a virtual table?
SELECT t.ID, t.val_string, t.val_int, t.val_datetime
FROM table1 AS t
LEFT JOIN (subquery) AS v_table
ON t.ID = v_table.ID
Sample fiddle