GROUP rows when conditions are met - mysql

I hope you can help me out with the next problem. I am trying to figure out how to count many records exist when some conditions occur.
An example of my input is shown below:
fullname emailaddress1 telephone1
Juana Foster juana
Juana Foster juana 1933595322
Henzley 1841901633
Henzley henzley 1841901633
Hoyle hoyle 1584190699
Hoyle hoyle 1584190655
Aaron Jeans alpha2222 1816808600
Aaron Jeans alpha 1816808600
Erick Martin a1009 1816250211
Erick Martin martin 1565960141
Erick Martin a1009
Erick Martin martin 1565960141
I would like to group the occurrences which match the next conditions:
Fullname = Fullname and ((emailaddress = emailaddress and emailaddress != '') OR (telephone=telephone and telephone != '')).
In other words, I want to group in one row, all the rows which have the same fullname and, same email or address. Email and Address must be distinct to empty to be considered matchable.
The expected output would be:
fullname occurrences
Juana Foster 1
Henzley 1
Hoyle 1
Aaron Jeans 1
Erick Martin 2
I don't have any problem to solve the problem in a loop, but I have been thinking how to solve the problem in SQL, and tried GROUP BY and UNIONS, however and I haven't reached the solution. I am using mySQL.
UPDATE:
I provide a new example with more specific cases, in order to clarify the information:
For example for the next input
fullname emailaddress1 telephone1
Aaron Jeans alpha2222 1816808600
Aaron Jeans 1816808600
Aaron Jeans alpha2222 1816808600
Aaron Jeans alpha 1816808600
Erick Martin a1009 1816250211
Erick Martin a1009
Erick Martin 1816250211
Erick Martin martin 1565960141
Erick Martin martin 1565960141
Nacho Mason 1111111111
Nacho Mason 2222222222
In this case the output should be:
Aaron Jeans 1
Erick Martin 2
Nacho Mason 2
Aaron Jeans has 1 occurrence because his 4 records share the same telephone.
Erick Martin has 2 occurrences, the first one is for the next cases:
Erick Martin a1009 1816250211
Erick Martin a1009
Erick Martin 1816250211
Because the 3 records share same fullname and (same email(a1009) or same phone (1816250211), these three records are considered 1 occurrence.
The second occurrence for Erick Martin match with the next two records, because it has same fullname, same email and same phone.
Erick Martin martin 1565960141
Erick Martin martin 1565960141
Nacho Mason has 2 occurrences, because he has 2 different phones, and as his email is empty, and can't be considered equal.

I think this might be what you're trying to do.
first you want to do a SELECT of all occurrences where Email is not blank and telephone is not blank like
SELECT fullname,emailaddress1,telephone1,COUNT(*) AS occurrences
FROM T
WHERE emailaddress1 != ''
AND telephone1 != ''
GROUP BY fullname,emailaddress1,telephone1
then now you outer SELECT and MAX(occurrences) GROUP BY fullname
SELECT fullname, MAX(occurrences) as occurrences
FROM
(SELECT fullname,emailaddress1,telephone1,COUNT(*) AS occurrences
FROM T
WHERE emailaddress1 != ''
AND telephone1 != ''
GROUP BY fullname,emailaddress1,telephone1
)AS result
GROUP BY fullname
sqlfiddle
At least, I think that's what you're trying to get at.
curious though, what if you had another full name with 3 rows of identical email and phone number and the same full name with another 2 rows of identical email and phone number like this
Adam Smith adam 1234
Adam Smith adam 1234
Adam Smith adam 1234
Adam Smith smith 5678
Adam Smith smith 5678
Do you want to show as Adam Smith 3 or Adam Smith 2 or both?
the query above will you Adam Smith 3
UPDATE: I guess based on your desired output you want Adam Smith 3 because with your Erick Martin 2, he would've had Erick Martin 1, Erick Martin 2 from the inner select.

Related

Using tab separation in linux/vim

I am having trouble with the cut command in linux. So say the original data looks something like this:
4567 Harrison Joel Accountant
Mitchell Barbara Admin
3589 Olson Timothy Supervisor
4591 Moore Sarah Dept
Note that in row 2, it is missing the value for the first column (so only has three fields instead of four).
When I run the following command:
$ awk '{print $3,$4,$5,$8}' data.txt |column -t
I get:
4567 Harrison Joel Accountant
Mitchell Barbara Admin
3589 Olson Timothy Supervisor
4591 Moore Sarah Dept
What I would like is this:
4567 Harrison Joel Accountant
Mitchell Barbara Admin
3589 Olson Timothy Supervisor
4591 Moore Sarah Dept
In other words, I want the columns to stay consistent after I perform the tab separation. So it's clear which column corresponds to first name, last name, description, etc.
Is there a good way to do this?

How to select value if only all criteria are met?

First time Poster here so I appoligize about the formatting and am really novice at sql, but this has me stumped. That and I am using 2016 MS Access's SQL as well.
I have a table and I want to select only the names of the people who have fulfilled all the requirements.
Table Chore
ID Name Chore Done
1 Joe Sweep Yes
2 Joe Cook Yes
3 Joe Dust Yes
4 Bill Vacuum No
5 Bill Dust Yes
6 Carrie Bathroom Yes
7 John Cook No
8 John Beds No
9 John Laundry Yes
10 Mary Laundry No
11 Mary Sweep No
12 Cindy Car Yes
13 Cindy Garden Yes
In this case, only Joe, Carrie and Cindy's names should be returned because under their name, they finished all their chores.
Help please and thanks in advance!
You can use not in
select name from my_table
where name not in (select name from my_table where chore_done ='No');
You could check the value of max(done), like
select
name
from
my_table
group by name
having max(done) = -1
In Access, Yes/True is -1, No/False is 0, so max(done) is Yes

prediction on rowwise data or progressive data

I am working on employee attrition analysis with a table having rowwise data for a (employee like Id, name, Date_Join Date_Relieving Dept Role etc)
eID eName Joining Releiving Dept Married Experience
123 John Doe 10Oct15 12Oct16 HR No 12
234 Jen Doee 01jan16 -NA- HR No 11 (ie she is available)
I can run regression on this data to find the beta coefficient
eID eName Joining Releiving Dept Married Experience
123 John Doe 10Oct15 12Oct16 HR No 12
234 Jen Doee 01jan16 -NA- HR No 11
But I've seen other approach too.. where employee have multiple entries depending on their difference between joining date and current month or relieving month(say Employee A joined in Jan and Left in Dec so he'll have 12 entries updating corresponding columns like experience and marriage etc)
eID eName Dept Married Experience
123 John Doe HR No 0
123 John Doe HR No 1
123 John Doe HR Yes 2
123 John Doe HR Yes 3
can someone tell what differentiate two approaches.. and what is the outcome of this second approach.

MySQL update the column with conditions

I am trying to update (middle name - mname) a database table based on a certain condition. SQLfiddle. http://www.sqlfiddle.com/#!9/3c022/2
I would like to know HENRY {null} FORD is belonging to one of other HENRY {A,B} FORD based on the coauthors and update the table.
The author with null middle name is updated with the middle name of the author with the same first and last name and with whom he has more number of common co-authors.
For example, based on the data the results is:
HENRY FORD HENRY FORD ---> this should be updated 'B' due to more common authors
HENRY A FORD HENRY A FORD
HENRY B FORD HENRY B FORD
However,
JACK SMITH JACK SMITH ---> this shouldn't be updated due to no common authors
JACK A SMITH JACK A SMITH
JACK B SMITH JACK B SMITH
Any suggestions are appreciate.

SQL Server Reporting Services: Cell Value Not Repeating when using table/matrix wizard

I have a dataset which shows a worker’s scores on various skills using four test types along with their supervisor and the director above the supervisor. To save space, the dataset example below is for just one worker. This is what I start with:
Director Supervisor Worker Test Skill Score
Doris Smith Jane Awe Lorina Marc Overall 1: Identifying Support 1
Doris Smith Jane Awe Lorina Marc Test A 1: Identifying Support 4
Doris Smith Jane Awe Lorina Marc Test B 1: Identifying Support 1
Doris Smith Jane Awe Lorina Marc Test C 1: Identifying Support 5
Doris Smith Jane Awe Lorina Marc Overall 2: Tracking the Sequence 3
Doris Smith Jane Awe Lorina Marc Test A 2: Tracking the Sequence 2
Doris Smith Jane Awe Lorina Marc Test B 2: Tracking the Sequence 5
Doris Smith Jane Awe Lorina Marc Test C 2: Tracking the Sequence 5
Doris Smith Jane Awe Lorina Marc Overall 3: Searching for Exceptions 3
Doris Smith Jane Awe Lorina Marc Test A 3: Searching for Exceptions 3
Doris Smith Jane Awe Lorina Marc Test B 3: Searching for Exceptions 3
Doris Smith Jane Awe Lorina Marc Test C 3: Searching for Exceptions 3
I feed this into SQL Server Reporting Services using either table wizard or matrix wizard. I have to move Skill column over the Score column so the skills are now columns.
Row Groups: Director, Supervisor, Worker Test
Column Group: Skill
Value: Score
I get this:
Director Suprviser Worker Test 1: Identifying Support 2: Tracking the Sequence 3: Searching for Exceptions
Doris Smith Jane Awe Lorina Marc Overal 1 3 3
Test A 4 2 3
Test B 1 5 3
Test C 5 5 3
Al Vega Overal 5 5 3
Test A 3 3 2
Test B 2 4 4
Test C 5 2 5
David Osorio Overal 1 1 3
Test A 2 4 2
Test B 4 5 1
Test C 2 3 2
Katie Lewis Ally McIntosh Overal 1 2 3
Test A 5 3 4
Test B 3 3 2
Test C 1 3 2
Christina Gooderd Overal 2 2 1
Test A 4 4 1
Test B 5 5 4
Test C 2 5 4
I need to have a value in each cell, so the values need to repeat for each group. So, what I want should look like this:
Director Suprviser Worker Test 1: Identifying Support 2: Tracking the Sequence 3: Searching for Exceptions
Doris Smith Jane Awe Lorina Marc Overal 1 3 3
Doris Smith Jane Awe Lorina Marc Test A 4 2 3
Doris Smith Jane Awe Lorina Marc Test B 1 5 3
Doris Smith Jane Awe Lorina Marc Test C 5 5 3
Doris Smith Jane Awe Al Vega Overal 5 5 3
Doris Smith Jane Awe Al Vega Test A 3 3 2
Doris Smith Jane Awe Al Vega Test B 2 4 4
Doris Smith Jane Awe Al Vega Test C 5 2 5
Doris Smith Jane Awe David Osorio Overal 1 1 3
Doris Smith Jane Awe David Osorio Test A 2 4 2
Doris Smith Jane Awe David Osorio Test B 4 5 1
Doris Smith Jane Awe David Osorio Test C 2 3 2
Doris Smith Katie Lewis Ally McIntosh Overal 1 2 3
Doris Smith Katie Lewis Ally McIntosh Test A 5 3 4
Doris Smith Katie Lewis Ally McIntosh Test B 3 3 2
Doris Smith Katie Lewis Ally McIntosh Test C 1 3 2
Doris Smith Katie Lewis Christina Gooderd Overal 2 2 1
Doris Smith Katie Lewis Christina Gooderd Test A 4 4 1
Doris Smith Katie Lewis Christina Gooderd Test B 5 5 4
Doris Smith Katie Lewis Christina Gooderd Test C 2 5 4
What do I fix/change/modify so I can have a value in each cell filled in?
As you've seen, SSRS treats row header cells differently, i.e. stretching them over any child groups.
Your report probably looks a bit like this:
I've highlighted the dotted lines that separate the report areas. This example report has the same issue as your example:
To get around this, the various row group values need to be moved from the row header area to the main report area.
First, delete the left four rows - when prompted choose Delete columns only.
You should only have the Skill column remaining.
Right click and Insert Column -> Outside Group - Left.
Keep adding columns using Insert Column - Left.
Once you have enough new columns, add the various grouping values. The report should look something like this:
Note that there are now no dotted lines between the Skill column and the Test column.
Now the group values are repeated for each row as required:
I had a similar issue, but in a matrix, needed to repeat prior value at the details group level when there is only one rowgroup. To do this, I used custom code.
For example, I have a column in my matrix named "Clusters". The row group field is a simple date field. In my data set, I have Date, Clusters, ResourceType as fields. I have different resource type values and so I can see the dates that clusters associated with a given resource type have data. My challenge came from the fact that my data set is sparse - for a given ResourceType value, not all dates have values. In my matrix, I ended up with rows that have blank values in cells that there is no corresponding row in the underlying data set. The use of Previous doesn't work well (lots of other examples).
To solve the problem, I used custom code and a hashtable as follows:
Private LastSeenValue as System.Collections.HashTable = New System.Collections.HashTable
' GetRowValue is used to fill in blank cells in the dynamic matrix with the nearest value above them in the same column.
' The data can contain multiple sets according to ResourceType field, and not all dates are present in all of these sets.
' This has the effect on the screen of having blank cells for each given date where there is no corresponding resource type.
' The requirement this function enables is that it allows filling in the blank cells with the nearest real value above.
' The SSRS PreviousRow function does not do this.
' Author: DanRo, 1/8/2016
'
' Some behavior notes for developers who follow and seek to alter the function.
' The prototype for the GetRowValue function performs "null to zero" coercion as a result of return type. This was done purposefully.
' The Object type for the FieldVal inpute parameter allows null rows to be processed with the same type of coercion
' on the incoming side.
' This is report specific logic that takes advantage of the fact that all of data requiring this function is numeric.
Function GetRowValue(ByVal FieldName as String, ByVal FieldVal as Object, ByVal ResourceType as String) As Double
' TheKey variable allows this function to be used for any number of columns for any number of resource types.
Dim TheKey as String
TheKey = "[" & FieldName & "][" & ResourceType & "]"
' See if a value was passed. In SSRS, when the cell tries to render in the matrix, there
' is no underlying data row for the column region, so a null (Nothing) gets passed by the runtime environment.
If FieldVal is Nothing Then
' Coercion on the return type happens when the HashTable Item property returns Null if the lookup fails.
' If the lookup succeeds, the last value encountered (top to bottom) will be present.
Return LastSeenValue(TheKey)
End If
' now we know that a value was passed
If (Not LastSeenValue.ContainsKey(TheKey)) Then
LastSeenValue.Add(TheKey, FieldVal)
Return FieldVal
End If
' A value was passed and we have an old value. Update it
LastSeenValue(TheKey) = FieldVal
Return FieldVal
End Function
Finally, in the Matrix Cell for Clusters column I set the expression to:
=Code.GetRowValue("Clusters", Fields!Clusters.Value, ReportItems!ResourceType.Value)
And that solved the problem. An added benefit is now my rows at the start of the table that were blank now correctly contain zeroes (which was correct for me). One tricky thing was Typing the FieldVal argument to Object instead of Integer (which was my Clusters data type) because you cannot check for existence on value types. Another was referring to the ReportItem!ResourceType.Value instead of Fields!ResourceType.Value because ResourceType was my column grouping. Finally, the return type choice for the function will impact whether your data has decimal points - so the choice of Double lets you handle both integers and real numbers. This would have to be modified to make this function handle strings correctly.
Before, Original Experience:
After, Values are now repeating where previously blank: