N or more continuous year range - mysql

I have to create a report using MySql DB where more than 4 tables are involved. I have one table (S1) with S1_ID and S1_Year_Range (strings like 2001-2002) and another table (S2) with S2_ID(PK), S2_Customer_ID, S1_ID (FK) and other fields for other conditions that can appear in Where clause of my query. There can be more than one row in S2 with the same S2_Customer_ID but different S1_ID. My query is to create a report using VB.net and ask users to enter two values; one number for how many continuous years or bigger (like >= 5 years), and a year range value (like 2011-2012) which is the highest value in the list for all customers.
My report lists customer names (by joining the above query with another table), customer rank and all year range values (highest at the bottom) for that customer in one column for each customer. Any help for this query would be appreciated.
Data and results could be like the following:
S1:
(S1_ID....S1_Year_Range)
(1......2000-2001)
(2......2001-2002)
(3......2002-2003)
(4......2003-2004)
(5......2004-2005)
etc
S2:
(S2_ID.....S2_Customer_ID.....S1_ID)
(1....1....1)
(2....1....2)
(3....1....3)
(4....2....2)
(5....2....3)
(6....2....5)
(7....3....2)
(8....3....3)
(9....3....4)
(10...3....5)
(11...4....3)
(12...4....4)
(13...4....5)
etc
when number 2 and year range (2003-2004) is entered by the user, the result should be the following:
customer 3 with 3 year range values (2003-2004, 2002-2003, and 2001-2002) and customer 4 with 2 year range values (2003-2004 and 2002-2003):
cname3
2001-2002
2002-2003
2003-2004
cname4
2002-2003
2003-2004
I hope you can see the columns of the report correctly.

I finally created a complex query to solve my problem. In the following query, I encoded the user year range value as '2010-2011' and number of continuous years as 14. Also a tiny difference with the question is the table names; table CSP here is the same as table S2 in my question but field names are the same as those in my question.
SELECT CSYWFY.S2_Customer_ID, COUNT(CSYWFY.S2_Customer_ID)
FROM (SELECT S1F.S1_Year_Range, S2.S2_Customer_ID , COUNT(S1F.S1_Year_Range) FROM CSP as S2 INNER JOIN S1 as S1F ON S2.S1_ID = S1F.S1_ID WHERE '2010-2011' IN (SELECT S1N.S1_Year_Range FROM CSP as S2N INNER JOIN S1 as S1N ON S2N.S1_ID = S1N.S1_ID WHERE S2N.S2_Customer_ID = S2.S2_Customer_ID ) GROUP BY S2.S2_Customer_ID ASC , S1F.S1_Year_Range DESC ) CSYWFY
GROUP BY CSYWFY.S2_Customer_ID
HAVING COUNT(CSYWFY.S2_Customer_ID) > 14
HTH

Related

SQL Capture duplicate records across two DIFFERENT columns

I am writing an Exception Catching Page using MySQL for catching duplicate billing entries the following scenario.
Items details are entered in a table which has the following two columns (among others).
ItemCode VARCHAR(50), BillEntryDate DATE
It often happens that same item's bill is entered multiple times, but over a period of few days. Like,
"Football","2019-01-02"
"Basketball","2019-01-02"
...
...
"Football","2019-01-05"
"Rugby","2019-01-05"
...
"Handball","2019-01-05"
"Rugby","2019-01-07"
"Rugby","2019-01-10"
In the above example, the item Football is billed twice - first on 2Jan and again on 5Jan. Similarly, item Rugby is billed thrice on 5,7,10Jan.
I am looking to write simple SQL which can pickup each item [say, using distinct(ItemCode) clause], and then display all the records which are duplicates over a period of 30 days.
In the above case, the expected output should be the following 5 records:
"Football","2019-01-02"
"Football","2019-01-05"
"Rugby","2019-01-05"
"Rugby","2019-01-07"
"Rugby","2019-01-10"
I am trying to run the following SQL:
select * from tablen a, tablen b, where a.ItemCode=b.ItemCode and a.BillEntryDate = b.BillEntryDate+30;
However, this seems to be highly inefficient as it is running for long without displaying any records.
Is there any possibility for getting a less complex and faster method?
I did explore existing topics (like How do I find duplicates across multiple columns?), but it is catching duplicates where BOTH columns have same value. My requirement is one column same value, and second column varying over a month-long date range.
You can use:
select t.*
from tablen t
where exists (select 1
from tablen t2
where t2.ItemCode = t.ItemCode and
t2.BillEntryDate <> t.BillEntryDate and
t2.BillEntryDate >= t1.BillEntryDate - interval 30 day and t2.BillEntryDate <= t1.BillEntryDate + interval 30 day
);
This will pick up both duplicates in the pair.
For performance, you want an index on (ItemCode, BillEntryDate).
With EXISTS:
select ItemCode, BillEntryDate
from tablename t
where exists (
select 1 from tablename
where
ItemCode = t.ItemCode
and
abs(datediff(BillEntryDate, t.BillEntryDate)) between 1 and 30
)

Gathering data from three separate tables, sql

I have three separate tables that represent student attendance for three weeks, respectively. I want to be able to generate four columns that break down the attendance by week for each of the students. If a student was present multiple times a week, the number of times present should be added. Also, if a student was present in one week and not the next, it would get 1 for the month present (assuming it was only present once) and and 0 for the one absent. I have tried to multiple variations of count() and joins but to no avail. Any help would be greatly appreciated. The following is a truncated fiddle:
http://www.sqlfiddle.com/#!9/b847a
Here is a sample of what I am trying to achive:
Name | CurrWeek | LastWeek | TwoWkAgo
Paula | 0 | 2 | 3
Rather than three tables you should have only one with a column for the week. So naturally one solution for your request is to build it on-the-fly with UNION ALL:
select
name,
sum(week = 'currentWeek') as currentWeek,
sum(week = 'lastWeek') as lastWeek,
sum(week = 'thirdWeek') as thirdWeek
from
(
select 'currentWeek' as week, name from currentWeek
union all
select 'lastWeek' as week, name from lastWeek
union all
select 'thirdWeek' as week, name from thirdWeek
) all_weeks
group by name
order by name;
(If you want to join the three tables instead, you'd need full outer joins, which MySQL does not support, if I remember correctly. Anyway, my advice is to change the data model.)
You can try this query:
select currweek.name, currweek.att, lastweek.att, twoWkAgo.att from
(select name, count(attendance) as att from currentWeekTable group by name) currweek,
(select name, count(attendance) as att from lastWeekTable group by name) lastweek,
(select name, count(attendance) as att from twoWeekTable group by name) twoWkAgo
where twoWkAgo.name=currWeek.name and twoWkAgo.name=lastweek.name;
Assuming your 3 attendance tables contain name as common field.

Relational Database Logic

I'm fairly new to php / mysql programming and I'm having a hard time figuring out the logic for a relational database that I'm trying to build. Here's the problem:
I have different leaders who will be in charge of a store anytime between 9am and 9pm.
A customer who has visited the store can rate their experience on a scale of 1 to 5.
I'm building a site that will allow me to store the shifts that a leader worked as seen below.
When I hit submit, the site would take the data leaderName:"George", shiftTimeArray: 11am, 1pm, 6pm (from the example in the picture) and the shiftDate and send them to an SQL database.
Later, I want to be able to get the average score for a person by sending a query to mysql, retrieving all of the scores that that leader received and averaging them together. I know the code to build the forms and to perform the search. However, I'm having a hard time coming up with the logic for the tables that will relate the data. Currently, I have a mysql table called responses that contains the following fields,
leader_id
shift_date // contains the date that the leader worked
shift_time // contains the time that the leader worked
visit_date // contains the date that the survey/score was given
visit_time // contains the time that the survey/score was given
score // contains the actual score of the survey (1-5)
I enter the shifts that the leader works at the beginning of the week and then enter the survey scores in as they come in during the week.
So Here's the Question: What mysql tables and fields should I create to relate this data so that I can query a leader's name and get the average score from all of their surveys?
You want tables like:
Leader (leader_id, name, etc)
Shift (leader_id, shift_date, shift_time)
SurveyResult (visit_date, visit_time, score)
Note: omitted the surrogate primary keys for Shift and SurveyResult that I would probably include.
To query you join shifts and surveys group on leader and taking the average then jon that back to leader for a name.
The query might be something like (but I haven;t actually built it in MySQL to verify syntax)
SELECT name
,AverageScore
FROM Leader a
INNER JOIN (
SELECT leader_id
, AVG(score) AverageScore
FROM Shift
INNER JOIN
SurveyResult ON shift_date = visit_date
AND shift_time = visit_time --depends on how you are recording time what this really needs to be
GROUP BY leader ID
) b ON a.leader_id = b.leader_id
I would do the following structure:
leaders
id
name
leaders_timetabke (can be multiple per leader)
id,
leader_id
shift_datetime (I assume it stores date and hour here, minutes and seconds are always 0
survey_scores
id,
visit_datetime
score
SELECT l.id, l.name, AVG(s.score) FROM leaders l
INNER JOIN leaders_timetable lt ON lt.leader_id = l.id
INNER JOIN survey_scores s ON lt.shift_datetime=DATE_FORMAT('Y-m-d H:00:00', s.visit_datetime)
GROUP BY l.id
DATE_FORMAT here helps to cut hours and minutes from visit_datetime so that it could be matched against shift_datetime. This is MYSQL function, so if you use something else you'll need to use different function
Say you have a 'leader' who has 5 survey rows with scores 1, 2, 3, 4 and 5.
if you select all surveys from this leader, sum the survey scores and divide them by 5 (the total amount of surveys that this leader has). You will have the average, in this case 3.
(1 + 2 + 3 + 4 + 5) / 5 = 3
You wouldn't need to create any more tables or fields, you have what you need.

SQL find rows in the same table based on certain columns

I have two tables which are for two different programs. Each program has a specific program_instance (the program_instance) is the year of the program.
One table is called 'enrollees' and the other is 'nominations' - for two programs that aren't technically related.
I've been trying to get the count of past participants from both tables. For reference:
program_instance_id:
5 = GC 2014
3 = GC 2013
1 = GC 2012
4 = GE 2013
2 = GE 2012
So I ran this query on my enrollees table and it produced a result in 913ms:
SELECT count(*) AS prev_enrollees
FROM outreach.enrollees e1
WHERE e1.program_instance_id = 5 AND EXISTS
(SELECT * FROM outreach.enrollees e2
WHERE e1.first_name = e2.first_name
AND e1.last_name = e2.last_name
AND e1.address1 = e2.address1
AND e1.state = e2.state
AND e1.zip = e2.zip
AND e2.program_instance_id < 5);
This query, to my understanding, would give me the number of rows in the 'enrollees' table where an enrollee from the current year (program_instance_id = 5) had previous enrolled in another year. The result it produces is pretty accurate, to my understanding.
So... I ran this EXACT query (changing the table name) on my 'nominations' table. The nominations table has almost the exact structure of the 'enrollees' table (some columns are different, but the person's information fields are identical). This query ran for over a half hour before I cancelled it. It's not popping out an almost-instant result like it was on the enrollee table and I don't know why it would take longer.
I could imagine if there were a lot more rows in the table but the enrollee table has about 50k MORE rows than the nominations table.
I've also tried:
SELECT count(*) AS prev_enrollees
FROM outreach_grow_education.nominations e1
JOIN outreach_grow_education.nominations e2 ON e1.first_name = e2.first_name
AND e1.last_name = e2.last_name
AND e1.address1 = e2.address1
AND e1.state = e2.state
AND e1.zip = e2.zip
AND 4 = e2.program_instance_id
WHERE e1.id IS NOT NULL AND e1.program_instance_id = 2;
Alas, to the same result. Instant result on enrollees, never-ending on nominations.
Is there any other alternative for what I'm trying to achieve that wouldn't cause the never-ending cycle?
I suggest checking the indexes of the two tables, specifically for the columns you use in the JOIN clauses: first_name, last_name, address1, state, zip, and program_instance_id. Chances are one or more of these columns is indexed in the "enrollees" table and not in "nominations."

MySQL - The most occuring for the specific day?

I'm stuck on this problem.
Basically I need to find out for each department how to figure out which days had the most sales made in them. The results display the department number and the date of the day and a department number can appear several times in the results if there were several days that have equally made the most sales.
This is what I have so far:
SELECT departmentNo, sDate FROM Department
HAVING MAX(sDate)
ORDER BY departmentNo, sDate;
I tried using the max function to find which dates occurred most. But it only returns one row of values. To clarify more, the dates that has the most sales should appear with the corresponding column called departmentNo. Also, if two dates for department A has equal amount of most sales then department A would appear twice with both dates showing too.
NOTE: only dates with the most sales should appear and the departmentNo.
I've started mySQL for few weeks now but still struggling to grasp the likes of subqueries and store functions. But i'll learn from experiences. Thank you in advance.
UPDATED:
Results I should get:
DepartmentNo Column 1: 1 | Date Column 2: 15/08/2000
DepartmentNo Column 1: 2 | Date Column 2: 01/10/2012
DepartmentNo Column 1: 3 | Date Column 2: 01/06/1999
DepartmentNo Column 1: 4 | Date Column 2: 08/03/2002
DepartmentNo Column 1: nth | Date Column 2: nth date
These are the data:
INSERT INTO Department VALUES ('1','tv','2012-05-20','13:20:01','19:40:23','2');
INSERT INTO Department VALUES ('2','radio','2012-07-22','09:32:23','14:18:51','4');
INSERT INTO Department VALUES ('3','tv','2012-09-14','15:15:43','23:45:38','3');
INSERT INTO Department VALUES ('2','tv','2012-06-18','06:20:29','09:57:37','1');
INSERT INTO Department VALUES ('1','radio','2012-06-18','11:34:07','15:41:09','2');
INSERT INTO Department VALUES ('2','batteries','2012-06-18','16:20:01','23:40:23','3');
INSERT INTO Department VALUES ('2','remote','2012-06-18','13:20:41','19:40:23','4');
INSERT INTO Department VALUES ('1','computer','2012-06-18','13:20:54','19:40:23','4');
INSERT INTO Department VALUES ('2','dishwasher','2011-06-18','13:20:23','19:40:23','4');
INSERT INTO Department VALUES ('3','lawnmower','2011-06-18','13:20:57','20:40:23','4');
INSERT INTO Department VALUES ('3','lawnmower','2011-06-18','11:20:57','20:40:23','4');
INSERT INTO Department VALUES ('1','mobile','2012-05-18','13:20:31','19:40:23','4');
INSERT INTO Department VALUES ('1','mouse','2012-05-18','13:20:34','19:40:23','4');
INSERT INTO Department VALUES ('1','radio','2012-05-18','13:20:12','19:40:23','4');
INSERT INTO Department VALUES ('2','lawnmowerphones','2012-05-18','13:20:54','19:40:23','4');
INSERT INTO Department VALUES ('2','tv','2012-05-12','06:20:29','09:57:37','1');
INSERT INTO Department VALUES ('2','radio','2011-05-23','11:34:07','15:41:09','2');
INSERT INTO Department VALUES ('1','batteries','2011-05-21','16:20:01','23:40:23','3');
INSERT INTO Department VALUES ('2','remote','2011-05-01','13:20:41','19:40:23','4');
INSERT INTO Department VALUES ('3','mobile','2011-05-09','13:20:31','19:40:23','4');
For department1 the date 2012-05-18 would appear because that date occurred the most. And for every department, it should only show the one with the most sales, and if same amount of sales appears on the same date then both will appear, e.g. Department 1 will appear twice with both the dates of max sales.
I've tested the following query based on the table and two columns you've provided along with sample data. So, let me describe it for you. The inner-most "PREQUERY" is doing a count by department and date. The results of this will be pre-ordered by Department first, THEN the highest count in DESCENDING ORDER (so highest sales count is listed FIRST), it doesn't matter what date the count happened.
Next, by utilizing MySQL #variables, I'm pre-declaring two to be used in the query. #variables are like inline programming with MySQL. They can be declared once and then changed as applied to each record being processed. So, I'm defaulting to a bogus department value and a zero sales count.
Now, I'm grabbing the results of the PreQuery (Dept, #Sales and Date), but now, adding a test. If it is the FIRST ENTRY for a given department, use that record's "NumberOfSales" and put into the #maxSales variable and store as a final column name "MaxSaleCnt". The next column name uses the #lastDept and is set to whatever the current record's Department # is. So it can be compared to the next record.
If the next record is the same department, then it just keeps whatever the #maxSales value was from the previous, thus keeping the same first count(*) result for ALL entries on each respective department.
Now, the closure. I've added a HAVING clause (not a WHERE as that restricts what records get tested, but HAVING processes AFTER the records are part of the PROCESSED set. So now, it would have all 5 columns. I am saying ONLY KEEP those records where the final NumberOfSales for the record MATCHES the MaxSaleCnt for the department. If one, two or more dates, no problem it returns them all per respective department.
So, one department could have 5 dates with 10 sales each, and another department has 2 dates with only 3 sales each, and another with only 1 date with 6 sales.
select
Final.DepartmentNo,
Final.NumberOfSales,
Final.sDate
from
(select
PreQuery.DepartmentNo,
PreQuery.NumberOfSales,
PreQuery.sDate,
#maxSales := if( PreQuery.DepartmentNo = #lastDept, #maxSales, PreQuery.NumberOfSales ) MaxSaleCnt,
#lastDept := PreQuery.DepartmentNo
from
( select
D.DepartmentNo,
D.sDate,
count(*) as NumberOfSales
from
Department D
group by
D.DepartmentNo,
D.sDate
order by
D.DepartmentNo,
NumberOfSales DESC ) PreQuery,
( select #lastDept := '~',
#maxSales := 0 ) sqlvars
having
NumberOfSales = MaxSaleCnt ) Final
To clarify the "#" and "~" per you final comment. The "#" indicates a local variable to the program (or in this case and in-line sql variable) that can be used in the query. The '~' is nothing more than a simple string that probability would never exist that of any of your departments, so when it is compared to the first qualified record, does an IF( '~' = YourFirstDepartmentNumber, then use this answer, otherwise use this answer).
Now, how do the above work. Lets say the following is the results of your data returned by the inner-most query, grouped and ordered by the most sales at the top going down... SLIGHTLY altered from your data, lets just assume the following to simulate multiple dates on Dept 2 that have the same sales quantity...
Row# DeptNo Sales Date # Sales
1 1 2012-05-18 3
2 1 2012-06-18 2
3 1 2012-05-20 1
4 2 2012-06-18 4
5 2 2011-05-23 4
6 2 2012-05-18 2
7 2 2012-05-12 1
8 3 2011-06-18 2
9 3 2012-09-14 1
Keep track of the actual rows. The innermost query that finishes as alias "PreQuery" returns all the rows in the order you see here. Then, that is joined (implied) with the declarations of the # sqlvariables (special to MySQL, other sql engines dont do this) and starts their values with the lastDept = '~' and the maxSales = 0 (via assignment with #someVariable := result of this side ).
Now, think of the above being handled as a
DO WHILE WE HAVE RECORDS LEFT
Get the department #, Number of Sales and sDate from the record.
IF the PreQuery Record's Department # = whatever is in the #lastDept
set MaxSales = whatever is ALREADY established as max sales for this dept
This basically keeps the MaxSales the same value for ALL in the same Dept #
ELSE
set MaxSales = the # of sales since this is a new department number and is the highest count
END IF
NOW, set #lastDept = the department you just processed to it
can be compared when you get to the next record.
Skip to the next record to be processed and go back to the start of this loop
END DO WHILE LOOP
Now, the reason you need to have the #MaxSales and THEN the #LastDept as returned columns is they must be computed for each record to be used to compare to the NEXT record. This technique can be used for MANY application purposes. If you click on my name, look at my tags and click on the MySQL tag, it will show you the many MySQL answers I've responded to. Many of them do utilize # sqlvariables. In addition, there are many other people who are very good at working queries, so dont just look in one place. As for any question, if you find a good answer that you find helpful, even if you didn't post the question, clicking on an up-arrow next to the answer helps others indicate what really helped them understand and get resolution to questions -- again, even if its not your question. Good luck on your MySQL growth.
I think this can be achieved with a single query, but my experiences for similar functionality have involved either WITH (as defined in SQL'99) using either Oracle or MSSQL.
The best (only?) way to approach a problem like this is to break in into smaller components. (I don't think your provided statement provides all columns, so I'm going to have to make a few assumptions.)
First, how many sales were made for each day for each group:
SELECT department, COUNT(1) AS dept_count, sale_date
FROM orders
GROUP BY department, sale_date
Next, what's the most sales for each department
SELECT tmp.department, MAX(tmp.dept_count)
FROM (
SELECT department, COUNT(1) AS dept_count
FROM orders
GROUP BY department
) AS tmp
GROUP BY tmp.department
Finally, putting the two together:
SELECT a.department, a.dept_count, b.sale_date
FROM (
SELECT tmp.department, MAX(tmp.dept_count) AS max_dept_count
FROM (
SELECT department, COUNT(1) AS dept_count
FROM orders
GROUP BY department
) AS tmp
GROUP BY tmp.department
) AS a
JOIN (
SELECT department, COUNT(1) AS dept_count, sale_date
FROM orders
GROUP BY department, sale_date
) AS b
ON a.department = b.department
AND a.max_dept_count = b.dept_count