Can't figure out a proper MySQL query - mysql

I have a table with the following structure:
id | workerID | materialID | date | materialGathered
Different workers contribute different amounts of different material per day. A single worker can only contribute once a day, but not necessarily every day.
What I need to do is to figure out which of them was the most productive and which of them was the least productive, while it is supposed to be measured as AVG() material gathered per day.
I honestly have no idea how to do that, so I'll appreciate any help.
EDIT1:
Some sample data
1 | 1 | 2013-01-20 | 25
2 | 1 | 2013-01-21 | 15
3 | 1 | 2013-01-22 | 17
4 | 1 | 2013-01-25 | 28
5 | 2 | 2013-01-20 | 23
6 | 2 | 2013-01-21 | 21
7 | 3 | 2013-01-22 | 17
8 | 3 | 2013-01-24 | 15
9 | 3 | 2013-01-25 | 19
Doesn't really matter how the output looks, to be honest. Maybe a simple table like that:
workerID | avgMaterialGatheredPerDay
And I didn't really attempt anything because I literally have no idea, haha.
EDIT2:
Any time period that is in the table (from earliest to latest date in the table) is considered.
Material doesn't matter at the moment. Only the arbitrary units in the materialGathered column matter.

As in your comments you say that we look at each worker and consider their avarage daily working skill, rather than checking which worked most in a given time, the answer is rather easy: Group by workerid to get a result record per worker, use AVG to get their avarage amount:
select workerid, avg(materialgathered) as avg_gathered
from work
group by workerid;
Now to the best and worst workers. These can be more than two. So you cannot just take the first or last record, but need to know the maximum and the minimum avg_gathered.
select max(avg_gathered) as max_avg_gathered, min(avg_gathered) as min_avg_gathered
from
(
select avg(materialgathered) as avg_gathered
from work
group by workerid
);
Now join the two queries to get all workers that worked the avarage minimum or maximum:
select work.*
from
(
select workerid, avg(materialgathered) as avg_gathered
from work
group by workerid
) as worker
inner join
(
select max(avg_gathered) as max_avg_gathered, min(avg_gathered) as min_avg_gathered
from
(
select avg(materialgathered) as avg_gathered
from work
group by workerid
)
) as worked on worker.avg_gathered in (worked.max_avg_gathered, worked.min_avg_gathered)
order by worker.avg_gathered;
There are other ways to do this. For example with HAVING avg(materialgathered) IN (select min(avg_gathered)...) OR avg(materialgathered) IN (select max(avg_gathered)...) instead of a join. The join is very effective though, because you need just one select for both min and max.

Related

Self Join? Were Staff Who Worked the Previous Week Active 3 Weeks ago - MYSQL

I'm trying to add a column to a production hours dataset that will tell if a provider who worked last week was also working three weeks earlier. The current dataset looks something like this:
RowID | ProviderID | ClientID | DOS | DOS (Week) | Hours
1 | 1111111111 | 22222222 | 11/2/2020 | 11/1/2020 | 2.5
2 | 1111111111 | 33333333 | 11/5/2020 | 11/1/2020 | 1
3 | 1111111111 | 44444444 | 10/13/2020 | 10/11/2020 | 3
I'm trying to get an extra column 'Active 3 Weeks Prior' with y/n or 1/0 for values. For the above table, let's assume the provider started on 10/13/20. The new column would ideally populate like this:
RowID | ProviderID | ClientID | DOS | DOS (Week) | Hours | Active 3 weeks Prior
1 | 1111111111 | 22222222 | 11/2/2020 | 11/1/2020 | 2.5 | Yes
2 | 1111111111 | 33333333 | 11/5/2020 | 11/1/2020 | 1 | Yes
3 | 1111111111 | 44444444 | 10/13/2020 | 10/11/2020 | 3 | No
A couple extra tidbits: our org uses Sunday as the start of the week so DOS (Week) is the Sunday prior to the date of service. From what I've been reading so far, it seems like the solution here is some kind of self join, where the base production records are aggregated into weekly hours and compared with that same providerID's records for DOS (Week) - 21.
The trouble I'm having is: whether I'm on the right track in the first place with the self-join and how I would generate the y/n values based on the success or failure to find a matching value. Also, I suspect that joining based on a concatenate of ProviderID and DOS(Week) might be flawed? This is what I've been playing with so far.
Please let me know if I can clarify the question at all or am missing something very obvious. I truly appreciate any help, as I've been trying to figure out the right search terms to get a clue on the answer for a few days now.
If you are running MySQL 8.0, you can use window functions and a range specification:
select t.*,
(
max(providerid) over(
partition by providerid
order by dos
range between interval 3 week preceding and interval 3 week preceding
) is not null
) as active_3_weeks_before
from mytable t
It is not really clear from your explanation and data what you mean by was also working three weeks earlier. What the query does is, for each row, to check if another row exists with the same supplier and a dos that is exactly 3 week before the dos of the current row. This can easily be adapted for some other requirement.
Edit: if you want to check for any record within the last 3 weeks, you would change the window range to:
range between interval 3 week preceding and interval 1 day preceding
And if you want this in MySQL < 8.0, where window functions are not available, then you would use a correlated subquery:
select t.*,
exists (
select 1
from mytable t1
where
t1.providerid = t.provider_id
and t1.dos >= t.dos - interval 3 week
and t1.dos < t.dos
) as active_3_weeks_before
from mytable t

MySQL select avg reading every hour even if there is no reading

I'm having a hard time making a MySQL statement from a Postgres one for a project we are migrating. I won't give the exact use case since it's pretty involved, but I can create a simple comparable situation.
We have a graphing tool that needs somewhat raw output for our data in hourly intervals. In Postgres, the SQL would generate a series for the date and hour over a time span, then it would join a query against that for the average where that date an hour existed. We were able to get for example the average sales by hour, even if that number is 0.
Here's a table example:
Sales
datetime | sale
2017-12-05 08:34:00 | 10
2017-12-05 08:52:00 | 20
2017-12-05 09:15:00 | 5
2017-12-05 10:22:00 | 10
2017-12-05 10:49:00 | 10
Where something like
SELECT DATE_FORMAT(s.datetime,'%Y%m%d%H') as "byhour", AVG(s.sale) as "avg sales" FROM sales s GROUP BY byhour
would produce
byhour | avg sales
2017120508 | 10
2017120509 | 5
2017120510 | 10
I'd like something that gives me the last 24 hours, even the 0/NULL values like
byhour | avg sales
2017120501 | null
2017120502 | null
2017120503 | null
2017120504 | null
2017120505 | null
2017120506 | null
2017120507 | null
2017120508 | 10
2017120509 | 5
2017120510 | 10
...
2017120600 | null
Does anyone have any ideas how I could do this in MySQL?
Join the result on a table that you know contains all the desired hours
someting like this:
SELECT
* FROM (
SELECT
DATE_FORMAT(s.datetime, '%Y%m%d%H') AS 'byhour'
FROM
table_that_has_hours
GROUP BY byhour) hours LEFT OUTER JOIN (
SELECT
DATE_FORMAT(s.datetime, '%Y%m%d%H') AS 'byhour',
AVG(s.sale) AS 'avg sales'
FROM
sales s
GROUP BY byhour) your_stuff ON your_stuff.byhour = hours.by_hours
if you don't have a table like that you can create one.
like this:
CREATE TABLE ref (h INT);
INSERT INTO ref (h)
-> VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),
-> (12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23)
and then you can just DATE_FORMAT(date(now()),'%Y%m%d%H') to the values

Iterate through entries of column and calculate with them

I need to iterate through the entries of one specific column and calculate with the entries. My table looks like that:
FreeKB (Column to iterate) | FileSystem | Date | System |
---------------------------|------------|-----------|---------|
5000 | TestFS | 2017-03-28| TestSys |
7000 | TestFS | 2017-03-27| TestSys |
3000 | TestFS | 2017-03-26| TestSys |
10000 | TestFS | 2017-03-25| TestSys |
9000 | TestFS | 2017-03-24| TestSys |
8000 | TestFS | 2017-03-23| TestSys |
10000 | TestFS | 2017-03-22| TestSys |
11000 | TestFS | 2017-03-21| TestSys |
The question is: How do I iterate through all the entries of "FreeKB" and calculate with them? To be more specific: I want to calculate the median of all entries, where the amount of FreeKB is shrinking. I'm familiar with scripting and a little bit c++ but I'm a newbie to SQL.
Sorry if the answer seems obvious...
Greetings
Edit:
For the result, I want to iterate somehow through the entries of the last 7 days for each single FileSystem and System in the table, look where the amount of FreeKB shrinks, and calculate the median of the shrinking-numbers. Example: From 2017-03-27 to 2017-03-28 the amount of FreeKB shrinks by 2000 KB, 25th to 26th by 7000, 22th to 23th by 2000. I want to get the median of the numbers and calculate when the FileSystem might become full for an E-Mail
You can use CrossApply in SQL Server
Here are some link to get you started with
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
https://www.mssqltips.com/sqlservertip/1958/sql-server-cross-apply-and-outer-apply/
https://technet.microsoft.com/en-us/library/ms175156(v=sql.105).aspx
Here is sample Code
CREATE TABLE #test (FreeKb int,FileSystem varchar(50),[Date] Datetime)
INSERT INTO #test
SELECT 1001,'TestFS','2016/12/14' UNION
SELECT 1111,'TestFS','2017/01/01' UNION
SELECT 1223,'TestFS','2017/01/15' UNION
SELECT 1233,'TestFS','2017/01/02' UNION
SELECT 1321,'TestFS','2017/01/31' UNION
SELECT 1400,'TestFS','2016/12/12' UNION
SELECT 1456,'TestFS','2017/03/13'
SELECT a.*,b.newColumn FROM #test a
cross apply(
SELECT a.FreeKB/(SELECT count(FreeKB) from #test)as NewColumn
)b

Select a row for every date in the table, no matter the data

I have the following 3 tables in my database:
noobs
id
name
img_url
associations_id
noobs_has_points
noobs_id
points_id
points
id
amount
create_time (as UNIX timestamp)
I want to get a result for every day (such as FROM_UNIXTIME(points.create_time,'%Y-%m-%d')). And in that result I want the noobs.id and his amount of points so SUM(points.amount). So whether a noob has actually scored points on that day doesn't matter, if he did not I would want a row with 0 in there as the amount, so that for every day I get to see how many points each noob scored.
However, I have no idea how to get this result. I have tried some things with left/right (or unioned) joins but I don't get the result I want. Can anyone help me with this?
Example results:
day | points.amount | noobs.id
2015-04-11 | 3 | 1
2015-04-11 | 0 | 2 (no points scored, no entry in database)
2015-04-12 | 0 | 1 (no points scored, no entry in database)
2015-04-12 | 1 | 2
Some sample data from the three tables:
Noobs
id | name | img_url | associations_id
1 | Rien | NULL | 1
2 | Peter| NULL | 1
noobs_has_points
noobs_id | points_id
1 | 1
2 | 3
points
id | amount | create_time
1 | 3 | 1428779292
2 | 1 | 1428805351
Because there may be no dara for a given day for a given noob, you need a way to generate date values. Unfortunately, mysql doesn't have a built-in way to do this. You can code a range into the query with a series if unions as a subquery, but it's ugly and not scalable.
I recommend creating a table to hold date values:
create table dates(_date date not null primary key);
And populating it with lots of dates (say everything from 1970-2020).
Then you can code:
select _date day, sum(p.amount) total, n.id
from dates d
cross join noobs n
left join noobs_has_points np on np.noob_id = n.id
left join points p on p.id = np.points_id
and date(p.create_time) = _date
where _date between ? and ?
group by 1, 3
The cross join gives every noob a result for every date in the specified range, while to left joins ensure a zero for days without points for the noob.

Need Help Thinking Through Semi-Complex Query on Two Tables

I have two tables, and I want to create a SELECT pulling a single record from one table based on multiple records from another table.
Perhaps it will be clearer if I just give a sort of example.
Table: dates. Fields: dosID, personID, name, dateIn
1 | 10 | john smith | 2013-09-05
2 | 10 | john smith | 2013-01-25
Table: cards. Fields: cardID, personID, cardColor, cardDate
1 | 10 | red | 2013-09-05
2 | 10 | orange | 2013-09-05
3 | 10 | black | 2013-09-05
4 | 10 | green | 2013-01-25
5 | 10 | orange | 2013-01-25
So what I want is to only select a record from the dates table if a person did not receive a "red" card. The closest I have come is something like:
SELECT name, dateIn FROM dates, cards
WHERE dates.personID = cards.personID AND cardColor != 'red' AND dateIn = cardDate;
But for this query the 2013-09-05 date-of-service would still be pulled out because of the "orange" and "black" cards given on the same day.
I have tried searching but I am not even sure how to properly describe this issue and so my Google-fu has failed me. Any help or suggestions would be very much appreciated.
The easiest to understand version would filter using NOT EXISTS:
SELECT name, dateIn
FROM dates
WHERE NOT EXISTS(
SELECT *
FROM cards
WHERE cards.personID = dates.personID
AND cards.cardDate = dates.dateIn
AND cards.cardColor = 'red'
)
See it on sqlfiddle.
I'm a bit confused, you're getting the non-red 2013-09-05 rows back.
http://sqlfiddle.com/#!2/89260
I tried it using an inner join (as you wrote it), and an outer join. Same results.
EDIT:
Sorry, misunderstood your post.
eggyal's answer looks like a winner to me.