Collecting multiple columns of aggregate data with a join - mysql

I'm trying to figure out if the query I'd like to do is at all doable or feasible in SQL or if I need to collect raw data and process it in my application.
My schema looks like this:
applications
================
id INT
application_steps
=================
id INT
application_id INT
step_id INT
activated_at DATE
completed_at DATE
steps
=====
id INT
step_type_id INT
Ideally, with this data in application_steps:
| id | application_id | step_id | activated_at | completed_at |
| 1 | 1 | 1 | 2013-01-01 | 2013-01-02 |
| 2 | 1 | 2 | 2013-01-02 | 2013-01-02 |
| 3 | 1 | 3 | 2013-01-02 | 2013-01-10 |
| 4 | 1 | 4 | 2013-01-10 | 2013-01-11 |
| 5 | 2 | 1 | 2013-02-02 | 2013-02-02 |
| 6 | 2 | 2 | 2013-02-02 | 2013-02-07 |
| 7 | 2 | 4 | 2013-02-09 | 2013-02-11 |
I want to get this result:
| application_id | step_1_days | step_2_days | step_3_days | step_4_days |
| 1 | 1 | 0 | 8 | 1 |
| 2 | 0 | 5 | NULL | 2 |
Note that in reality there are many more steps and many more applications that I would be looking at.
As you can see, there is a has-many relation between applications and application_steps. It is also possible for a given step to not be in use for a particular application. I'd like to get the amount of time each step takes (using DATEDIFF(completed_at, activated_at)), all in one row (the column names don't matter). Is this at all possible?
Secondary question: To complicate things a bit further, I will also need a secondary query which joins application_steps with steps and only gets data for steps with a particular step_type_id. Assuming part one is possible, how can I extend it to filter efficiently?
NOTE: Efficiency is key here - this is for a yearly report, which equates to about 2500 applications with 70 different steps and 44,000 application_steps in production (not a lot of data, but potentially a lot when joins are factored in).

This should be a basic "pivoting" aggregation:
select id,
max(case when step_id = 1 then datediff(completed_at, activated_at) end) as step_1_days,
max(case when step_id = 2 then datediff(completed_at, activated_at) end) as step_2_days,
max(case when step_id = 3 then datediff(completed_at, activated_at) end) as step_3_days,
max(case when step_id = 4 then datediff(completed_at, activated_at) end) as step_4_days
from application_steps s
group by id;
You would have to repeat this for all 70 steps.
To do this only for a particular type of step:
select application_id,
max(case when step_id = 1 then datediff(completed_at, activated_at) end) as step_1_days,
max(case when step_id = 2 then datediff(completed_at, activated_at) end) as step_2_days,
max(case when step_id = 3 then datediff(completed_at, activated_at) end) as step_3_days,
max(case when step_id = 4 then datediff(completed_at, activated_at) end) as step_4_days
from application_steps s join
steps
on s.step_id = steps.id and
steps.step_type_id = XXX
group by application_id;

Related

One row result with multiple join in MySQL

I have 3 tables like the following.
Table "mansioni":
id_mansione | desc_mansione
1 | production
2 | office
3 | transport
Table "dipendente": store id, name and surname:
id_dip | nome_dip | cognome_dip
1 | piero | rossi
2 | marco | rossi
Table dipendenti_iddip: store the association between "dipendente" and table "mansioni"
iddip_mansione | num_mansione | id_mansione
1 | 1 | 1
1 | 2 | 2
2 | 1 | 2
2 | 2 | 3
Now I need a query that give me a result like this:
id_dip | nome_dip | cognome_dip | mansione1 | mansione2 | mansione3
1 | piero | rossi | production| office |
2 | marco | rossi | office | transport |
I arrived to the following query but with this I can only see the "id_mansione" and not the "desc mansione" field
select i.id_dip,
i.nome_dip,
i.cognome_dip,
max(case when t.num_mansione='1' then t.id_mansione end) Mansione1,
max(case when t.num_mansione='2' then t.id_mansione end) Mansione2,
max(case when t.num_mansione='3' then t.id_mansione end) Mansione3
from dipendente i
left join dipendenti_iddip t
on i.id_dip = t.iddip_mansione
group by i.id_dip, i.nome_dip, i.cognome_dip
How can I arrive to my result?
Thanks...
Add join on mansioni and replace t.id_mansione with m.desc_mansione
select i.id_dip,
i.nome_dip,
i.cognome_dip,
max(case when t.num_mansione = '1' then m.desc_mansione end) Mansione1,
max(case when t.num_mansione = '2' then m.desc_mansione end) Mansione2,
max(case when t.num_mansione = '3' then m.desc_mansione end) Mansione3
from dipendente i
join dipendenti_iddip t
on i.id_dip = t.iddip_mansione
join mansioni m on m.id_mansione = t.id_mansione
group by i.id_dip

Mysql adding new columns for the selection

I have the following table
+-----+------------------+-------------+
| id | name |month_1 |
+-----+------------------+-------------+
| 1 | anna | 15 |
| 2 | bin | 20 |
+-----+------------------+-------------+
When I make a selection I want to add one more column.
For example
SELECT id,name, money as month_1 FROM test where month(day)='1';
And I want to add a column, something like this:
SELECT id,name, money as month_1,money as month_2
FROM test
where where month(day)='1',where month(day)='2'
+-----+------------------+-------------+------------+
| id | name | month_1 |month_2 |
+-----+------------------+-------------+------------+
| 1 |anna | 15 | 10 |
| 2 | bin | 20 | 0 |
+-----+------------------+-------------+------------+
You can use conditional aggregation:
SELECT id,name,
SUM(CASE WHEN month(day) = 1 THEN money ELSE 0 END) as month_1,
SUM(CASE WHEN month(day) = 2 THEN money ELSE 0 END) as month_2
FROM test
GROUP BY id, name;
You may or may not want to include the month and year in the aggregation.

How to do one big select from table in MySQL?

I need to show the data from DB into a table of report file.
my_table looks like:
+----+-------+------+------+-------------------+-----------+-------+----+-------------------+
| id |entryID|userID|active| dateCreated |affiliateId|premium|free| endDate |
| 1 | 69856 | 1 | N |2014-03-22 13:54:49| 1 | N | N |2014-03-22 13:54:49|
| 2 | 63254 | 2 | Y |2014-03-21 13:35:15| 2 | Y | N | |
| 3 | 56324 | 3 | N |2014-03-21 11:11:22| 2 | Y | N |2014-02-22 16:44:46|
| 4 | 41256 | 4 | Y |2014-03-21 08:10:46| 1 | N | Y | |
| .. | ... | ... | ... | ... | ... | ... | .. | ... |
+----+-------+------+------+-------------------+-----------+-------+----+-------------------+
I need to create the table with data from my_table
| Date | № of Entries (in that date) | Total № of Entries | Premium | Free | Afiiliate |
The final table in file should looks like:
Report 17-07-2013:
+----------+--------------+-------+---------+------+-----------+
| Date | № of Entries | Total | Premium | Free | Afilliate |
|2013-07-17| 2 | 99845 | 2 | 0 | 0 |
|2013-07-18| 1 | 99843 | 0 | 1 | 0 |
|2013-07-22| 1 | 99842 | 1 | 0 | 1 |
|2013-07-23| 3 | 99841 | 2 | 1 | 2 |
|2013-07-24| 298 | 99838 | 32 | 273 | 25 |
|2013-07-25| 5526 | 99540 | 474 | 5058 | 126 |
|2013-07-26| 1686 | 94014 | 157 | 1532 | 56 |
|2013-07-27| 1673 | 92328 | 156 | 1517 | 97 |
|2013-07-28| 1461 | 90655 | 155 | 1310 | 83 |
| ... | ... | ... | ... | ... | ... |
+----------+--------------+-------+---------+------+-----------+
Should I for each column do a SELECT or I should do only 1 select?
If it possible to do 1 select how to do it?
It should be by analogy with this report:
report
Some fields differ (like 'Number of Entries in that date').
Total number of Entries means: all entries from beginning to the that specific date.
Number of Entries in that date means: all entries in that date.
In a final table the date from column Date will not repeat, that's why Column 'Number of Entries (in that date)' will calculate all entries for that date.
Your result is not so clear for the total is a count or sum and affiliate is sum or count also
but assuming total will be count and affiliate will be sum
here a query you might use to give you a result ( using ms-sql )
select DateCreated,count(EntryId) as Total,
sum(case when Premium='Y' then 1 else 0 end) as Premium,
sum(case when Premium='N' then 1 else 0 end) as Free,
sum(AffiliateId) as Affiliate
from sample
group by DateCreated
here a working demo
if I didn't understood you correctly, kindly advise
hope it will help you
SQLFiddle Demo: http://sqlfiddle.com/#!9/20cc0/5
The added column entryID does not matter for us.
I don't really understand what you want for Total, or the criteria for affiliateID. This query should get you started.
SELECT
DATE(dateCreated) as "Date",
count(dateCreated) as "No of Entries",
99845 as Total,
sum( case when premium='Y' then 1 else 0 end ) as Premium,
sum( case when premium='N' then 1 else 0 end ) as Free,
sum( case when affiliateID IS NOT NULL then 1 else 0 end) as Affiliate
FROM MyTable
GROUP BY DATE(dateCreated)
ORDER BY Date ASC
The final table in file should looks like:
... This new table can be in a file or in the web page. But it is not a new table in DB. –
It sounds like you may be new to this area so I just wanted to inform you that spitting out a report into a file for a website is highly unusual and typically only done when your data is completely separate from the website. Putting data from a database onto a website (like the query we made here) is very common and it's very likely you don't need to mess with any files.
select date(DateCreated),count(entryId) as Total,
sum(case when Premium='Y' then 1 else 0 end) as Premium,
sum(case when Premium='N' then 1 else 0 end) as Free,
sum( case when affiliateID IS NOT NULL then 1 else 0 end) as Affiliate
INTO OUTFILE '/tmp/myfile.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
from my_table
group by date(DateCreated) order by date(DateCreated);

SQL Increment Column

I am trying to show one semesters aggregates in one column, the next semester's aggregates in the second column, and the third semesters aggregates in the third column. Also the real tables, I don't know how many status codes there are...
I have a semester table:
Id Semester
+----+----------+
| 1 Current |
| 2 Next |
| 3 2 Ahead |
+----+----------+
I have a simple project table:
Id Title Status termId
+----+--------+---------+--------+
| 1 A OK 1 |
| 2 B Bad 1 |
| 3 C OK 1 |
| 4 D Bad 2 |
| 5 E OK 2 |
| 6 F Bad 3 |
| 7 G OK 2 |
+----+--------+---------+--------+
This is the desired Output:
Status CurrentCount NextCount 2AheadCount
+---------+--------------+-----------+-------------+
| OK 2 1 0 |
| Bad 1 1 1 |
+---------+--------------+-----------+-------------+
What would you recommend I do to be able to achieve this?
You can use conditional aggregation with group by:
select status,
sum(case when termId = 1 then 1 else 0 end) CurrentCount,
sum(case when termId = 2 then 1 else 0 end) NextCount,
sum(case when termId = 3 then 1 else 0 end) 2AheadCount
from project
group by status

How to pivot a column that is not unique in MySQL or SSIS?

I have an MySQL input table like below. The Primary Key consists of PID and MID.
PID | MID | VAL
---------------
1 | 1 | 50
1 | 2 | 51
1 | 3 | 52
1 | 4 | 53
2 | 1 | 25
2 | 2 | 26
3 | 1 | 11
3 | 1 | 12
3 | 2 | 13
And I need this format below, where you can see that the fixed number of 50 MID's are in the columns and all available PID's are in the rows. PID should be the Primary Key:
PID | MID1 | MID2 | MID3 | MID4 | MID5 | .... | MID50
---------------------------------------------------------------
1 | 50 | 51 | 52 | 53 | ..(null)..
2 | 25 | 26 | ..(null)..
3 | 12 | 13 | ..(null)..
Regarding SSIS: According to MSDN Pivot Article the PIVOT transformation "rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output" so I think this is not applicable in my case since MID is not unique.
In SQL / MySQL I'm not an expert so I hope you can help me. I've seen the MySQL Pivot query here which in my opinion goes to the right direction but cannot adapt this solution to my problem.
The solution might be a MySQL-Query or a SSIS transformation.
something on these lines
SELECT
pid,
MAX(CASE mid WHEN 1 THEN val ELSE NULL END) as MID1,
MAX(CASE mid WHEN 2 THEN val ELSE NULL END) as MID2,
MAX(CASE mid WHEN 3 THEN val ELSE NULL END) as MID3,
MAX(CASE mid WHEN 4 THEN val ELSE NULL END) as MID4,
MAX(CASE mid WHEN 5 THEN val ELSE NULL END) as MID5,
MAX(CASE mid WHEN 6 THEN val ELSE NULL END) as MID6
FROM inputTable
GROUP BY pid;
DEMO