Find distinct combination of stations in table - mysql

I have a table with sample data below. Want to know distinct combination of stations present in table.
Table Name: train_route
FROM_STN TO_STN DISTANCE
BLR CHENNAI 800
DEL MUMBAI 1500
VNS DEL 1000
MUMBAI DEL 1497
CHENNAI BLR 798
Distances might be different in different records for same stations. I want to know all the distinct combination of stations present in table.
For ex: For above sample desired output will be
FROM_STN TO_STN
BLR CHENNAI
DEL MUMBAI
VNS DEL
The actual table is having billions of records. Anything that can be done using self join?

select tr.* from
(
select from_stn as frs, to_stn as tos
from train_route
union
select to_stn, from_stn
from train_route) t
join train_route tr on t.frs = tr.from_stn and t.tos = tr.to_stn
You can use union to remove duplicates.

If only the distinct combinations and not which station is the to or from is important you can just do a lexical comparison and swap place so that lower station always shows up in the first column and then do a group by:
select
if(FROM_STN < TO_STN, FROM_STN, TO_STN) station1,
if(FROM_STN > TO_STN, FROM_STN, TO_STN) station2
from
train_route
group by
if(FROM_STN < TO_STN, FROM_STN, TO_STN),
if(FROM_STN > TO_STN, FROM_STN, TO_STN);
This would give you a result like:
| station1 | station2 |
|----------|----------|
| BLR | CHENNAI |
| DEL | MUMBAI |
| DEL | VNS |
Sample SQL Fiddle
Another solution that might perform better (depending on keys and indexes):
select distinct from_stn, to_stn
from
(
select from_stn, to_stn from train_route
union all
select to_stn, from_stn from train_route
) all_pairs
where from_stn < to_stn;
In the end I don't think there's any way around having to do a lexical comparison.

Thanks for all the answers.
I have got solution for my question. Just wanted to share with you all.
select a.from_stn,a.to_stn
from train_route a
left join train_route b
on a.from_stn=b.to_stn and a.to_stn=b.from_stn
where a.from_stn<=coalesce(b.from_stn,a.from_stn);

Related

Combine two separate SQL queries in a single query

I have these two tables:
Name | Income
----------|----------
Alice | 200
Bob | 100
Charlie | 50
Dave | 500
Name | Outcome
----------|----------
Alice | 300
Bob | 40
Charlie | 100
Dave | 250
I can make this query to get all the people who have an income which is greater than 150 and order them
SELECT Name, Income
FROM table1
WHERE Income > 150
ORDER BY Income DESC
Similarly I can get all the people who have an outcome which is less than 200:
SELECT Name, Outcome
FROM table2
WHERE Outcome < 200
ORDER BY Outcome DESC
Is there a way to get the two views by writing a single query i.e. using only one ;?
EDIT: I'm sorry, I just realised I wasn't clear about what I want to get. This is more or less what I am trying to achieve:
Name | Income
----------|----------
Dave | 500
Alice | 200
Name | Outcome
----------|----------
Charlie | 100
Bob | 40
I know about JOIN but that would make only one table in a result. I can't use UNION either because Outcome and Income do have same datatype but they mean different things.
What you are showing is still two separate results. One query gives you one result. If you want to combine the two queries that gives one query and one result. One method:
SELECT what, name, value
FROM
(
SELECT 'INCOME' as what, name, income as value, 1 as sortkey1, -income as sortkey2
FROM table1
WHERE income > 150
UNION ALL
SELECT 'OUTCOME' as what, name, outcome as value, 2 as sortkey1, outcome as sortkey2
FROM table2
WHERE outcome < 200
)
ORDER BY sortkey1, sortkey2;
You can JOIN these two tables by Name and SELECT both Income and Outcome, e.g.:
SELECT t1.name, t1.Income, t2.Outcome
FROM table1 t1 JOIN table2 t2 ON t1.Name = t2.Name
WHERE t1.Income > 150 AND t2.Outcome < 200
ORDER BY t1.Income DESC t2.Outcome DESC;
update (as per edits in question)
You can't have one query resulting on two separate outputs. Closest you can get to it is by using UNION with another column to distinguish between the outputs, e.g.:
SELECT Name, Income, 'Income'
FROM table1
WHERE Income > 150
ORDER BY Income DESC
UNION
SELECT Name, Outcome, 'Outcome'
FROM table2
WHERE Outcome < 200
ORDER BY Outcome DESC

How to group by on a highest value

So, for example i've got the following table;
ID COUNTRY VALUE
--------------------- -------------------- --------------------
1 India 12000
2 India 11000
3 UK 11000
4 India 15000
5 Canada 11000
And I would like to group by Country but only have the country with the highest value show up, if I would just use a group by query like:
SELECT * FROM countries GROUP BY country
I would get;
ID COUNTRY VALUE
--------------------- -------------------- --------------------
1 India 12000
3 UK 11000
5 Canada 11000
Where the value for india would be 12000. I would like the query to group on the highest value for the group by on country like:
ID COUNTRY VALUE
--------------------- -------------------- --------------------
3 UK 11000
4 India 15000
5 Canada 11000
So it's grouped on the highest value which is 15000.
DEMO
SELECT s1.ID, s1.COUNTRY, s1.VALUE
FROM countries s1
LEFT JOIN countries s2
ON s1.VALUE < s2.VALUE
AND s1.COUNTRY = s2.COUNTRY
WHERE s2.COUNTRY IS NULL;
OUTPUT
NOTE: But be carefull of ties. In that case you can get one random from those ties.
You can use the MAX aggregate function.
select
country,
max(value) value
from countries
group by
country
See the live example.
Edit: The original solution was only correct due to the nature of the data. I've removed the ID from the first query, to correct the mistake. Here is another solution (based on #Juan Carlos Oropeza's work - thank you) that will return the ID and eliminate the ties.
select
min(x.id) id,
x.country,
x.value
from (
select
c.*
from countries c
left join countries c1 on c.value < c1.value and c.country = c1.country
where c1.country is null
) x
group by
x.country,
x.value
;
See the live example - I've modified the data to cover edge cases mentioned above.

one mysql query getting, per each row, the average of the previous three rows

i have something like this:
id | value
---------------
201311 | 10
201312 | 15
201401 | 20
201402 | 5
201403 | 17
and i need a result like this:
201311 | NULL or 0
201312 | 3.3 // 10/3
201401 | 8.3 // (15+10)/3
201402 | 15 // (20+15+10)/3
201403 | 13.3 // (5+20+15)/3
So far, i got to the point where i can get the AVG of the last three previous rows like this:
select AVG(c.value) FROM (select b.value from table as b where b.id < 201401 order by b.id DESC LIMIT 3) as c
passing the id manually. I'm not able to do it for each id.
Any ideas would be much appreciated!
thanks a lot.
regards
I think you'll have to write a stored procedure, use a cursor, iterate through the table and populate a new table using the values calculated in your cursor loop. If you need help with writing out the cursor loop, just drop a comment and I can get you an example.
i got to this now:
SELECT a.id, (select AVG(b.value) FROM table as b where b.id < a.id AND str_to_date(CONCAT(b.id,'01'), '%Y%m%d') >= DATE_SUB(str_to_date(CONCAT(a.id,'01'), '%Y%m%d'), INTERVAL 3 MONTH)) FROM `table` as a WHERE 1
But i'm quite sure there should be a better/cleaner solution
select a.id,coalesce(b.value,0) from test a left outer join
(select a.id, sum(b.value)/3 as value from
(select #row:=#row+1 as rownum,id,value from test,(select #row:=0)r) a,
(select #row1:=#row1+1 as rownum,id,value from test,(select #row1:=0)r) b
where b.rownum in (a.rownum-1,a.rownum-2,a.rownum-3)
group by a.rownum) b
on a.id=b.id;

JOINING two tables to fetch COUNT from one and SUM from the other

I am working on a project where users work on reports and enter details of their work in database. My database structure has two tables:
tbl_reports - this table contains all details of work performed
report_id user_id date country status
-----------------------------------------------------------------------
0001 abc 2014-05-04 USA checked
0002 abc 2014-05-04 USA checked
0003 abc 2014-05-05 India checked
0004 lmn 2014-05-04 USA checked
0005 lmn 2014-05-04 India checked
0006 xyz 2014-05-06 Taiwan checked
tbl_time - this table contains all details on time repoted by the users, date and country wise
id user_id date country time (hrs)
----------------------------------------------------
01 abc 2014-05-04 USA 4
02 abc 2014-05-05 India 2
03 lmn 2014-05-04 USA 3
04 lmn 2014-05-04 India 2
05 opq 2014-05-05 Belgium 4
As you can see users "abc and "lmn" have tracked all their tasks appropriately while user "xyz" has not tracked his time yet and user "opq" has tracked his time but has no records of reports he has worked on.
Now out of this I want to extract details of this team GROUPING BY "date" and "country" as below:
date country total_report_count total_time_count
-----------------------------------------------------------------------
2014-05-04 India 1 2
2014-05-04 USA 3 7
2014-05-05 Belgium 0 4
2014-05-05 India 1 2
2014-05-06 Taiwan 1 0
Which means irrespective of which user has tracked his reports or time, I need to generate team report for worked done in which country on which date , its counts and total time tracked.
Now I was able to find total_time_count report using below code:
CREATE VIEW vw_teamreport AS
SELECT
tb1.date , tb1.country,
SUM(tb1.time) AS total_time_count
FROM tbl_time tb1
LEFT JOIN tbl_reports tb2
ON tb2.report_id IS NULL
GROUP BY tb1.date, tb1.country
ORDER BY tb1.date, tb1.country;
Need help to complete the problem, and I am using MYSQL (In case if FULL JOIN is required, FULL JOIN keyword is not supported)
Because there's no FULL JOIN you'll need a query to pull out all the distinct date/country combinations from the UNION of these two tables. Or, you'll need some other query to generate the full list of dates and countries. Call this query A.
You need to write two separate aggregating queries. One will aggregate the hours by date and country and the other will aggregate the reports by date and country. Call these queries B and C.
Then you need to do
SELECT whatever, whatever
FROM (
/*query A*/
) AS a
LEFT JOIN (
/*query B*/
) AS b ON a.date=b.date AND a.country=b.country
LEFT JOIN (
/*query C*/
) AS c ON a.date=c.date AND a.country=c.country
This will produce a correctly summarized report with all the rows you need, and NULLs where there is missing summary data.
Edit
Sorry, forgot about the nested query view restriction. You'll need to create four views, one for each subquery and one for the join query. So it will be:
CREATE VIEW dates_countries AS
SELECT DISTINCT `date`, country FROM tbl_time
UNION
SELECT DISTINCT `date`, country FROM tbl_reports;
CREATE VIEW time_totals AS
SELECT `date`, country, SUM(time) AS tot
FROM tbl_time
GROUP BY `date`, country
CREATE VIEW report_totals AS
SELECT `date`, country, COUNT(*) AS tot
FROM tbl_reports
GROUP BY `date`, country
And finally this view.
CREATE VIEW team_report AS
SELECT a.`date`, a.country,
c.tot AS total_report_count,
b.tot AS total_time_count
FROM dates_countries AS a
LEFT JOIN time_totals AS b ON a.`date` = b.`date` AND a.country = b.country
LEFT JOIN repoorts_totals AS r ON a.`date` = r.`date` AND a.country = r.country;
You don't have much choice about this when you need a view.
you could do a divide it into two sub-queries with each section providing one of the reporting data like this,
CREATE VIEW vw_teamreport AS
SELECT tt.date, tt.country, t1.total_time_count, t2.total_report_count
FROM
(SELECT distinct tb1.date , tb1.country
FROM tbl_time tb1
UNION
SELECT tb1.date , tb1.country
FROM tbl_reports tb1
) tt
LEFT JOIN
(SELECT tb1.date , tb1.country, SUM(tb1.time) AS total_time_count
FROM tbl_time tb1
GROUP BY tb1.date, tb1.country) t1
ON tt.date = t1.date and tt.country = t1.country
LEFT JOIN
(SELECT tb1.date , tb1.country, COUNT(tb1.country) AS total_report_count
FROM tbl_reports tb1
GROUP BY tb1.date, tb1.country)t2
ON tt.date = t2.date and tt.country = t2.country
The first query provides the union for all time & country. The 2nd and the 3rd query provides the report data.

Get percentage value of GROUP BY results in MySQL

I'm working with survey data. Essentially, I want to count total responses for a question for a location, then use that total to create what percentage of all responses each particular response is. I then want to group results by location.
An ideal output would be similar to this:
Q1 | State | City | %
yes| MA | bos |10
no | MA | bos |40
m. | MA | bos |50
yes| MA | cam |20
no | MA | cam |20
m. | MA | cam |80
The problem I run into (I believe) is that GROUP BY works before my count statement, so I can't count all the responses. Below is an example of what I have to produce real numbers:
SELECT q1, state, city, COUNT(q1) FROM master GROUP BY state, city, q1
Not all questions have responses, so below is my attempt to get the percentage:
SELECT q1, state, city, count(q1)/(count(nullif(q1,0))) as percent FROM master group by state, city, q1
I believe using WITH or OVER(PARTITION BY...) would be a possible avenue, but I can't get either to work.
I think that the query needs to be phrased in two parts. One to get the count per State+City plus another to get the count per State+City+Q1. You then join these two queries together and do the calculation on the combined results. There might be a more elegant solution than this, but something along these lines perhaps might work. Apologies for any typos!
select t1.q1, t1.state, t1.city, ResponseCount, 100.0 * ResponseCount/CityCount as "%"
from
(select q1, state, city, count(q1) as ResponseCount
from master
group by state, city, q1) t1
join
(select state, city, count(*) as CityCount
from master
group by state, city) t2
on t2.State = t1.State and t2.City = t1.City
Hope this helps.
GROUP BY state, city, COUNT(q1)