I have an SQL table with roughly the following structure:
Employee| date | department | Country | Designation
What I would like is to get results with the following structure:
count_emp_per_department | count_emp_per_country | count_emp_per_designation |
Currently I am using UNION ALL, that is constructing a query similar to that one:
SELECT emp_ID, NULL, count(1)
FROM employee
GROUP BY country
UNION ALL
SELECT NULL, emp_ID, count(1)
FROM film
GROUP BY designation
Is this the most effective way to perform multiple aggregations and return all of them in a single result set in Hive?
Kindly share if you new approach which can optimize/enhance performance.
Not sure whether its a real requirement.. as the output isnt that useful.. anyway
Here is the structure and query.
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| emp | int | |
| dt | date | |
| dept | string | |
| country | string | |
| desig | string | |
+-----------+------------+----------+
+--------+-------------+---------+------------+----------+
| t.emp | t.dt | t.dept | t.country | t.desig |
+--------+-------------+---------+------------+----------+
| 1 | 2020-02-02 | human | usa | hr |
| 2 | 2020-02-02 | dir | usa | hr |
| 3 | 2020-02-02 | dir | canada | it |
+--------+-------------+---------+------------+----------+
with q1 as (select dept,count(*) as deptcount from t group by dept),
q2 as (select country,count(*) as countrycount from t group by country),
q3 as (select desig,count(*) as desigcount from t group by desig)
select * from q1, q2, q3;
output will be like this..
+----------+---------------+-------------+------------------+-----------+----------------+
| q1.dept | q1.deptcount | q2.country | q2.countrycount | q3.desig | q3.desigcount |
+----------+---------------+-------------+------------------+-----------+----------------+
| dir | 2 | canada | 1 | hr | 2 |
| dir | 2 | usa | 2 | hr | 2 |
| dir | 2 | canada | 1 | it | 1 |
| dir | 2 | usa | 2 | it | 1 |
| human | 1 | canada | 1 | hr | 2 |
| human | 1 | usa | 2 | hr | 2 |
| human | 1 | canada | 1 | it | 1 |
| human | 1 | usa | 2 | it | 1 |
+----------+---------------+-------------+------------------+-----------+----------------+
Related
I am trying to list the unique/distinct MIN(time) for each person in the 'Results table' while joining the 'Athletes table' but I am getting duplicates.
Here is some sample data (I am running MySql 5.7)
Results Table
+----------+-----------+---------+----------+-------+-------------+------------+
| resultID | athleteID | eventID | ageGroup | time | venue | date |
+----------+-----------+---------+----------+-------+-------------+------------+
| 1 | 10 | 1 | MS | 10.20 | Tokyo | 06-06-2019 |
| 2 | 11 | 1 | MS | 10.24 | London | 03-08-2019 |
| 3 | 10 | 1 | MS | 10.20 | Los Angeles | 01-11-2019 |
| 4 | 13 | 1 | MS | 10.29 | Glasgow | 28-10-2019 |
| 5 | 14 | 1 | MS | 10.32 | Oslo | 16-07-2019 |
| ... | ... | ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... | ... | ... |
+----------+-----------+---------+----------+-------+-------------+------------+
Athletes Table
+-----------+-----------+----------+--------+-------------+
| athleteID | nameFirst | nameLast | gender | dateOfBirth |
+-----------+-----------+----------+--------+-------------+
| 10 | Bill | Smith | MS | 10-11-2000 |
| 11 | John | Brown | MS | 1-08-1999 |
| 12 | Steve | Jones | MS | 16-01-1997 |
| 13 | Alan | Green | MS | 21-07-2001 |
| 14 | Paul | Black | MS | 27-10-2000 |
| ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... |
+-----------+-----------+----------+--------+-------------+
I have tried the following code - which appears to bring the correct results set, but returns duplicate values. Bill Smith ran 10.20 twice but I only need to show one of them.
Have tried using the DISTINCT function on both SELECT's but no luck - so this is what I have:
SELECT *
FROM results
INNER JOIN (
SELECT athleteID, nameFirst, nameLast, MIN(time) as minTime
FROM results
INNER JOIN athletes USING(athleteID)
WHERE eventID = '1'
AND ageGroup IN('MS')
AND YEAR(results.date) = '2019'
GROUP BY athleteID
) AS child ON (results.athleteID = child.athleteID) AND (results.time = minTime)
HAVING YEAR(results.date) = '2019'
ORDER BY minTime ASC
I get this result
+-------+-----------+----------+-------------+------------+
| time | nameFirst | nameLast | venue | date |
+-------+-----------+----------+-------------+------------+
| 10.20 | Bill | Smith | Tokyo | 06-06-2019 |
| 10.20 | Bill | Smith | Los Angeles | 01-11-2019 |
| 10.24 | John | Brown | London | 03-08-2019 |
| 10.29 | Steve | Jones | Glasgow | 28-10-2019 |
| 10.32 | Alan | Green | Oslo | 16-07-2019 |
| ... | ... | ... | ... | ... |
| ... | ... | ... | ... | ... |
+-------+-----------+----------+-------------+------------+
As you can see, the additional result for Bill Smith (10.20 - Los Angeles) is also showing up. I need this to be omitted and only show 1 result per athlete - as below.
Desired Result
+-------+-----------+----------+---------+------------+
| time | nameFirst | nameLast | venue | date |
+-------+-----------+----------+---------+------------+
| 10.20 | Bill | Smith | Tokyo | 06-06-2019 |
| 10.24 | John | Brown | London | 03-08-2019 |
| 10.29 | Steve | Jones | Glasgow | 28-10-2019 |
| 10.32 | Alan | Green | Oslo | 16-07-2019 |
+-------+-----------+----------+---------+------------+
Any suggestions as to what I could try?
Many thanks in advance ..
You have athlete with the same min time in this case you need the min date too in outer select
SELECT r.athleteID, r.nameFirst, r.nameLast, min(r.date), child.minTime
FROM results r
INNER JOIN (
SELECT athleteID, nameFirst, nameLast
, MIN(time) as minTime
FROM results
INNER JOIN athletes USING(athleteID)
WHERE eventID = '1'
AND ageGroup IN('MS')
AND YEAR(results.date) = '2019'
GROUP BY athleteID
) AS child ON (r.athleteID = child.athleteID) AND (r.time = minTime)
WHERE YEAR(r.date) = '2019'
GROUP BY r.athleteID, child.minTime
ORDER BY minTime ASC
I'm trying to make a report where I need to know the the count of items from another table like this
+----------+--------+--------------------------------------------+
| Sale No. | Widget | Total Sold |
+----------+--------+--------------------------------------------+
| 123 | foo | Dcount(another table where widget = "foo") |
| 456 | bar | Dcount(another table where widget = "bar") |
+----------+--------+--------------------------------------------+
.
SELECT [Sale No.]
, Widget
, Dcount("SELECT foo from whatever where widget = " & widget) as [Total Sold]
FROM sometable
Unfortunately this queries the database for every record, for such a report that must be run daily this isn't really efficient.
Is there a way to query this once, and either through VBA or some SQL I don't know to hold the query in memory or the counts of each unique item. Basically query the other table just the one time instead of N times.
Here's a more accurate table that reflects my data closer
+----------+------------+---------+------------------+
| Employee | Department | Policy | Review Requested |
+----------+------------+---------+------------------+
| 123 | Sales | PlanABC | TRUE |
| 456 | Sales | PlanABC | TRUE |
| 789 | Accounting | PlanXYZ | FALSE |
| 101112 | Accounting | PlanXYZ | TRUE |
| 131415 | Sales | PlanXYZ | FALSE |
| 161718 | Admin | PlanJKL | TRUE |
+----------+------------+---------+------------------+
And the result I'm going for
+------------+----------+---------+----------------------+
| Department | Employee | Policy | Count of All Polices |
+------------+----------+---------+----------------------+
| Sales | 123 | PlanABC | 2 |
| Sales | 456 | PlanABC | 2 |
| Accounting | 101112 | PlanXYZ | 3 |
| Admin | 161718 | PlanJKL | 1 |
+------------+----------+---------+----------------------+
If your table is set up as below:
| Sale No | Widget |
|---------|--------|
| 1 | Foo |
| 2 | Bar |
| 3 | Foo |
| 4 | Foo |
| 5 | Bar |
| 6 | Foo |
| 7 | Foo |
| 8 | Bar |
| 9 | Bar |
| 10 | Foo |
You can't include the Sale No as it will group the values on that.
SELECT Widget
, COUNT(Widget) AS [Total Sold]
FROM sometable
GROUP BY Widget
Just adding the Widgets and grouping on them will return:
| Widget | Total Sold |
|--------|------------|
| Bar | 4 |
| Foo | 6 |
If, on the other hand, your Sale No field is duplicated then you can get a count per Sale No.
| Sale No | Widget |
|---------|--------|
| 1 | Foo |
| 1 | Bar |
| 5 | Foo |
| 5 | Foo |
| 5 | Bar |
| 7 | Foo |
| 7 | Foo |
| 7 | Bar |
| 7 | Bar |
| 10 | Foo |
Here the Sale No is added and the query is grouped by all fields that are not being aggregated.
SELECT [Sale No]
, Widget
, COUNT(Widget) As [Total Sold]
FROM sometable
GROUP BY [Sale No]
, Widget
This would return this table:
| Sale No | Widget | Total Sold |
|---------|--------|------------|
| 1 | Bar | 1 |
| 1 | Foo | 1 |
| 5 | Bar | 1 |
| 5 | Foo | 2 |
| 7 | Bar | 2 |
| 7 | Foo | 2 |
| 10 | Foo | 1 |
Edit:
Based on the provided table this SQL should give the correct result:
SELECT T1.Department
, T1.Employee
, T1.Policy
, COUNT(T2.Policy)
FROM sometable T1 INNER JOIN sometable T2 ON T1.Policy = T2.Policy
GROUP BY T1.Department
, T1.Employee
, T1.Policy
Resulting table:
| Department | Employee | Policy | Expr1003 |
|------------|----------|---------|----------|
| Accounting | 789 | PlanXYZ | 3 |
| Accounting | 101112 | PLanXYZ | 3 |
| Admin | 161718 | PLanJKL | 1 |
| Sales | 123 | PlanABC | 1 |
| Sales | 456 | PalnABC | 1 |
| Sales | 131415 | PlanXYZ | 3 |
I have the following tables:
clients:
| id | name | code | zone |
--------------------------------
| 1 | client 1 | a1b1 | zone1|
| 2 | client 2 | a2b2 | zone2|
contacts:
| id_contact | first_name | last_name |
----------------------------------------
| 11 | first1 | last1 |
| 22 | first2 | last2 |
| 33 | first3 | last3 |
| 44 | first4 | last4 |
client_contacts:
| id_client | id_contact |
--------------------------
| 1 | 11 |
| 1 | 22 |
| 1 | 33 |
| 2 | 11 |
| 2 | 44 |
offers:
| id_offer | id_client | value |
--------------------------
| 111 | 1 | 100 |
| 222 | 1 | 200 |
| 333 | 1 | 300 |
| 444 | 2 | 400 |
I would like through a optimal select to obtain:
| id_client | name | code | zone | contacts_pers | total_offer_value |
----------------------------------------------------------------------------
| 1 | client 1 | a1b1 | zone1 | first1 last1; | 600 |
first2 last2;
first3 last3;
| 2 | client 2 | a2b2 | zone2 | first1 last1; | 400 |
first4 last4;
I know how to get the desired result with "group_concat" and stored procedures for "total_offer_value". But how to get the desired result from a single efficient select?
SELECT c.id, c.name, c.code, c.zone, GROUP_CONCAT(DISTINCT CONCAT(co.first_name, " ", c.last_name) SEPARATOR ";") AS contact_pers, func_total_offer_value(c.id) AS total_offer_value
FROM clients c
LEFT OUTER JOIN (client_contacts cc, contacts co) ON ( c.id = cc.id_client AND cc.id_contact = co.id_contact )
GROUP BY c.id
I have the following table:
TABLE sales
| id | name | date | amount |
|----|------|------------|--------|
| 1 | Mike | 2016-12-05 | 67.15 |
| 2 | Mike | 2016-12-09 | 98.24 |
| 3 | John | 2016-12-12 | 12.98 |
| 4 | Mike | 2016-12-19 | 78.48 |
| 5 | Will | 2016-12-19 | 175.26 |
| 6 | John | 2016-12-22 | 14.26 |
| 7 | John | 2016-12-23 | 13.48 |
I am trying to create a view that will group by the name column and return only the most resent amount. It should look like this:
TABLE sales_view
| id | name | date | amount |
|----|------|------------|--------|
| 4 | Mike | 2016-12-19 | 78.48 |
| 5 | Will | 2016-12-19 | 175.26 |
| 7 | John | 2016-12-23 | 13.48 |
I'm not sure how to go about making this. I would imagine I would need sub-queries, but I know that SQL get mad if you try to use them inside of views.
You can use a tuple and a subquery with group by for max(date)
select * from sales
where (name, date) in ( select name, max(date)
from sales
group by name)
How to get count of combinations from database?
I have to database tables and want to get the count of combinations. Does anybody know how to put this in a database query, therefore I haven't a db request for each trip?
Trips
| ID | Driver | Date |
|----|--------|------------|
| 1 | A | 2015-12-15 |
| 2 | A | 2015-12-16 |
| 3 | B | 2015-12-17 |
| 4 | A | 2015-12-18 |
| 5 | A | 2015-12-19 |
Passengers
| ID | PassengerID | TripID |
|----|-------------|--------|
| 1 | B | 1 |
| 2 | C | 1 |
| 3 | D | 1 |
| 4 | B | 2 |
| 5 | D | 2 |
| 6 | A | 3 |
| 7 | B | 4 |
| 8 | D | 4 |
| 9 | B | 5 |
| 10 | C | 5 |
Expected result
| Driver | B-C-D | B-D | A | B-C |
|--------|-------|-----|---|-----|
| A | 1 | 2 | - | 1 |
| B | - | - | 1 | - |
Alternative
| Driver | Passengers | Count |
|--------|------------|-------|
| A | B-C-D | 1 |
| A | B-D | 2 |
| A | B-C | 1 |
| B | A | 1 |
Has anybody an idea?
Thanks a lot!
Try this:
SELECT Driver, Passengers, COUNT(*) AS `Count`
FROM (
SELECT t.ID, t.Driver,
GROUP_CONCAT(p.PassengerID
ORDER BY p.PassengerID
SEPARATOR '-') AS Passengers
FROM Trips AS t
INNER JOIN Passengers AS p ON t.ID = p.TripID
GROUP BY t.ID, t.Driver) AS t
GROUP BY Driver, Passengers
The above query will produce the alternative result set. The other result set can only be achieved using dynamic sql.
Demo here