Poor UNION ALL performance in MySQL - mysql

I have a database with rows like the following:
+------------+---------+------------+-------+
| continent | country | city | value |
+------------+---------+------------+-------+
| Asia | China | Beijing | 3 |
| ... | ... | ... | ... |
| N. America | USA | D.C | 7 |
| .... | .... | .... | .... |
In order to generate a treemap visualization, I need to work this into a table with the following shape:
+-----+------------+-------+
| uid | parent-uid | value |
+-----+------------+-------+
In this case, Asia is the "parent" for China, which is the "parent" for Beijing. So for those three you'd have something like:
+---------+--------+-----+
| Beijing | China | 3 |
| China | Asia | ... |
| Asia | global | ... |
+---------+--------+-----+
The "value" for China needs to be an aggregate of all child values. Similarly the value of Asia needs to be an aggregate of all child values.
To accomplish this purely in SQL I created the following three queries and combined them with UNION ALL:
# City-level:
SELECT
CONCAT(continent, "-", country, "-", city) as uid,
CONCAT(continent, "-", country) as parentuid,
value
FROM
table
UNION ALL
# Country-level
SELECT
CONCAT(continent, "-", country) as uid,
continent as parentuid,
SUM(value) as value
FROM
table
GROUP BY
country
UNION ALL
# Continent-level
SELECT
continent as uid,
"global" as parentuid,
SUM(value) as value
FROM
table
GROUP BY
continent
Each of the individual queries completes in milliseconds. City-level, country-level, and continent-level all return results in < 0.01 seconds
When I union them all together it suddenly takes 8 seconds to get results!
I've tried Googling the issues but everything just says "Use UNION ALL instead of UNION" (I already am)
I considered that it might not have enough RAM to build the temporary results table so it's disk trashing, but I don't know how to increase the memory limit. I tried bumping innodb_buffer_pool_size to 1GB (1073741824) but it didn't help

The first select, selects all rows in the table then getting the first row is very fast but fetching all rows will take very much time(the mysql workbench append limit 1000 to end of the query by default).
To test that fetching all rows take more time, try following query and say us the time it consumes:
select * from (
SELECT
CONCAT(continent, "-", country, "-", city) as uid,
CONCAT(continent, "-", country) as parentuid,
value
FROM
table
) t1;
If it takes almost 8 seconds then your union have no problem. And for improve performance you must limit rows by using where clause.
I hope it could help.

I guess my question is: what's wrong with WITH ROLLUP?
SELECT
CONCAT_WS('-',continent,country,city) as uid,
CONCAT_WS('-',continent,COALESCE(country,'global')) as parentuid,
value
FROM (
SELECT continent, country, city, SUM(value) as value
FROM table
GROUP BY continent, country, city WITH ROLLUP
) t1
WHERE t1.continent IS NOT NULL;
I may not have the CONCAT_WS() calls correct, especially if you have cities or countries named '', but I have to think this would be faster. The WHERE clause is just there to remove the overall summary.
Here's the example for WITH ROLLUP from the MySQL doc to help explain what it does:
mysql> SELECT year, country, product, SUM(profit)
-> FROM sales
-> GROUP BY year, country, product WITH ROLLUP;
+------+---------+------------+-------------+
| year | country | product | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer | 1500 |
| 2000 | Finland | Phone | 100 |
| 2000 | Finland | NULL | 1600 |
| 2000 | India | Calculator | 150 |
| 2000 | India | Computer | 1200 |
| 2000 | India | NULL | 1350 |
| 2000 | USA | Calculator | 75 |
| 2000 | USA | Computer | 1500 |
| 2000 | USA | NULL | 1575 |
| 2000 | NULL | NULL | 4525 |
| 2001 | Finland | Phone | 10 |
| 2001 | Finland | NULL | 10 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 2700 |
| 2001 | USA | TV | 250 |
| 2001 | USA | NULL | 3000 |
| 2001 | NULL | NULL | 3010 |
| NULL | NULL | NULL | 7535 |
+------+---------+------------+-------------+

Related

Counting "subcolumns" with mySQL

Guys let me make myself clear. I'm studying MYSQL and practicing the function "count()". I have a table called "City", where I have ID, name, CountryCode, district, and Population. My first idea was to know how many cities I have by country
SELECT *, Count(name) as "total" FROM world.city GROUP BY countrycode;
It worked, an extra column was created with the number of cities by each country. I would like to know how many countries I have by counting the number of distinct rows (I know that a have this information on the bottom of the WorkBench, but I would like to know to make this information appear on my query). I tried to add a Count(CountryCode), but it didn't work as I was expecting, a number 4079 appeared, which is the total number of cities that I have. I figured out that my "Count()" is calculating the number of rows inside each Country, not counting the number of codes that I have for each country. Is that possible to get this information?
(A mini-lesson for a Novice.)
The first thing to learn is that COUNT(*) is the usual way to use COUNT. And you get the number of rows. In contrast, COUNT(name) counts the number of rows with non-NULL name values.
Then comes the way to use DISTINCT. It is not a function. So COUNT(DISTINCT a,b) counts the number of different combinations of a and b. And COUNT(DISTINCT(a)) though it works 'fine' and 'correctly', the parens are redundant. So use COUNT(DISTINCT a).
Don't use * with GROUP BY. That is, SELECT *, ... GROUP BY ... is improper. The usual way to say something like your query is
SELECT countrycode, COUNT(*) AS "total"
FROM world.city
GROUP BY countrycode;
For provinces in Canada (which I happen to have a table of):
SELECT province, COUNT(*) AS "total" FROM world.canada GROUP BY province;
+---------------------------+-------+
| province | total |
+---------------------------+-------+
| Alberta | 573 |
| British Columbia | 716 |
| Manitoba | 299 |
| New Brunswick | 210 |
| Newfoundland and Labrador | 474 |
| Northwest Territories | 94 |
| Nova Scotia | 331 |
| Nunavut | 107 |
| Ontario | 891 |
| Prince Edward Island | 57 |
| Quebec | 1045 |
| Saskatchewan | 573 |
| Yukon | 114 |
+---------------------------+-------+
Note that a few cities show up in multiple provinces:
SELECT COUNT(DISTINCT city), COUNT(*) FROM world.canada;
+----------------------+----------+
| COUNT(DISTINCT city) | COUNT(*) |
+----------------------+----------+
| 5248 | 5484 |
+----------------------+----------+
Munch on this; there are some more lessons to learn:
SELECT city, COUNT(*) AS ct, GROUP_CONCAT(DISTINCT state)
FROM world.us
GROUP BY city
ORDER BY COUNT(*)
DESC LIMIT 11;
+-------------+----+----------------------------------+
| city | ct | GROUP_CONCAT(DISTINCT state) |
+-------------+----+----------------------------------+
| Springfield | 11 | FL,IL,MA,MO,NJ,OH,OR,PA,TN,VA,VT |
| Clinton | 10 | CT,IA,MA,MD,MO,MS,OK,SC,TN,UT |
| Madison | 8 | AL,CT,IN,ME,MS,NJ,SD,WI |
| Lebanon | 8 | IN,ME,MO,NH,OH,OR,PA,TN |
| Auburn | 7 | AL,CA,IN,ME,NH,NY,WA |
| Burlington | 7 | IA,MA,NC,NJ,VT,WA,WI |
| Washington | 7 | DC,IL,IN,MO,NC,PA,UT |
| Farmington | 7 | ME,MI,MN,MO,NH,NM,UT |
| Canton | 6 | GA,IL,MA,MI,MS,OH |
| Monroe | 6 | GA,LA,MI,NC,WA,WI |
| Lancaster | 6 | CA,NY,OH,PA,SC,TX |
+-------------+----+----------------------------------+
As for the number of cities in a country, that belongs in a the table Countries, not in the table Cities. Then use a JOIN when you want to put them together.

MySQL find averages based on multiple factors

I have table that does something like this
+--------------------------+--------+------+---------+
| | City | Year | Density |
+--------------------------+--------+------+---------+
| Project 1 | City A | 2008 | 500 |
+--------------------------+--------+------+---------+
| Project 2 | City B | 2012 | 800 |
+--------------------------+--------+------+---------+
| Project 3 | City C | 2012 | 400 |
+--------------------------+--------+------+---------+
| Project 4 | City A | 2008 | 600 |
+--------------------------+--------+------+---------+
| Project 5 | City C | 2013 | 700 |
+--------------------------+--------+------+---------+
| etc (c. 30,000 projects spread across 30 cities) |
+--------------------------+--------+------+---------+
(About 30,000 projects spread across 30 cities.)
I can write a query like:
SELECT Year, AVG(`Density`) as Density FROM table where City=’A’ GROUP BY Year
Which works fine for one city. Could anyone point me in the right direction as to how I write a single query that would calculate the average by year for each city? I’d anticipate a results table that looked something like this:
+------+--------+--------+--------+-------------+
| | City A | City B | City C | City D, etc |
+------+--------+--------+--------+-------------+
| 2005 | | | | |
+------+--------+--------+--------+-------------+
| 2006 | | | | |
+------+--------+--------+--------+-------------+
| 2008 | | | | |
+------+--------+--------+--------+-------------+
| 2009 | | | | |
+------+--------+--------+--------+-------------+
| 2010 | | | | |
+------+--------+--------+--------+-------------+
| etc | | | | |
+------+--------+--------+--------+-------------+
I have tried to use a subquery in the where clause (where in (select distinct City)) but that did not behave as I expected.
Or do I just have to do a separate line for each of the 30 cities by hand?
I am no expert with MySQL and can't see conceptually what I need to do. If anyone could give me any pointers I would be very grateful. Thanks.
You can group by multiple columns:
SELECT city, year, AVG(density) AS density
FROM table
GROUP BY city, year
This will return a separate row for each city/year combination. To get cities as columns, you'll need to pivot it. See MySQL pivot table

Conditionals and Aggregates across Multiple Tables

I have table that looks like the following:
`units`
+----+------+-------+---------------+-------+
| id | tech | jobID | city | units |
+----+------+-------+---------------+-------+
| 1 | 1234 | 8535 | San Jose | 3 |
| 2 | 1234 | 8253 | San Francisco | 4 |
| 3 | 1234 | 2457 | San Francisco | 5 |
| 4 | 1234 | 8351 | Mountain View | 8 |
+----+------+-------+---------------+-------+
and a view that uses this data to do some computations:
`total`
+----+--------+------+-------+
| id | name | tech | total |
+----+--------+------+-------+
| 1 | Dan | 1234 | 12 |
| 2 | Dan SF | 1234 | 12 |
+----+--------+------+-------+ ...
My problem is that I am trying to sum up the amount of units Dan completed in San Francisco and the amount of units he did elsewhere (need to specifically track how many units were completed in SF). However, I'm unsure of how to do this within my select query and if you look at my current total table, you'll see that both total values are simply summing all of the units regardless of city.
I want to get the following:
`total`
+----+--------+------+-------+
| id | name | tech | total |
+----+--------+------+-------+
| 1 | Dan | 1234 | 11 |
| 2 | Dan SF | 1234 | 9 |
+----+--------+------+-------+ ...
I need help writing my SELECT because I'm unsure of how to use CASE to get the desired result. I've tried the following:
SELECT otherTable.name AS name, units.tech AS tech,
(CASE WHEN City = 'SAN FRANCISCO' THEN SUM(units)
ELSE SUM(units)
) AS total
FROM units, otherTable
GROUP BY name
but clearly this won't work since I'm not differentiating between cities in the two aggregates.
Any help is greatly appreciated.
EDIT: The SELECT query for my current view (with join info) is as follows:
SELECT otherTable.name, units.tech, SUM(units.units)
FROM units
LEFT JOIN otherTable ON otherTable.tech = units.tech
GROUP BY name
As for otherTable, it simply associates each tech ID with a name:
`otherTable`
+----+--------+------+-----------+
| id | name | tech | otherInfo |
+----+--------+------+-----------+
| 1 | Dan | 1234 | ...... |
+----+--------+------+-----------+
First off, it appears that your base query is wrong. There isn't nothing about the join between units and otherTable, but I don't know enough to put it in.
It seems strange to me that you would want it broken out into rows instead of columns, but you could do the following:
SELECT otherTable.name AS name, units.tech AS tech,
SUM(units) AS total
FROM units, otherTable
-- not sure if this section should exclude 'SAN FRANCISO' or not
GROUP BY name
UNION ALL
SELECT otherTable.name || ' SF' AS name, units.tech AS tech,
SUM(units) AS total
FROM units, otherTable
WHERE City = 'SAN FRANCISCO'
GROUP BY name
This would give you
+--------+------+-------+
| name | tech | total |
+--------+------+-------+
| Dan | 1234 | 11 |
| Dan SF | 1234 | 9 |
+--------+------+-------+
Or if you want separate columns, you could do this
SELECT otherTable.name AS name, units.tech AS tech,
SUM(units) AS total,
SUM(CASE WHEN City = 'SAN FRANCISCO' THEN units
ELSE 0
) AS sf_total
FROM units, otherTable
GROUP BY name
This would give you
+--------+------+-------+----------+
| name | tech | total | sf_total |
+--------+------+-------+----------+
| Dan | 1234 | 11 | 9 |
+--------+------+-------+----------+

Mysql query - Count items grouping by year and including "sub-counts"

I have a table "events" like this
id | user_id | date | is_important
---------------------------------------------------
1 | 3 | 01/02/2012 | 0
1 | 3 | 01/02/2012 | 1
1 | 3 | 01/02/2011 | 1
1 | 3 | 01/02/2011 | 1
1 | 3 | 01/02/2011 | 0
Basically, what I need to get is this:
(for the user_id=3)
year | count | count_importants
--------------------------------------------
2012 | 2 | 1
2011 | 3 | 2
I've tried this:
SELECT YEAR(e1.date) as year,COUNT(e1.id) as count_total, aux.count_importants
FROM events e1
LEFT JOIN
(
SELECT YEAR(e2.date) as year2,COUNT(e2.id) as count_importants
FROM `events` e2
WHERE e2.user_id=18
AND e2.is_important = 1
GROUP BY year2
) AS aux ON aux.year2 = e1.year
WHERE e1.user_id=18
GROUP BY year
But mysql gives me an error
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'aux ON aux.year2 = e1.year WHERE e1.user_id=18 GROUP BY year LIMIT 0, 30' at line 10
And i've run out of ideas to make this query u_Uº. Is it possible to do this using only one query??
Thanks in advance
Edit: I think I over-complicated things. Can't you just do this in a simple query?
SELECT
YEAR(`year`) AS `year`,
COUNT(`id`) AS `count`,
SUM(`is_important`) AS `count_importants`
FROM `events`
WHERE user_id = 18
GROUP BY YEAR(`year`)
Here's the big solution that adds summaries :)
Consider using MySQL GROUP BY ROLLUP. This will basically do a similar job to a normal GROUP BY, but will add rows for the summaries too.
In the example below, you see two records for Finland in 2000, for £1500 and £100, and then a row with the NULL product with the combined value of £1600. It also adds NULL rollup rows for each dimension grouped by.
From the manual:
SELECT year, country, product, SUM(profit)
FROM sales
GROUP BY year, country, product WITH ROLLUP
+------+---------+------------+-------------+
| year | country | product | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer | 1500 |
| 2000 | Finland | Phone | 100 |
| 2000 | Finland | NULL | 1600 |
| 2000 | India | Calculator | 150 |
| 2000 | India | Computer | 1200 |
| 2000 | India | NULL | 1350 |
| 2000 | USA | Calculator | 75 |
| 2000 | USA | Computer | 1500 |
| 2000 | USA | NULL | 1575 |
| 2000 | NULL | NULL | 4525 |
| 2001 | Finland | Phone | 10 |
| 2001 | Finland | NULL | 10 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 2700 |
| 2001 | USA | TV | 250 |
| 2001 | USA | NULL | 3000 |
| 2001 | NULL | NULL | 3010 |
| NULL | NULL | NULL | 7535 |
+------+---------+------------+-------------+
Here's an example the specifically matches your situation:
SELECT year(`date`) AS `year`, COUNT(`id`) AS `count`, SUM(`is_important`) AS `count_importants`
FROM new_table
GROUP BY year(`date`) WITH ROLLUP;
The alias year - year(e1.date) AS year is not visible in JOIN ON clause. Try to use this condition -
...
LEFT JOIN
(
...
) ON aux.year2 = year(e1.date) -- e1.year --> year(e1.date)
...

Top 3 countries

My question is surely banal but i can't set up an sql query that allows me to make a list of top 3 countries for a sport-event summary table.
I explain me better: in a sport event i have a lot of athletes from different countries and i need to produce a summary table showing countries that won more medals.
Here is an example:
--------------------------------------------
|id | name | activity | country |
--------------------------------------------
| 1 | John | 100m | USA |
| 2 | Andy | 200m | CANADA |
| 3 | Frank | 400m | USA |
| 4 | Ian | 400m | GERMANY |
| 5 | Anthony | 100m | USA |
| 6 | Eric | 400m | CANADA |
| 7 | Mike | 200m | UK |
| 8 | Dave | 200m | GERMANY |
| 9 | Richard | 100m | USA |
| 10| Max | 100m | USA |
| 11| Randy | 100m | USA |
| 12| Maurice | 400m | CANADA |
| 13| Col | 100m | UK |
| 14| Jim | 400m | USA |
| 15| Adam | 200m | BRAZIL |
| 16| Ricky | 100m | UK |
| 17| Emily | 400m | USA |
| 18| Serge | 200m | UK |
| 19| Alex | 400m | FRANCE |
| 20| Enamuel | 100m | USA |
--------------------------------------------
The summary table i wish to obtain is the following:
Top 3 countries
--------------------------------------
| position | country | medals |
--------------------------------------
| 1 | USA | 9 |
| 2 | UK | 4 |
| 3 | CANADA | 3 |
--------------------------------------
How can build the qsl query?
Thanks in advance for your kind answer.
Mattew
Without the position column, this is quite easy. Just do the following
SELECT Country,COUNT(*) AS medals
FROM MyTable
GROUP BY Country
ORDER BY COUNT(*) DESC
LIMIT 3;
There is some more complicated code for getting the "position" column out, but unless you need it, it probably isn't necessary, and you can just get those numbers using a counter on the processing code. If you're interested, the code would be something like this.
SELECT #rownum:=#rownum+1 AS Position,Country,Medals FROM
(
SELECT Country,COUNT(*) AS medals
FROM Medals
GROUP BY Country
ORDER BY COUNT(*) DESC
LIMIT 3
) AS Stats, (SELECT #rownum:=0) RowNum;
The above query has been tested and appears to be working as you need it to be.
CREATE TABLE IF NOT EXISTS top_three_countries
(position INT NOT NULL AUTO_INCREMENT, country VARCHAR(30), medals INT);
TRUNCATE TABLE top_three_countries;
INSERT INTO top_three_countries (country, medals)
SELECT country, count(*) total
FROM medal
GROUP BY country
ORDER BY total DESC
LIMIT 3;
This will produce a summary table (top_three_countries) as you describe.
It's much simpler without the rank, if you can add that in your programs' logic instead:
SELECT `country`, COUNT(*) total
FROM medal
GROUP BY country
ORDER BY total DESC
LIMIT 3
Looks like Kibbee beat me too it but for a guaranteed GROUP BY compatible query you can wrap the above in a SELECT of its own:
SELECT #n:=#n+1 AS rank, country, total
FROM
(
SELECT `country`, COUNT(*) total
FROM medal
GROUP BY country
ORDER BY total DESC
LIMIT 3
) t1,
(SELECT #n:=0) t2