I did a query (pretty simple, with 4 columns x 100 000 rows) and there are redundancies (rows appear several times).
I'd like to add a column that counts the number of times each row appears, and then remove the duplicates. So I don't lose information by adding this "count" column, and in the mean time lighten the table by deleting duplicate rows.
For instance if I have this table:
name
country
price
table
France
12
desk
Italy
8
table
France
12
desk
Italy
8
desk
Italy
14
desk
Italy
8
And the output should be like:
name
country
price
count
table
France
12
2
desk
Italy
8
3
desk
Italy
14
1
To count how many occurrences of a record that appears in a table, you can use the GROUP BY command, this will count how many occurrences depending on what you want to group by.
For instance, if you GROUP BY country then it will count how many records were found for each country.
Example: SELECT name, country, price, COUNT(*) FROM tablename GROUP BY name;
You can read up on GROUP BY here
This query should work for you:
select `name`, `country`, `price`, count(*) from MyTable
group by `name`, `country`, `price`
Related
I have a table with 3 fields, touristic places, the country they're in and the average rating by tourists for this place. I would like to compare different countries based on the average rating of their top touristic places. I use MySQL
It looks like this basically :
Eiffel Tower | France | 4,2
Trakoscan Castle | Croatia | 4,6
For example, how does the average of the notes of the 5 best touristic places in France compare with the average of the notes of the 5 best touristic places in Croatia. I know how to average all places for a country and compare that but I don't know how to combine LIMIT and GROUP BY
Thank you for your help.
You can use window functions to filter on the top 5 notes per country, then aggregate.
Assuming that your table has columns country, place and rating, you would phrase the query as:
select country, avg(rating) avg_rating_top5
from (
select t.*,
row_number() over(partition by country order by rating desc) rn
from mytable t
) t
where rn <= 5
group by country
Note that window functions are available in MySQL 8.0 only.
A question for homework is to show the total amount of houses with multiple presents. The list below shows which ones they are but I cannot work out the query to show them as a total of 6. I am still new and learning Mysql, my apologies for the ignorance.
Mysql data
**address** **Number of presents per home**
2 Bay Road 2
2a Neptune Road 2
45 Bay Road 2
59 Black Street 2
65 Mainway Avenue 3
89 White Street 2
Query used:
SELECT address, SUM(goodBehaviour) AS `Number of Houses with Multiple presents`
FROM wishlist
GROUP BY address
HAVING SUM(goodBehaviour) >1;
I have tried a few other queries to total the Address column but have not been able to show my desired output. Thanks.
The problem is that you sum the goodBehaviour field's values, but you should count the number of addresses that have more than 1 presents.
If each address has just 1 record in your table (based on your sample data):
select count(address)
from wishlist
where goodBehaviour >1
If you can have multiple records for a single address, then in a subquery you need to sum the number of presents and count the number of addresses in the outer query, where the total number of presents are more than 1:
select count(address)
from
(select address, sum(goodBehaviour) as presents
from wishlist
group by address) t
where t.presents>1
If you need total number of houses - you can use your query as subquery:
SELECT count(*) FROM (SELECT address, SUM(goodBehaviour) AS `Number of Houses with Multiple presents`
FROM wishlist
GROUP BY address
HAVING SUM(goodBehaviour) >1) x;
I have a table of products as shown below:
id name quantity
1 shoe 2
2 pen 1
3 shoe 1
4 glass 3
5 pen 4
6 shoe 2
I want to get the item that occurs more in the table and the number of rows it occupies or the how many times it is repeated in the table.
In the case of the above table, shoe occurs the highest number of times i.e. 3 times. I need the mysql query that can permit me to do this (return 3 in the above case).
Please take performance into consideration, since this query will be perform over a table having about 10 million records. Thank you!
SELECT name,count(*) FROM products GROUP BY name ORDER BY count(*) DESC limit 1
This may work
A basic GROUP BY will do:
select top 1 name, count(name), sum(quantity)
from XX
group by name
order by count(name) desc
I have a table of flights, which have an origin and destination city, represented as a foreign id.
A very simplified example of this table looks like:
id | origin | destination
023 1 3
044 3 2
332 2 1
509 1 3
493 1 4
I need to get the first time that a city shows up as an origin or a destination; a list of all the flights that contain a city that hasn't been flown to or from yet.
What I would like to get for the above example would be:
023: 1, 3
044: 2
493: 4
Flights 332 and 509 aren't in the output because they only visit cities that have already been visited.
Here's what I've tried:
(SELECT distinct(origin), distinct(destination) FROM flights ORDER BY id)
Doesn't work because you can't select more than one distinct column
SELECT (distinct(origin) FROM flights ORDER BY id) UNION (distinct (destination) FROM flights ORDER BY id)
Doesn't work because of syntax errors, but mainly because it doesn't take into account that a city should be unique in the origin and destination columns.
If there's not a quick way to do this in SQL I'm also happy to just iterate through and keep track of cities that have been visited (this app has literally one user, and he doesn't care about a few milliseconds of computation because he's over 80), but I'd love to know just so that I can learn more about SQL!
This does it:
SELECT id, GROUP_CONCAT(city ORDER BY city) cities
FROM (
SELECT city, min(id) id
FROM (
SELECT origin city, MIN(id) id
FROM flights
GROUP BY city
UNION
SELECT destination city, MIN(id) id
FROM flights
GROUP BY city) u
GROUP BY city) x
GROUP BY id
DEMO
My query is given below:
select vend_id,
COUNT(*) as num_prods
from Products
group by vend_id;
Please tell me how does this part work - select vend_id, COUNT(vend_id) as opposed to select COUNT(vend_id)?
select COUNT(vend_id)
That will return the number of rows where the vendor ID is not null
select vend_id, COUNT(*) as num_prods
from Products
group by vend_id
That will group the elements by Id's, and return, for each Id, how many rows do you have.
An example:
ID name salary start_date city region
----------- ---------- ----------- ----------------------- ---------- ------
1 Jason 40420 1994-02-01 00:00:00.000 New York W
2 Robert 14420 1995-01-02 00:00:00.000 Vancouver N
3 Celia 24020 1996-12-03 00:00:00.000 Toronto W
4 Linda 40620 1997-11-04 00:00:00.000 New York N
5 David 80026 1998-10-05 00:00:00.000 Vancouver W
6 James 70060 1999-09-06 00:00:00.000 Toronto N
7 Alison 90620 2000-08-07 00:00:00.000 New York W
8 Chris 26020 2001-07-08 00:00:00.000 Vancouver N
If you run this query, you will get One row for city, and you can apply a function (in this case, count) to that row. So, for each city, you will get the count of rows. You can also use other functions.
SELECT City, COUNT(*) as Employees
FROM Employee
GROUP BY City
The result is:
City Employees
--------- ---------
New York 3
Toronto 2
Vancouver 3
as you can compare the numbers of rows for each city
When you simply select COUNT(vend_id) with no GROUP BY clause, you get one row with the total count of rows with a non-NULL vendor ID - that last bit is important and is one reason why you may prefer COUNT(*) so as to avoid "missing" rows. Some people may argue that COUNT(*) is somehow less efficient but that's true in no DBMS I've used. In any case, if you are using a brain-dead DBMS, you can always try COUNT(1).
When you group by vend_id, you get one row per vendor ID with the count being the number of rows for that ID.
In step-by-step detail (conceptually, though there are almost certainly efficiencies to be gained by optimising), the first query:
SELECT COUNT(vend_id) AS num_prods FROM products
Get a list of all rows in products.
Count the rows where vend_id is not NULL, then deliver one row containing that count in the single num_prods column.
For the grouping one:
SELECT vend_id, COUNT(vend_id) AS num_prods FROM products GROUP BY vend_id
Get a list of all rows in products.
For each value of vend_id:
Count the rows matching that vend_id where vend_id is not NULL, then deliver one row containing the vend_id in the first column and that count in the second num_prods column.
Note that those rows with a null vend_id do not contribute to the aggregate function (count in this case).
In the first query, that simply means they don't appear in the overall total.
In the second case, it means that the output row still exists but the count will be zero. That's another good reason to use COUNT(*) or COUNT(1).
select vend_id will only select the vend_id field, where select * will select all the fields
select vend_id, COUNT(vend_id) and select COUNT(vend_id) gives same result for the count column as long as you use group by vend_id. when you use select vend_id, COUNT(vend_id) you must group by it using vend_id