Filter duplicates combining multiple columns

Filter duplicates combining multiple columns - mysql

Example table
| name | year | latitude | longitude |
|--------------|------|----------|-----------|
| Cleveland | 1800 | 10 | 11 |
| Cleveland | 1810 | 10 | 11 |
| Medina | 1811 | 12 | 13 |
| Dayton | 1812 | 14 | 15 |
| Sandusky | 1105 | 50 | 50 |
| Mount Vernon | 1813 | 50 | 50 |
What I'm aiming to do
I want to select each unique combinations of latitude and longitude. So I want to filter out any duplicate pairs. I also need to filter out any records whose year is less than 1500.
This is the subset I'm trying to achieve:
| name | year | latitude | longitude |
|--------------|------|----------|-----------|
| Cleveland | 1800 | 10 | 11 |
| Medina | 1811 | 12 | 13 |
| Dayton | 1812 | 14 | 15 |
| Mount Vernon | 1813 | 50 | 50 |
Each records year is greater than 1500 and there aren't any duplicate lat,long pairs.
What I've tried
I've tried to find a way to use DISTINCT. Nothing I've found has worked.
I also have tried using GROUP BY:
SELECT *
FROM users
GROUP BY latitude, longitude
HAVING year > 1500;
The issue with the above query is that is eliminates both of the following records which contain the lat,long pair of 50,50:
| name | year | latitude | longitude |
|--------------|------|----------|-----------|
| Sandusky | 1105 | 50 | 50 |
| Mount Vernon | 1813 | 50 | 50 |
The group is eliminated because Sandusky's year is less than 1500. I don't want Sandusky's record, but I do want Mount Vernon.
I noticed that if if the two records where switched like so:
| name | year | latitude | longitude |
|--------------|------|----------|-----------|
| Mount Vernon | 1813 | 50 | 50 |
| Sandusky | 1105 | 50 | 50 |
...then the group's year is set as 1813 and the group is not eliminated. I thought maybe sorting by year would fix it, but it didn't:
SELECT *
FROM users
GROUP BY latitude, longitude
HAVING year > 1500
ORDER BY year DESC;
Is what I'm attempting possible?

How about this?
SELECT `id`, `name`, MAX(users.year) as `year`, latitude, longitude
FROM users
WHERE year > 1500
GROUP BY latitude, longitude;
Results in:
| 7 | Columbus | 1978 | 7 | 8
| 1 | Cleveland | 1800 | 10 | 11
| 3 | Medina | 1811 | 12 | 13
| 4 | Dayton | 1812 | 14 | 15
| 6 | Mount Vernon | 1813 | 50 | 50
The only difference is where the WHERE/HAVING is, because it is before the GROUP BY statement, it will do the filtering BEFORE the grouping happens and thus you get the desired result.
The MAX(users.year)ensure that you always get the largest year on the set. If this doesn't matter to you, you can replace SELECT `id`, `name`, MAX(users.year) as `year`, latitude, longitude with SELECT *

Maybe I didn't understand the problem, but it would be this simple:
select * from users u where u.year > 1500;
I don't know what you want to do in case there are more than one pair of the same coordinates with a year greater than 1500.

How about this unless it is a misread. I did read. It makes assumptions like you want to not eliminate a different name with same lat,long
create table users
( id int auto_increment primary key,
name varchar(50) not null,
year int not null,
latitude int not null,
longitude int not null
);
truncate table users;
insert users (name,year,latitude,longitude) values
('Cleveland',1810,10,11),
('Medina',1811,12,13),
('Dayton',1812,14,15),
('Mount Vernon',1813,50,50),
('Sandusky',1105,50,50);
SELECT distinct name,year,latitude,longitude
FROM users
where year > 1500
ORDER BY year;
+--------------+------+----------+-----------+
| name | year | latitude | longitude |
+--------------+------+----------+-----------+
| Cleveland | 1810 | 10 | 11 |
| Medina | 1811 | 12 | 13 |
| Dayton | 1812 | 14 | 15 |
| Mount Vernon | 1813 | 50 | 50 |
+--------------+------+----------+-----------+

Related

Working with MySQL / MySQLi data - daily, weekly and monthly graphs

I'm developing a dashboard with graphs.
What's the problem?
Let's say, that I have a table with the folowing sctructure:
+-------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-----+---------+-------+
| total | int | NO | | NULL | |
| new | int | NO | | NULL | |
| date | date | YES | | NULL | |
+-------+------+------+-----+---------+-------+
where total stands for Total Members and new for New Members (date is a date of course - in format: yyyy-mm-dd).
Example of columns:
+-------+-------+------------+
| total | new | date |
+-------+-------+------------+
| 3450 | 21 | 2021-11-06 |
| 3650 | 200 | 2021-11-07 |
| 3694 | 34 | 2021-11-08 |
| 3520 | 26 | 2021-11-09 |
| 3399 | -321 | 2021-11-10 |
| 3430 | 31 | 2021-11-11 |
| 3450 | 20 | 2021-11-12 |
| 3410 | -40 | 2021-11-13 |
| 3923 | 513 | 2021-11-14 |
| 4019 | 96 | 2021-11-15 |
| 4119 | 100 | 2021-11-16 |
| 4000 | -119 | 2021-11-17 |
| 3000 | -1000 | 2021-11-18 |
| 3452 | 452 | 2021-11-19 |
| 3800 | 348 | 2021-11-20 |
| 3902 | 102 | 2021-11-21 |
| 4050 | 148 | 2021-11-22 |
+-------+-------+------------+
And there are a few options, where the dashboard user can select between 2 dates and type of graphs (daily, weekly, monthly).
Image, that describes the Setting options.
The Point
I need to take these 2 dates and somehow get all data from the database between the given term. Well, but that's not all. The Daily, Weekly and Monthly option means, that graphs will be showing average newcoming and total members per every week (so if I will grab 7 days from the database, I need to create an average - and do this between all these days / weeks / months in a term), if it's weekly, etc. So the final graph will be showing something like:
250 new 20 new 31 new
1000 total 1020 total 1051 total
Nov 7 Nov 14 Nov 21
etc...
More informations:
Ubuntu: 21.04
MySQL: 8.0.27
PHP: 7.4.23
Apache: 2.4.46
Does anyone have any ideas?

I don't get where your numbers come from
But your query would go like this.
For the month you need to group by MONTHof course
CREATE TABLE members (
`total` INTEGER,
`new` INTEGER,
`date` date
);
INSERT INTO members
(`total`, `new`, `date`)
VALUES
('3450', '21', '2021-11-06'),
('3650', '200', '2021-11-07'),
('3694', '34', '2021-11-08'),
('3520', '26', '2021-11-09'),
('3399', '-321', '2021-11-10'),
('3430', '31', '2021-11-11'),
('3450', '20', '2021-11-12'),
('3410', '-40', '2021-11-13'),
('3923', '513', '2021-11-14'),
('4019', '96', '2021-11-15'),
('4119', '100', '2021-11-16'),
('4000', '-119', '2021-11-17'),
('3000', '-1000', '2021-11-18'),
('3452', '452', '2021-11-19'),
('3800', '348', '2021-11-20'),
('3902', '102', '2021-11-21'),
('4050', '148', '2021-11-22');
SELECT `new`,sumtotal, `date` FROM members m
INNER JOIN (SELECT SUM(`new`) sumtotal, MIN(`date`) mindate FROM members GROUP BY WEEK(`date`)) t1
ON m.`date`= t1.mindate
WHERE m.`date` BETWEEN '2021-11-07' AND '2021-11-22'
new | sumtotal | date
--: | -------: | :---------
200 | -50 | 2021-11-07
513 | 390 | 2021-11-14
102 | 250 | 2021-11-21
db<>fiddle here

select updated_dates which are not between started_date and stopped_date

I have two table one contains updated_at (can be more than one row) datetime And second contains started_date and stopped_date(one or more records).
I want select updated_at date which should not in between started_date and stopped_date.
Thanks in advance.
I saw the other Question "Check overlap of date ranges in MySQL".
But this not what I want.
user_location
+---+-----+-----+---------------------+
|id | lat | lon | updated_date |
+---+-----+-----+---------------------+
| 1 |16.45|75.45|2018-01-09 12:50:57 |
| 2 |16.85|75.15|2018-01-09 12:53:45 |
| 3 |16.78|75.25|2018-01-09 12:55:48 |
| 4 |16.43|75.35|2018-01-09 13:57:35 |
| 5 |16.48|75.47|2018-01-09 14:59:30 |
| 6 |16.49|75.49|2018-01-10 05:59:58 |
| 7 |16.50|75.50|2018-01-10 07:35:15 |
+---+-----+-----+---------------------+
location_blocked_datetime
+---+--------------------+---------------------+
|id | start_date | stopped_date |
+---+--------------------+---------------------+
| 1 |2018-01-09 05:55:48 | 2018-01-09 07:55:48 |
| 2 |2018-01-09 12:51:48 | 2018-01-09 12:56:48 |
| 3 |2018-01-10 04:30:48 | 2018-01-04 06:55:48 |
+---+--------------------+---------------------+
I want select location from user_location table where updated_date should not be there in start_date and stopped_date.start_date and stopped dates are not fixed and contain more than 1 records
The result of above query should look like this:-
If I want to select locations On 2018-01-09
Result Of Above Query
+---+-----+-----+---------------------+
|id | lat | lon | updated_date |
+---+-----+-----+---------------------+
| 1 |16.45|75.45|2018-01-09 12:50:57 |
| 2 |16.43|75.35|2018-01-09 13:57:35 |
| 3 |16.48|75.47|2018-01-09 14:59:30 |
+---+-----+-----+---------------------+

So I get this right, that the tables are not actually connected and you want to get the records where updated_date is not between any start_date - stopped_date range?
If so, then use not exists like this:
SELECT
l.*
FROM
user_location l
WHERE DATE(l.updated_date) = '2018-01-09'
and not exists (select 1 from location_blocked_datetime
where l.updated_date between start_date and stopped_date)
| id | lat | lon | updated_date |
|----|-----|-----|---------------------|
| 1 | 16 | 75 | 2018-01-09 12:50:57 |
| 4 | 16 | 75 | 2018-01-09 13:57:35 |
| 5 | 16 | 75 | 2018-01-09 14:59:30 |
see it working live in an sqlfiddle

SQL Query to get most frequently visited place Laravel

I developed project at Laravel 5.2. I have database structure like this :
user_visited
id | user_id | latitude | longitude
01 | 1 | 140.5938388 | 36.3335513
02 | 1 | 140.2631739 | 36.3724621
03 | 1 | 140.0804782 | 36.083233
04 | 1 | 140.0855777 | 36.1048973
05 | 1 | 140.2215081 | 35.981243
06 | 1 | 140.577927 | 36.3114456
07 | 1 | 140.65826 | 36.6068145
08 | 1 | 140.109301 | 36.0865606
09 | 1 | 140.2055252 | 35.926693
10 | 1 | 139.7540075 | 36.1662458
11 | 1 | 140.2637594 | 36.241148
12 | 1 | 139.8043185 | 36.1115211
13 | 1 | 140.2183821 | 36.0601167
14 | 1 | 139.7540075 | 36.1662458
15 | 1 | 140.0309725 | 36.0381176
lcoations
id | Location name | Type | Address | Latitude | Longitude
31 | Murse Park | Theme Park | 552-18 | 140.6066128 | 36.3985857
32 | Dom Park | Theme Park | 552-12 | 140.6417064 | 36.5436575
33 | Football Park | Theme Park | 588-1 | 140.3690094 | 36.4195418
34 | Istanbul Park | Theme Park | 37 | 140.3330587 | 36.5449685
This is user's location history that get from mobile app that get location information every n seconds.
And from user's visited location history I want to know which place that users most frequently visited. How to do that?
The fact is, user can visit same location, but the longitude and latitude isn't perfect same.
Maybe we must set radius for 500 mill or what?
How to query that?

answering so that someone might get help.
According to Geo location, we can consider a place same if its latitude and longitude are similar at 5 to 6 precision.
Let's say if lat and long of a place are (140.5938388,36.3335513) and a user visits a place with lat and long (140.59383723,36.33355213) then by querying with precision of 5 i.e (140.59383,36.33355) we get the same block where your location exists and taking count of those places/locations will give you the frequently visited places.
Edited:
SELECT count(location_name) FROM locations
LEFT JOIN user_visited ON TRUNCATE(user_visited.latitude,5) = TRUNCATE(locations.latitude,5) AND TRUNCATE(user_visited.longitude,5) = TRUNCATE(locations.longitude,5)
WHERE user_visited.user_id = 1

Mysql query - Count items grouping by year and including "sub-counts"

I have a table "events" like this
id | user_id | date | is_important
---------------------------------------------------
1 | 3 | 01/02/2012 | 0
1 | 3 | 01/02/2012 | 1
1 | 3 | 01/02/2011 | 1
1 | 3 | 01/02/2011 | 1
1 | 3 | 01/02/2011 | 0
Basically, what I need to get is this:
(for the user_id=3)
year | count | count_importants
--------------------------------------------
2012 | 2 | 1
2011 | 3 | 2
I've tried this:
SELECT YEAR(e1.date) as year,COUNT(e1.id) as count_total, aux.count_importants
FROM events e1
LEFT JOIN
(
SELECT YEAR(e2.date) as year2,COUNT(e2.id) as count_importants
FROM `events` e2
WHERE e2.user_id=18
AND e2.is_important = 1
GROUP BY year2
) AS aux ON aux.year2 = e1.year
WHERE e1.user_id=18
GROUP BY year
But mysql gives me an error
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'aux ON aux.year2 = e1.year WHERE e1.user_id=18 GROUP BY year LIMIT 0, 30' at line 10
And i've run out of ideas to make this query u_Uº. Is it possible to do this using only one query??
Thanks in advance

Edit: I think I over-complicated things. Can't you just do this in a simple query?
SELECT
YEAR(`year`) AS `year`,
COUNT(`id`) AS `count`,
SUM(`is_important`) AS `count_importants`
FROM `events`
WHERE user_id = 18
GROUP BY YEAR(`year`)
Here's the big solution that adds summaries :)
Consider using MySQL GROUP BY ROLLUP. This will basically do a similar job to a normal GROUP BY, but will add rows for the summaries too.
In the example below, you see two records for Finland in 2000, for £1500 and £100, and then a row with the NULL product with the combined value of £1600. It also adds NULL rollup rows for each dimension grouped by.
From the manual:
SELECT year, country, product, SUM(profit)
FROM sales
GROUP BY year, country, product WITH ROLLUP
+------+---------+------------+-------------+
| year | country | product | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer | 1500 |
| 2000 | Finland | Phone | 100 |
| 2000 | Finland | NULL | 1600 |
| 2000 | India | Calculator | 150 |
| 2000 | India | Computer | 1200 |
| 2000 | India | NULL | 1350 |
| 2000 | USA | Calculator | 75 |
| 2000 | USA | Computer | 1500 |
| 2000 | USA | NULL | 1575 |
| 2000 | NULL | NULL | 4525 |
| 2001 | Finland | Phone | 10 |
| 2001 | Finland | NULL | 10 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 2700 |
| 2001 | USA | TV | 250 |
| 2001 | USA | NULL | 3000 |
| 2001 | NULL | NULL | 3010 |
| NULL | NULL | NULL | 7535 |
+------+---------+------------+-------------+
Here's an example the specifically matches your situation:
SELECT year(`date`) AS `year`, COUNT(`id`) AS `count`, SUM(`is_important`) AS `count_importants`
FROM new_table
GROUP BY year(`date`) WITH ROLLUP;

The alias year - year(e1.date) AS year is not visible in JOIN ON clause. Try to use this condition -
...
LEFT JOIN
(
...
) ON aux.year2 = year(e1.date) -- e1.year --> year(e1.date)
...

Excluding rows where the max value in one column is used to check the value in another?

Sorry for the obscure title, but I'm not sure how to sum it up.
I'm working with some static train schedule data supplied to me in several tables. I'm trying to show all the trains that stop at a specific station excluding those that end at the specified station. So for example, when listing all the trains that stop at NYPenn station, I don't want those trains terminating in NYPenn station.
The relevant tables are:
trips - list all of the trips made each day. each trip has a trip_id and consists of one or more stops. it also contains a trip_headsign column that shows the final destination of the train, but as text (not ID).
+----------+------------+---------+-------------------------+--------------+----------+----------+
| route_id | service_id | trip_id | trip_headsign | direction_id | block_id | shape_id |
+----------+------------+---------+-------------------------+--------------+----------+----------+
| 1 | 1 | 1 | PRINCETON RAIL SHUTTLE | 1 | 603 | 1 |
| 1 | 2 | 2 | PRINCETON RAIL SHUTTLE | 1 | 603 | 2 |
+----------+------------+---------+-------------------------+--------------+----------+----------+
stop_times - lists every stop made by every train. all stops made on the same trip share a trip_id, so this is what i LEFT JOIN on. this table also has a column called stop_sequence, ranging from 1 to n, where n is the total number of stops for that trip. The train originates at stop_sequence=1. This value ranges from 2 to 26.
+---------+--------------+----------------+---------+---------------+-------------+---------------+---------------------+
| trip_id | arrival_time | departure_time | stop_id | stop_sequence | pickup_type | drop_off_type | shape_dist_traveled |
+---------+--------------+----------------+---------+---------------+-------------+---------------+---------------------+
| 1 | 21:15:00 | 21:15:00 | 24070 | 1 | 0 | 0 | 0 |
| 1 | 21:25:00 | 21:25:00 | 41586 | 2 | 0 | 0 | 2.5727 |
+---------+--------------+----------------+---------+---------------+-------------+---------------+---------------------+
This particular train makes only two stops. The final stop (41586) is what's listed in the headsign column (notice it doesn't match the stop_name).
+---------------+---------+---------+-------------------------+----------+----------------+
| stop_sequence | stop_id | trip_id | trip_headsign | block_id | departure_time |
+---------------+---------+---------+-------------------------+----------+----------------+
| 1 | 24070 | 1 | PRINCETON RAIL SHUTTLE | 603 | 21:15:00 |
| 2 | 41586 | 1 | PRINCETON RAIL SHUTTLE | 603 | 21:25:00 |
+---------------+---------+---------+-------------------------+----------+----------------+
+---------+----------------------------+-----------+-----------+------------+---------+
| stop_id | stop_name | stop_desc | stop_lat | stop_lon | zone_id |
+---------+----------------------------+-----------+-----------+------------+---------+
| 41586 | PRINCETON RAILROAD STATION | | 40.343398 | -74.659872 | 336 |
+---------+----------------------------+-----------+-----------+------------+---------+
So, again, what I'm looking to do is show a list of all the trains that stop at a particular station EXCEPT those that terminate at the station in question. The query I wrote to do this is (in this case, for stop_id 105 which is NY Penn station):
select stop_sequence, trips.trip_id, trip_headsign, trips.block_id, departure_time from rail_data.trips left join rail_data.stop_times on trips.trip_id = stop_times.trip_id where stop_id = '105' order by departure_time asc;
This returns results like this:
+---------------+---------+-----------------------+----------+----------------+
| stop_sequence | trip_id | trip_headsign | block_id | departure_time |
+---------------+---------+-----------------------+----------+----------------+
| 18 | 1342 | NEW YORK PENN STATION | 6600 | 05:43:00 |
| 1 | 1402 | SUMMIT | 6305 | 06:07:00 |
| 16 | 1328 | NEW YORK PENN STATION | 6604 | 06:34:00 |
| 1 | 1391 | SUMMIT | 6307 | 06:41:00 |
| 19 | 1360 | NEW YORK PENN STATION | 6908 | 06:47:00 |
+---------------+---------+-----------------------+----------+----------------+
In this case, I only want the trains headed to SUMMIT to show up. But remember I can't simply say where stop_sequence > 1 because I want to include trains that might be the second, third, etc. stop -- just not the final stop.
Thanks in advance for the help!

you can query the stop_times and check if its shape_dist_traveled==0 and take the corresponding "trip id" and then query the trips table with this id. so, you have to add a where:
where trips.trip_id in (select st.trip_id from stop_times st where st.shape_dist_traveled==0)
P.S: I assumed the shape_dist_travelled will give the distance travelled by the train

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Filter duplicates combining multiple columns - mysql

Maybe I didn't understand the problem, but it would be this simple: select * from users u where u.year > 1500; I don't know what you want to do in case there are more than one pair of the same coordinates with a year greater than 1500.

Related

Working with MySQL / MySQLi data - daily, weekly and monthly graphs

select updated_dates which are not between started_date and stopped_date

SQL Query to get most frequently visited place Laravel

Mysql query - Count items grouping by year and including "sub-counts"

Excluding rows where the max value in one column is used to check the value in another?

Categories

Resources