sql query to get duplicate records with different dates

sql query to get duplicate records with different dates - mysql

I need to get records with different date field ,
table Sites:
field id
reference
created
Every day we add lot of records, so I need to do a function that extract all records existing with duplicates of rows just was added, to do some notifications.
the conditions that i can't get is the difference between records of the current day and the old data in the table should be (one day to 4 days) .
If is there any simple query to do that without using transaction .

I'm not sure I totally understand what you mean by duplicate records, but here's a basic date query:
SELECT fieldId, reference, created, DATE(created) as the_date
FROM Sites
WHERE the_date
BETWEEN DATE( DATE_SUB( NOW() , INTERVAL 3 DAY ) )
AND DATE ( NOW() )

I'm making several assumptions such as:
You don't want the "first" row returned
Duplicates don't carry the
date forward (The next after initial 4 days is not a duplicate)
The 4 days means +4 days so Day 5 is included
So, my code is :
with originals as (
select s1.*
from sites as s1
where 0 = (
select count(*)
from sites as s2
where s1.field_id = s2.field_id
and s1.reference = s2.reference
and s1.created <> s2.created
and DATEDIFF(DAY,s2.created, s1.created) between 1 and 4
)
)
select s1.*
from sites as s1
inner join originals as o
on s1.field_id = o.field_id
and s1.reference = o.reference
and s1.created <> o.created
where DATEDIFF(DAY,o.created, s1.created) between 1 and 4
order by 1,2,3;
Here it is in a fiddle: http://sqlfiddle.com/#!3/9b407/20
This could be simpler if some conditions are relaxed.

thanks a lot for every one who tried to help me ,
i have found this solution after lot of test
SELECT `id`,`reference`,count(`config_id`) as c,`created` FROM `sites`
where datediff(date(current_date()),date(`created`)) < 4
group by `reference`
having c > 1
thanks a lot for your help

Related

SQL - Show all dates in a given range and count how many posts there are on that date using a timestamp from database

I have created a query that will count the number of posts and group them by the date, however the result doesn't show the dates when there were no posts.
QUERY:
SELECT DATE(Date_Uploaded) AS ForDate
, COUNT(*) AS NumPosts
FROM Articles
WHERE 'Status'='4'
GROUP
BY DATE(Date_Uploaded)
ORDER
BY ForDate
for info 'Status' = 4 means the post is published on the site and 'Date_Uploaded' is the timestamp of publication.
This will for example return;
2020-09-10: 2
2020-09-14: 1
2020-09-25: 4
However I want;
2020-09-10: 2
2020-09-11: 0
2020-09-12: 0
2020-09-13: 0
2020-09-14: 1
etc.
The reason I need my data like this is so I can use it with google charts to create a column chart that shows the number of posts over time. by not including the dates with no posts the chart will not format the missing dates.
If there is a way to still use the same query but have google charts space the data appropriately this would also be a great solution.
Thanks.
EDIT: The date range would be defined on the same page I intend to place the chart using the data

If you have data for all dates, but just not for status = 4, then you can use conditional aggregation
SELECT DATE(Date_Uploaded) AS ForDate,
SUM(CASE WHEN Status = 4 THEN 1 ELSE 0 END) AS NumPosts
FROM Articles
GROUP BY DATE(Date_Uploaded)
ORDER BY ForDate;
Otherwise, you need to use a table that has dates (or numbers) or generate them and use LEFT JOIN. However, the exact syntax depends on the database.
EDIT:
In MySQL, you can use a recursive CTE to generate the dates:
with recursive dates as (
select date('2020-09-10') as date
union all
select date + interval 1 day
from dates
where date < '2020-09-25'
)
select d.date, count(a.date_uploaded)
from dates d left join
articles a
on a.date_uploaded >= d.date and
a.date_uploaded < d.date + interval 1 day and
a.status = 4
group by d.date;

Dropdown with union query

I'm developing a booking system and in my booking form I have a dropdown element which is returning (still) available start time slots for a booking system.
By creating a new booking the query I have created is working fine and all available start time slots are returned correctly.
QUERY :
WHERE {thistable}.id
IN (
SELECT id +3
FROM (
SELECT p1.book_date, t.*, count(p1.book_date) AS nbre
FROM fab_booking_taken AS p1
CROSS JOIN fab_booking_slots AS t
WHERE NOT ((t.heuredepart_resa < p1.book_end AND
t.heurearrivee_resa > p1.book_start))
AND DATE(p1.book_date)=DATE('{fab_booking___book_bookingdate}')
GROUP BY t.id) AS x
WHERE nbre =
(
SELECT count(p2.book_date)
FROM fab_booking_taken AS p2
WHERE p2.book_date = x.book_date
)
) ORDER BY id ASC
Please see video : booking creationg
The problem I have by using the same query by editing an existing booking the available start time slots are returned which is fine :
18:00
18:30
19:00
19:30
but not the already by the customer chosen (and in the database saved) time slot which is in my example 14:00.
Please see video : Editing booking with same query
Dropdown should be populated with the following options :
14:00
18:00
18:30
19:00
19:30
I tried to create an union query to get the already by the customer chosen start time slot and the (still) available start time slots.
QUERY :
{thistable}.id
IN (
SELECT id + 3
FROM (
SELECT p1.book_date, t.*, count(p1.book_date) AS nbre
FROM fab_booking_taken AS p1
CROSS JOIN fab_booking_slots AS t
WHERE NOT ((t.heuredepart_resa < p1.book_end
AND t.heurearrivee_resa > p1.book_start))
AND p1.book_date = DATE_FORMAT('{fab_booking___book_bookingdate}', '%Y-%m-%d')
GROUP BY t.id
) as foobar2
UNION (
SELECT id + 3
FROM (
SELECT p1.book_date, t.*, count(p1.book_date) AS nbre
FROM fab_booking_taken AS p1
CROSS JOIN fab_booking_slots AS t
WHERE ( ( t.heuredepart_resa < p1.book_end
AND t.heurearrivee_resa > p1.book_start ) )
AND t.id = '{fab_booking___book_starttime}'
AND p1.book_date = DATE_FORMAT('{fab_booking___book_bookingdate}', '%Y-%m-%d')
GROUP BY t.id
) AS x
WHERE nbre = (
SELECT count(p2.book_date)
FROM fab_booking_taken AS p2
WHERE p2.book_date = x.book_date
)
)
)
The already by the customer chosen start time slot is returned (14:00) but the other available returned start time slots are not correct.
Please see video : Editing booking with union query
I'm stuck and I have no clue how I could solve this issue, so I would appreciate some help here.
Thanks
Relevant database tables
fab_booking with the booking concerned into the video
please download the sql table
fab_booking_taken with the already existing bookings on 25 11 2016 id = 347
Please download the sql table
id 347 is the concerned booking
fab_booking_slots table which contains all possible time slots
Please download the sql table
fab_heuredepart_resa table which populate the dropdown element
Please download the sql table

I have to admit that it is daunting to try to untangle that query to understand the logic behind it but I think that the following query should return the results that you need.
{thistable}.id IN (
/*Finds booking slots where the booking slot does not overlap
with any of the existing bookings on that day,
or where the booking slot id is the same as the current slot.*/
SELECT t.id + 3
FROM fab_booking_slots AS t
WHERE t.id = '{fab_booking___book_starttime}'
OR NOT EXISTS (
Select 1
From fab_booking_taken AS p1
Where Date(p1.book_date) = Date('{fab_booking___book_bookingdate}')
And p1.book_end > t.heuredepart_resa
And p1.book_start < t.heurearrivee_resa
)
)
Order By id Asc;
I'm pretty sure that this is logically equivalent, and once expressed in a simplified form like this it's easier to see how you can get it to also return the additional time slot.
You should have a separate query to use when populating the time slots for a new booking that doesn't have an existing time slot, in which case you can just remove the single line t.id = '{fab_booking___book_starttime}' OR.

Combine Table Records per Day to Create 'Last Week' page

I apologize if this has been asked before.. I'm very new to developing and although I've tried searching a lot, I'm not really sure what to look for.
Anyway so I have a table which counts records being entered per day. It looks something like this (each record is represented by a letter) (assume today's date is 27/01/2013):
RECORD | COUNT | DATE
------A-----|-----4-----|27/01/2013
------B-----|-----7-----|27/01/2013
------B-----|-----3-----|24/01/2013
------C-----|-----8-----|22/01/2013
------A-----|-----2-----|19/01/2013
Each new post is checked in the table and it updates the count if the record already exists on the current day, otherwise a new record is created.
For the page which prints the records which have been added 'TODAY', I have the MySQL query
SELECT * FROM `table` ORDER BY `date` DESC, `count` DESC LIMIT 1000
and use a php 'if' statement to only print the records where the date('Y-m-d') = date in the table. So only the records and the corresponding count which has been entered that day are printed.
- the table above would produce the result:
1. B 7
2. A 4
What I would like is a page which prints the records which have been entered in the last week. I know I can use DATE_SUB(now(),INTERVAL 1 WEEK) AND NOW(), to print the records from last week but I need to duplicate records to be combined and the counts added together.. so the result for this table would look like this:
1. B 10
2. C 8
3. A 4
How would I go about combining those duplicate records and have a list of records ordered by count? Is this the best method to get a 'last week' record count, or is there another table structure which would be better?
Again I'm sorry if this a silly question or if my explanation was long-winded, but just some simple pointers will be really appreciated.

Try this
SELECT `record`, SUM(`count`) AS `count`
FROM `table`
WHERE `date` > DATE_SUB(CURDATE(),INTERVAL 1 WEEK)
GROUP BY `record`
ORDER BY `count` DESC
And you can LIMIT 1000 grouped resultset if you need to

Using GROUP BY will allow you group related records together
SELECT `record`
, SUM(`count`) AS `count`
FROM `table`
WHERE `date` > `date` - INTERVAL 1 WEEK
GROUP BY `record`
ORDER BY `count` DESC
LIMIT 1000

MySQL GROUP BY DateTime +/- 3 seconds

Suppose I have a table with 3 columns:
id (PK, int)
timestamp (datetime)
title (text)
I have the following records:
1, 2010-01-01 15:00:00, Some Title
2, 2010-01-01 15:00:02, Some Title
3, 2010-01-02 15:00:00, Some Title
I need to do a GROUP BY records that are within 3 seconds of each other. For this table, rows 1 and 2 would be grouped together.
There is a similar question here: Mysql DateTime group by 15 mins
I also found this: http://www.artfulsoftware.com/infotree/queries.php#106
I don't know how to convert these methods into something that will work for seconds. The trouble with the method on the SO question is that it seems to me that it would only work for records falling within a bin of time that starts at a known point. For instance, if I were to get FLOOR() to work with seconds, at an interval of 5 seconds, a time of 15:00:04 would be grouped with 15:00:01, but not grouped with 15:00:06.
Does this make sense? Please let me know if further clarification is needed.
EDIT: For the set of numbers, {1, 2, 3, 4, 5, 6, 7, 50, 51, 60}, it seems it might be best to group them {1, 2, 3, 4, 5, 6, 7}, {50, 51}, {60}, so that each grouping row depends on if the row is within 3 seconds of the previous. I know this changes things a bit, I'm sorry for being wishywashy on this.
I am trying to fuzzy-match logs from different servers. Server #1 may log an item, "Item #1", and Server #2 will log that same item, "Item #1", within a few seconds of server #1. I need to do some aggregate functions on both log lines. Unfortunately, I only have title to go on, due to the nature of the server software.

I'm using Tom H.'s excellent idea but doing it a little differently here:
Instead of finding all the rows that are the beginnings of chains, we can find all times that are the beginnings of chains, then go back and ifnd the rows that match the times.
Query #1 here should tell you which times are the beginnings of chains by finding which times do not have any times below them but within 3 seconds:
SELECT DISTINCT Timestamp
FROM Table a
LEFT JOIN Table b
ON (b.Timestamp >= a.TimeStamp - INTERVAL 3 SECONDS
AND b.Timestamp < a.Timestamp)
WHERE b.Timestamp IS NULL
And then for each row, we can find the largest chain-starting timestamp that is less than our timestamp with Query #2:
SELECT Table.id, MAX(StartOfChains.TimeStamp) AS ChainStartTime
FROM Table
JOIN ([query #1]) StartofChains
ON Table.Timestamp >= StartOfChains.TimeStamp
GROUP BY Table.id
Once we have that, we can GROUP BY it as you wanted.
SELECT COUNT(*) --or whatever
FROM Table
JOIN ([query #2]) GroupingQuery
ON Table.id = GroupingQuery.id
GROUP BY GroupingQuery.ChainStartTime
I'm not entirely sure this is distinct enough from Tom H's answer to be posted separately, but it sounded like you were having trouble with implementation, and I was thinking about it, so I thought I'd post again. Good luck!

Now that I think that I understand your problem, based on your comment response to OMG Ponies, I think that I have a set-based solution. The idea is to first find the start of any chains based on the title. The start of a chain is going to be defined as any row where there is no match within three seconds prior to that row:
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
Now we can assume that any non-chain starters belong to the chain starter that appeared before them. Since MySQL doesn't support CTEs, you might want to throw the above results into a temporary table, as that would save you the multiple joins to the same subquery below.
SELECT
SQ1.my_id,
COUNT(*) -- You didn't say what you were trying to calculate, just that you needed to group them
FROM
(
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
) SQ1
INNER JOIN My_Table MT3 ON
MT3.title = SQ1.title AND
MT3.my_time >= SQ1.my_time
LEFT OUTER JOIN
(
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
) SQ2 ON
SQ2.title = SQ1.title AND
SQ2.my_time > SQ1.my_time AND
SQ2.my_time <= MT3.my_time
WHERE
SQ2.my_id IS NULL
This would look much simpler if you could use CTEs or if you used a temporary table. Using the temporary table might also help performance.
Also, there will be issues with this if you can have timestamps that match exactly. If that's the case then you will need to tweak the query slightly to use a combination of the id and the timestamp to distinguish rows with matching timestamp values.
EDIT: Changed the queries to handle exact matches by timestamp.

Warning: Long answer. This should work, and is fairly neat, except for one step in the middle where you have to be willing to run an INSERT statement over and over until it doesn't do anything since we can't do recursive CTE things in MySQL.
I'm going to use this data as the example instead of yours:
id Timestamp
1 1:00:00
2 1:00:03
3 1:00:06
4 1:00:10
Here is the first query to write:
SELECT a.id as aid, b.id as bid
FROM Table a
JOIN Table b
ON (a.Timestamp is within 3 seconds of b.Timestamp)
It returns:
aid bid
1 1
1 2
2 1
2 2
2 3
3 2
3 3
4 4
Let's create a nice table to hold those things that won't allow duplicates:
CREATE TABLE
Adjacency
( aid INT(11)
, bid INT(11)
, PRIMARY KEY (aid, bid) --important for later
)
Now the challenge is to find something like the transitive closure of that relation.
To do so, let's find the next level of links. by that I mean, since we have 1 2 and 2 3 in the Adjacency table, we should add 1 3:
INSERT IGNORE INTO Adjacency(aid,bid)
SELECT adj1.aid, adj2.bid
FROM Adjacency adj1
JOIN Adjacency adj2
ON (adj1.bid = adj2.aid)
This is the non-elegant part: You'll need to run the above INSERT statement over and over until it doesn't add any rows to the table. I don't know if there is a neat way to do that.
Once this is over, you will have a transitively-closed relation like this:
aid bid
1 1
1 2
1 3 --added
2 1
2 2
2 3
3 1 --added
3 2
3 3
4 4
And now for the punchline:
SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
FROM Adjacency
GROUP BY aid
returns:
aid Neighbors
1 1,2,3
2 1,2,3
3 1,2,3
4 4
So
SELECT DISTINCT Neighbors
FROM (
SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
FROM Adjacency
GROUP BY aid
) Groupings
returns
Neighbors
1,2,3
4
Whew!

I like #Chris Cunningham's answer, but here's another take on it.
First, my understanding of your problem statement (correct me if I'm wrong):
You want to look at your event log as a sequence, ordered by the time of the event,
and partitition it into groups, defining the boundary as being an interval of
more than 3 seconds between two adjacent rows in the sequence.
I work mostly in SQL Server, so I'm using SQL Server syntax. It shouldn't be too difficult to translate into MySQL SQL.
So, first our event log table:
--
-- our event log table
--
create table dbo.eventLog
(
id int not null ,
dtLogged datetime not null ,
title varchar(200) not null ,
primary key nonclustered ( id ) ,
unique clustered ( dtLogged , id ) ,
)
Given the above understanding of the problem statement, the following query should give you the upper and lower bounds your groups. It's a simple, nested select statement with 2 group by to collapse things:
The innermost select defines the upper bound of each group. That upper boundary defines a group.
The outer select defines the lower bound of each group.
Every row in the table should fall into one of the groups so defined, and any given group may well consist of a single date/time value.
[edited: the upper bound is the lowest date/time value where the interval is more than 3 seconds]
select dtFrom = min( t.dtFrom ) ,
dtThru = t.dtThru
from ( select dtFrom = t1.dtLogged ,
dtThru = min( t2.dtLogged )
from dbo.EventLog t1
left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
and datediff(second,t1.dtLogged,t2.dtLogged) > 3
group by t1.dtLogged
) t
group by t.dtThru
You could then pull rows from the event log and tag them with the group to which they belong thus:
select *
from ( select dtFrom = min( t.dtFrom ) ,
dtThru = t.dtThru
from ( select dtFrom = t1.dtLogged ,
dtThru = min( t2.dtLogged )
from dbo.EventLog t1
left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
and datediff(second,t1.dtLogged,t2.dtLogged) > 3
group by t1.dtLogged
) t
group by t.dtThru
) period
join dbo.EventLog t on t.dtLogged >= period.dtFrom
and t.dtLogged <= coalesce( period.dtThru , t.dtLogged )
order by period.dtFrom , period.dtThru , t.dtLogged
Each row is tagged with its group via the dtFrom and dtThru columns returned. You could get fancy and assign an integral row number to each group if you want.

Simple query:
SELECT * FROM time_history GROUP BY ROUND(UNIX_TIMESTAMP(time_stamp)/3);

How to get values for every day in a month

Data:
values date
14 1.1.2010
20 1.1.2010
10 2.1.2010
7 4.1.2010
...
sample query about january 2010 should get 31 rows. One for every day. And values vould be added. Right now I could do this with 31 queries but I would like this to work with one. Is it possible?
results:
1. 34
2. 10
3. 0
4. 7
...

This is actually surprisingly difficult to do in SQL. One way to do it is to have a long select statement with UNION ALLs to generate the numbers from 1 to 31. This demonstrates the principle but I stopped at 4 for clarity:
SELECT MonthDate.Date, COALESCE(SUM(`values`), 0) AS Total
FROM (
SELECT 1 AS Date UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
--
SELECT 28 UNION ALL
SELECT 29 UNION ALL
SELECT 30 UNION ALL
SELECT 31) AS MonthDate
LEFT JOIN Table1 AS T1
ON MonthDate.Date = DAY(T1.Date)
AND MONTH(T1.Date) = 1 AND YEAR(T1.Date) = 2010
WHERE MonthDate.Date <= DAY(LAST_DAY('2010-01-01'))
GROUP BY MonthDate.Date
It might be better to use a table to store these values and join with it instead.
Result:
1, 34
2, 10
3, 0
4, 7

Given that for some dates you have no data, you'll need to fill in the gaps. One approach to this is to have a calendar table prefilled with all dates you need, and join against that.
If you want the results to show day numbers as you have showing in your question, you could prepopulate these in your calendar too as labels.
You would join your data table date field to the date field of the calendar table, group by that field, and sum values. You might want to specify limits for the range of dates covered.
So you might have:
CREATE TABLE Calendar (
label varchar,
cal_date date,
primary key ( cal_date )
)
Query:
SELECT
c.label,
SUM( d.values )
FROM
Calendar c
JOIN
Data_table d
ON d.date_field = c.cal_date
WHERE
c.cal_date BETWEEN '2010-01-01' AND '2010-01-31'
GROUP BY
d.date_field
ORDER BY
d.date_field
Update:
I see you have datetimes rather than dates. You could just use the MySQL DATE() function in the join, but that would probably not be optimal. Another approach would be to have start and end times in the Calendar table defining a 'time bucket' for each day.

This works for me... Its a modification of a query I found on another site. The "INTERVAL 1 MONTH" clause ensures I get the current month data, including zeros for days that have no hits. Change this to "INTERVAL 2 MONTH" to get last months data, etc.
I have a table called "payload" with a column "timestamp" - Im then joining the timestamp column on to the dynamically generated dates, casting it so that the dates match in the ON clause.
SELECT `calendarday`,COUNT(P.`timestamp`) AS `cnt` FROM
(SELECT #tmpdate := DATE_ADD(#tmpdate, INTERVAL 1 DAY) `calendarday`
FROM (SELECT #tmpdate :=
LAST_DAY(DATE_SUB(CURDATE(),INTERVAL 1 MONTH)))
AS `dynamic`, `payload`) AS `calendar`
LEFT JOIN `payload` P ON DATE(P.`timestamp`) = `calendarday`
GROUP BY `calendarday`

To dynamically get the dates within a date range using SQL you can do this (example in mysql):
Create a table to hold the numbers 0 through 9.
CREATE TABLE ints ( i tinyint(4) );
insert into ints (i)
values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Run a query like so:
select ((curdate() - interval 2 year) + interval (t.i * 100 + u.i * 10 + v.i) day) AS Date
from
ints t
join ints u
join ints v
having Date between '2015-01-01' and '2015-05-01'
order by t.i, u.i, v.i
This will generate all dates between Jan 1, 2015 and May 1, 2015.
Output
2015-01-01
2015-01-02
2015-01-03
2015-01-04
2015-01-05
2015-01-06
...
2015-05-01
The query joins the table ints 3 times and gets an incrementing number (0 through 999). It then adds this number as a day interval starting from a certain date, in this case a date 2 years ago. Any date range from 2 years ago and 1,000 days ahead can be obtained with the example above.
To generate a query that generates dates for more than 1,000 days simply join the ints table once more to allow for up to 10,000 days of range, and so forth.

If I'm understanding the rather vague question correctly, you want to know the number of records for each date within a month. If that's true, here's how you can do it:
SELECT COUNT(value_column) FROM table WHERE date_column LIKE '2010-01-%' GROUP BY date_column

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008