Mysql select just one row per day in interval - mysql

I have a mysql DB with multiple that contains data like this.
http://sqlfiddle.com/#!9/f084c
CREATE TABLE `datos` (
`id` int(11) NOT NULL,
`fecha` datetime NOT NULL,
`temperatura` tinyint(4) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `datos` (`id`, `fecha`, `temperatura`) VALUES
(1874, '2017-05-20 01:56:40', 20),
(1875, '2017-05-20 01:56:51', 20),
(1876, '2017-06-18 23:32:49', 17),
(1877, '2017-06-18 23:34:50', 17),
(1878, '2017-06-18 23:36:51', 17),
(1879, '2017-06-18 23:38:52', 17),
(1880, '2017-06-18 23:46:02', 16),
(1881, '2017-06-18 23:47:12', 17),
(1882, '2017-06-22 01:06:27', 21);
I want to select just one value of a day during a 30 day interval, to have a result like this
2017-06-22 01:06:27 21
2017-06-18 23:47:12 17
2017-05-20 01:56:51 20
I have selected the entire interval using
SELECT * FROM `datos` WHERE fecha >= '2017-06-22 01:06:27' - INTERVAL 30 DAY
But i have not managed to just select one value per day instead of all of them.
Would appreciate the help

Based on the desired result you listed, it looks like you want maximum fecha for each day.
select date(fecha), max(fecha)
from datos
group by date(fecha)
This results in the following:
2017-06-22 2017-06-22 01:06:27
2017-06-18 2017-06-18 23:47:12
2017-05-20 2017-05-20 01:56:51
By treating the above query as a table and joining it back to the datos table you can get the complete record: id, fecha, and temperatura which had the maximum fecha each day.
select d1.*
from datos d1,
(select date(fecha), max(fecha) as max_fecha
from datos
group by date(fecha) ) d2
where d1.fecha = d2.max_fecha
and d1.fecha >= '2017-06-22 01:06:27' - INTERVAL 30 DAY
Resulting in the following:
1875 2017-05-20 2017-05-20 01:56:51 20
1881 2017-06-18 2017-06-18 23:47:12 17
1882 2017-06-22 2017-06-22 01:06:27 21
Note I initially solved in Oracle since I do not have access to a mysql database, but I believe I correctly altered the queries to work in mysql.

select a.*
from datos a
left outer join datos b
on substr(a.fecha,1,10) = substr(b.fecha,1,10) and a.id < b.id
where b.id is null
You can manipulate your conditions of matching to select whatever property of the rows you want to prioritize.

Related

How to get summary data for every months in mysql

I want to count the number of items sold(item_count) every month for every item,
--
-- Table structure for table `sales`
--
CREATE TABLE `sales` (
`id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
`date` date NOT NULL,
`item_count` int(11) NOT NULL,
`amount` float NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--
-- Dumping data for table `sales`
--
INSERT INTO `sales` (`id`, `item_id`, `date`, `item_count`, `amount`) VALUES
(1, 1, '2018-01-15', 11, 110),
(2, 2, '2018-01-21', 5, 1000),
(3, 1, '2018-02-02', 7, 700),
(4, 2, '2018-02-11', 3, 3000);
I have tried this SQL, but it's not showing the data correctly.
SELECT `sales`.`item_id`,
(CASE WHEN MONTH(sales.date)=1 THEN sum(sales.item_count) ELSE NULL END) as JAN,
(case when MONTH(sales.date)=2 THEN sum(sales.item_count) ELSE NULL END) as FEB
FROM sales WHERE 1
GROUP BY sales.item_id
ORDER BY sales.item_id
This is my expected result,
item_id JAN FEB
1 11 7
2 5 3
I am getting this,
item_id JAN FEB
1 18 NULL
2 8 NULL
Here is an immediate fix to your query. You need to sum over a CASE expression, rather than the other way around.
SELECT
s.item_id,
SUM(CASE WHEN MONTH(s.date) = 1 THEN s.item_count END) AS JAN,
SUM(CASE WHEN MONTH(s.date) = 2 THEN s.item_count END) AS FEB
FROM sales s
GROUP BY
s.item_id
ORDER BY
s.item_id;
But the potential problem with this query is that in order to support more months, you need to add more columns. Also, if you want to cover mulitple years, then this approach also might not scale. Assuming you only have a few items, here is another way to do this:
SELECT
DATE_FORMAT(date, '%Y-%m') AS ym,
SUM(CASE WHEN item_id = 1 THEN item_count END) AS item1_total,
SUM(CASE WHEN item_id = 2 THEN item_count END) AS item2_total
FROM sales
GROUP BY
DATE_FORMAT(date, '%Y-%m');
This would generate output looking something like:
ym item1_total item2_total
2018-01 11 5
2018-02 7 3
Which version you use depends on how many months your report requires versus how many items might appear in your data.

SQL Query Issue - Picking the minimum time when there is a maximum number

SQL God...I need some help!
I have a data table that has a route_complete_percentage column and a created_at column.
I need two pieces of data:
the time stamp (within created_at column) when the route_complete_percentage is at its minimum but not zero
the time stamp (within created_at column) when the route_complete_percentage is at its maximum, it might be 100% or not, but when its at its highest.
Here is the kicker, there might be multiple time stamps for the highest route completion column. For example,
Example Table
I have multiple values when the route_completion_percentage is at its maximum, but I need the minimum time stamp value.
Here is the query so far...but the two time stamps are the same.
SELECT
A.fc,
A.plan_id,
A.route_id,
mintime.first_scan AS First_Batch_Scan,
min(route_complete_percentage),
maxtime.last_scan AS Last_Batch_Scan,
max(route_complete_percentage)
FROM
(SELECT
fc,
plan_id,
route_id,
route_complete_percentage,
CONCAT(plan_id, '-', route_id) AS JOINKEY
FROM
houdini_ops.BATCHINATOR_SCAN_LOGS_V2
WHERE
fc <> ''
AND order_id <> 'Can\'t find order'
AND source = 'scan'
AND created_at > DATE_ADD(CURDATE(), INTERVAL - 3 DAY)) A
LEFT JOIN
(SELECT
l.fc,
l.route_id,
l.plan_id,
CONCAT(l.plan_id, '-', l.route_id) AS JOINKEY,
CASE
WHEN MIN(route_complete_percentage) THEN CONVERT_TZ(l.created_at, 'UTC', s.time_zone)
END AS first_scan
FROM
houdini_ops.BATCHINATOR_SCAN_LOGS_V2 l
JOIN houdini_ops.O_SERVICE_AREA_ATTRIBUTES s ON l.fc = s.default_station_code
WHERE
l.fc <> ''
AND l.order_id <> 'Can\'t find order'
AND l.source = 'scan'
AND l.created_at > DATE_ADD(CURDATE(), INTERVAL - 3 DAY)
GROUP BY fc , plan_id , route_id) mintime ON A.JOINKEY = mintime.JOINKEY
LEFT JOIN
(SELECT
l.fc,
l.route_id,
l.plan_id,
CONCAT(l.plan_id, '-', l.route_id) AS JOINKEY,
CASE
WHEN MAX(route_complete_percentage) THEN CONVERT_TZ(l.created_at, 'UTC', s.time_zone)
END AS last_scan
FROM
houdini_ops.BATCHINATOR_SCAN_LOGS_V2 l
JOIN houdini_ops.O_SERVICE_AREA_ATTRIBUTES s ON l.fc = s.default_station_code
WHERE
l.fc <> ''
AND l.order_id <> 'Can\'t find order'
AND l.source = 'scan'
AND l.created_at > DATE_ADD(CURDATE(), INTERVAL - 3 DAY)
GROUP BY fc , plan_id , route_id) maxtime ON mintime.JOINKEY = maxtime.JOINKEY
GROUP BY fc , plan_id , route_id
I don't want to meddle with the rest of your query. Here is something that will do what it sounds like you need. There's sample data included. -- I interpreted your blank values as nulls from your sample data.
Basically, what you are looking for is the Minimum created_at value, inside each of the route_complete_percentage groups. So I treated route_complete_percentage as a group identifier. But you only care about two of the groups, so I identify those groups first in the cte, and use them to filter the aggregate query.
if object_id('tempdb.dbo.#Data') is not null drop table #Data
go
create table #Data (
route_complete_percentage int,
created_at datetime
)
insert into #Data (route_complete_percentage, created_at)
values
(0, '20170531 19:58'),
(1, null),
(2, null),
(3, null),
(4, null),
(5, null),
(6, null),
(7, null),
(80, null),
(90, null),
(100, '20170531 20:10'),
(100, '20170531 20:12'),
(100, '20170531 20:15')
;with cteMinMax(min_route_complete_percentage, max_route_complete_percentage) as (
select
min(route_complete_percentage),
max(route_complete_percentage)
from #Data D
-- This ensures the condition that you don't get the timestamp for 0
where D.route_complete_percentage > 0
)
select
route_complete_percentage,
min_created_at = min(created_at)
from #Data D
join cteMinMax MM on D.route_complete_percentage in (MM.min_route_complete_percentage, MM.max_route_complete_percentage)
group by route_complete_percentage

How can I fix this Query - Mysql [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 8 years ago.
I have a little problem with my query, here is my table:
CREATE TABLE IF NOT EXISTS `realizado` (
`cod` int(11) NOT NULL AUTO_INCREMENT,
`datedoc` date NOT NULL,
`bank` int(11) NOT NULL,
`bankValue` float NOT NULL,
PRIMARY KEY (`cod`));
INSERT INTO `realizado` (`cod`, `datedoc`, `bank`, `bankValue`) VALUES
(152, '2014-10-22', 22, 1000),
(153, '2014-10-22', 23, 2000),
(154, '2014-10-22', 24, 3000),
(200, '2014-10-23', 22, 950),
(201, '2014-10-25', 22, 100),
(202, '2014-10-25', 23, 2050),
(203, '2014-10-24', 22, 150),
(204, '2014-10-24', 24, 3800);
The problem is: I need to get the bankValue from a date and still group by bank, something like this:
SELECT bank, bankValue
FROM realizado
WHERE datedoc <= '2014/10/25'
GROUP BY bank
the closest I got is:
SELECT r.bank, (select bankValue from realizado r2 where max(r.cod) = r2.cod) as Value
FROM realizado as r
WHERE r.datedoc <= '2014/10/25'
GROUP BY r.bank
here's the SQL Fiddle if u like -> http://sqlfiddle.com/#!2/83e309/2
the result that I expect is ( 22 - 100 / 23 - 2050 / 24 - 3800 )
Here you go! (Thanks for setting up the sqlFiddle with DDL and bootstrap data :)
Working sqlFiddle: http://sqlfiddle.com/#!2/83e309/10
SELECT a.*
FROM realizado a
INNER JOIN
(
SELECT bank, MAX(datedoc) datedoc
FROM realizado
GROUP BY bank
) b ON a.bank = b.bank AND
a.datedoc = b.datedoc
You seem to want the latest value for the bank. If so, you can do:
select r.*
from realizado r
where not exists (select 1 from realizao r2 where r2.bank = r.bank and r2.datedoc > r.datedoc);

Aligning timestamps when not quite synchronized

I have 3 processes A, B and C as defined in the following series of tables:
http://sqlfiddle.com/#!2/48f54
CREATE TABLE processA
(date_time datetime, valueA int);
INSERT INTO processA
(date_time, valueA)
VALUES
('2013-1-8 22:10:00', 100),
('2013-1-8 22:15:00', 100),
('2013-1-8 22:30:00', 100),
('2013-1-8 22:35:00', 100),
('2013-1-8 22:40:00', 100),
('2013-1-8 22:45:00', 100),
('2013-1-8 22:50:00', 100),
('2013-1-8 23:05:00', 100),
('2013-1-8 23:10:00', 100),
('2013-1-8 23:20:00', 100),
('2013-1-8 23:25:00', 100),
('2013-1-8 23:35:00', 100),
('2013-1-8 23:40:00', 100),
('2013-1-9 00:05:00', 100),
('2013-1-9 00:10:00', 100);
CREATE TABLE processB
(date_time datetime, valueB decimal(4,2));
INSERT INTO processB
(date_time, valueB)
VALUES
('2013-1-08 21:46:00', 3),
('2013-1-08 22:11:00', 4),
('2013-1-08 22:31:00', 5),
('2013-1-08 22:36:00', 6),
('2013-1-08 22:41:00', 7),
('2013-1-08 23:06:00', 8),
('2013-1-08 23:20:00', 2),
('2013-1-08 23:46:00', 3),
('2013-1-09 00:34:00', 9);
CREATE TABLE processC
(date_time datetime, status varchar(4));
INSERT INTO processC
VALUES
('2013-1-08 18:00:00', 'yes'),
('2013-1-08 19:00:00', 'yes'),
('2013-1-08 20:00:00', 'yes'),
('2013-1-08 21:00:00', 'yes'),
('2013-1-08 22:00:00', 'yes'),
('2013-1-08 23:00:00', 'no'),
('2013-1-08 00:00:00', 'no'),
('2013-1-08 01:00:00', 'no');
As you can see the time at which readings occur for each of the processes is not the same.
ProcessA, IF it occurs, does so at 5 minute intervals
ProcessB, readings occur at unpredictable times but usually occur multiple times within the hour
ProcessC will always have an hourly value (yes or no).
Firstly, I want to convert processB so that there is a reading at ever 5 minute interval so the data aligns with processA, which can then enable me to do a simple join of both tables at the 5 minute interval mark. For the conversion, the data at every 5 minutes should be set to the nearest processB observation available within [-30,30) minute window. If values are equidistant then take the average. If none is available in the 30 minute window then set it to null.
Once I have that, I can do a simple join on %Y%m%d%H with ProcessC using something like the following to get a final table with all data aligned at the 5 minute interval mark:
date_format(date_time, '%Y%m%d%H') = date_format(date_time, '%Y%m%d%H')
If anyone has any pointers/guidance I would appreciate some direction. I appreciate it.
Sample output:
'2013-1-8 22:10:00', 100, 4, yes <--- closer to 22:11 than 21:46
'2013-1-8 22:15:00', 100, 4, yes <--- closer to 22:11 than 21:31
'2013-1-8 22:30:00', 100, 5, yes <--- closer to 22:31 than 22:11
'2013-1-8 22:35:00', 100, 6, yes <--- closer to 22:36 than 22:31
'2013-1-8 22:40:00', 100, 7, yes <--- closer to 22:41 than 22:36
'2013-1-8 22:45:00', 100, 7, yes <--- closer to 22:41 than 23:06
'2013-1-8 22:50:00', 100, 7, yes <--- closer to 22:41 than 23:06
'2013-1-8 23:05:00', 100, 8, yes <--- closer to 23:06 than 23:06
'2013-1-8 23:10:00', 100, 8, no <--- closer to 23:06 than 23:20
'2013-1-8 23:20:00', 100, 2, no <--- closer to 23:20 than 23:10
'2013-1-8 23:25:00', 100, 2, no <--- closer to 23:20 than 23:10
'2013-1-8 23:35:00', 100, 3, no <--- closer to 23:46 than 23:20
'2013-1-9 00:05:00', 100, 3, no <--- closer to 23:46 than 00:34
'2013-1-9 00:10:00', 100, 6, no <--- takes the avg of 3 and 9
The tricky part of this is the retrieval of the appropriate row or rows from processB that correspond to each row of processA as you figured out.
Let's take it step by step.
First, we need to be able to join processA and processB to retrieve the candidate timestamp pairs. Let's do it like this:
SELECT a.date_time a,
TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) timediff
FROM processA a
JOIN processB b
ON TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) >= -1800
AND TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) < 1800
This gets us the a and b times meeting the [-30, 30) criterion. There are a lot of rows in this result; but we can inspect it to make sure we've done the range comparison correctly. http://sqlfiddle.com/#!2/48f54/47/0
Now we need to generate the time window to search for each a record for your one or more matching b records. Like so.
SELECT a,
MIN(ABS(timediff)) windowsize
FROM (
SELECT a.date_time a,
TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) timediff
FROM processA a
JOIN processB b
ON TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) >= -1800
AND TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) < 1800
) d
GROUP BY a
This yields two columns: the first is the timestamp from a, and the second is the time range of the nearest b timestamp (or timestamps, if more than one are to be averaged) that are in range. This resultset doesn't have any row for a records that don't have b records near enough to consider. http://sqlfiddle.com/#!2/48f54/46/0
Finally, we need to retrieve and average the b record values for each a record. Here this is.
SELECT processA.date_time date_time,
processA.valueA valueA,
AVG(processB.valueB) valueB
FROM processA
LEFT JOIN (
SELECT a,
MIN(ABS(timediff)) windowsize
FROM (
SELECT a.date_time a,
TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) timediff
FROM processA a
JOIN processB b
ON TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) >= -1800
AND TIMESTAMPDIFF(SECOND, a.date_time, b.date_time) < 1800
) d
GROUP BY a
) j ON processA.date_time = j.a
LEFT JOIN processB ON ( processB.date_time >= j.a - INTERVAL j.windowsize SECOND
AND processB.date_time <= j.a + INTERVAL j.windowsize SECOND
AND processB.date_time < j.a + INTERVAL 1800 SECOND)
GROUP BY processA.date_time, processA.valueA
Notice there are a couple of open ranges here (< operators instead of <= operators). Those are there to accomodate your [-30, 30) open range. Here's the query. http://sqlfiddle.com/#!2/48f54/45/0
This final query joins together three tables: processA, our virtual table showing the search range for each timestamp, and process B. The last ON clause performs the actual range search. It's made slightly more complicated by the open range.
See how this goes? It's helpful to construct the query from the inside out.
Don't forget to put an index on processB.date_time.
I am taking the liberty of leaving the join of processC to this virtual table to you.

MYSQL query - getting totals by month

http://sqlfiddle.com/#!2/6a6b1
The scheme is given above.. all I want to do is get the results as the total of sales/month... the user will enter a start date and end date and I can generate (in PHP) all the month and years for those dates. For example, if I want to know the total number of "sales" for 12 months, I know I can run 12 individual queries with start and end dates, but I want to run only one query where the result will look like:
Month numofsale
January - 2
Feb-1
March - 23
Apr - 10
and so on...
or just a list of sales without the months, I can then pair it to the array of months generated in the PHP ...any ideas...
Edit/schema and data pasted from sqlfiddle.com:
CREATE TABLE IF NOT EXISTS `lead_activity2` (
`lead_activity_id` int(11) NOT NULL AUTO_INCREMENT,
`sp_id` int(11) NOT NULL,
`act_date` datetime NOT NULL,
`act_name` varchar(255) NOT NULL,
PRIMARY KEY (`lead_activity_id`),
KEY `act_date` (`act_date`),
KEY `act_name` (`act_name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
INSERT INTO `lead_activity2` (`lead_activity_id`, `sp_id`, `act_date`, `act_name`) VALUES
(1, 5, '2012-10-16 16:05:29', 'sale'),
(2, 5, '2012-10-16 16:05:29', 'search'),
(3, 5, '2012-10-16 16:05:29', 'sale'),
(4, 5, '2012-10-17 16:05:29', 'DNC'),
(5, 5, '2012-10-17 16:05:29', 'sale'),
(6, 5, '2012-09-16 16:05:30', 'SCB'),
(7, 5, '2012-09-16 16:05:30', 'sale'),
(8, 5, '2012-08-16 16:05:30', 'sale'),
(9, 5,'2012-08-16 16:05:30', 'sale'),
(10, 5, '2012-07-16 16:05:30', 'sale');
SELECT DATE_FORMAT(date, "%m-%Y") AS Month, SUM(numofsale)
FROM <table_name>
WHERE <where-cond>
GROUP BY DATE_FORMAT(date, "%m-%Y")
Check following in your fiddle demo it works for me (remove where clause for testing)
SELECT DATE_FORMAT(act_date, "%m-%Y") AS Month, COUNT(*)
FROM lead_activity2
WHERE <where-cond-here> AND act_name='sale'
GROUP BY DATE_FORMAT(act_date, "%m-%Y")
It returns following result
MONTH COUNT(*)
07-2012 1
08-2012 2
09-2012 1
10-2012 3
You can try query as given below
select SUM(`SP_ID`) AS `Total` , DATE_FORMAT(act_date, "%M") AS Month, Month(`ACT_DATE`) AS `Month_number` from `lead_activity2` WHERE `ACT_DATE` BETWEEN '2012-05-01' AND '2012-12-17' group by Month(`ACT_DATE`)
Here 2012-05-01 and 2012-12-17 are date input from form. and It will be return you the sum of sales for particular month if exist in database.
thanks
Try this query -
SELECT
MONTH(act_date) month, COUNT(*)
FROM
lead_activity2
WHERE
YEAR(act_date) = 2012 AND act_name = 'sale'
GROUP BY
month
Check WHERE condition if it is OK for you - act_name = 'sale'.
If you want to output month names, then use MONTHNAME() function instead of MONTH().
SELECT YEAR(act_date), MONTH(act_date), COUNT(*)
FROM lead_activity2
GROUP BY YEAR(act_date), MONTH(act_date)
For getting data by month or any other data based on column you have to add GROUP BY.
You can add many columns or calculated values to GROUP BY.
I assume that "num of sales" means count of rows.
Sometimes you might want the month names as Jan, Feb, Mar .... Dec possibly for a Chart likeFusionChart
SELECT DATE_FORMAT(date, "%M") AS Month, SUM(numofsale)
FROM <Table_name>
GROUP BY DATE_FORMAT(date, "%M")
Results would look like this on table
MONTH COUNT(*)
Jul 1
Aug 2
SEP 1
OCT 3