Count enumerated values? - mysql

If my table looks like this:
CREATE TABLE `daily_individual_tracking` (
`daily_individual_tracking_id` int(10) unsigned NOT NULL auto_increment,
`daily_individual_tracking_date` date NOT NULL default ''0000-00-00'',
`sales` enum(''no'',''yes'') NOT NULL COMMENT ''no'',
`repairs` enum(''no'',''yes'') NOT NULL COMMENT ''no'',
`shipping` enum(''no'',''yes'') NOT NULL COMMENT ''no'',
PRIMARY KEY (`daily_individual_tracking_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
basically the fields can be either yes or no.
How can I count how many yes's their are for each column over a date range?
Thanks!!

You can either run three queries like this:
SELECT COUNT(*)
FROM daily_individual_tracking
WHERE sales = 'YES'
AND daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'
Or if you want you can get all three at once like this:
SELECT (
SELECT COUNT(*)
FROM daily_individual_tracking
WHERE sales = 'YES'
AND daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'
) AS sales_count, (
SELECT COUNT(*)
FROM daily_individual_tracking
WHERE repairs = 'YES'
AND daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'
) AS repairs_count, (
SELECT COUNT(*)
FROM daily_individual_tracking
WHERE shipping = 'YES'
AND daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'
) AS shipping_count
Another way to do it is to use SUM instead of COUNT. You could try this too to see how it affects the performance:
SELECT
SUM(sales = 'YES') AS sales_count,
SUM(repairs = 'YES') AS repairs_count,
SUM(shipping = 'YES') AS shipping_count
FROM daily_individual_tracking
WHERE daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'

Related

SQL query to select all rows with max column value

CREATE TABLE `user_activity` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`type` enum('request','response') DEFAULT NULL,
`data` longtext NOT NULL,
`created_at` datetime DEFAULT NULL,
`source` varchar(255) DEFAULT NULL,
`task_name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
);
I have this data:-
Now I need to select all rows for user_id=527 where created_at value is the maximum. So I need the last 3 rows in this image.
I wrote this query:-
SELECT *
FROM user_activity
WHERE user_id = 527
AND source = 'E1'
AND task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
AND created_at = (SELECT Max(created_at)
FROM user_activity
WHERE user_id = 527
AND source = 'E1'
AND task_name IN ( 'GetReportTask',
'StopMonitoringUserTask' ));
This is very inefficient because I am running the exact same query again as an inner query except that it disregards created_at. What's the right way to do this?
I would use a correlated subquery:
SELECT ua.*
FROM user_activity ua
WHERE ua.user_id = 527 AND source = 'E1' AND
ua.task_name IN ('GetReportTask', 'StopMonitoringUserTask' ) AND
ua.created_at = (SELECT MAX(ua2.created_at)
FROM user_activity ua2
WHERE ua2.user_id = ua.user_id AND
ua2.source = ua.source AND
ua2.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
);
Although this might seem inefficient, you can create an index on user_activity(user_id, source, task_name, created_at). With this index, the query should have decent performance.
Order by created_at desc and limit your query to return 1 row.
SELECT *
FROM user_activity
WHERE user_id = 527
AND source = 'E1'
AND task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
ORDER BY created_at DESC
LIMIT 1;
I used EverSQL and applied my own changes to come up with this single-select query that uses self-join:-
SELECT *
FROM user_activity AS ua1
LEFT JOIN user_activity AS ua2
ON ua2.user_id = ua1.user_id
AND ua2.source = ua1.source
AND ua2.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
AND ua1.created_at < ua2.created_at
WHERE ua1.user_id = 527
AND ua1.source = 'E1'
AND ua1.task_name IN ( 'GetReportTask', 'StopMonitoringUserTask' )
AND ua2.created_at IS NULL;
However, I noticed that the response times of both queries were similar. I tried to use Explain to identify any performance differences; and from what I understood from its output, there are no noticeable differences because proper indexing is in place. So for readability and maintainability, I'll just use the nested query.

Get rid of the subqueries for the sake of sorting grouped data

Tables
CREATE TABLE `aircrafts_in` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city_from` int(11) NOT NULL COMMENT 'Откуда',
`city_to` int(11) NOT NULL COMMENT 'Куда',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=utf8 COMMENT='Самолёты по направлениям'
CREATE TABLE `aircrafts_in_parsed_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`price` int(11) NOT NULL COMMENT 'Ценник',
`airline` varchar(255) NOT NULL COMMENT 'Авиакомпания',
`date` date NOT NULL COMMENT 'Дата вылета',
`info_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `info_id` (`info_id`),
KEY `price` (`price`),
KEY `date` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=940682 DEFAULT CHARSET=utf8
date - departure date
CREATE TABLE `aircrafts_in_parsed_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` enum('success','error') DEFAULT NULL,
`type` enum('roundtrip','oneway') NOT NULL,
`date` datetime NOT NULL COMMENT 'Дата парсинга',
`aircrafts_in_id` int(11) DEFAULT NULL COMMENT 'ID направления',
PRIMARY KEY (`id`),
KEY `aircrafts_in_id` (`aircrafts_in_id`)
) ENGINE=InnoDB AUTO_INCREMENT=577759 DEFAULT CHARSET=utf8
date - created date, when was parsed
Task
Get lowest price of ticket and date of departure for each month. Be aware that the minimum price is relevant, not just the minimum. If multiple dates with minimum cost, we need a first.
My solution
I think that there's something not quite right.
I don't like subqueries for grouping, how to solve this problem
select *
from (
select * from (
select airline,
price,
pdata.`date` as `date`
from aircrafts_in_parsed_data `pdata`
inner join aircrafts_in_parsed_info `pinfo`
on pdata.`info_id` = pinfo.`id`
where pinfo.`aircrafts_in_id` = {$id}
and pinfo.status = 'success'
and pinfo.`type` = 'roundtrip'
and `price` <> 0
group by pdata.`date`, year(pinfo.`date`) desc, month(pinfo.`date`) desc, day(pinfo.`date`) desc
) base
group by `date`
order by price, year(`date`) desc, month(`date`) desc, day(`date`) asc
) minpriceperdate
group by year(`date`) desc, month(`date`) desc
Takes 0.015 s without cache, table size can view in auto increment
SELECT MIN(price) AS min_price,
LEFT(date, 7) AS yyyy_mm
FROM aircrafts_in_parsed_data
GROUP BY LEFT(date, 7)
will get the lowest price for each month. But it can't say 'first'.
From my groupwise-max cheat-sheet, I derive this:
SELECT
yyyy_mm, date, price, airline -- The desired columns
FROM
( SELECT #prev := '' ) init
JOIN
( SELECT LEFT(date, 7) != #prev AS first,
#prev := LEFT(date, 7)
LEFT(date, 7) AS yyyy_mm, date, price, airline
FROM aircrafts_in_parsed_data
ORDER BY
LEFT(date, 7), -- The 'GROUP BY'
price ASC, -- ASC to do "MIN()"
date -- To get the 'first' if there are dup prices for a month
) x
WHERE first -- extract only the first of the lowest price for each month
ORDER BY yyyy_mm; -- Whatever you like
Sorry, but subqueries are necessary. (I avoided YEAR(), MONTH(), and DAY().)
You are right, your query is not correct.
Let's start with the innermost query: You group by pdata.date + pinfo.date, so you get one result row per date combination. As you don't specify which price or airline you are interested in for each date combination (such as MAX(airline) and MIN(price)), you get one airline arbitrarily chosen for a date combination and one price also arbitrarily chosen. These don't even have to belong to the same record in the table; the DBMS is free to chose one airline and one price matching the dates. Well, maybe the date combination of pdata.date and pinfo.date is already unique, but then you wouldn't have to group by at all. So however we look at this, this isn't proper.
In the next query you group by pdata.date only, thus again getting arbitrary matches for airline and price. You could have done that in the innermost query already. It makes no sense to say: "give me a randomly picked price per pdata.date and pinfo.date and from these give me a randomly picked price per pdata.date", you could just as well say it directly: "give me a randomly picked price per pdata.date". Then you order your result rows. This is completely useless, as you are using the results as a subquery (derived table) again, and such is considered an unordered set. So the ORDER BY gives the DBMS more work to do, but is in no way guaranteed to influence the main queries results.
In your main query then you group by year and month, again resulting in arbitrarily picked values.
Here is the same query a tad shorter and cleaner:
select
pdata.airline, -- some arbitrily chosen airline matching year and month
pdata.price, -- some arbitrily chosen price matching year and month
pdata.date -- some arbitrily chosen date matching year and month
from aircrafts_in_parsed_data pdata
inner join aircrafts_in_parsed_info pinfo on pdata.info_id = pinfo.id
where pinfo.aircrafts_in_id = {$id}
and pinfo.status = 'success'
and pinfo.type = 'roundtrip'
and pdata.price <> 0
group by year(pdata.date), month(pdata.date)
order by year(pdata.date) desc, month(pdata.date) desc
As to the original task (as far as I understand it): Find the records with the lowest price per month. Per month means GROUP BY month. The lowest price is MIN(price).
select
min_price_record.departure_year,
min_price_record.departure_month,
min_price_record.min_price,
full_record.departure_date,
full_record.airline
from
(
select
year(`date`) as departure_year,
month(`date`) as departure_month,
min(price) as min_price
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
group by year(`date`), month(`date`)
) min_price_record
join
(
select
`date` as departure_date,
year(`date`) as departure_year,
month(`date`) as departure_month,
price,
airline
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
) full_record on full_record.departure_year = min_price_record.departure_year
and full_record.departure_month = min_price_record.departure_month
and full_record.price = min_price_record.min_price
order by
min_price_record.departure_year desc,
min_price_record.departure_month desc;

Sum only if all grouped rows are not null, else return null

I have a table like this:
item_id quantity
1 2
1 3
2 NULL
2 4
3 NULL
3 NULL
And now I'm doing a SELECT like this:
SELECT
sum(`quantity`) AS `total_quantity`,
FROM `items`
GROUP BY `item_id`
Now, it return repectively 5, 4, and NULL, but I want 5, NULL and NULL.
I want that if there is a NULL value in the grouped rows, the sum should be NULL, and not the sum of the lines whose columns are not null. How can I achieve that?
Thanks!
You can use only case statement to check if any row of a group contains null as quantity
SELECT item_id,
CASE WHEN SUM(quantity IS NULL) > 0
THEN NULL
ELSE SUM(quantity)
END quantity
FROM items
GROUP BY item_id
using #Abhik Chakraborty's fiddle
DEMO
Thats kind if weird output, in most cases the request is to replace null with 0 or something else, however here is a way to do it
select
x.item_id,
max(x.quantity) as quantity from (
SELECT
t1.item_id,
#sm:= if(#prev_item = item_id, #sm_qty+quantity,quantity) as quantity,
#prev_item :=item_id,
#sm_qty:= quantity
from items t1,(select #prev_item:=null,#sm_qty=0)x
order by item_id
)x
group by x.item_id;
http://www.sqlfiddle.com/#!9/ccb36/13
SELECT * FROM (
( -- Get all not null quantities
SELECT
`i1`.`item_id`,
sum(`i1`.`quantity`) AS `total_quantity`
FROM `items` AS `i1`
WHERE `i1`.`item_id` NOT IN ( SELECT `i2`.`item_id` FROM `items` AS `i2` WHERE `i2`.`quantity` IS NULL )
GROUP BY `item_id`
)
UNION ALL
( -- Get all null quantities
SELECT
`i3`.`item_id`,
null AS `i3`.`total_quantity`
FROM `items` AS `i3`
WHERE `i3`.`item_id` IN ( SELECT `i4`.`item_id` FROM `items` AS `i4` WHERE `i4`.`quantity` IS NULL )
GROUP BY `i3.item_id`
)
) AS items_table
ORDER BY items_table.item_id

Load top 5 records per date

I have a table, in which there are date wise quiz score of different users. I want to load top 5 scorers for every date.
Table sample create statement:
CREATE TABLE `subscriber_score` (
`msisdn` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
`date` date NOT NULL,
`score` int(11) NOT NULL DEFAULT '0',
`total_questions_sent` int(11) NOT NULL DEFAULT '0',
`total_correct_answers` int(11) NOT NULL DEFAULT '0',
`total_wrong_answers` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`msisdn`,`date`),
KEY `fk_subscriber_score_subscriber1` (`msisdn`),
CONSTRAINT `fk_subscriber_score_subscriber1` FOREIGN KEY (`msisdn`) REFERENCES `subscriber` (`msisdn`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query which I have tried:
SELECT subscriber.msisdn AS msisdn,subscriber.name AS name,subscriber.gender AS gender,tmp2.score AS score,tmp2.date AS winning_date
FROM subscriber,
(SELECT msisdn,tmp.date,tmp.score
FROM subscriber_score,
(SELECT date,MAX(score) AS score
FROM subscriber_score
WHERE date > '2014-10-10' AND date < '2014-11-10' GROUP BY date)
tmp
WHERE subscriber_score.date=tmp.date AND subscriber_score.score=tmp.score)
tmp2
WHERE subscriber.msisdn=tmp2.msisdn ORDER BY winning_date
Actual output: Only one top scorer for every date is shown.
Wanted Output Top 5(or say 10) records for every date are required.
I think you can do this using variables to assign each row a row number, then filter the top 5 for each date.
SELECT s.name AS name,
s.gender AS gender,
s.msisdn,
ss.date,
ss.score
FROM ( SELECT ss.msisdn,
ss.score,
#r:= CASE WHEN ss.Date = #d THEN #r + 1 ELSE 1 END AS RowNum,
#d:= ss.date AS winning_date
FROM subscriber_score AS ss
CROSS JOIN (SELECT #d:= '', #r:= 0) AS v
WHERE ss.date > '2014-10-10'
AND ss.date < '2014-11-10'
ORDER BY ss.Date, ss.Score DESC
) AS ss
INNER JOIN Subscriber AS s
ON s.msisdn = ss.msisdn
WHERE ss.RowNum <= 5;
Example on SQL Fiddle
refer this query its not complete but hope it helps
SELECT SCORE
FROM table
WHERE date='somedate'
ORDER BY SCORE DESC LIMIT 5
select bc.msisdn msisdn,bc.name name,bc.gender gender,ab.score score,ab.date winning_date
(
select msisdn,date,score,
dense_rank() over (partition by date order by score desc) rnk
from subscriber_score
) ab,subscriber bc
where bc.msisdn=ab.msisdn and ab.rnk<=5
order by winning_date ;
This is how you can get solution of your problem in oracle sql.
try below
SELECT subscriber.msisdn AS msisdn,subscriber.name AS name,subscriber.gender AS gender,tmp2.score AS score,tmp2.date AS winning_date
FROM subscriber inner join
(select msisdn,date, score, ROW_NUMBER() OVER(PARTITION BY date ORDER BY score DESC) AS Row
FROM subscriber_score
WHERE date > '2014-10-10' AND date < '2014-11-10' GROUP BY date)
tmp
on subscriber.msisdn=tmp.msisdn and tmp.row<=5

where clause while joining select

I need to get payment sum for all period and for current month. There are two tables: category and transactions.
transactions:
id int(11)
category varchar(32)
dttm_added datetime
minus float
minus_currency varchar(32)
categories:
id int(11)
key varchar(32)
name varchar(50)
type varchar(1)
Here is my query:
select `key`, `id`, `name`, minus_month,month_cur
from `categories` as ct
left join (
select `category` as tr_ct_m, date_format(`dttm_trans`, '%Y%m') as dat, sum(`minus`) as minus_month, `minus_currency` as month_cur from `transactions` where dat = date_format(now(), '%Y%m')
) as tr_m on tr_m.tr_ct_m = ct.key
where `type` = '-'
I need to check clause in select before joining, because after sum of minus_month all dates are turning to Null.
Help plz
not sure if i understand you correctly,please try below:
select `ct`.`key`, `ct`.`id`, `ct`.`name`, tr_m.minus_month,tr_m.month_cur from
( select * from `categories` where `type` = '-') as ct
left join
( select `category` as tr_ct_m, date_format(`dttm_trans`, '%Y%m') as dat,
sum(`minus`) as minus_month, `minus_currency` as month_cur from `transactions`
where dat = date_format(now(), '%Y%m')
) as tr_m on tr_m.tr_ct_m = ct.key