Retrieve rows that have a first entry in 2014 in MySQL - mysql

I want to retrieve all rows from a table that have their first entry on or after 01/01/2014 but no later than 31/12/2014
Example of the table:
OID FK_OID Treatment Trt_DATE
1 100 19304 2011-05-24
2 100 19304 2011-08-01
3 100 19306 2014-03-05
4 200 19305 2012-02-02
5 300 19308 2014-01-20
6 400 19308 2014-06-06
For example. I would like to pull all entries that have STARTED treatment in 2014. So above i would to extract FK_OID's 300 and 400 because their first entry is in 2014, but i would like to omit FK_OID 100 because they have 2 entries prior to 2014.
How do i go about this? I can extract all entries within a date range etc but that brings back all entries for that date and doesn't omit anyone who has an entry prior to the start of the date range. It just returns their first entry in 2014.
For the ones who need to see that i have tried something. See below.
I am not an experienced coder and this is the best i can get because i don't have the knowledge.
SELECT
mod,
(select NHSNum from person p
WHERE
p.oid = t.fk_oid) as 'NHS'
FROM
timeline t
Where trt_date BETWEEN '2014-01-01' AND '2014-12-31'
ORDER BY trt_date ASC
This returns every treatment for 2014 regardless of whether it is the first ever one for that person. I want to omit anyone from this list who has had treatment before 01/01/2014 as well as only return the first treatment per person. For example, this code returns all treatments for all people in 2014. I only want their first one and only if it is their first one ever.
Thanks.

create table aThing
( oid int auto_increment primary key,
fk_oid int not null,
treatment int not null,
trt_date date not null
);
insert aThing (fk_oid,treatment,trt_date) values
(100, 19304, '2011-05-24'),
(100, 19304, '2011-08-01'),
(100, 19306, '2014-03-05'),
(200, 19305, '2012-02-02'),
(300, 19308, '2014-01-20'),
(400, 19308, '2014-06-06');
select fk_oid,dt
from
( select fk_oid,min(trt_date) as dt
from aThing
group by fk_oid
) xDerived
where year(dt)=2014;
+--------+------------+
| fk_oid | dt |
+--------+------------+
| 300 | 2014-01-20 |
| 400 | 2014-06-06 |
+--------+------------+
The inner part, the nested one, become a derived table, and is given a name xDerived. This means that even though it is just a result set, by making it a derived table, it can be referred to by name. So it is not a physical table, but a derived one, or virtual one.
So that derived table is a very simple group by with an aggregate function. It says, for every fk_oid, bring back one row and only 1 row, with its minimum value for trt_date.
So if you have 10 million rows in that table called aThing, but only 17 distinct values for fk_oid, it will return only 17 rows. Each row being the minimum of trt_date for its fk_oid.
So now that that is achieved, the outer wrapper says just show me those two columns (but with a year check). There is a complicated to explain reason why I had to do that, so I will try to do it here.
But I might need a little time to explain it well, so bear with me.
This will be a shortcut way to say it. I had to get the min into an alias, and I only had access to that alias if resolved in a derived table, to cleanse it so to speak, and then access it with an outer wrapper.
An alias of aggregate column, like as dt, is not available (as a pseudo like column name which is what an alias is) ... it is not available in a where clause. But by wrapping it in a derived table name, I cleanse it so to speak, and then I can access it in a where clause.
So I can't access it directly in its own query in the where clause, but when I wrap it in an envelope (a derived table), I can access it on the outside.
I will try better to explain it later, maybe, but I would have to show alternative attempts to gain access to results, and the syntax errors that would result.

There's probably a more elegant solution, but this seems to satisfy the requirement...
SELECT x.*
FROM my_table x
JOIN
( SELECT fk_oid
, MIN(trt_date) min_date
FROM my_table
GROUP
BY fk_oid
HAVING min_date > '2014-01-01'
) a
ON a.fk_oid = x.fk_oid
LEFT
JOIN my_table b
ON b.fk_oid = a.fk_oid
AND b.trt_date > '2014-12-31'
WHERE b.oid IS NULL;

Having a few years a experience with this, i decided to revisit it. The solution i now use regularly is:
SELECT t1.column1, t1.column2
FROM MyTable AS t1
LEFT OUTER JOIN MyTable AS t2
ON t1.fkoid = t2.fkoid
AND (t1.date > t2.date
OR (t1.date = t2.date AND t1.oid > t2.oId))
WHERE t2.fkoid IS NULL and t1.date >= '2014-01-01'

Related

Looking for a low footprint solution to GROUP rows using HAVING to filter

Here is a table
id date name
1 180101 josh
2 180101 peter
3 180101 julia
4 180102 robert
5 180103 patrick
6 180104 josh
7 180104 adam
I need to get all the names whom having the same days as 'josh'. how can i achieve it without groupping the whole table together. i need to keep it efficient (this is not my real table, i just simplified my problem here, and i have hundred thousands of records, and 99% of the rows have different dates, so groupable rows by date is kind of rare).
So basicaly what i want is: if 'josh' is the target, i need to get 'josh,peter,julia,adam' (actually the first 10 distinct names sharing the same date with josh).
SELECT
COUNT(date) as datecount,
GROUP_CONCAT(DISTINCT name) as names,
FROM
table
GROUP BY
date
HAVING
datecount>1
// && name IN ('josh') would work nice for me, but im getting error because 'name' is not in GROUPED BY
LIMIT 10
Any idea ? As i mentioned it needs to be fast, and most of the rows have unique dates
Join the table with itself on date:
select distinct t1.name
from tbl t1
join tbl t2 using (date)
where t2.name = 'josh'
Demo
For the best performance you would have indexes on (name) and (date, name).

Mixing HAVING with CASE OR Analytic functions in MySQL (PartitionQualify(?

I have a SELECT query that returns some fields like this:
Date | Campaign_Name | Type | Count_People
Oct | Cats | 1 | 500
Oct | Cats | 2 | 50
Oct | Dogs | 1 | 80
Oct | Dogs | 2 | 50
The query uses aggregation and I only want to include results where when Type = 1 then ensure that the corresponding Count_People is greater than 99.
Using the example table, I'd like to have two rows returned: Cats. Where Dogs is type 1 it's excluded because it's below 100, in this case where Dogs = 2 should be excluded also.
Put another way, if type = 1 is less than 100 then remove all records of the corresponding campaign name.
I started out trying this:
HAVING CASE WHEN type = 1 THEN COUNT(DISTINCT Count_People) > 99 END
I used Teradata earlier int he year and remember working on a query that used an analytic function "Qualify PartitionBy". I suspect something along those lines is what I need? I need to base the exclusion on aggregation before the query is run?
How would I do this in MySQL? Am I making sense?
Now that I understand the question, I think your best bet will be a subquery to determine which date/campaign combinations of a type=1 have a count_people greater than 99.
SELECT
<table>.date,
<table>.campaign_name,
<table>.type,
count(distinct count_people) as count_people
FROM
(
SELECT
date,
campaign_name
FROM
<table>
WHERE type=1
HAVING count(distinct count_people) > 99
GROUP BY 1,2
) type1
LEFT OUTER JOIN <table> ON
type1.campaign_name = <table>.campaign_name AND
type1.date = <table>.date
WHERE <table>.type IN (1,2)
GROUP BY 1,2,3
The subquery here only returns campaign/date combinations when both the type=1 AND it has greater than 99 count_people. It uses a LEFT JOIN back to the to insure that only those campaign/date combinations make it into the result set.
The WHERE on the main query keeps the results to only Types 1 and 2, which you stated was already a filter in place (though not mentioned in the question, it was stated in a comment to a previous answer).
Based on your comments to answer by #JNevill I think you will have no option but to use subselects to pre-filter the record set you are dealing with, as working with HAVING is going to limit you only to the current record being evaluated - there is no way to compare against previous or subsequent records in the set in this manner.
So have a look at something like this:
SELECT
full_data.date AS date,
full_data.campaign_name AS campaign_name,
full_data.type AS type,
COUNT(full_data.people) AS people_count
FROM
(
SELECT
date,
campaign_name,
type,
COUNT(people) AS people_count
FROM table
WHERE type IN (1,2)
GROUP BY date, campaign_name, type
) AS full_data
LEFT JOIN
(
SELECT
date,
campaign_name,
COUNT(people) AS people_count
FROM table
WHERE type = 1
GROUP BY date, campaign_name
HAVING people_count < 100
) AS filter
ON
full_data.date = filter.date
AND full_data.campaign_name = filter.campaign_name
WHERE
filter.date IS NULL
AND filter.campaign_name IS NULL
The first subselect is basically your current query without any attempt at using HAVING to filter out results. The second subselect is used to find all date/campaign name combos which have people_count > 100 and use those as a filter for against the full data set.

Subquery with max value in a big table SQL

I'm trying to make a query to get the date of last work experience of a person and also the date they left the company (in some cases that value is null because the person is still working on the company).
I have something like:
SELECT r.idcurriculum, r.startdate, r.lastdate FROM (
SELECT idcurriculum, max(startdate) as startdate
FROM workexperience
GROUP BY idcurriculum) as s
INNER JOIN workexperience r on (r.idcurriculum = s.idcurriculum)
The structure should come out something like this:
idcurriculum | startdate | lastdate
1234 | 2010-05-01| null
2532 | 2005-10-01| 2010-02-28
5234 | 2011-07-01| 2013-10-31
1025 | 2012-04-01| 2014-03-31
I tried running that query but I had to stop it because it was taking too long. The workexperience table weights aprox 20GB. I don't know if the query is wrong, I've only run it for 10 minutes.
Help will be much appreciated.
You might try rephrasing the query as:
select r.*
from workexperience we
where not exists (select 1
from workexperience we2
where we2.idcurriculum = we.idcurriculum and
we2.startdate > we.startdate
);
Important: for performance reasons you need a composite index on idcurriculum, startdate:
create index idx_workexperience_idcurriculum_startdate on workexperience(idcurriculum, strtdate)
The logic of the query is: "Get me all rows from workexperience where there is no row for the same idcurriculum that has a larger startdate". That is a fancy way of saying "get me the maximum".
With the group by, MySQL has to do an aggregation, which would typically involve sorting the data -- expensive on 20 Gbytes. With this method, it can look up the results using the index, which should be faster.
As an alternative to Gordon's answer you could also write the query as:
SELECT r.*
FROM work_experience we
LEFT JOIN work_experience we2
ON we2.idcurriculum = we.idcurriculum
AND we2.startdate > we.startdate
WHERE we2.idcurriculum IS NULL;
You can run into problems when there are multiple maximum start_dates in the group however.

MySQL - Combining two select statements into one result with LIMIT efficiently

For a dating application, I have a few tables that I need to query for a single output with a LIMIT 10 of both queries combined. It seems difficult to do at the moment, even though it's not an issue to query them separately, but the LIMIT 10 won't work as the numbers are not exact (ex. not LIMIT 5 and LIMIT 5, one query may return 0 rows, while the other 10, depending on the scenario).
members table
member_id | member_name
------------------------
1 Herb
2 Karen
3 Megan
dating_requests
request_id | member1 | member2 | request_time
----------------------------------------------------
1 1 2 2012-12-21 12:51:45
dating_alerts
alert_id | alerter_id | alertee_id | type | alert_time
-------------------------------------------------------
5 3 2 platonic 2012-12-21 10:25:32
dating_alerts_status
status_id | alert_id | alertee_id | viewed | viewed_time
-----------------------------------------------------------
4 5 2 0 0000-00-00 00:00:00
Imagine you are Karen and just logged in, you should see these 2 items:
1. Herb requested a date with you.
2. Megan wants a platonic relationship with you.
In one query with a LIMIT of 10. Instead here are two queries that need to be combined:
1. Herb requested a date with you.
-> query = "SELECT dr.request_id, dr.member1, dr.member2, m.member_name
FROM dating_requests dr
JOIN members m ON dr.member1=m.member_id
WHERE dr.member2=:loggedin_id
ORDER BY dr.request_time LIMIT 5";
2. Megan wants a platonic relationship with you.
-> query = "SELECT da.alert_id, da.alerter_id, da.alertee_id, da.type,
da.alert_time, m.member_name
FROM dating_alerts da
JOIN dating_alerts_status das ON da.alert_id=das.alert_id
AND da.alertee_id=das.alertee_id
JOIN members m ON da.alerter_id=m.member_id
WHERE da.alertee_id=:loggedin_id AND da.type='platonic'
AND das.viewed='0' AND das.viewed_time<da.alert_time
ORDER BY da.alert_time LIMIT 5";
Again, sometimes both tables may be empty, or 1 table may be empty, or both full (where LIMIT 10 kicks in) and ordered by time. Any ideas on how to get a query to perform this task efficiently? Thoughts, advice, chimes, optimizations are welcome.
You can combine multiple queries with UNION, but only if the queries have the same number of columns. Ideally the columns are the same, not only in data type, but also in their semantic meaning; however, MySQL doesn't care about the semantics and will handle differing datatypes by casting up to something more generic - so if necessary you could overload the columns to have different meanings from each table, then determine what meaning is appropriate in your higher level code (although I don't recommend doing it this way).
When the number of columns differs, or when you want to achieve a better/less overloaded alignment of data from two queries, you can insert dummy literal columns into your SELECT statements. For example:
SELECT t.cola, t.colb, NULL, t.colc, NULL FROM t;
You could even have some columns reserved for the first table and others for the second table, such that they are NULL elsewhere (but remember that the column names come from the first query, so you may wish to ensure they're all named there):
SELECT a, b, c, d, NULL AS e, NULL AS f, NULL AS g FROM t1
UNION ALL -- specify ALL because default is DISTINCT, which is wasted here
SELECT NULL, NULL, NULL, NULL, a, b, c FROM t2;
You could try aligning your two queries in this fashion, then combining them with a UNION operator; by applying LIMIT to the UNION, you're close to achieving your goal:
(SELECT ...)
UNION
(SELECT ...)
LIMIT 10;
The only issue that remains is that, as presented above, 10 or more records from the first table will "push out" any records from the second. However, we can utilise an ORDER BY in the outer query to solve this.
Putting it all together:
(
SELECT
dr.request_time AS event_time, m.member_name, -- shared columns
dr.request_id, dr.member1, dr.member2, -- request-only columns
NULL AS alert_id, NULL AS alerter_id, -- alert-only columns
NULL AS alertee_id, NULL AS type
FROM dating_requests dr JOIN members m ON dr.member1=m.member_id
WHERE dr.member2=:loggedin_id
ORDER BY event_time LIMIT 10 -- save ourselves performing excessive UNION
) UNION ALL (
SELECT
da.alert_time AS event_time, m.member_name, -- shared columns
NULL, NULL, NULL, -- request-only columns
da.alert_id, da.alerter_id, da.alertee_id, da.type -- alert-only columns
FROM
dating_alerts da
JOIN dating_alerts_status das USING (alert_id, alertee_id)
JOIN members m ON da.alerter_id=m.member_id
WHERE
da.alertee_id=:loggedin_id
AND da.type='platonic'
AND das.viewed='0'
AND das.viewed_time<da.alert_time
ORDER BY event_time LIMIT 10 -- save ourselves performing excessive UNION
)
ORDER BY event_time
LIMIT 10;
Of course, now it's up to you to determine what type of row you're dealing with as you read each record in the resultset (suggest you test request_id and/or alert_id for NULL values; alternatively one could add an additional column to the results that explicitly states from which table each record originated, but it should be equivalent provided those id columns are NOT NULL).

Mysql subquery with joins

I have a table 'service' which contains details about serviced vehicles. It has an id and Vehicle_registrationNumber which is a foreign key. Whenever vehicle is serviced, a new record is made. So, for example if I make a service for car with registration ABCD, it will create new row, and I will set car_reg, date and car's mileage in the service table (id is set to autoincreament) (e.g 12 | 20/01/2012 | ABCD | 1452, another service for the same car will create row 15 | 26/01/2012 | ABCD | 4782).
Now I want to check if the car needs a service (the last service was either 6 or more months ago, or the current mileage of the car is more than 1000 miles since last service), to do that I need to know the date of last service and the mileage of the car at the last service. So I want to create a subquery, that will return one row for each car, and the row that I'm interested in is the newest one (either with the greatest id or latest endDate). I also need to join it with other tables because I need this for my view (I use CodeIgniter but don't know if it's possible to write subqueries using CI's ActiveRecord class)
SELECT * FROM (
SELECT *
FROM (`service`)
JOIN `vehicle` ON `service`.`Vehicle_registrationNumber` = `vehicle`.`registrationNumber`
JOIN `branch_has_vehicle` ON `branch_has_vehicle`.`Vehicle_registrationNumber` = `vehicle`.`registrationNumber`
JOIN `branch` ON `branch`.`branchId` = `branch_has_vehicle`.`Branch_branchId`
GROUP BY `service`.`Vehicle_registrationNumber` )
AS temp
WHERE `vehicle`.`available` != 'false'
AND `service`.`endDate` <= '2011-07-20 20:43'
OR service.serviceMileage < vehicle.mileage - 10000
SELECT `service`.`Vehicle_registrationNumber`, Max(`service`.`endDate`) as lastService,
MAX(service.serviceMileage) as lastServiceMileage, vehicle.*
FROM `service`
INNER JOIN `vehicle`
ON `service`.`Vehicle_registrationNumber` = `vehicle`.`registrationNumber`
INNER JOIN `branch_has_vehicle`
ON `branch_has_vehicle`.`Vehicle_registrationNumber` = `vehicle`.`registrationNumber`
INNER JOIN `branch`
ON `branch`.`branchId` = `branch_has_vehicle`.`Branch_branchId`
WHERE vehicle.available != 'false'
GROUP BY `service`.`Vehicle_registrationNumber`
HAVING lastService<=DATE_SUB(CURDATE(),INTERVAL 6 MONTH)
OR lastServiceMileage < vehicle.mileage - 10000
;
I hope I have no typo in it ..
If instead of using * in the subquery you specify the fields you need (which is always good practice anyway), most databases have a MAX() function that returns the maximum value within the group.
Actually, you don't even need the subquery. You can do the joins and use the MAX in the SELECT statement. Then you can do something like
SELECT ...., MAX('service'.'end_date') AS LAST_SERVICE
...
GROUP BY 'service'.'Vehicle_registrationNumber'
Or am I missing something?