When will recursive query stop in this case? - mysql

Given this table description.
I have written a query to find Users who logged in for 5 or more consecutive days.
WITH RECURSIVE
rec_t AS
(SELECT id, login_date, 1 AS days FROM Logins
UNION ALL
SELECT l.id, l.login_date, rec_t.days+1 FROM rec_t
INNER JOIN Logins l
ON rec_t.id = l.id AND DATE_ADD(rec_t.login_date, INTERVAL 1 DAY) = l.login_date
)
SELECT * FROM Accounts
WHERE id IN
(SELECT DISTINCT id FROM rec_t WHERE days = 5)
ORDER BY id
Code Explanation :
For every id and login date, match the CTE table with the same id and +1 login_date.
the "days" column just increments +1 everytime the same user_id appears.
The Problem:
Although the query works fine, I just don't know where am I asking the query to stop the recursion. There isn't a "where" in RECURSIVE CTE definition. However, the inner join might help to dictate that there are no more login_date to match on. But I am uncertain that is the case.

Related

How to use SQL to count events in the first week

I'm trying to write a SQL query, which says how many logins each user made in their first week.
Assume, for the purpose of this question, that I have a table with at least user_id and login_date. I'm trying to produce an output table with user_id and num_logins_first_week
Use aggregation to get the first date for each user. Then join in the logins and aggregate:
select t.user_id, count(*) as num_logins_first_week
from t join
(select user_id, min(login_date) as first_login_date
from t
group by user_id
) tt
on tt.user_id = t.user_id and
t.login_date >= tt.first_login_date and
t.login_date < tt.first_login_date + interval 7 day
group by t.user_id;

Mysql Query where max(time) less than today

I have two tables, the first table ( job ) stores the data and the second table ( job_locations ) stores the locations for each job, I'm trying to show the number of jobs that job locations are less than today
I use the DateTime for the Date Column
unfortunately, the numbers that appear after test the next code are wrong
My code
SELECT *
FROM `job`
left join job_location
on job_location.job_id = job.id
where job_location.cutoff_time < CURDATE()
group by job.id
Please help me to write the working Query.
I think you need to rephrase your query slightly. Select a count of jobs where the cutoff time is earlier than the start of today.
SELECT
j.id,
COUNT(CASE WHEN jl.cutoff_time < CURDATE() THEN 1 END) AS cnt
FROM job j
LEFT JOIN job_location jl;
ON j.id = jl.job_id
GROUP BY
j.id;
Note that the left join is important here because it means that we won't drop any jobs having no matching criteria. Instead, those jobs would still appear in the result set, just with a zero count.
As a note, you can simplify the count (in MySQL). And, assuming that all jobs have at least one location, you don't need a JOIN at all. So:
SELECT jl.job_id, sum( jl.cutoff_time < CURDATE() )
FROM job_location jl
GROUP BY jl.job_id;
If this is not correct (and you need the JOIN), then the condition on the date should go in the ON clause:
SELECT jl.job_id, COUNT(jo.job_id)
FROM job LEFT JOIN
job_location jl
ON jl.job_id = j.id AND jl.cutoff_time < CURDATE()
GROUP BY jl.job_id;

Can't address parent field in query with multiple subqueries

EDIT: Better explanation
I have a page with a job. The job as an idea and three skills (skill_ids) and skill requirements (a user must have at least this skill value to be qualified).
I click on the job to find candidates, so I have the job_id and the three skill_ids and skill_id_requirements. So I can do this so far as the first answer proposed with joins. I find all users who have the three skills. The skills are saved in skill_ratings. So far it works as I use to find the skill_id's only.
But now I want the value and here I have my code where I compute the final value (called rating). The rating respects all given values, but isn't a simple average or the sum of all. That's why I need the long horrible code. In the long horrible code I usually insert a user's ID. But here I need all user_id's who have the skills mentioned above just to calculate if they are qualified. This is dynamic.
I'm having a table where I want to find people who are qualified for a position under some requirements. Here I work with one table called skill_ratings, but (as far as I see) need to add some subqueries. And here I have the problem. There are many subqueries and I've tried to address a parent query field. But it only seems to work in a first-grade subquery to a parent query.
Here's my structure:
SELECT * FROM table t
WHERE EXISTS (SELECT * FROM table d WHERE x > 1
AND b=t.id
AND y <= (SELECT a FROM (MAIN SUBQUERY WITH CALCULATIONS)))
GROUP BY xyx
But the error I get is: #1054 - Unknown column 'skra.usr_id_get' in 'where clause'. skra is the parent table in this case.
I want to get the following (pseudo-sql):
SELECT all FROM table t AS x
WHERE EXISTS (
SELECT all FROM table t AS y
WHERE y.skill_id = 1
AND y.usr_id_get = t.usr_id_get
AND y.value <= (my algorithm)
)
The main subquery is important so far as I want to get a computed number. Elsewhere the code works because I were able to work with predefined PHP-variables for a user's ID. But I can't do this here as I need to find the users within the boundaries of the where-clauses.
How can I solve this? Because addressing a parent-field in a subquery seems to be limited to a first-grade subquery.
EDIT: Code
Code removed due to project status.
Error: #1054 - Unknown column 'c.usr_id_get' in 'where clause'
We want users that have certain skills of certain levels. For example all users that have skill 1 with at least level 20 and skill 2 with at least level 70.
Here is an algorithm:
First of all we must get the skill levels. A user has several skill ratings and the average rating per skill is the level.
Then we want a table of criteria (skill 1 / level 20, skill 2 / level 70 in our example).
We collect all user skill levels that match the criteria (EXISTS clause) and then
keep the users that match all skill levels (count(*) = <desired number of skills>).
The query:
select
sr.usr_id_get
from
(
select usr_id_get, skill_id, avg(value) as level
from skill_ratings
group by usr_id_get, skill_id
) sr
where exists
(
select *
from
(
select 1 as skill_id, 20 as level
union all
select 2 as skill_id, 70 as level
) criteria
where sr.skill_id = criteria.skill_id
and sr.level >= criteria.level
)
group by usr_id_get
having count(*) = 2;
You can also make criteria a real (temporary) table. Then your query stays the same, no matter how many skills are requested. You'd have
where exists
(
select *
from criteria
where sr.skill_id = criteria.skill_id
and sr.level >= criteria.level
)
group by usr_id_get
having count(*) = (select count(*) from criteria);
then.
This looks like it could be done with a simple JOIN:
SELECT T.*
FROM your_table T
JOIN other_table Y ON (
T.usr_id_get = Y.usr_id_get
AND T.skill_id = 1
AND Y.value <= [...]
)
If you need to perform some sort of calculations before the join, then you could join with a subquery:
SELECT T.*
FROM your_table T
JOIN (
SELECT *
FROM other_table Y
WHERE Y.skill_id = 1
AND Y.value = [...]
) Y USING(usr_id_get)
If I understand correctly, you have a user, say user 123, and a skill, say skill 99. Now you want to get the avarage rating for user 123 and skill 99 and then find all users with an equal or better average rating on that skill.
This is how to get the avarage ratings for skill 99 per user:
select usr_id_get, avg(value)
from skill_ratings
where skill_id = 99
group by usr_id_get;
This is how to get all users with an equal or better avarage rating for skill 99 than user 123:
select usr_id_get
from skill_ratings
where skill_id = 99
group by usr_id_get
having avg(value) >=
(select avg(value) from skill_ratings where skill_id = 99 and usr_id_get = 123);
Add to this whatever other criteria you need.

generate index columns for "ORDER BY x, y"

I use this query to summarize the contents of the table export_blocks, aggregated by user and date, and save it as a new table:
CREATE TABLE export_days
SELECT user_id DATE(submitted) AS date_str,
FROM export_blocks
GROUP BY user_id, DATE(submitted)
ORDER BY user_id, submitted
How can I, for each user_id get an incremental index for the date of records for that user? The indicies should start at 1 for each user, following the ORDER BY. I.e. I'd like to generate the date_index of the output below using SQL:
user_id date_str date_index
brian 2014-06-10 1
brian 2014-06-12 2
brian 2014-06-15 3
louis 2014-06-08 1
louis 2014-06-16 2
lucy 2013-11-15 1
(etc...)
I've been trying https://stackoverflow.com/a/5493480/1297830 but I cannot get it to work. It stops the counters prematurely, giving too low numbers for id_no and date_no.
Basing it on your sample query, you can do simple (dependent) subqueries to get the result;
SELECT id, date_str,
(SELECT COUNT(DISTINCT id)+1 FROM mytable WHERE id < a.id) id_no,
(SELECT COUNT(id)+1 FROM mytable WHERE id = a.id AND date_str < a.date_str) date_no
FROM mytable a
ORDER BY id;
...or you could do a couple of self joins;
SELECT a.id, a.date_str,
COUNT(DISTINCT b.id)+1 id_no,
COUNT(DISTINCT c.date_str)+1 date_no
FROM mytable a
LEFT JOIN mytable b ON a.id > b.id
LEFT JOIN mytable c ON a.id = c.id AND a.date_str > c.date_str
GROUP BY a.id, a.date_str
ORDER BY a.id, a.date_str;
An SQLfiddle showing both in action.
Sadly neither is really a very performant solution, but since MySQL lacks analytical (ie ranking) functions, the options are limited. Using user variables to do the ranking is also an option, however they're notoriously tricky to use and aren't portable so I'd go there only if performance demands it.
Based on Joachim's excellent answer I worked out the solution. It also works when there's multiple rows per day for each user.
CREATE TABLE export_days
SELECT
user_id, DATE(submitted) AS date_str,
(SELECT COUNT(DISTINCT DATE(submitted))+1 FROM export_blocks WHERE user_id = a.user_id AND submitted < a.submitted) date_no
FROM export_blocks a
GROUP BY user_id, DATE(submitted)
ORDER BY user_id, submitted

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.