MySQL query construction - mysql

I have a query problem. I have a table of agents that do things. I keep track of the things they do in an events table. I want to keep my agents busy but not too busy, so I need a query that will return me a group of agents that have done no more that 10 events in the past 10 minutes and no more than 400 events in the past 24 hours. And from this pool of available agents I can choose one to give something to do
So my agent table looks something like:
Agent table
AgentID. AgentName
1 Bob
2 Sue
Event Table
Event ID. Agent ID. Event Timestamp
1 2 1319525462
2 1 1319525462
3 2 1319525462
Obviously these tables are just to give the form of the db. What I need generally and have not been able to figure out is how to select a group of agents from a join that returns a group of agents that have done no more than 10 events in the past 10 min and no more than 400 events in the past 24 hours. My actual tables are more complex, but I am just looking for a general principle on how to structure a query that would return the desired result. Thanks ahead of time for the help!
**UPDATE
building on Benoit's answere I came up with this:
SELECT DISTINCT username FROM campaign_agents
LEFT OUTER JOIN (SELECT count(event_index) myevents, field_event_agent_value FROM new_event WHERE field_event_time_value BETWEEN 1320206138 AND 1320292538 GROUP BY field_event_agent_value ) last_24_hours
ON last_24_hours.field_event_agent_value = campaign_agents.username
LEFT OUTER JOIN (SELECT count(event_index) myevents, field_event_agent_value FROM new_event WHERE field_event_time_value BETWEEN 1320291938 AND 1320292538 GROUP BY field_event_agent_value ) last_10_mins
ON last_10_mins.field_event_agent_value = campaign_agents.username
WHERE last_24_hours.myevents < 550 AND last_10_mins.myevents < 10
But it doesn't get the agents in the campaign_agents table who haven't done anything yet and are therefore not in the events table. Shouldn't a LEFT OUTER JOIN include everything in the first table, campaign_agents, even if there are no matches to the second table? Do I need to put and OR statement after the where to somehow get them included?

You can try:
SELECT agent.agentid
FROM agent
INNER JOIN (SELECT count(eventid) events, agentid
FROM event
WHERE timestamp BETWEEN /* (now - 24 hours) */ AND /* now */ -- adapt this
GROUP BY agentid
) last_24_hours
ON last_24_hours.agentid = agent.agentid
INNER JOIN (SELECT count(eventid) events, agentid
FROM event
WHERE timestamp BETWEEN /* (now - 10 minutes) */ AND /* now */ -- adapt this
GROUP BY agentid
) last_10_mins
ON last_10_mins.agentid = agent.agentid
WHERE last_24_hours.events < 400
AND last_10_mins.events < 10

Related

Issue while using inner join with sum

I have two (2) tables:
ps_wk_mp_seller_transaction_history
And ps_wk_mp_seller_order_status
I want to totalize seller_amount only if current_state = 5
For that I have written this:
select sum(seller_amount) as seller_amount
from ps_wk_mp_seller_transaction_history th inner join
ps_wk_mp_seller_order_status os
on os.id_order = th.id_transaction and
os.current_state = 5 and
th.id_customer_seller = 2;
For id_customer_seller=2 I get this 4984.020000 (4950+34.02) and it is exact
But I get 25.848000 instead of NULL for id_customer_seller=5
Can someone help me?
May be you can test yourself, this the code: https://github.com/kulturman/fakerepo
First of all your records have some logic issue, id_transaction have tow transactions with same id_transaction but different amount and different state ! so transaction with id 41 having state 5 and that why customer 5 will not be null because 41 is in state 5.
To fix this
transaction id must be unique id for every transaction in order to differentiate between transaction state and amount
the query should be like this
select sum(seller_amount) as seller_amount
from ps_wk_mp_seller_transaction_history th
left join ps_wk_mp_seller_order_status os
on os.id_order=th.id_transaction
where
th.id_customer_seller = 2
and os.current_state=5
working example here

Moving average query MS Access

I am trying to calculate the moving average of my data. I have googled and found many examples on this site and others but am still stumped. I need to calculate the average of the previous 5 flow for the record selected for the specific product.
My Table looks like the following:
TMDT Prod Flow
8/21/2017 12:01:00 AM A 100
8/20/2017 11:30:45 PM A 150
8/20/2017 10:00:15 PM A 200
8/19/2017 5:00:00 AM B 600
8/17/2017 12:00:00 AM A 300
8/16/2017 11:00:00 AM A 200
8/15/2017 10:00:31 AM A 50
I have been trying the following query:
SELECT b.TMDT, b.Flow, (SELECT AVG(Flow) as MovingAVG
FROM(SELECT TOP 5 *
FROM [mytable] a
WHERE Prod="A" AND [a.TMDT]< b.TMDT
ORDER BY a.TMDT DESC))
FROM mytable AS b;
When I try to run this query I get an input prompt for b.TMDT. Why is b.TMDT not being pulled from mytable?
Should I be using a different method altogether to calculate my moving averages?
I would like to add that I started with another method that works but is extremely slow. It runs fast enough for tables with 100 records or less. However, if the table has more than 100 records it feels like the query comes to a screeching halt.
Original method below.
I created two queries for each product code (There are 15 products): Q_ProdA_Rank and Q_ProdA_MovAvg
Q_ProdA_RanK (T_ProdA is a table with Product A's information):
SELECT a.TMDT, a.Flow, (Select count(*) from [T_ProdA]
where TMDT<=a.TMDT) AS Rank
FROM [T_ProdA] AS a
ORDER BY a.TMDT DESC;
Q_ProdA_MovAvg
SELECT b.TMDT, b.Flow, Round((Select sum(Flow) from [Q_PRodA_Rank] where
Rank between b.Rank-1 and (b.Rank-5))/IIf([Rank]<5,Rank-1,5),0) AS
MovingAvg
FROM [Q_ProdA_Rank] AS b;
The problem is that you're using a nested subquery, and as far as I know (can't find the right site for the documentation at the moment), variable scope in subqueries is limited to the direct parent of the subquery. This means that for your nested query, b.TMDT is outside of the variable scope.
Edit: As this is an interesting problem, and a properly-asked question, here is the full SQL answer. It's somewhat more complex than your try, but should run more efficiently
It contains a nested subquery that first lists the 5 previous flows for per TMDT and prod, then averages that, and then joins that in with the actual query.
SELECT A.TMDT, A.Prod, B.MovingAverage
FROM MyTable AS A LEFT JOIN (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Avg(Top5.Flow) As MovingAverage
FROM (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Top5.Flow
FROM MyTable As JoinKeys INNER JOIN MyTable AS Top5 ON JoinKeys.Prod = Top5.Prod
WHERE Top5.TMDT In (
SELECT TOP 5 A.TMDT FROM MyTable As A WHERE JoinKeys.Prod = A.Prod AND A.TMDT < JoinKeys.TMDT ORDER BY A.TMDT
)
)
GROUP BY JoinKeys.TMDT, JoinKeys.Prod
) AS B
ON A.Prod = B.JoinKeys.Prod AND A.TMDT = B.JoinKeys.TMDT
While in my previous version I advocated a VBA approach, this is probably more efficient, only more difficult to write and adjust.

Compare 2 fields of table A with 6 fields of table B

I've got two tables in my MySQL DB. One contains requiredSkill1, requiredSkillLevel1, requiredSkill2, requiredSkillLevel2, requiredSkill3 and requiredSkillLevel3.
The other table has X rows per user with the following collumns: skill and level.
itemid requiredSkill1 requiredSkillLevel1 requiredSkill2 requiredSkillLevel2 requiredSkill3 requiredSkillLevel3
2410 3319 4 20211 1 NULL NULL
The other table:
userid skill level
21058 3412 4
21058 3435 2
21058 3312 4
Keep in mind, these are just examples.
I want every itemid which has matching values in requiredSkill{1-3} and requiredSkillLevel{1-3}.
Is this even possible with a single query and is this still performant, since the user table contains up to 300 rows per user and the item table has a fixed value of 6000 rows. This will be used in a web application, so I can use Ajax to load ranges of items from the database to decrease loading time.
I don't have the data set up. A SQL Fiddle would be helpful, but I think you want to approach it like this:
SELECT itemid FROM items i
INNER JOIN users u1 ON u1.skill = i.requiredSkill1 AND u1.level >= i.requiredSkillLevel1
INNER JOIN users u2 ON u2.skill = i.requiredSkill2 AND u2.level >= i.requiredSkillLevel2 AND u1.userid = u2.userid
INNER JOIN users u3 ON u3.skill = i.requiredSkill3 AND u3.level >= i.requiredSkillLevel3 AND u3.userid = u1.userid
Someone will solve this for you if you post demo data.

Database design - one to one relation with repeat possible?

I have to design a simple relation
1 "Workout" has multiple "Interval"
1 "Interval" is present in only 1 "Workout", but it can be repeated multiple time in a "Workout"
I was thinking having this structure:
Interval
-id
-workout_id (foreign key)
-seq_workout
Workout
-id
-name
But with this structure, if an "Interval" is present many time in a "Workout", I have to insert multiple row of this Interval in the table with a different "seq_workout" (position where the interval is in the workout). I find this bad for using DB space that could be saved otherwise.
I could use a third table (interval_workout_position)
Where I put the Interval id, the Workout id, and the position of the interval in the Workout. That way I could put the same interval multiple time in a Workout)
Is there another solution, because I find using 3 table may be overkill for this?
Basically i'm just trying to reprensent an ArrayList with repeat possible (Workout has a QList of Interval ), for those familiar with Qt or ArrayList in other langage.
Thank you!
This is a one-to-many relationship.
Your relational design is sound. The table design you have supports these requirements;
An Interval occurs in exactly 1 Workout.
1 Workout has zero, one or more Intervals.
An Interval has a position (sequence) within a Workout.
There is no need for a third table, unless you are introducing another entity.
Update:
The table definition shown in the question is different than the table design added in the comment.
If you want to model a many-to-many relationship, you would use three tables.
interval
id
message_en
duration
etc.
workout
id
name
workout_interval
id
workout_id (fk references workout.id)
seq_workout
interval_id (fk references interval.id)
etc.
This model is a many-to-many.
The workout_interval table is the "relationship" between the Workout and the Interval.
With this model
An Interval can appear in zero, one or more Workout, and a Workout can have zero, one or more Interval.
Any attributes that are specific to a particular Interval within a Workout can be added to the relationship table. Attributes of just the Interval (like a label or whatever), which doesn't change, those would be on the Interval table.
For example, if you are tracking workout results, you would record the weight and number of reps completed on the workout_interval table.
Update 2:
Given that you want an Interval to be used only in a single Workout, my the model in my previous update could be used, but it doesn't get you the constraint...
This model would provide that restriction. interval is a child of workout, and interval_seq is a child of interval.
workout
id
name
interval
id
workout_id (fk references workout.id)
message_en
etc.
interval_seq
id
interval_id (fk references interval.id)
seq_ (position within workout sequence 1,2,3,...)
etc.
To get the intervals within a workout in sequence, with repeats:
SELECT w.id
, s.seq_
, i.*
, s.*
FROM workout w
JOIN interval i
ON i.workout_id = w.id
JOIN interval_seq s
ON s.interval_id = i.id
WHERE w.id
ORDER BY w.id, s.seq_
Just to say that I chose the solution with 3 tables from "spencer7593" :
-interval
-workout
-interval_workout (link between interval and workout)
It was the most natural way for me, I'm missing one small constraint (1 interval can only be in one workout) but I will be managing the insert in the table so I don't have to worry about that.
Now here is an example of a query I do to get all interval's workout for a specific user in my database, this query return me a xml file from my Rest webService that my application use to build workouts specific for a user:
Now I need to see if Json is better than Xml in my application (using QXmlStreamReader from Qt now) and if there's a better way to code that query (currently hard-coded in a php page, hard to maintain) Thanks for your help!
└(°ᴥ°)┘ - To the moon! - └(°ᴥ°)┘
/* GET ALL INTERVAL WITH DETAILS IN SUBSCRIBED WORKOUT FOR USER=X */
SELECT
/* plan */
p.id plan_id, p.name_en plan_name_en, p.name_en plan_name_fr,
/* workout type */
wt.id workout_type_id, wt.type_en workout_type_en, wt.type_fr workout_type_fr,
/* workout */
w.id workout_id, w.name_en workout_name_en, w.name_fr workout_name_fr,
w.descrip_en workout_descrip_en, w.descrip_fr workout_descrip_fr, w.creator workout_creator,
/* seq_workout */
iw.seq_workout,
/* intervalle */
i.id intervalle_id, i.duration intervalle_duration, i.msg_en intervalle_msg_en, i.msg_fr intervalle_msg_fr,
/* intervalle : power */
i.power_start intervalle_power_start, i.power_end intervalle_power_end, i.power_range intervalle_power_range, i.power_left intervalle_power_left,
/* intervalle : cadence */
i.cadence_start intervalle_cadence_start, i.cadence_end intervalle_cadence_end, i.cadence_range intervalle_cadence_range,
/* intervalle : hr */
i.hr_start intervalle_hr_start, i.hr_end intervalle_hr_end, i.hr_range intervalle_hr_range,
/* intervalle type */
it.id intervalle_type_id, it.type_en intervalle_type_en, it.type_fr intervalle_type_fr,
/* intervalle : step type */
isPower.id intervalle_steptype_power_id, isPower.type_en intervalle_steptype_power_type_en, isPower.type_fr intervalle_steptype_power_type_fr,
isCadence.id intervalle_steptype_cadence_id, isCadence.type_en intervalle_steptype_cadence_type_en, isCadence.type_fr intervalle_steptype_cadence_type_fr,
isHr.id intervalle_steptype_hr_id, isHr.type_en intervalle_steptype_hr_type_en, isHr.type_fr intervalle_steptype_hr_type_fr
FROM user u
INNER JOIN user_groupe ug
ON u.id = ug.user_id
INNER JOIN groupe g
ON ug.groupe_id = g.id
INNER JOIN groupe_plan gp
ON g.id = gp.groupe_id
INNER JOIN plan p
ON gp.plan_id = p.id
INNER JOIN workout w
ON p.id = w.plan_id
INNER JOIN workout_type wt
ON wt.id = w.workout_type_id
INNER JOIN intervalle_workout iw
ON w.id = iw.workout_id
INNER JOIN intervalle i
on i.id = iw.intervalle_id
INNER JOIN intervalle_type it
on i.intervalle_type_id = it.id
INNER JOIN intervalle_steptype isPower
on isPower.id = i.power_steptype_id
INNER JOIN intervalle_steptype isCadence
on isCadence.id = i.cadence_steptype_id
INNER JOIN intervalle_steptype isHr
on isHr.id = i.hr_steptype_id
WHERE u.id = 1
ORDER BY iw.workout_id, iw.seq_workout;

mysql update with a self referencing query

I have a table of surveys which contains (amongst others) the following columns
survey_id - unique id
user_id - the id of the person the survey relates to
created - datetime
ip_address - of the submission
ip_count - the number of duplicates
Due to a large record set, its impractical to run this query on the fly, so trying to create an update statement which will periodically store a "cached" result in ip_count.
The purpose of the ip_count is to show the number of duplicate ip_address survey submissions have been recieved for the same user_id with a 12 month period (+/- 6months of created date).
Using the following dataset, this is the expected result.
survey_id user_id created ip_address ip_count #counted duplicates survey_id
1 1 01-Jan-12 123.132.123 1 # 2
2 1 01-Apr-12 123.132.123 2 # 1, 3
3 2 01-Jul-12 123.132.123 0 #
4 1 01-Aug-12 123.132.123 3 # 2, 6
6 1 01-Dec-12 123.132.123 1 # 4
This is the closest solution I have come up with so far but this query is failing to take into account the date restriction and struggling to come up with an alternative method.
UPDATE surveys
JOIN(
SELECT ip_address, created, user_id, COUNT(*) AS total
FROM surveys
WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed
GROUP BY ip_address, user_id
) AS ipCount
ON (
ipCount.ip_address = surveys.ip_address
AND ipCount.user_id = surveys.user_id
AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH)
)
SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id.
WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address
Thank you for you help in advance :)
A few (very) minor tweaks to what is shown above. Thank you again!
UPDATE surveys AS s
INNER JOIN (
SELECT x, count(*) c
FROM (
SELECT s1.id AS x, s2.id AS y
FROM surveys AS s1, surveys AS s2
WHERE s1.state IN (1, 3) # completed and verified
AND s1.id != s2.id # dont self join
AND s1.ip_address != "" AND s1.ip_address IS NOT NULL # not interested in blank entries
AND s1.ip_address = s2.ip_address
AND (s2.created BETWEEN (s1.created - INTERVAL 6 MONTH) AND (s1.created + INTERVAL 6 MONTH))
AND s1.user_id = s2.user_id # where completed for the same user
) AS ipCount
GROUP BY x
) n on s.id = n.x
SET s.ip_count = n.c
I don't have your table with me, so its hard for me to form correct sql that definitely works, but I can take a shot at this, and hopefully be able to help you..
First I would need to take the cartesian product of surveys against itself and filter out the rows I don't want
select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)
The output of this should contain every pair of surveys that match (according to your rules) TWICE (once for each id in the 1st position and once for it to be in the 2nd position)
Then we can do a GROUP BY on the output of this to get a table that basically gives me the correct ip_count for each survey_id
(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)
So now we have a table mapping each survey_id to its correct ip_count. To update the original table, we need to join that against this and copy the values over
So that should look something like
UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x
There is some pseudo code in there, but I think the general idea should work
I have never had to update a table based on the output of another query myself before.. Tried to guess the right syntax for doing this from this question - How do I UPDATE from a SELECT in SQL Server?
Also if I needed to do something like this for my own work, I wouldn't attempt to do it in a single query.. This would be a pain to maintain and might have memory/performance issues. It would be best have a script traverse the table row by row, update on a single row in a transaction before moving on to the next row. Much slower, but simpler to understand and possibly lighter on your database.