Get maximum number of consecutive duplicate records in mysql - mysql

Suppose there are 25 groups of programmers, with 5-100 programmers in each group. Each group of programmers is tasked with writing the query that this question refers to. In response to this task, many programmers in each group begins drinking heavily. Each group has a stocked bar consisting of:
Whiskey
Vodka
Beer
Water
Every time a programmer finishes a drink, a new row is added to the table including:
Time the drink was finished
Group ID
Programmer ID
Type of drink consumed
The program manager wants to be emailed every six hours with a list of programmers who have consumed 5 or more Beers in a row, within the last 6 hours, without having a shot of vodka/whiskey, or a glass of water. The total number of beers that each programmer has consumed without switching to another drink at least once needs to be included in the report.
If at least one drink other than a beer is consumed before reaching 5 beers, then that programmer will not go on the list.
There are no upper or lower bounds on the number of drinks that a programmer can consume in a 6-hour period.
There are no requirements on the type or order of drinks that any programmer can consume.
The MySQL database has a table 'drinks' with:
drinks_id INT(11) PK NN AI
group_id INT(11) NN
programmer_id INT(11) NN
type_of_drink VARCHAR(25) NN
time_finished DATETIME NN
(type of drink should probably be in another table and the drink_type_id used, but I'm going for simplicity here)
The core of what I'm looking for is the maximum count value of the number of consecutive rows with type_of_drink = 'beer' for every group/programmer combination during a specified period of time. I've exhausted my sql skills trying to count the number of consecutive records which exist between two records with type_of_drink <> 'beer' and returning the maximum value for each group/programmer combination. I can't seem to get it right, and that may not be the way to look at this problem in the first place.
Thanks in advance. I'll be happy to provide any additional information or requirements if needed.

SELECT DISTINCT programmer_id
FROM (
SELECT
programmer_id,
#beercounter := IF(#prev_programmer != programmer_id OR type_of_drink != 'beer', 1, #beercounter + 1) AS how_many_beer_in_a_row,
#prev_programmer := programmer_id
FROM
your_table y
, (SELECT #beercounter:=1, #prev_programmer:=NULL) vars
WHERE time_finished >= NOW() - INTERVAL 6 HOUR
ORDER BY programmer_id, time_finished
) sq
WHERE how_many_beer_in_a_row >= 5

Related

candle creation on a monthly basis using the scores in mysql

I am trying to create a candle using the table as below. It has a score and month and there can be as many as 4 scores in a month.
id | score | month
1 | 10 | 12
.. | .. | ..
And here is what I actually did,
select
score as open,
max(score) as high,
min(score) as low
from score_table
group by month
I am successful in getting Open, high and low.
My problem is getting the close, basically the fourth score of a month. I tried some solutions using joins unfortunately I am wrong and couldn't get it right which actually landed me in too many confusion. I'm not good at SQL and need help...
I see when you group by month the records just give you a high and a low with the same values
What I changed is to get the month and the high and low .
There should be separate columns for the high ,low and open in a list form to break the high lows up per time period (if you only working on one candle its fine but many candles over a time period there should be a row for each time period)
that data is quite hard to work with the way the table is set out but you can construct something like this to make it easier for your self
id | Month | Open | High | Low |
would be more ideal to work with that data but non the the least I changed the the MySQL query a bit to reflect data as per your description. I achieved it by combining 2 MySQL queries to get the open data from row 3
select x.open, y.high, y.low from ( select (score) as open
from score
where id = 3 )as x,
(select max(score) as high,
min(score) as low
from score ) as y;

Relational Database Logic

I'm fairly new to php / mysql programming and I'm having a hard time figuring out the logic for a relational database that I'm trying to build. Here's the problem:
I have different leaders who will be in charge of a store anytime between 9am and 9pm.
A customer who has visited the store can rate their experience on a scale of 1 to 5.
I'm building a site that will allow me to store the shifts that a leader worked as seen below.
When I hit submit, the site would take the data leaderName:"George", shiftTimeArray: 11am, 1pm, 6pm (from the example in the picture) and the shiftDate and send them to an SQL database.
Later, I want to be able to get the average score for a person by sending a query to mysql, retrieving all of the scores that that leader received and averaging them together. I know the code to build the forms and to perform the search. However, I'm having a hard time coming up with the logic for the tables that will relate the data. Currently, I have a mysql table called responses that contains the following fields,
leader_id
shift_date // contains the date that the leader worked
shift_time // contains the time that the leader worked
visit_date // contains the date that the survey/score was given
visit_time // contains the time that the survey/score was given
score // contains the actual score of the survey (1-5)
I enter the shifts that the leader works at the beginning of the week and then enter the survey scores in as they come in during the week.
So Here's the Question: What mysql tables and fields should I create to relate this data so that I can query a leader's name and get the average score from all of their surveys?
You want tables like:
Leader (leader_id, name, etc)
Shift (leader_id, shift_date, shift_time)
SurveyResult (visit_date, visit_time, score)
Note: omitted the surrogate primary keys for Shift and SurveyResult that I would probably include.
To query you join shifts and surveys group on leader and taking the average then jon that back to leader for a name.
The query might be something like (but I haven;t actually built it in MySQL to verify syntax)
SELECT name
,AverageScore
FROM Leader a
INNER JOIN (
SELECT leader_id
, AVG(score) AverageScore
FROM Shift
INNER JOIN
SurveyResult ON shift_date = visit_date
AND shift_time = visit_time --depends on how you are recording time what this really needs to be
GROUP BY leader ID
) b ON a.leader_id = b.leader_id
I would do the following structure:
leaders
id
name
leaders_timetabke (can be multiple per leader)
id,
leader_id
shift_datetime (I assume it stores date and hour here, minutes and seconds are always 0
survey_scores
id,
visit_datetime
score
SELECT l.id, l.name, AVG(s.score) FROM leaders l
INNER JOIN leaders_timetable lt ON lt.leader_id = l.id
INNER JOIN survey_scores s ON lt.shift_datetime=DATE_FORMAT('Y-m-d H:00:00', s.visit_datetime)
GROUP BY l.id
DATE_FORMAT here helps to cut hours and minutes from visit_datetime so that it could be matched against shift_datetime. This is MYSQL function, so if you use something else you'll need to use different function
Say you have a 'leader' who has 5 survey rows with scores 1, 2, 3, 4 and 5.
if you select all surveys from this leader, sum the survey scores and divide them by 5 (the total amount of surveys that this leader has). You will have the average, in this case 3.
(1 + 2 + 3 + 4 + 5) / 5 = 3
You wouldn't need to create any more tables or fields, you have what you need.

Schedule a snapshot of a MySQL Table

It is possible to schedule a snapshot of a MySQL table?
The situation:
I have a table that collects votes (up and down) from a website. It registers the number of votes (that can be either a vote up, “+1”, or a vote down, “-1”), and records the score (for example, if one ID received 5 votes, 3 voting "up" (+1) and 2 voting "down" (-1), then the table would record that number of votes was "5", and that the vote score was "2". It is recorded in a table that has the following column headers/fields:
ID | Score | nvotes
…where ID is the reference of the item being voted on; 'score' is the actual score that is calculated (from the number of “+1” and “-1” votes), and 'nvotes' is the number of votes that have been received
This is great for looking at the website at any point in time and seeing what the score is, and how many people have voted.
However, I now want to be able to chart the trend for the ID – to look back over time and see how the score has gone up and down over time.
Is there a facility in MySQL to be able to take a snapshot at the end of each day, recording where that particular ID was at the end of that day in terms of their score and the number of votes received, and store this in another table so that I can create charts and analysis over time?
Or, failing that, can anyone think of a better, more intelligent way of trying to acheive what I need?

MySQL: Optimizing query for calculation of financial BETA

in finance, a stock's beta is the covariance between the stock's daily returns and an index' daily returns divided by the variance of the index daily returns. I try to calaculate beta for set of stocks and a set of indices.
Here's my query for a 50 business day rolling window and I'd like you to help me optimize it for speed:
INSERT INTO betas (permno, index_id, DATE, beta)
(SELECT
permno, index_id, s.date, IF(
s.`seq` >= 50,
(SELECT
(AVG(s2.log_return*i2.log_return)-AVG(s2.log_return)*AVG(i2.log_return))/VAR_POP(i2.log_return) AS beta
FROM
stock_series s2
INNER JOIN `index_series` i2 ON i2.date=s2.date
WHERE i2.index_id=i.index_id AND s2.permno = s.permno
AND s2.`seq` BETWEEN s.`seq` - 49 AND s.`seq`
GROUP BY index_id,permno), NULL)
AS beta
FROM
stock_series s
INNER JOIN `index_series` i ON i.index_id IN ('SP500') AND i.date=s.date
)
ON DUPLICATE KEY
UPDATE beta= VALUES (beta)
Both main tables are already ordered by entity and date in ascending order, and they already include log daily returns as well as a "seq" column. Seq sequentally enumerates all daily rows company- (or index-) wise, i.e. seq starts over at 1 for every new stock or index in the table and counts up to the number of total number of rows for a given entity. I created it to allow for the rolling window.
As of now, with 500 firms and 1 index, the query takes like forever to complete.
Let me know any optimization that comes to your mind, like views, stored procs, temp tables, and if you find any inconsistencies, of course.
EDIT: Indexes:
stock_series has PRIMARY KEY (permno,date) and UNIQUE KEY (permno,seq),
index_series has PRIMARY KEY (index_id,date)
EXPLAIN EXTENDED results for ONE company (by including a WHERE s.permno=... restriction at the end):
EXPLAIN EXTENDED results for ALL ~500 companies:
Here i what the pros do: do NOT calcualte that in the databae. Pull data, calculate, reinsert. Where I work now they have a hugh grid doing that stuff in the end of day run. Yes, gris - like in a significant number of machines. We talk of producing gigabytes of csv files that then get reloaded into the database. Beta, Gamma, PnL on trades with 120.000 different elements. Databaes are NOT optimized for this.

MySQL query puzzle - finding what WOULD have been the most recent date

I've looked all over and haven't yet found an intelligent way to handle this, though I feel sure one is possible:
One table of historical data has quarterly information:
CREATE TABLE Quarterly (
unique_ID INT UNSIGNED NOT NULL,
date_posted DATE NOT NULL,
datasource TINYINT UNSIGNED NOT NULL,
data FLOAT NOT NULL,
PRIMARY KEY (unique_ID));
Another table of historical data (which is very large) contains daily information:
CREATE TABLE Daily (
unique_ID INT UNSIGNED NOT NULL,
date_posted DATE NOT NULL,
datasource TINYINT UNSIGNED NOT NULL,
data FLOAT NOT NULL,
qtr_ID INT UNSIGNED,
PRIMARY KEY (unique_ID));
The qtr_ID field is not part of the feed of daily data that populated the database - instead, I need to retroactively populate the qtr_ID field in the Daily table with the Quarterly.unique_ID row ID, using what would have been the most recent quarterly data on that Daily.date_posted for that data source.
For example, if the quarterly data is
101 2009-03-31 1 4.5
102 2009-06-30 1 4.4
103 2009-03-31 2 7.6
104 2009-06-30 2 7.7
105 2009-09-30 1 4.7
and the daily data is
1001 2009-07-14 1 3.5 ??
1002 2009-07-15 1 3.4 &&
1003 2009-07-14 2 2.3 ^^
then we would want the ?? qtr_ID field to be assigned '102' as the most recent quarter for that data source on that date, and && would also be '102', and ^^ would be '104'.
The challenges include that both tables (particularly the daily table) are actually very large, they can't be normalized to get rid of the repetitive dates or otherwise optimized, and for certain daily entries there is no preceding quarterly entry.
I have tried a variety of joins, using datediff (where the challenge is finding the minimum value of datediff greater than zero), and other attempts but nothing is working for me - usually my syntax is breaking somewhere. Any ideas welcome - I'll execute any basic ideas or concepts and report back.
Just subquery for the quarter id using something like:
(
SELECT unique_ID
FROM Quarterly
WHERE
datasource = ?
AND date_posted >= ?
ORDER BY
unique_ID ASC
LIMIT 1
)
Of course, this probably won't give you the best performance, and it assumes that dates are added to Quarterly sequentially (otherwise order by date_posted). However, it should solve your problem.
You would use this subquery on your INSERT or UPDATE statements as the value of your qtr_ID field for your Daily table.
The following appears to work exactly as intended but it surely is ugly (with three calls to the same DATEDIFF!!), perhaps by seeing a working query someone might be able to further reduce it or improve it:
UPDATE Daily SET qtr_ID = (select unique_ID from Quarterly
WHERE Quarterly.datasource = Daily.datasource AND
DATEDIFF(Daily.date_posted, Quarterly.date_posted) =
(SELECT MIN(DATEDIFF(Daily.date_posted, Quarterly.date_posted)) from Quarterly
WHERE Quarterly.datasource = Daily.datasource AND
DATEDIFF(Daily.date_posted, Quarterly.date_posted) > 0));
After more work on this query, I ended up with enormous performance improvements over the original concept. The most important improvement was to create indices in both the Daily and Quarterly tables - in Daily I created indices on (datasource, date_posted) and (date_posted, datasource) USING BTREE and on (datasource) USING HASH, and in Quarterly I did the same thing. This is overkill but it made sure I had an option that the query engine could use. That reduced the query time to less than 1% of what it had been. (!!)
Then, I learned that given my particular circumstances I could use MAX() instead of ORDER BY and LIMIT so I use a call to MAX() to get the appropriate unique_ID. That reduced the query time by about 20%.
Finally, I learned that with the InnoDB storage engine I could segment the chunk of the Daily table that I was updating with any one query, which allowed me to multi-thread the queries with a little elbow-grease and scripting. The parallel processing worked well and every thread reduced the query time linearly.
So, the basic query that is performing literally 1000 times better than my own first attempt is:
UPDATE Daily
SET qtr_ID =
(
SELECT MAX(unique_ID)
FROM Quarterly
WHERE Daily.datasource = Quarterly.datasource AND
Daily.date_posted > Quarterly.dateposted
)
WHERE unique_ID > ScriptVarLowerBound AND
unique_ID <= ScriptVarHigherBound
;