MySql subselects and counts based on counts - mysql

Here is model of my table structure. Three tables.
---------------- ---------------------------- -------------------------------
|possibilities | |realities | |measurements |
|--------------| |--------------------------| |-----------------------------|
|pid| category | |rid | pid | item | status | |mid | rid | meas | date |
|--------------| |--------------------------| |-----------------------------|
|1 | animal | |1 | 1 | dog | 1 (yes)| |1 | 1 | 3 | 2012-01-01|
|2 | vegetable| |2 | 1 | fox | 1 | |2 | 3 | 2 | 2012-01-05|
|3 | mineral | |3 | 1 | cat | 1 | |3 | 1 | 13 | 2012-02-02|
---------------- |4 | 2 | apple| 2 (no) | |4 | 3 | 24 | 2012-02-15|
|5 | 1 | mouse| 1 | |5 | 2 | 5 | 2012-02-16|
|7 | 1 | bat | 2 | |6 | 6 | 4 | 2012-02-17|
---------------------------- -------------------------------
What I'm after is a result that will show me a series of counts based on measurement ranges for a particular entry from the "possibilities" table where the status of the related "realities" is 1 (meaning it's currently being tracked), BUT the only measurement that is relevant is the most recent one.
Here is an example result I'm looking for using animal as the possibility.
-----------------------
| 0-9 | 10-19 | 20-29 |
|---------------------|
| 2 | 1 | 1 |
-----------------------
So, in this example, the apple row was not used to make a count because it isn't an animal, nor was bat because it's status was set to no (meaning don't measure), and only the most recent measurements were used to determine the count.
I currently have a workaround in my real world use but it doesn't follow good database normalization. In my realities table I have a current_meas column that get updated when a new measurement is taken and entered in the measurements table. Then I only need to use the first two tables and I have a single SELECT statement with a bunch of embedded SUM statements that use an IF the value is between 0-9 for example. It gives me exactly what I want, however my app has evolved to the point where this convenience has become a problem in other areas.
So, the question is, is there a more elegant way to do this in one statement? Subselects? temporary tables? Getting the counts is the heart of what the app is about.
This is a PHP5, MySQL5, JQuery 1.8 based webapp, in case that give me some more options. Thanks in advance. I love the stack and hope to help back as much as it has helped me.

Here's one approach
Create a temp table to get the recent measurements
CREATE TEMPORARY TABLE RecentMeasurements
SELECT * FROM Measurements m
INNER JOIN (SELECT max(mid) max_id,date FROM Measurements GROUP BY DATE ORDER BY DATE ) x
ON x.max_id=m.mid
then do you query:
SELECT *, your counting logic
FROM Realities
WHERE status = 1 AND pid = 1
INNER JOIN RecentMeasurements

Here is what I ended up doing based on the two answers suggested.
First I created a temporary table that generates a table of
realities that are based on one possibility (animals) and whose
status is 1 (yes).
Second I created a temporary table that generates a table of the
individual realities from the first temp table, and finds the most
recent measurement for each one.
From this second table I do a select that gives me the breakdown of
counts in ranges.
When I tried it with just one temp table the query would take 5-10 seconds per possibility. In my real-world use I currently have 30 possibilities (a script loops through each one and generates these temp tables and selects), well over 1,000 realities (600 active on any given day, 100 added per month), and over 21,000 measurements (20-30 added daily). That just wasn't working for me. So breaking it up into smaller pools to draw from reduced it to the whole report running in under 3-4 seconds.
Here is the MySQL stuff with my real-world table and column names.
//Delete the temporary tables in advance
$delete_np_prod = 'DROP TABLE IF EXISTS np_infreppool';
mysql_query($delete_np_prod) or die ("Drop NP Prod Error " . mysql_error ());
$delete_np_max = 'DROP TABLE IF EXISTS np_maxbrixes';
mysql_query($delete_np_max) or die ("Drop NP Max Error " . mysql_error ());
//Make a temporary table to hold the totes of this product at North Plains that are active
$create_np_prod_pool_statement = 'CREATE TEMPORARY TABLE np_infreppool
SELECT inf_row_id FROM infusion WHERE formid = ' . $active_formids["formid"] . ' AND location = 1 AND status = 1';
mysql_query($create_np_prod_pool_statement) or die ("Prod Error " . mysql_error ());
//Make a temporary table to hold the tote with its most recent brix value attached to it.
$create_np_maxbrix_pool_statement = 'CREATE TEMPORARY TABLE np_maxbrixes
SELECT b.inf_row_id AS inf_row_id, b.brix AS brix from brix b, np_infreppool pool WHERE b.inf_row_id = pool.inf_row_id AND b.capture_date = (SELECT max(capture_date) FROM brix WHERE inf_row_id = pool.inf_row_id )';
mysql_query($create_np_maxbrix_pool_statement) or die ("Brix Error " . mysql_error ());
//Get the counts for slected form from NP
$get_report_np = "SELECT
SUM(IF(brix BETWEEN 0 AND 4,1,0)) as '0-4',
SUM(IF(brix BETWEEN 5 AND 9,1,0)) as '5-9',
SUM(IF(brix BETWEEN 10 AND 14,1,0)) as '10-14',
SUM(IF(brix BETWEEN 15 AND 19,1,0)) as '15-19',
SUM(IF(brix BETWEEN 20 AND 24,1,0)) as '20-24',
SUM(IF(brix BETWEEN 25 AND 29,1,0)) as '25-29',
SUM(IF(brix BETWEEN 30 AND 34,1,0)) as '30-34',
SUM(IF(brix BETWEEN 35 AND 39,1,0)) as '35-39',
SUM(IF(brix BETWEEN 40 AND 44,1,0)) as '40-44',
SUM(IF(brix BETWEEN 45 AND 49,1,0)) as '45-49',
SUM(IF(brix BETWEEN 50 AND 54,1,0)) as '50-54',
SUM(IF(brix BETWEEN 55 AND 59,1,0)) as '54-49',
SUM(IF(brix BETWEEN 60 AND 64,1,0)) as '60-64',
SUM(IF(brix BETWEEN 65 AND 69,1,0)) as '65-69',
SUM(IF(brix >=70, 1, 0)) as 'Over 70'
FROM np_maxbrixes";
$do_get_report_np = mysql_query($get_report_np);
$got_report_np = mysql_fetch_array($do_get_report_np);
UPDATE
I got it to work in a single SELECT statement without using temporary tables and it works faster. Using my sample schema above, here is how it looks.
SELECT
SUM(IF(m.meas BETWEEN 0 AND 4,1,0)) as '0-4',
SUM(IF(m.meas BETWEEN 5 AND 9,1,0)) as '5-9',
SUM(IF(m.meas BETWEEN 10 AND 14,1,0)) as '10-14',
SUM(IF(m.meas BETWEEN 15 AND 19,1,0)) as '15-19',
SUM(IF(m.meas BETWEEN 20 AND 24,1,0)) as '20-24',
SUM(IF(m.meas BETWEEN 25 AND 29,1,0)) as '25-29',
SUM(IF(m.meas BETWEEN 30 AND 34,1,0)) as '30-34',
SUM(IF(m.meas BETWEEN 35 AND 39,1,0)) as '35-39',
SUM(IF(m.meas BETWEEN 40 AND 44,1,0)) as '40-44',
SUM(IF(m.meas BETWEEN 45 AND 49,1,0)) as '45-49',
SUM(IF(m.meas BETWEEN 50 AND 54,1,0)) as '50-54',
SUM(IF(m.meas BETWEEN 55 AND 59,1,0)) as '54-49',
SUM(IF(m.meas BETWEEN 60 AND 64,1,0)) as '60-64',
SUM(IF(m.meas BETWEEN 65 AND 69,1,0)) as '65-69',
SUM(IF(m.meas >=70, 1, 0)) as 'Over 70'
FROM measurement m, realities r
WHERE r.status = 1 AND r.pid = " . $_GET['pid'] . " AND r.rid = m.rid AND m.date = (SELECT max(date) FROM measurements WHERE rid = r.rid)

Related

How to optimize an update query for multiple rows using MySQL and PHP

I have a table that has around 80.000 records. It has 4 columns:
| id | code | size | qty |
+--------+--------+-------+------+
| 1 | 4735 | M | 5 |
| 2 | 8452 | L | 2 |
...
| 81456 | 9145 | XS | 13 |
The code column is unique.
I have to update the qty twice a day.
For that i'm using this query:
UPDATE stock SET qty = CASE id
WHEN 1 THEN 10
WHEN 2 THEN 8
...
WHEN 2500 THEN 20
END
WHERE id IN (1,2,...,2500);
I am splitting the query to update 2500 stocks at a time using PHP.
Here is (in seconds) how much it takes for each 2500 stocks to update:
[0]7.11
[1]11.30
[2]19.86
[3]27.01
[4]36.25
[5]44.21
[6]51.44
[7]61.03
[8]71.53
[9]81.14
[10]89.12
[11]99.99
[12]111.46
[13]121.86
[14]131.19
[15]136.94
[END]137
As you can see it takes between 5 - 9 seconds to update 2500 products which i think is quiet a lot.
What can i change to speed things up?
Thank you!
Because the times seem to be getting longer the further along you get, I'd expect you need an index on the id field, as it looks suspiciously like it's doing a full table scan. You can create the index something like this
CREATE INDEX my_first_index ON table(id);
(I am having to add this as an answer because I can't make comments, I know it is more of a comment!!)
** EDIT **
I re-read and see your issue is bigger. I still think there is a chance that putting an index on id would fix it but a better solution would be to have a new table for the id to quantity mappings, lets call it qty_mapping
| id | qty |
+--------+------+
| 1 | 10 |
| 2 | 8 |
...
| 2500 | 20 |
make sure to index id and then you can change your update to
update stock set qty = (select qm.qty from qty_mapping qm where qm.id = stock.id)
It should be able to update the whole 80,000 records in next to no time.

MySQL - Get values from previous rows

I am trying to reconstruct data that has a tree structure.
Example - Country / City:
1) USA
1.1) New York
1.2) Chicago
2) France
2.1) Paris
2.2) Lyon
3) China
In my database it looks like this:
| Element | Level | Row |
|:--------:|:-----:|:---:|
| USA | 1 | 1 |
| New York | 2 | 2 |
| Chicago | 2 | 3 |
| France | 1 | 4 |
| Paris | 2 | 5 |
| Lyon | 2 | 6 |
| China | 1 | 7 |
Based on the sequence (row) of my entries I can reconstruct the tree structure. For each row I look for the nearest previous row that has Level-1.
max(pre.Row) / pre.Row < cur.Row / pre.Level = cur.Level-1
Following code is working and it returns the right results. My problem is that the table is 7 million rows large and therefore it takes a lot of time. It is like comparing 7 million times 7 million rows...
SELECT cur.`Row`, (
SELECT max(pre.`Row`)
FROM `abc`.`def` AS pre
WHERE pre.`Row` < cur.`Row`
AND pre.`Level`=cur.`Level`-1
) AS prev_row
FROM `abc`.`def` AS cur
;
Is there a faster way to implement this?
Maybe with loops or user variables? I could imagine that you actually start from the current row and then test if the previous row meets the conditions otherwise look for the next previous row and so on. This will reduce the opertions to 7 million times ~5. I never worked with loops so I have no clue if this is possible in SQL. Any ideas?
here's my try with 3 levels you can add levels if you have more, not sure why it's returning weird values that look like encoded values but CAST() AS unSIGNED gets you prev_row just as your query.
SELECT Row,
CAST(ELT(level-1,#level_1,#level_2,#level_3) as UNSIGNED) as prev_row,
#level_1 := IF(`level` = 1, row, #level_1),
#level_2 := IF(`level` = 2, row, #level_2),
#level_3 := IF(`level` = 3, row, #level_3)
FROM `def`
ORDER BY Row ASC
http://sqlfiddle.com/#!9/719b2/22

Best way to gain performance and do fast sql queries?

I use MySQL for my database and i do some processing on the database side to make it easier for my application.
The queries i do used to be very fast until recently my database has lots of data and the queries are very very very slow.
My application do mainly statistics and has lots of related database to fetch data.
Here is an example:
tbl_game
+-------------------------------------+
| id | winner | duration| endedAt |
|--------+--------+---------+---------|
| 1 | 1 | 1200 |timestamp|
| 2 | 0 | 1200 |timestamp|
| 3 | 1 | 1200 |timestamp|
| 4 | 1 | 1200 |timestamp|
+-------------------------------------+
winner is either 0 or 1 for the team who won the game
duration is the number of seconds a game took
tbl_game_player
+-------------------------------------------------+
| gameId | playerId | playerSlot | frags | deaths |
|--------+----------+------------+-------+--------|
| 1 | 100 | 1 | 24 | 50 |
| 1 | 150 | 2 | 32 | 52 |
| 1 | 101 | 3 | 26 | 62 |
| 1 | 109 | 4 | 48 | 13 |
| 1 | 123 | 5 | 24 | 52 |
| 1 | 135 | 6 | 30 | 30 |
| 1 | 166 | 7 | 28 | 48 |
| 1 | 178 | 8 | 52 | 96 |
| 1 | 190 | 9 | 12 | 75 |
| 1 | 106 | 10 | 68 | 25 |
+-------------------------------------------------+
The details are only for the first game with id 1
1 game has 10 player slots where slot 1-5 = team 0 and 6-10 = team 1
There are more details in my real table this is just to give an overview.
So i need to calculate the statistics of each player in all the games. I created a view to accomplish this and it works fine when i have little data.
Here is an example:
+--------------------------------------------------------------------------+
| gameId | playerId | frags | deaths | actions | team | percent | isWinner |
|--------+----------+-------+--------+---------+------+---------+----------|
actions = frags + deaths
percent = (actions / sum(actions of players in the same team)) * 100
team is calculated using playerSlot in 1,2,3,4,5 or 6,7,8,9,10
isWinner is calculated by the team and winner
This is just 1 algorithm and i have many others to perform. My database is 1 milion + records and the queries are very slow.
here is the query for the above:
SELECT
tgp.gameId,
tgp.playerId,
tgp.frags,
tgp.deaths,
tgp.frags + tgp.deaths AS actions,
IF(playerSlot in (1,2,3,4,5), 0, 1) AS team,
((SELECT actions) / tgpx.totalActions) * 100 AS percent,
IF((SELECT team) = tg.winner, 1, 0) AS isWinner
FROM tbl_game_player tgp
INNER JOIN tbl_game tg on tgp.gameId = tg.id
INNER JOIN (
SELECT
gameId,
SUM(frags) AS totalFrags,
SUM(deaths) AS totalDeaths,
SUM(frags) + SUM(deaths) as totalActions,
IF(playerSlot in (1,2,3,4,5), 0, 1) as team
FROM tbl_game_player
GROUP BY gameId, team
) tgpx on tgp.gameId = tgpx.gameId and team = tgpx.team
It's quite obvious that indexes don't help you here¹, because you want all data from the two tables. You even want the data from tbl_game_player twice, once aggregated, once not aggregated. So there are millions of records to read and join. Your query is fine, and I see no way to improve it really.
¹ Of course you should always have indexes on primary and foreign keys, so the DBMS can make use of them in joins. (E.g. there should be an index on tbl_game(tgp.gameId)).
So your options lie outside the query:
Hardware (obviously).
Add a computed column for the team to tbl_game_player, so at least you save its evaluation when querying.
Partitions. One partition per team, so the aggregates can be calcualted separately.
Pre-computed data: Add a table tbl_game_team holding the sums; fill it with triggers. Thus you don't have to compute the aggregates in your query.
Data warehouse table: Make a table holding the complete result. Fill it with triggers or at intervals.
Setting up indexes would speed up your queries. Queries can take a while to run if there is a lot of results, this is definitely a start though.
for large databases Mysql INDEX can be very helpful in speed problems, An index can be created in a table to find data more quickly & efficiently. so must create index , you can learn more about MYsql index here http://www.w3schools.com/sql/sql_create_index.asp

How to generalize a sequential COUNT() of chronological data without loops or cursors?

I have read all the arguments: Tell SQL what you want, not how to get it. Use set-based approaches instead of procedural logic. Avoid cursors and loops at all costs.
Unfortunately, I have been racking my brain for weeks and I can't figure out how to come up with a set-based approach to generating an iterative COUNT for sequential subsets of chronologically ordered data.
Here is the specific application of the problem I am working on.
I do football-related research using a database that comprises many years of play-by-play data, which is of course arranged chronologically by year, game, and play. The database is loaded onto a web server running MySQL 5.0.
The fields I need for this particular problem are contained in the core table. Here is some sample data from the relevant part of the table:
GID | PID | OFF | DEF | QTR | MIN | SEC | PTSO | PTSD
--------------------------------------------------------
121 | 2455 | ARI | CHI | 2 | 4 | 30 | 17 | 10
121 | 2456 | ARI | CHI | 2 | 4 | 15 | 17 | 10
121 | 2457 | ARI | CHI | 2 | 3 | 53 | 17 | 10
121 | 2458 | ARI | CHI | 2 | 3 | 31 | 20 | 10
The columns represent, respectively: unique game identifier, unique play identifier, which team is on offense for that play, which team is on defense for that play, the quarter and time the play occurred, and the offense's and defense's scores going into the play. In other words, in (hypothetical) game 121, the Arizona Cardinals scored a field goal on play 2457 (i.e., going into play 2458).
What I want to do is go through several years' worth of data game by game, second by second, and count the number of times any possible score differential occurred at any given elapsed time. The following query arranges the data by seconds elapsed and score differential:
SELECT core.GID, core.PID, core.QTR, core.MIN, core.SEC, core.PTSO, core.PTSD,
((core.QTR - 1) * 900 + (900-(core.MIN * 60 + core.SEC))) AS secEl,
core.PTSO - core.PTSD AS oDif, (core.PTSO - core.PTSD) * -1 AS dDif
FROM core
ORDER BY secEl ASC, oDif ASC;
The result looks something like this:
GID | PID | OFF | DEF | QTR | MIN | SEC | PTSO | PTSD | secEl | oDif | dDif
---------------------------------------------------------------------------------
616 | 100022 | CHI | MIN | 1 | 15 | 00 | 0 | 0 | 0 | 0 | 0
617 | 100169 | HOU | DAL | 1 | 15 | 00 | 0 | 0 | 0 | 0 | 0
618 | 100224 | PHI | SEA | 1 | 15 | 00 | 0 | 0 | 0 | 0 | 0
619 | 100303 | JAX | NYJ | 1 | 15 | 00 | 0 | 0 | 0 | 0 | 0
Although that looks pretty, my goal is not to sort the data chronologically. Rather, I want to step sequentially through every one of the 4,500 possible seconds (four 15-minute quarters plus one 15-minute overtime period) in an NFL game and count the number of times every score differential has ever occurred in each one of those seconds.
In other words, I don't want to count just the number of times a team has been up by, say, 21 points at 1,800 seconds elapsed (i.e., the start of the second quarter) between 2002 and 2013. I want to count the number of times a team has been up by 21 points at any point in a game. On top of that, I want to do this for every score differential that has ever occurred (i.e., -50, -49, -48, ..., 0, 1, 2, ... 48, 49, 50, ...) for every second of every game.
This would be relatively easy to accomplish with a series of nested loops, but it wouldn't be the most reusable of code.
What I want to do is construct set logic that will COUNT the instances of each score differential that has occurred at every second of time elapsed without using loops or cursors. The results would be tabulated as follows:
secondsElapsed | scoreDif | Occurrences
-----------------------------------------
10 | -1 | 12
10 | 0 | 125517
10 | 1 | 0
10 | 2 | 3
Here is a sample query for getting the total number of instances of a specific score differential (+21) at a specific time point (3,000 seconds elapsed):
SELECT ((core.QTR - 1) * 900 + (900-(core.MIN * 60 + core.SEC))) AS timeElapsed,
(core.PTSO - core.PTSD) AS diff, COUNT(core.PTSO - core.PTSD) AS occurrences
FROM core
WHERE ((core.QTR - 1) * 900 + (900-(core.MIN * 60 + core.SEC))) = 3000
AND ABS(core.PTSO - core.PTSD) = 21
That query returns the following results:
timeElapsed | diff | occurrences
----------------------------------
3000 | 21 | 5
Now I want generalize this query to count the instances of every differential at every second elapsed.
Your description is rather confusing but if you want to "COUNT all of the possible score differentials for every possible second without using loops or cursors" then I would do something like:
1) Build a work table (either a temporary table# or a Table data type#) and fill it with the time increments you want e.g.
QTR | MIN | SEC |
1 | 00 | 01
1 | 00 | 02
..
1 | 01 | 59
1 | 02 | 00
1 | 02 | 01
1 | 02 | 02
..
4 | 15 | 59
2) You then use this as the basis of your query. Cross Join a list of the games you are interested in with the work table to give you a table of every game and every minute in that game.
3) With the result of (2) left join your query above back into it?
With this result set you can then look at a whole game and sum\count as neccessary without having to loop.
Not sure if this will cure your problem, but you could try using row_number over a partition...
SELECT ROW_NUMBER() OVER (PARTITION BY <column> ORDER BY <column>) AS aColumn, aColumn FROM aTable
I did it using a sub-query and two variables to define the time point and another to define the point difference.
The query then returns the Diff, then the amount of times the offensive side had it, followed by the defensive side and total times.
SET #Diff INT = 7;
SET #Seconds INT = 1530;
SELECT ABS(core.PTSO - core.PTSD) AS diff, SUM(CASE WHEN core.PTSO - core.PTSD <= 0 THEN 1 ELSE 0 END) OffensiveTimes, SUM(CASE WHEN core.PTSO - core.PTSD >= 0 THEN 1 ELSE 0 END) DefensiveTimes, SUM(1) TotalTimes
FROM (SELECT core.GID, core.PID, core.QTR, core.MIN, core.SEC, core.PTSO, core.PTSD,
((core.QTR - 1) * 900 + (900-(core.MIN * 60 + core.SEC))) AS secEl,
core.PTSO - core.PTSD AS oDif, (core.PTSO - core.PTSD) * -1 AS dDif
FROM core
) core
WHERE secEl = #Seconds AND ABS(core.PTSO - core.PTSD) = #Diff
GROUP BY ABS(core.PTSO - core.PTSD);
This returns this for the small dataset you gave
7 diff, 0 OffensiveTimes, 1 DefensiveTimes, 1 Times
Hope that was what you were looking for :)

MYSQL JOIN two tables

We just want to make the query for mysql database, in which there are 12 table according to the months(JAN - DEC), with 32 Columns(JAN1, JAN2, JAN3,....JAN31). These database is used for getting the availability for hotel,like if we select a tour for three days (29JAN-1JAN), so the query will check the records for 2 tables, one for JAN and other for FEB. the whole columns stored the values in digit(like, 5,10,2,0,5,etc)its showing Rooms available. We are successfully built a query for single month, but we unable to create a mysql query for 2 months, because we want a value in greater than 1. like we only shows the available rooms only.
$num = mysql_query("SELECT DISTINCT id,table_type,JAN29,room_type FROM JAN Where table_type='disp' AND JAN!=0 ");
above query is working fine for me, we just want this query for 2 tables. and getting the positive value , greater than 0(1).
Please help to solve this problem ..
Thanks
Rod
ID | JAN1 | JAN2 | JAN3 | JAN31|
34 | 5 | 3 | 3 | 4 |
56 | 4 | 3 | 9 | 3 |
28 | 0 | 7 | 0 | 9 |