I'm new to MySQL and I'm having trouble with the problem below;
Let's call this table runners_table
+---------+-----------+
| race_id | runner_id |
+---------+-----------+
| 10 | A |
| 10 | E |
| 10 | V |
| 23 | G |
| 23 | J |
| 23 | A |
| 67 | E |
| 67 | G |
| 67 | X |
+---------+-----------+
And I want to add a new column like this;
+---------+-----------+--------------+
| race_id | runner_id | prev_race_id |
+---------+-----------+--------------+
| 10 | A | - |
| 10 | E | - |
| 10 | V | - |
| 23 | G | - |
| 23 | J | - |
| 23 | A | 10 |
| 67 | E | 10 |
| 67 | G | 23 |
| 67 | X | - |
+---------+-----------+--------------+
Where prev_race_id looks back and gets the previous race_id for the same runner_id.
To illustrate;
What I'd like to do is, say a runner took part in 6 races out of 10 races that year, in the row relating to his 5th race I want to know the race_id of his 4th race.
I guess I could make a new table for every runner that was a record of the races they took part in, but this would stretch to hundreds of tables... there must be a better way.
Is it possible to do this?
I think the easiest way is using a correlated subquery. Assuming race_id is in numerical order:
select r.*,
(select r2.race_id
from runners_table r2
where r2.race_id < r.race_id and r2.runner_id = r.runner_id
order by r2.race_id desc
limit 1
) as prev_race_id
from runners_table r;
If some other column determines the previous record, then the where and order by would change.
You can create 3 tables, one for race, one for runner and one for the relationship between them. Since you stored the date of every race you can get the previus one easily.
CREATE TABLE Race(
race_id INT,
data VARCHAR(100)
);
CREATE TABLE Runner(
runner_id INT,
data VARCHAR(100)
);
CREATE TABLE Race_Runner(
race_id INT,
runner_id INT,
fecha DATETIME
);
If you want to know the previous race date for a specific runner, try this query:
select race_id
from Race_Runner
where fecha <> (select max(fecha) from Race_Runner)
and runner_id = ? -- the runner you want
order by fecha desc
limit 1
The following query returns the value of each runner with the previous race date:
select runner_id,
LAG(fecha) over (partition by runner_id order by fecha desc)
as previous_date_race
from Race_Runner
Related
I'm in the need to perform a select SUM() where that is a formula contained into a field selected by another query.
Example:
table_A (the "formula" field contains, in each cell, an arithmetic expression involving columns from table B):
+------------+--------------+------------+
| Product_id | related_prod | formula |
+------------+--------------+------------+
| U1 | C2 | col2-col1 |
| U2 | C3 | col3-col2 |
| U3 | C4 | col3-col1 |
+------------+--------------+------------+
table_B:
+------------+---------+------------+----------+------+------+------+
| Product_id | year_id | company_id | month_id | col1 | col2 | col3 |
+------------+---------+------------+----------+------+------+------+
| C2 | 2017 | 1 | 2 | 100 | 200 | 300 |
| C3 | 2017 | 1 | 2 | 400 | 500 | 600 |
| C4 | 2017 | 1 | 2 | 700 | 800 | 900 |
+------------+---------+------------+----------+------+------+------+
I do, then, the following query:
SELECT
SUM(totals.relaz) as final_sum,
totals.relaz as 'col',
totals.prod as 'prod',
totals.cons as 'cons',
m.company_id, m.month_id, m.year_id, FROM `table_B` m,
( SELECT formula as relaz,
related_prod as prod,
p.product_id as cons FROM table_A p )
AS totals
WHERE m.product_id=totals.prod
GROUP BY m.company_id, m.year_id, m.month_id, m.product_id, totals.cons
After the select I'd do expect that, considering for example the only product 'U1', the corresponding row would be
+-----------+-----------+------+------+------------+----------+---------+
| final_sum | col | prod | cons | company_id | month_id | year_id |
+-----------+-----------+------+------+------------+----------+---------+
| 100 | col2-col1 | C2 | U1 | 1 | 2 | 2017 |
+-----------+-----------+------+------+------------+----------+---------+
Instead, what I get is
+-----------+-----------+------+------+------------+----------+---------+
| final_sum | col | prod | cons | company_id | month_id | year_id |
+-----------+-----------+------+------+------------+----------+---------+
| 0 | col2-col1 | C2 | U1 | 1 | 2 | 2017 |
+-----------+-----------+------+------+------------+----------+---------+
i.e. the final_sum field is always set to 0, despite the 'col' field contains the correct equation.
What am I doing wrong?
Thank you in advance
Alex
You are trying to get sum from a string column (table_A.formula). This will result 0. MySQL/MariaDB will not try to convert the strings to column references and evaluate the formula in the string.
Another thing is that you should list all columns not in aggregate function in GROUP BY.
To get the result you want, use:
SELECT
SUM(CASE
WHEN a.formula = 'col2-col1' THEN b.col2-b.col1
WHEN a.formula = 'col3-col1' THEN b.col3-b.col1
WHEN a.formula = 'col3-col2' THEN b.col3-b.col2
END
) AS final_sum,
a.formula as 'col',
a.related_prod as 'prod',
a.Product_id as 'cons',
b.company_id,
b.month_id,
b.year_id
FROM table_B b
JOIN table_A a on a.related_prod=b.Product_id
GROUP BY a.formula, a.related_prod, a.Product_id, b.company_id, b.month_id, b.year_id
It may possible to build a Stored routine that fetches the string col2-col1 and inserts it (using CONCAT) into a string, then PREPAREs and EXECUTEs the SQL string.
That is, dynamically build the SQL, perhaps like in #slaakso's Answer.
It would be messy.
I have needed something like this; I chose to do eval() in PHP, which was the client language. I use it for evaluating VARIABLES and GLOBAL STATUS. Example: Table_open_cache_misses / Uptime gives the "misses per second", which, if high, indicates the need for increasing the setting table_open_cache.
Context:
I'm attempting to take a series of market transactions, and determine the amount of money actually moving per item type. This is pretty much my first attempt at MySql, so the query is ugly, but the following nearly works:
SELECT types.typename,
averages.type,
averages.price,
movement.sold,
( averages.price * movement.sold ) AS value
FROM (SELECT type,
Round(Avg(price)) AS price
FROM orders
GROUP BY type) AS averages
INNER JOIN (SELECT type,
( startingvolume - currentvolume ) AS sold
FROM (SELECT type,
Sum(volume) AS currentVolume,
Sum(volumeentered) startingVolume
FROM orders
GROUP BY type) AS movement
WHERE ( startingvolume - currentvolume ) > 10000
ORDER BY sold) AS movement
ON averages.type = movement.type
INNER JOIN invtypes AS types
ON types.typeid = averages.type
ORDER BY value DESC
LIMIT 10 ;
-
+------------------------------------+-------+---------+------------+------------------+
| typeName | type | price | sold | value |
+------------------------------------+-------+---------+------------+------------------+
| Dirt | 34 | 1904767 | 2670581874 | 5086836224393358 |
| Light Wood | 2629 | 42999 | 2756595 | 118530828405 |
| Dark Wood | 24509 | 47344 | 1107771 | 52446310224 |
| Stone | 21922 | 18386 | 1505884 | 27687183224 |
| Grass | 238 | 5643 | 4554470 | 25700874210 |
| Paper | 3814 | 25635 | 861006 | 22071888810 |
| Iron | 3699 | 320270 | 58833 | 18842444910 |
| Ink | 16275 | 8552 | 2200545 | 18819060840 |
| Loam | 2679 | 5759 | 2608771 | 15023912189 |
| Copper | 672 | 904612 | 14989 | 13559229268 |
+------------------------------------+-------+---------+------------+------------------+
The problem with the data above is that the raw market data is unavoidably corrupted by outliers, as you can see below:
select type, price from orders where type = 34 order by price desc limit 10;
-
+------+-----------+
| type | price |
+------+-----------+
| 34 | 200000000 |
| 34 | 15.99 |
| 34 | 12.06 |
| 34 | 10 |
| 34 | 7.67 |
| 34 | 7.5 |
| 34 | 7.3 |
| 34 | 7.17 |
| 34 | 7.1 |
| 34 | 7.06 |
+------+-----------+
Core problem:
99% of the market data is clean, but the outliers destroy the average, and MySql doesn't seem to have a median function. I've found several examples of how to find the median of an entire column, but I need the median per-item.
How would I determine a per-item median in stead of a per-item mean, or efficiently clean the data of these outliers prior to running the primary query?
Note:
I've tried omitting results via std, but prices of items range from $17 to $10B, while deviation remains relatively low, regardless of price range.
I won't touch your original query because it very complex, but one option you could do would be to use a subquery to remove any statistical outliers. For example, if you wanted to remove any outlier from the orders table whose value is more than say two standard deviations away from the mean you could use:
SELECT t1.type,
t1.price
FROM orders t1
INNER JOIN
(
SELECT type,
AVG(price) AS AVG,
STD(price) AS STD
FROM orders
GROUP BY type
) t2
ON t1.type = t2.type
WHERE t1.price < ABS(2*t2.STD - t2.AVG) -- any value more than 2 standard devations
-- away from the mean is discarded
Demo here:
SQLFiddle
I am working on a product sample inventory system where I track the movement of the products. The status of each product can have a status of "IN" or "OUT" or "REMOVED". Each row of the table represents a new entry, where ID, status and date are unique. Each product also has a serial number.
I need help with a SQL query that will return all products that are currently "OUT". If I simply just select SELECT * FROM table WHERE status = "IN", it will return all products that ever had status IN.
Every time product comes in and out, I duplicate the last row of that specific product and change the status and update the date and it will get a new ID automatically.
Here is the table that I have:
id | serial_number | product | color | date | status
------------------------------------------------------------
1 | K0T4N | XYZ | silver | 2016-07-01 | IN
2 | X56Z7 | ABC | silver | 2016-07-01 | IN
3 | 96T4F | PQR | silver | 2016-07-01 | IN
4 | K0T4N | XYZ | silver | 2016-07-02 | OUT
5 | 96T4F | PQR | silver | 2016-07-03 | OUT
6 | F0P22 | DEF | silver | 2016-07-04 | OUT
7 | X56Z7 | ABC | silver | 2016-07-05 | OUT
8 | F0P22 | DEF | silver | 2016-07-06 | IN
9 | K0T4N | XYZ | silver | 2016-07-07 | IN
10 | X56Z7 | ABC | silver | 2016-07-08 | IN
11 | X56Z7 | ABC | silver | 2016-07-09 | REMOVED
12 | K0T4N | XYZ | silver | 2016-07-10 | OUT
13 | 96T4F | PQR | silver | 2016-07-11 | IN
14 | F0P22 | DEF | silver | 2016-07-12 | OUT
This query will give you all the latest records for each serial_number
SELECT a.* FROM your_table a
LEFT JOIN your_table b ON a.serial_number = b.serial_number AND a.id < b.id
WHERE b.serial_number IS NULL
Below query will give your expected result
SELECT a.* FROM your_table a
LEFT JOIN your_table b ON a.serial_number = b.serial_number AND a.id < b.id
WHERE b.serial_number IS NULL AND a.status LIKE 'OUT'
There are two good ways to do this. Which way is best,in terms of performance, can depend on various factors, so try both.
SELECT
t1.*
FROM table t
LEFT OUTER JOIN table later_t
ON later_t.serial_number = t.serial_number
AND later_t.date > t.date
WHERE later_t.id IS NULL
AND t.status = "OUT"
Which column you check from later_t for IS NULL does not matter, so long as that column is declared NOT NULL in the table definition.
The other logically equivalent method is:
SELECT
t.*
FROM table t
INNER JOIN (
SELECT
serial_number,
MAX(date) AS date
FROM table
GROUP BY serial_number
) latest_t
ON later_t.serial_number = t.serial_number
AND latest_t.date = t.date
WHERE t.status = "OUT"
For each of these queries, I strongly suggest the following index:
ALTER TABLE table
ADD INDEX `LatestSerialStatus` (serial_number,date)
I use this type of query a lot in my own work, and have the above index as the primary key on tables. Query performance is extremely fast in such cases, for these type of queries.
See also the documentation on this query type.
I have one problem that I can't resolve.
I have 2 tables.
Table 1:
ID | Time
1 | 08:12:54
2 | 08:15:40
3 | 09:30:01
4 | 10:15:15
5 | 10:56:12
6 | 11:00:03
Table 2:
ID | Name| Previous | Current
1 | Queue | null | 11
2 | Queue | 11 | 19
3 | Queue | 19 | 11
3 | List | null | 11
4 | Queue | 11 | 16
4 | List | null | 11
5 | Queue | null | 15
6 | Queue | 15 | 19
The result wanted:
NumberQueue | Start | End
11 | 08:12:54 | 08:15:40
19 | 08:15:40 | 09:30:01
11 | 09:30:01 | 10:15:15
15 | 10:56:12 | 11:00:03
...
...
The previous and the current fields, have the number of the Queue and I want to know for each Queue, the start date and the end date, knowing that the previous has the previous Queue, and the current has the new Queue.
I want one query that can present this result. Help me. :(
Regards.
SELECT t1outer.ID, t1outer.Time AS start, (
SELECT Time FROM Table1 AS t1inner
WHERE t1inner.ID > t1outer.ID
ORDER BY ID ASC LIMIT 1
) AS end, Table2.Previous, Table2.Current
FROM Table1 AS t1outer
LEFT JOIN Table2 USING (ID);
This select statement should provide the information you need:
SELECT Current AS Number, t1out.Time AS Start, (
SELECT Time FROM Table1 AS t1in
WHERE t1in.ID > t1out.ID
ORDER BY ID ASC LIMIT 1
) AS End FROM Table2
LEFT JOIN Table1 AS t1out USING (ID)
WHERE Table2.Name = 'Queue';
Using table below, How would get a column for 5 period moving average, 10 period moving average, 5 period exponential moving average.
+--------+------------+
| price | data_date |
+--------+------------+
| 122.29 | 2009-10-08 |
| 122.78 | 2009-10-07 |
| 121.35 | 2009-10-06 |
| 119.75 | 2009-10-05 |
| 119.02 | 2009-10-02 |
| 117.90 | 2009-10-01 |
| 119.61 | 2009-09-30 |
| 118.81 | 2009-09-29 |
| 119.33 | 2009-09-28 |
| 121.08 | 2009-09-25 |
+--------+------------+
The 5-row moving average in your example won't work. The LIMIT operator applies to the return set, not the rows being considered for the aggregates, so changing it makes no difference to the aggregate values.
SELECT AVG(a.price) FROM (SELECT price FROM t1 WHERE data_date <= ? ORDER BY data_date DESC LIMIT 5) AS a;
Replace ? with the date whose MA you need.
SELECT t1.data_date,
( SELECT SUM(t2.price) / COUNT(t2.price) as MA5 FROM mytable AS t2 WHERE DATEDIFF(t1.data_date, t2.data_date) BETWEEN 0 AND 6 )
FROM mytable AS t1 ORDER BY t1.data_date;
Change 6 to 13 for 10-day MA