I have data which is formed like this:
+----------+-------+-------+
| DAY | VALUE | Name |
+----------+-------+-------+
| 01/01/14 | 1030 | BOB
| 01/02/14 | 1020 | BOB
| 01/03/14 | 1080 | BOB
| 01/04/14 | 1090 | BOB
| 01/05/14 | 1040 | BOB
| 01/08/14 | 1030 | BOB
| 01/11/14 | 4030 | BOB
| 01/12/14 | 5000 | BOB
| 01/13/14 | 6000 | BOB
| 01/14/14 | 1096 | BOB
| 01/14/14 | 1200 | MIKE
| 01/15/14 | 1040 | MIKE
| 01/16/14 | 1600 | MIKE
| 01/17/14 | 1070 | MIKE
| 01/18/14 | 1340 | MIKE
| 01/19/14 | 1060 | MIKE
| 01/01/14 | 6000 | JANE
| 01/02/14 | 1700 | JANE
| 01/03/14 | 1070 | JANE
| 01/04/14 | 8000 | JANE
+----------+-------+------+
For each name there needs to be a row for the dates between 01/01/14 to 02/01/14 (1 month). As you can see Bob, Mike, and Jane (although in my real database there are thousands of names) are all missing dates between this time period. I would like to somehow insert the missing rows by interpolation of some sort. For example Bob is missing 01/06/14 and 01/07/14. I would like it to interpolate by adding these two dates and then the values to be the average of the two field between so these two missing fields would both have the value ((1040+1030)/2) = 1035. If there is no data before like for MIKE (starts at 01/14/14) I would like all the new rows to have 01/14/14 value now. I have tried various different techniques such as using coalesce command, cursors, but can't get it to work. Also I am not set on having these EXACT values, if there is some sort of math library which can interpolate I would be open to this as well. Thanks.
You have two problems, generating the rows and interpolating the values. You can generate the rows with this SQL:
select d.day, n.name, t.value
from (select distinct name from table t) n cross join
(select distinct day from table t) d left outer join
table t
on t.name = n.name and t.day = d.day;
Doing the interpolation is troublesome. You can do this using variables and multiple sorting. Here is logic:
select day, name, value, prev_value,
#value as next_value,
#value := if(#name = name and value is not null, value, #value),
#name := name
from (select d.day, n.name, t.value,
#value as prev_value,
#value := if(#name = name and value is not null, value, #value),
#name := name
from (select distinct name from table t) n cross join
(select distinct day from table t) d left outer join
table t
on t.name = n.name and t.day = d.day cross join
(select #name := '', #value := NULL) vars
order by n.name, d.day
) t cross join
(select #name := '', #value := NULL) vars
order by n.name, d.day desc;
This will probably work for you, but it is depending on MySQL evaluating the expressions in order in each select (for the assignment of variables). You can make the syntax more complicated to fix this, but that would hide the logic. You can now implement the logic that you want:
select day, name,
(case when value is not null then value
when prev_value is not null and next_value is not null
then (prev_value + next_value) / 2
when prev_value is null then next_value
else prev_value
end) as value
from (<previous query here>) t;
Related
I have an abstract problem which can be simplified as the following problem: Assume that we have two tables persons and names that look as follows:
SELECT * FROM persons;
+----+-------+--------+
| id | name | fan_of |
+----+-------+--------+
| 1 | alice | 2 |
| 2 | bob | 4 |
| 3 | carol | 1 |
| 4 | dave | 3 |
| 5 | bob | 2 |
+----+-------+--------+
and
SELECT * FROM names;
+----+-------+--------+
| id | name | active |
+----+-------+--------+
| 1 | alice | 1 |
| 2 | bob | 1 |
| 3 | carol | 0 |
| 4 | dave | 1 |
+----+-------+--------+
Every person (a row in the persons) table is a fan of itself or another person (represented by that other persons id in the fan_of column). The names table contains names that can be active or inactive.
For a given offset k, I want to SELECT the persons (rows of persons) that have the k+1-th active name as their name or that have one of these people as their fans. For example, if the offset is 1, the second active name is bob and hence I want to select all people with the name bob plus the people that have one of these bobs as their fans, which is in this example the row of persons with id=4. This means that I want to have the result:
+----+------+--------+
| id | name | fan_of |
+----+------+--------+
| 2 | bob | 4 |
| 4 | dave | 3 |
| 5 | bob | 2 |
+----+------+--------+
What I have so far is the following query:
1 SELECT * FROM persons WHERE
2 EXISTS (
3 SELECT * FROM (
4 SELECT * FROM names WHERE active=true LIMIT 1 OFFSET 1
5 ) AS selectedname WHERE (selectedname.name=persons.name)
6 )
7 OR
8 EXISTS (
9 SELECT * FROM(
10 SELECT * FROM persons WHERE EXISTS (
11 SELECT * FROM (
12 SELECT * FROM names WHERE active=true LIMIT 1 OFFSET 1
13 ) AS selectedname WHERE (selectedname.name=persons.name)
14 )
15 ) AS personswiththatname WHERE persons.id=personswiththatname.fan_of
16 );
It gives me the desired result from above but please note that it is inefficient because the lines 3-5 and 11-13 are the same.
I have the following two questions:
What can be done to avoid this inefficiency?
I actually need to distinguish between those rows that came from the
name condition (here the rows with name=bob) and those that came
from the fan_of condition (here the row with name=dave). This
could be done in the application code but then I would need another
database query before to find out the k+1-th active name and this might
be slow (please correct me if this is the better solution). I would
rather prefer an additional column z that helps me to distinguish
like
+----+------+--------+---+
| id | name | fan_of | z |
+----+------+--------+---+
| 2 | bob | 4 | 1 |
| 4 | dave | 3 | 0 |
| 5 | bob | 2 | 1 |
+----+------+--------+---+
How can such an output be achieved?
It looks like I can get the minimum you want to achieve using parameters (should this be an option).
It's not pretty, but I can't see a simple way of achieving what you're asking for, so this is what I have so far....(set #offset to suit 'k')
SET #offset = 1;
SET #name = (SELECT name FROM (select name, #rank := #rank +1 as Rank from names n, (SELECT #rank := 0) r where active !=0) as activeRanked where activeRanked.rank = (1 + #offset));
select
a.*
From persons a
where (a.name = #name) OR (a.id IN (SELECT fan_of from persons where name = #name));
If you still don't have an answer by the time I've had food, I'll look at part 2.
(hopefully I've read your brief correctly)
P.S. I've kept the #name SQL in a single line as it seems to read better in this context.
Edit: Here's a pretty messy but functional indicator of source, using your example. Z = 1 is where the row is from the name, '0' is from fan_of
SET #offset = 1;
SET #name = (SELECT name FROM (select name, #rank := #rank +1 as Rank from names n, (SELECT #rank := 0) r where active !=0) as activeRanked where activeRanked.rank = (1 + #offset));
select
a.*,'1' as z
From persons a
where (a.name = #name)
union
select
a.*,'0' as z
From persons a
where (a.id IN (SELECT fan_of from persons where name = #name));
Distinct ID Query:
SET #offset = 1;
SET #name = (SELECT name FROM (select name, #rank := #rank +1 as Rank from names n, (SELECT #rank := 0) r where active !=0) as activeRanked where activeRanked.rank = (1 + #offset));
SELECT id, name, fan_of, z FROM
(select
distinct a.id,
a.name,
a.fan_of,
1 as z
From persons a
where (a.name = #name)
union
select
distinct a.id,
a.name,
a.fan_of,
0 as z
From persons a
where (a.id IN (SELECT fan_of from persons where name = #name))
ORDER BY z desc) qry
GROUP BY id;
This produces:
+----+------+--------+---+
| id | name | fan_of | z |
+----+------+--------+---+
| 2 | bob | 4 | 1 |
| 5 | bob | 2 | 1 |
| 4 | dave | 3 | 0 |
+----+------+--------+---+
I have same problem like in this question Count rows until value in column changes mysql
But although the issue was resolved I not understand this query. Because I have low reputation points I can't leave comment there and must open new question.
My example:
mysql> select * from example;
+----+-------+------+
| id | name | succ |
+----+-------+------+
| 1 | peter | 1 |
| 2 | bob | 1 |
| 3 | peter | 0 |
| 4 | peter | 0 |
| 5 | nick | 1 |
| 6 | bob | 0 |
| 7 | peter | 1 |
| 8 | bob | 0 |
| 9 | peter | 1 |
| 10 | peter | 1 |
+----+-------+------+
10 rows in set (0.00 sec)
I want to count successive true values for peter (descending id, and results must be 3), I know how to set query like this :
mysql> select count(succ)
from example
where id > (select max(id) from example where succ = 0);
+-------------+
| count(succ) |
+-------------+
| 2 |
+-------------+
1 row in set (0.00 sec)
But how to get results just for peter, and if it is possible to get results grouped by name, like this:
+--------+------+
|name | succ |
+--------+------+
| peter | 3 |
| bob | 0 |
| nick | 1 |
+--------+------+
Use variables to count consecutive successes (re-starting when seeing a failure), and join with a query which selects highest id per name (somewhat similar to McNets's answer)
SELECT a.name, a.count FROM (
SELECT e1.id, e1.name, e1.succ,
#count_success := IF (#prev_name = e1.name AND e1.succ = 1, #count_success + 1, e1.succ) AS `count`,
#prev_name := e1.name AS `prev_name`
FROM `example` e1, (SELECT #count_success :=0, #prev_name := NULL) init
ORDER BY e1.name, e1.id ) `a`
JOIN (SELECT MAX(id) AS `max` FROM `example` GROUP BY `name`) `b` ON a.id = b.max
One way to solve this is with a self join. Using an outer join you exclude the rows that have a matching row with a higher id value and succ = 0, and then count the rows with succ = 1 using SUM() and CASE.
Here's the query for your example:
select e1.name,
sum(case when e1.succ = 1 then 1 else 0 end) as succ
from example e1
left outer join example e2 on e2.id > e1.id
and e2.name = e1.name
and e2.succ = 0
where e2.id is null
group by e1.name
if you need the count of records
select name, count(succ) from example group by name
or if you need the sum of the succ of every person you can use
select name, sum(succ) from example group by name
I have a simple table and I need to identified groups of four rows (the groups aren't consecutives), but each rows of each row has a +1 in the value. For example:
----------------------
| language | id |
----------------------
| C | 16 |
| C++ | 17 |
| Java | 18 |
| Python | 19 |
| HTML | 65 |
| JavaScript | 66 |
| PHP | 67 |
| Perl | 68 |
----------------------
I want to add a column that indicates the group or set, how is possible to get this output using MySQL?:
----------------------------
| language | id | set |
----------------------------
| C | 16 | 1 |
| C++ | 17 | 1 |
| Java | 18 | 1 |
| Python | 19 | 1 |
| HTML | 65 | 2 |
| JavaScript | 66 | 2 |
| PHP | 67 | 2 |
| Perl | 68 | 2 |
----------------------------
Note that in this examples is only 2 sets (it could be 1 or more sets) and they didn't start in 16 (such values are not knowledged, but the restriction is that each id value of each row has this form n, n+1, n+2 and n+3).
I've been investigating about Gaps & Islands problem but didn't figure how to solve it by using their solutions. Also I search on stackoverflow but the closest question that I found was How to find gaps in sequential numbering in mysql?
Thanks
SELECT language,id,g
FROM (
SELECT language,id,
CASE WHEN id=#lastid+1 THEN #n ELSE #n:=#n+1 END AS g,
#lastid := id As b
FROM
t, (SELECT #n:=0) r
ORDER BY
id
) s
EDIT
In case you want just 4 per group add a row number variable:
SELECT language,id,g,rn
FROM (
SELECT language,id,
CASE WHEN id=#lastid+1 THEN #n ELSE #n:=#n+1 END AS g,
#rn := IF(#lastid+1 = id, #rn + 1, 1) AS rn,
#lastid := id As dt
FROM
t, (SELECT #n:=0) r
ORDER BY
id
) s
Where rn <=4
FIDDLE
select language,
#n:=if(#m+1=id, #n, #n+1) `set`,
(#m:=id) id
from t1,
(select #n:=0) n,
(select #m:=0) m
Demo on sqlfiddle
You can use the following query:
SELECT l.*, s.rn
FROM languages AS l
INNER JOIN (
SELECT minID, #rn2:=#rn2+1 AS rn
FROM (
SELECT MIN(id) AS minID
FROM (
SELECT id,
id - IF (true, #rn1:=#rn1+1, 0) AS grp
FROM languages
CROSS JOIN (SELECT #rn1:=0) AS var1
ORDER BY id) t
GROUP BY grp
HAVING COUNT(grp) = 4 ) u
CROSS JOIN (SELECT #rn2:=0) AS var2
) s ON l.id BETWEEN minID AND minID + 3
The above query identifies islands of exactly 4 consecutive records and returns there records only. It is easily modifiable to account for a different number of consecutive records.
Please also note the usage of IF conditional: it guarantees that #rn1 is first initialized and then used in order to calculate grp field.
Demo here
Assuming that we have the following MySQL table:
ID | Name | Last_Name | Location |
1 | Alex | Griff | DT |
2 | John | Doe | York |
3 | Pat | Benat | DT |
4 | Jack | Darny | DT |
5 | Duff | Hill | York |
I want to create an sql statement that selects randomly one row of each location and store them in a new table.
For example:
2 | John | Doe | York |
3 | Pat | Benat | DT |
OR
4 | Jack | Darny | DT |
5 | Duff | Hill | York |
I would like to execute this on SQL since it's much faster than doing it on a Java program and using HashMap<K,V> and then storing the values again in another table.
This page has the solution. I simply modified the query to your table definition, as follows:
SELECT tmp.ID, tmp.Name, tmp.Last_Name, tmp.Location
FROM profiles
LEFT JOIN (SELECT * FROM profiles ORDER BY RAND()) tmp ON (profiles.Location = tmp.Location)
GROUP BY tmp.Location
ORDER BY profiles.Location;
SQL Fiddle demo
If you want a random sample for each location, you have several options. I think the easiest is a variable approach that will work well if your table is not super big.
select t.*
from (select t.*,
#rn := if(#location = location, #rn + 1, 1) as rn,
#location := location
from table t cross join
(select #location := '', #rn := 0) vars
order by location, rand()
) t
where rn = 1;
This assigns a sequential number to the locations and then chooses the first one.
Trying to sort rows from lowest to highest continually, or rather repeatedly using MySql. For example: if a column has the following values: 1,3,2,4,2,1,4,3,5, then it should end up like this 1,2,3,4,5,1,2,3,4. So it goes from lowest to highest, but tries to sort again from lowest to highest multiple times.
For large sets, the semi-JOIN operation (the approach in the answer from Strawberry) may create an unwieldy resultset. (Then again, MySQL may have some optimizations in there.)
Another alternative available in MySQL is to use "user variables", like this:
SELECT r.mycol
FROM ( SELECT IF(q.mycol=#prev,#seq := #seq + 1,#seq := 1) AS seq
, #prev := q.mycol AS mycol
FROM mytable q
JOIN (SELECT #prev := NULL, #seq := NULL) p
ORDER BY q.mycol
) r
ORDER BY r.seq, r.mycol
Let me unpack that a bit, and explain what it's doing, starting with the inner query (inline view aliased as r.) We're telling MySQL to get the column (mycol) containing the values you want to sort, e.g. 1,3,2,4,2,1,4,3,5 and we're telling MySQL to order these in ascending sequence: 1,1,2,2,3,3,4,4,5.
The "trick" now is to use a MySQL user variable, so that we can compare the mycol value from the current row to the mycol value from the previous row, and we use that to assign an ascending sequence value, from 1..n on each distinct value.
With that resultset, we can tell MySQL to order by that assigned sequence value first, and then by the value from mycol.
If there is a unique id on each row, then a correlated subquery can be used to get an equivalent result (although this approach is very unlikely to perform as well on large sets)
SELECT r.mycol
FROM mytable r
ORDER
BY ( SELECT COUNT(1)
FROM mytable q
WHERE q.mycol = r.mycol
AND q.id <= r.id
)
, r.mycol
Here's the setup for the test case:
CREATE TABLE mytable (id INT, mycol INT);
INSERT INTO mytable (id, mycol) VALUES
(1,1),(2,3),(3,2),(4,4),(5,2),(6,1),(7,4),(8,3),(9,5);
there is no order two time just this
ORDER BY column ASC
Let's pretend that the PK is a unique integer. Consider the following...
CREATE TABLE seq(id INT NOT NULL PRIMARY KEY,val INT);
INSERT INTO seq VALUES (8,1),(4,2),(1,3),(2,4),(7,0),(6,1),(3,2),(5,5);
SELECT * FROM seq ORDER BY val;
+----+------+
| id | val |
+----+------+
| 7 | 0 |
| 6 | 1 |
| 8 | 1 |
| 3 | 2 |
| 4 | 2 |
| 1 | 3 |
| 2 | 4 |
| 5 | 5 |
+----+------+
SELECT x.*
, COUNT(*) rank
FROM seq x
JOIN seq y
ON y.val = x.val
AND y.id <= x.id
GROUP
BY id
ORDER
BY rank
, val;
+----+------+------+
| id | val | rank |
+----+------+------+
| 7 | 0 | 1 |
| 6 | 1 | 1 |
| 3 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 4 | 1 |
| 5 | 5 | 1 |
| 8 | 1 | 2 |
| 4 | 2 | 2 |
+----+------+------+