Compacting tables after normalisation - mysql

I recently increased the level of normalisation in my database, going from something like this:
+--------------------------------------+
| state_changes |
+----+-------+-----------+------+------+
| ID | Name | Timestamp | Val1 | Val2 |
+----+-------+-----------+------+------+
| 0 | John | 17:19:01 | A | X |
| 1 | Bob | 17:19:02 | E | W |
| 2 | John | 17:19:05 | E | Y |
| 3 | John | 17:19:06 | B | Y |
| 4 | John | 17:19:12 | C | Z |
| 5 | John | 17:19:15 | A | Z |
+----+-------+-----------+------+------+
To something more like this:
+-------------------------------+ +-------------------------------+
| state_changes_1 | | state_changes_2 |
+----+-------+-----------+------+ +----+-------------------+------+
| ID | Name | Timestamp | Val1 | | ID | Name | Timestamp | Val2 |
+----+-------+-----------+------+ +----+-------+-----------+------+
| 0 | John | 17:19:01 | A | | 0 | John | 17:19:01 | X |
| 1 | Bob | 17:19:02 | E | | 1 | Bob | 17:19:02 | W |
| 2 | John | 17:19:05 | E | | 2 | John | 17:19:05 | Y |
| 3 | John | 17:19:06 | B | | 3 | John | 17:19:06 | Y |
| 4 | John | 17:19:12 | C | | 4 | John | 17:19:12 | Z |
| 5 | John | 17:19:15 | A | | 5 | John | 17:19:15 | Z |
+----+-------+-----------+------+ +----+-------+-----------+------+
How could I now write a query to "compact" the two resulting tables where values are duplicated?
I want to ignore the ID field when considering row uniqueness;
I want to ignore the Timestamp when considering row uniqueness;
But fields must be sequential (under a Name,Timestamp ordering) to be considered duplicates.
The result, in this example, should be:
+-------------------------------+ +-------------------------------+
| state_changes_1 | | state_changes_2 |
+----+-------+-----------+------+ +----+-------+-----------+------+
| ID | Name | Timestamp | Val1 | | ID | Name | Timestamp | Val2 |
+----+-------+-----------+------+ +----+-------+-----------+------+
| 0 | John | 17:19:01 | A | | 0 | John | 17:19:01 | X |
| 1 | Bob | 17:19:02 | E | | 1 | Bob | 17:19:02 | W |
| 3 | John | 17:19:06 | B | | 2 | John | 17:19:05 | Y |
| 4 | John | 17:19:12 | C | | 4 | John | 17:19:12 | Z |
| 5 | John | 17:19:15 | A | +----+-------+-----------+------+
+----+-------+-----------+------+
My tables have several billion rows so I'm looking for something that takes efficiency into consideration; that said, I'm a realistic sort of person so I'm happy for the query to take an hour or two to run (including index rebuilds) if needs be.

I tried this on MySQL 5.1.58 and it seems to work with your test data.
SET #name = NULL;
SET #val1 = NULL;
UPDATE state_changes_1
SET Val1 = IF(Name=#name AND Val1=#val1, NULL, (#val1:=Val1)),
Name = (#name:=Name)
ORDER BY Name, `Timestamp`;
DELETE FROM state_changes_1 WHERE Val1 IS NULL;

Your problem is your concept of 'sequential' or consecutive duplicate doesn't exist in relational algebra so won't be able to do it in sql. You can get easily the latest timestamp of each state by doing
SELECT id, name, MAX(timestamp) ts , state FROM states
GROUP BY id, name, state
ORDER BY ts
However, you could do what you want by dumping your table into a text file and do a simple script in which ever language you are confortable with, perl, ruby python etc. Even on a million row table that could be done quiet quickly

Related

MySQL: Rows concatenation

May be this is really a simple question, thanks in advance.
What I currently have:
+-----+---+---+---+---+
| sid | a | b | c | d |
+-----+---+---+---+---+
| 123 | | | | 4 |
| 123 | | 2 | | |
| 123 | | | 3 | |
| 123 | 1 | | | |
| 456 | | 5 | | |
| 456 | | | 6 | |
| 789 | | | | 8 |
| 789 | 7 | | | |
+-----+---+---+---+---+
What I am trying to get:
+-----+------+------+------+------+
| sid | a | b | c | d |
+-----+------+------+------+------+
| 123 | 1 | 2 | 3 | 4 |
| 456 | | 5 | 6 | |
| 789 | 7 | | | 8 |
+-----+------+------+------+------+
How such "rows concatenation" could be done in MySQL?
You can do this with the MAX() aggregation function with a GROUP BY clause in your query.
SELECT sid, MAX(a), MAX(b), MAX(c), MAX(d)
FROM table
GROUP BY sid
I used MAX() because it will filter the NULL values with others values.
More explanation here : MySQL Documentation

Make sequence autoincrement order number by variable in Mysql

In my example I have this table :
| name | number |
-------------------
| abc | |
| bca | |
| sad | |
| tyu | |
| hjh | |
| lpk | |
| ass | |
| drc | |
| dfg | |
then i get some variable filled with number like :
$order = 3, then i want to make query to update the table above to look like this :
| name | number |
--------------------
| abc | 1 |
| bca | 2 |
| sad | 3 |
| tyu | 1 |
| hjh | 2 |
| lpk | 3 |
| ass | 1 |
| drc | 2 |
| dfg | 3 |
How do I do that in mysql query??
Thanks in advance guys
SET #order=3;
UPDATE Table1 SET number2=MOD(number-1,#order)+1;

Find last value from one out of 3 row groups

I want to find a specific value from this table; the last value for each ElEnd that is has ItemNumber 2:
ID | ID2 | Item1 | Item2 | Item3 | Element | ItemNum | ElStart | ElEnd
===================================================================
1 | 1 | rock | n | roll | r | 1 | 23.212 | 23.222
2 | 1 | rock | n | roll | o | 1 | 23.222 | 23.256
3 | 1 | rock | n | roll | c | 1 | 23.256 | 23.277
4 | 1 | rock | n | roll | k | 1 | 23.277 | 23.290
5 | 1 | rock | n | roll | n | 2 | 23.290 | 23.321
6 | 1 | rock | n | roll | r | 3 | 23.321 | 23.331
7 | 1 | rock | n | roll | o | 3 | 23.331 | 23.434
8 | 1 | rock | n | roll | l | 3 | 23.434 | 23.456
9 | 1 | rock | n | roll | l | 3 | 23.456 | 23.567
10 | 2 | a | tiny | rock | a | 1 | 23.567 | 23.678
11 | 2 | a | tiny | rock | t | 2 | 23.678 | 23.789
12 | 2 | a | tiny | rock | i | 2 | 23.789 | 23.890
13 | 2 | a | tiny | rock | n | 2 | 23.890 | 23.901
14 | 2 | a | tiny | rock | y | 2 | 23.901 | 24.123
15 | 2 | a | tiny | rock | r | 3 | 24.123 | 24.234
16 | 2 | a | tiny | rock | o | 3 | 24.234 | 24.345
17 | 2 | a | tiny | rock | c | 3 | 24.345 | 24.456
18 | 2 | a | tiny | rock | k | 3 | 24.456 | 24.567
So in the case of this example table, I want to select 23.321 and 24.123. I later want to use these values in an UPDATE to copy them to a new column Item2ElementEnd.
I've tried a number of queries that use subselect or UNION, but none of them were efficient - they were all running so slowly that I had to stop them (my table has about 600.000 entries).
This is a query which gives me the wrong value (ElEnd for ItemNum 3 rather than 2):
select ID2, Item2, max(ElEnd)
from t1
group by ID2;
This is an example query which didn't work because it was running WAY too slowly (I had to abort):
select Item2, ElStart, ElEnd
from t1
where ItemNum = "2"
and ElStart = (select max(ElStart) from t1 as f where f.Item2 = t1.Item2);
How can I do this most efficiently?
I have now found a (surprisingly simple) solution using this query:
select ID, ID2, item2, max(ElEnd), ItemNum
from t1
WHERE ItemNum = 2
group by ID2, ItemNum;

SQL Select rows with max value from joined table

I have these 3 tables like that:
lecturers:
+-------------+---------+
| id-lecturer | name |
+-------------+---------+
| 1 | Johnson |
| 2 | Smith |
| ... | ... |
| ... | ... |
+-------------+---------+
subjects:
+------------+---------+
| id-subject | name |
+------------+---------+
| 1 | Math |
| 2 | Physics |
| ... | ... |
| ... | ... |
+------------+---------+
exams:
+---------+-------------+------------+------------+
| id-exam | id-lecturer | id-subject | date |
+---------+-------------+------------+------------+
| 1 | 5 | 1 | 1990-05-05 |
| 2 | 7 | 1 | ... |
| 3 | 5 | 3 | ... |
| ... | ... | ... | ... |
+---------+-------------+------------+------------+
When i try to do the first SELECT:
SELECT e.`id-lecturer`, e.`id-subject`, COUNT(e.`id-lecturer`) AS `exams-num`
FROM exams e
JOIN subjects s ON e.`id-subject`=s.`id-subject`
JOIN lecturers l ON e.`id-lecturer`=l.`id-lecturer`
GROUP BY e.`id-lecturer`, e.`id-subject`
I get the right answer. It shows something like that:
+-------------+------------+-----------+
| id-lecturer | id-subject | exams-num |
+-------------+------------+-----------+
| 0001 | 1 | 4 |
| 0001 | 3 | 1 |
| 0001 | 4 | 1 |
| 0001 | 5 | 1 |
| 0002 | 1 | 2 |
| 0002 | 2 | 1 |
| 0002 | 4 | 1 |
| 0002 | 6 | 3 |
+-------------+------------+-----------+
Now i want to show only the max number for every lecturer, my code is:
SELECT it.`id-lecturer`, it.`id-subject`, MAX(it.`exams-num`) AS `exams-number`
FROM (
SELECT e.`id-lecturer`, e.`id-subject`, COUNT(e.`id-lecturer`) AS `exams-num`
FROM egzaminy e
JOIN subjects s ON e.`id-subject`=s.`id-subject`
JOIN lecturers l ON e.`id-lecturer`=l.`id-lecturer`
GROUP BY e.`id-lecturer`, e.`id-subject`) it
GROUP BY it.`id-lecturer`
output:
+-------------+------------+--------------+
| id-lecturer | id-subject | exams-number |
+-------------+------------+--------------+
| 0001 | 1 | 4 |
| 0002 | 1 | 3 |
| 0003 | 1 | 2 |
| 0004 | 1 | 5 |
| 0005 | 2 | 1 |
+-------------+------------+--------------+
I get the correct numbers of the max values for each lecturer, but the subjects id doesn't match, it always takes the first row's id. How can I make it to match correctly these two fields in every row?
I guess you can simply use the same query for further conditions like below.
Select t.Lecturer_id,max(t.exams-num) from
(SELECT e.id-lecturer as Lecturer_id, e.id-subject as Subject_id,
COUNT(e.id-lecturer) AS exams-num
FROM exams e
JOIN subjects s ON e.id-subject=s.id-subject
JOIN lecturers l ON e.id-lecturer=l.id-lecturer
GROUP BY e.id-lecturer, e.id-subject ) as t
group by t.Lecturer_id

Is there a way in SQL (MySQL) to do a "round robin" ORDER BY on a particular field?

Is there a way in SQL (MySQL) to do a "round robin" ORDER BY on a particular field?
As an example, I would like to take a table such as this one:
+-------+------+
| group | name |
+-------+------+
| 1 | A |
| 1 | B |
| 1 | C |
| 2 | D |
| 2 | E |
| 2 | F |
| 3 | G |
| 3 | H |
| 3 | I |
+-------+------+
And run a query that produces results in this order:
+-------+------+
| group | name |
+-------+------+
| 1 | A |
| 2 | D |
| 3 | G |
| 1 | B |
| 2 | E |
| 3 | H |
| 1 | C |
| 2 | F |
| 3 | I |
+-------+------+
Note that the table may have many rows, so I can't do the ordering in the application. (I'd obviously have a LIMIT clause as well in the query).
I'd try something like:
SET #counter = 0;
SELECT (#counter:=#counter+1)%3 as rr, grp, name FROM table ORDER by rr, grp
What you can do is create a temporary column in which you create sets to give you something like this:
+-------+------+-----+
| group | name | tmp |
+-------+------+-----+
| 1 | A | 1 |
| 1 | B | 2 |
| 1 | C | 3 |
| 2 | D | 1 |
| 2 | E | 2 |
| 2 | F | 3 |
| 3 | G | 1 |
| 3 | H | 2 |
| 3 | I | 3 |
+-------+------+-----+
To learn how to create the sets, have a look at this question/answer.
Then its a simple
ORDER BY tmp, group, name
You can use MySQL variables to do this.
SELECT grp, name, #row:=#row+1 from table, (SELECT #row:=0) r ORDER BY (#row % 3);
+------+------+--------------+
| grp | name | #row:=#row+1 |
+------+------+--------------+
| 1 | A | 1 |
| 2 | D | 4 |
| 3 | G | 7 |
| 1 | B | 2 |
| 2 | E | 5 |
| 3 | H | 8 |
| 1 | C | 3 |
| 2 | F | 6 |
| 3 | I | 9 |
+------+------+--------------+