Randomize Primary Keys Based on Existing Values - mysql

I have got an older database for which (at some really questionable and obscure reason I do not like to put too much on topic here) I want to randomize or shuffle the primary keys.
I right now have auto-increment fields in the Mysql database tables.
I have not many relations, those that exist are not defined as foreign keys. The relationships do not need to be preserved.
All I'm looking for is to take the current values of the primary keys and replace it with a random value out of those like:
ID := new(ID)
Where the new function returns a value from the set of all OLD ids with a 1:1 match. E.g.
2 := 3
3 := 2
But not
2 := 3
3 := 3
Is there a way to change the data in the database with (ideally) a single SQL query per table?
Edit: I do not have really strict requirements. Consider to have exclusive access to the database if it helps, including changing constraints on the primary key back and forth, e.g. alter the table, do the operation, alter the table to previous schema. It is also possible to add another column for the new (or old) PK value.

Just a scetch of the procedure. Create two temporary tables
CREATE TABLE temp_old
( ai INT NOT NULL AUTO_INCREMENT
, id INT NOT NULL
, PRIMARY KEY (ai)
, INDEX old_idx (id, ai)
) ENGINE = InnoDB ;
CREATE TABLE temp_new
( ai INT NOT NULL AUTO_INCREMENT
, id INT NOT NULL
, PRIMARY KEY (ai)
, INDEX new_idx (id, ai)
) ENGINE = InnoDB ;
Copy the id values in different orders to the two tables (randomly in the 2nd table):
INSERT INTO temp_old
(id)
SELECT id
FROM tableX
ORDER BY id ;
INSERT INTO temp_new
(id)
SELECT id
FROM tableX
ORDER BY RAND() ;
Then we drop the primary key:
ALTER TABLE tableX
DROP PRIMARY KEY ;
to run the actual UPDATE statement:
UPDATE tableX AS t
JOIN temp_old AS o
ON o.id = t.id
JOIN temp_new AS n
ON n.ai = o.ai
SET t.id = n.id ;
Then recreate the primary key and drop the temp tables:
ALTER TABLE tableX
ADD PRIMARY KEY (id) ;
DROP TABLE temp_old ;
DROP TABLE temp_new ;
Tested in SQL-Fiddle

Here's a technique that creates a list of your ids in table order, along with a sequential number from 1, it also creates a list of your ids in a random order, along with a sequential number from 1. It then updates the ids based on matching the sequential number.
There are issues with the performance of order by rand(), (and it's randomness).
If your keys are already sequential starting from 1, you can simplify this.
Update
Test as t
Inner Join (
Select
#rownum2 := #rownum2 + 1 as rank,
t2.id
From
Test t2,
(Select #rownum2:= 0) a1
) as o on t.id = o.id
Inner Join (
Select
#rownum := #rownum + 1 as rank,
t3.id
From
(Select id from Test order by Rand()) t3,
(Select #rownum:= 0) a2
) as n on o.rank = n.rank
Set
t.id = n.id
http://sqlfiddle.com/#!2/3f354/1

You could create a stored procedure that would create a temporary table containing all of the ids, then you can loop over each record, replacing the id with an id from the temp table then removing that id from the temp table. I don't believe there is a way to do what you are talking about in a single query though.

Related

Can I control which JOINed row gets used in an UPDATE?

Once upon a time, I had a table like this:
CREATE TABLE `Events` (
`EvtId` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`AlarmId` INT UNSIGNED,
-- Other fields omitted for brevity
PRIMARY KEY (`EvtId`)
);
AlarmId was permitted to be NULL.
Now, because I want to expand from zero-or-one alarm per event to zero-or-more alarms per event, in a software update I'm changing instances of my database to have this instead:
CREATE TABLE `Events` (
`EvtId` INT UNSIGNED NOT NULL AUTO_INCREMENT,
-- Other fields omitted for brevity
PRIMARY KEY (`EvtId`)
);
CREATE TABLE `EventAlarms` (
`EvtId` INT UNSIGNED NOT NULL,
`AlarmId` INT UNSIGNED NOT NULL,
PRIMARY KEY (`EvtId`, `AlarmId`),
CONSTRAINT `fk_evt` FOREIGN KEY (`EvtId`) REFERENCES `Events` (`EvtId`)
ON DELETE CASCADE ON UPDATE CASCADE
);
So far so good.
The data is easy to migrate, too:
INSERT INTO `EventAlarms`
SELECT `EvtId`, `AlarmId` FROM `Events` WHERE `AlarmId` IS NOT NULL;
ALTER TABLE `Events` DROP COLUMN `AlarmId`;
Thing is, my system requires that a downgrade also be possible. I accept that downgrades will sometimes be lossy in terms of data, and that's okay. However, they do need to work where possible, and result in the older database structure while making a best effort to keep as much original data as is reasonably possible.
In this case, that means going from zero-or-more alarms per event, to zero-or-one alarm per event. I could do it like this:
ALTER TABLE `Events` ADD COLUMN `AlarmId` INT UNSIGNED;
UPDATE `Events`
LEFT JOIN `EventAlarms` USING(`EvtId`)
SET `Events`.`AlarmId` = `EventAlarms`.`AlarmId`;
DROP TABLE `EventAlarms`;
… which is kind of fine, since I don't really care which one gets kept (it's best-effort, remember). However, as warned, this is not good for replication as the result may be unpredictable:
> SHOW WARNINGS;
Unsafe statement written to the binary log using statement format since
BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-
increment column after selecting from another table are unsafe because the
order in which rows are retrieved determines what (if any) rows will be
written. This order cannot be predicted and may differ on master and the
slave.
Is there a way to somehow "order" or "limit" the join in the update, or shall I just skip this whole enterprise and stop trying to be clever? If the latter, how can I leave the downgraded AlarmId as NULL iff there were multiple rows in the new table between which we cannot safely distinguish? I do want to migrate the AlarmId if there is only one.
As a downgrade is a "one-time" maintenance operation, it doesn't have to be exactly real-time, but speed would be nice. Both tables could potentially have thousands of rows.
(MariaDB 5.5.56 on CentOS 7, but must also work on whatever ships with CentOS 6.)
First, we can perform a bit of analysis, with a self-join:
SELECT `A`.`EvtId`, COUNT(`B`.`EvtId`) AS `N`
FROM `EventAlarms` AS `A`
LEFT JOIN `EventAlarms` AS `B` ON (`A`.`EvtId` = `B`.`EvtId`)
GROUP BY `B`.`EvtId`
The result will look something like this:
EvtId N
--------------
370 1
371 1
372 4
379 1
380 1
382 16
383 1
384 1
Now you can, if you like, drop all the rows representing events that map to more than one alarm (which you suggest as a fallback solution; I think this makes sense, though you could modify the below to leave one of them in place if you really wanted).
Instead of actually DELETEing anything, though, it's easier to introduce a new table, populated using the self-joining query shown above:
CREATE TEMPORARY TABLE `_migrate` (
`EvtId` INT UNSIGNED,
`n` INT UNSIGNED,
PRIMARY KEY (`EvtId`),
KEY `idx_n` (`n`)
);
INSERT INTO `_migrate`
SELECT `A`.`EvtId`, COUNT(`B`.`EvtId`) AS `n`
FROM `EventAlarms` AS `A`
LEFT JOIN `EventAlarms` AS `B` ON(`A`.`EvtId` = `B`.`EvtId`)
GROUP BY `B`.`EvtId`;
Then your update becomes:
UPDATE `Events`
LEFT JOIN `_migrate` ON (`Events`.`EvtId` = `_migrate`.`EvtId` AND `_migrate`.`n` = 1)
LEFT JOIN `EventAlarms` ON (`_migrate`.`EvtId` = `EventAlarms`.`EvtId`)
SET `Events`.`AlarmId` = `EventAlarms`.`AlarmId`
WHERE `EventAlarms`.`AlarmId` IS NOT NULL
And, finally, clean up after yourself:
DROP TABLE `_migrate`;
DROP TABLE `EventAlarms`;
MySQL still kicks out the same warning as before, but since know that at most one value will be pulled from the source tables, we can basically just ignore it.
It should even be reasonably efficient, as we can tell from the equivalent EXPLAIN SELECT:
EXPLAIN SELECT `Events`.`EvtId` FROM `Events`
LEFT JOIN `_migrate` ON (`Events`.`EvtId` = `_migrate`.`EvtId` AND `_migrate`.`n` = 1)
LEFT JOIN `EventAlarms` ON (`_migrate`.`EvtId` = `EventAlarms`.`EvtId`)
WHERE `EventAlarms`.`AlarmId` IS NOT NULL
id select_type table type possible_keys key key_len ref rows Extra
---------------------------------------------------------------------------------------------------------------------
1 SIMPLE _migrate ref PRIMARY,idx_n idx_n 5 const 6 Using index
1 SIMPLE EventAlarms ref PRIMARY,fk_AlarmId PRIMARY 8 db._migrate.EvtId 1 Using where; Using index
1 SIMPLE Events eq_ref PRIMARY PRIMARY 8 db._migrate.EvtId 1 Using where; Using index
Use a subquery and user variables to select just one EventAlarms
In your update instead of EventAlarms use
( SELECT `EvtId`, `AlarmId`
FROM ( SELECT `EvtId`, `AlarmId`,
#rn := if ( #EvtId = `EvtId`
#rn + 1,
if ( #EvtId := `EvtId` , 1, 1)
) as rn
FROM `EventAlarms`
CROSS JOIN ( SELECT #EvtId := 0, #rn := 0) as vars
ORDER BY EvtId, AlarmId
) as t
WHERE rn = 1
) as SingleEventAlarms

Delete Duplicates from large mysql Address DB

I know, deleting duplicates from mysql is often discussed here. But none of the solution work fine within my case.
So, I have a DB with Address Data nearly like this:
ID; Anrede; Vorname; Nachname; Strasse; Hausnummer; PLZ; Ort; Nummer_Art; Vorwahl; Rufnummer
ID is primary Key and unique.
And i have entrys for example like this:
1;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Mobile;012345;67890
2;Herr;Michael;Müller;Testweg;1;55555;Testhausen;Fixed;045678;877656
The different PhoneNumber are not the problem, because they are not relevant for me. So i just want to delete the duplicates in Lastname, Street and Zipcode. In that case ID 1 or ID 2. Which one of both doesn't matter.
I tried it actually like this with delete:
DELETE db
FROM Import_Daten db,
Import_Daten dbl
WHERE db.id > dbl.id AND
db.Lastname = dbl.Lastname AND
db.Strasse = dbl.Strasse AND
db.PLZ = dbl.PLZ;
And insert into a copy table:
INSERT INTO Import_Daten_1
SELECT MIN(db.id),
db.Anrede,
db.Firstname,
db.Lastname,
db.Branche,
db.Strasse,
db.Hausnummer,
db.Ortsteil,
db.Land,
db.PLZ,
db.Ort,
db.Kontaktart,
db.Vorwahl,
db.Durchwahl
FROM Import_Daten db,
Import_Daten dbl
WHERE db.lastname = dbl.lastname AND
db.Strasse = dbl.Strasse And
db.PLZ = dbl.PLZ;
The complete table contains over 10Mio rows. The size is actually my problem. The mysql runs on a MAMP Server on a Macbook with 1,5GHZ and 4GB RAM. So not really fast. SQL Statements run in a phpmyadmin. Actually i have no other system possibilities.
You can write a stored procedure that will each time select a different chunk of data (for example by rownumber between two values) and delete only from that range. This way you will slowly bit by bit delete your duplicates
A more effective two table solution can look like following.
We can store only the data we really need to delete and only the fields that contain duplicate information.
Let's assume we are looking for duplicate data in Lastname , Branche, Haushummer fields.
Create table to hold the duplicate data
DROP TABLE data_to_delete;
Populate the table with data we need to delete ( I assume all fields have VARCHAR(255) type )
CREATE TABLE data_to_delete (
id BIGINT COMMENT 'this field will contain ID of row that we will not delete',
cnt INT,
Lastname VARCHAR(255),
Branche VARCHAR(255),
Hausnummer VARCHAR(255)
) AS SELECT
min(t1.id) AS id,
count(*) AS cnt,
t1.Lastname,
t1.Branche,
t1.Hausnummer
FROM Import_Daten AS t1
GROUP BY t1.Lastname, t1.Branche, t1.Hausnummer
HAVING count(*)>1 ;
Now let's delete duplicate data and leave only one record of all duplicate sets
DELETE Import_Daten
FROM Import_Daten LEFT JOIN data_to_delete
ON Import_Daten.Lastname=data_to_delete.Lastname
AND Import_Daten.Branche=data_to_delete.Branche
AND Import_Daten.Hausnummer = data_to_delete.Hausnummer
WHERE Import_Daten.id != data_to_delete.id;
DROP TABLE data_to_delete;
You can add a new column e.g. uq and make it UNIQUE.
ALTER TABLE Import_Daten
ADD COLUMN `uq` BINARY(16) NULL,
ADD UNIQUE INDEX `uq_UNIQUE` (`uq` ASC);
When this is done you can execute an UPDATE query like this
UPDATE IGNORE Import_Daten
SET
uq = UNHEX(
MD5(
CONCAT(
Import_Daten.Lastname,
Import_Daten.Street,
Import_Daten.Zipcode
)
)
)
WHERE
uq IS NULL;
Once all entries are updated and the query is executed again, all duplicates will have the uq field with a value=NULL and can be removed.
The result then is:
0 row(s) affected, 1 warning(s): 1062 Duplicate entry...
For newly added rows always create the uq hash and and consider using this as the primary key once all entries are unique.

Database structure to store the User 's preference of layout order

There are a few options in the webpage layout
Latest news
Recommend news
Followed news
History news
Most Viewed news
The user can select the order of the layout e.g. they can order the most viewed news to the top of the order.
So I am considering how to store the choice in table that can be convenient for development.
The number of choices are fixed, only these 5 choices
The user will frequently update the order
I was thinking of:
create a user_choice table
user_id
latest (nullable , integer)
recommend (nullable , integer)
follow (nullable , integer)
history (nullable , integer)
most_view (nullable , integer)
so , when ever a user register create a record in the table, and whenever update change the row, this approach seems feasible but as well not straight forward to re-order the layout in program
So, are there any better structure ideas?
Thanks for helping
I will create a table layout_order with user id, layout_id and order_id this way is easy add more layout without need add more columns to your table.
When you create a new user you assign a default order.
user_id layout_id order_id
1 1 1
1 2 2
1 3 3
1 4 4
1 5 5
Here is an example of UPDATE. You need #user_id, #layout_id and #order_id for that layout.
Here I use variable to create a new rank with a special ORDER BY
SqlFiddle Demo You can check what return just the SELECT inside the JOIN
SET #layout_id = 5;
SET #order_id = 2;
SET #user_id = 1;
UPDATE layout_order L
JOIN (SELECT l.*, #rownumber := #rownumber + 1 AS rank
FROM layout_order l
CROSS JOIN (select #rownumber := 0) r
WHERE user_id = #user_id
ORDER BY CASE
WHEN layout_id = #layout_id THEN #order_id -- here is the variable
WHEN order_id < #order_id THEN order_id -- order doesn't change
WHEN order_id >= #order_id THEN order_id + 1
END
) t
ON L.user_id = t.user_id
AND L.layout_id = t.layout_id
SET L.order_id = t.rank;
I would move in this direction:
create table user
( -- your pre-existing user table, this is a stub
id int auto_increment primary key
-- the rest
);
create table sortChoices
( -- codes and descriptions of them
code int auto_increment primary key,
description varchar(100) not null
);
create table user_sortChoices_Junction
( -- intersect or junction table for user / sortChoices
userId int not null,
code int not null,
theOrder int not null,
primary key (userId,code), -- prevents dupes
constraint `fk_2user` foreign key (userId) references user(id),
constraint `fk_2sc` foreign key (code) references sortChoices(code)
);
The junction table drives it, is flexible, and those that follow don't lock themselves into the same thinking 'there will only ever be 5'
Plus, there is the Data Normalization issue, for the others that prefer CSV values. Here is a write-up I did for that, and ties in to Junction Tables.
So, this is as much for those that follow, as it is for the OP question.

Update multiple rows in table with values from a temporary table

I'm trying to write a database migration script to add a column to a table that contains existing data, and then populate that column with appropriate data.
I'm doing the migration in a few steps. I've created a temporary table that contains a single column with ids like this:
new_column
==========
1000
1001
1002
1003
...
I now want to update my existing table so that each row in the temporary table above is used to update each row in my existing table. The existing table looks like this:
old_column_1 | old_column_2 | new_column
========================================
1 | 100 | null
2 | 101 | null
3 | 102 | null
...
I've tried a few variations of this sort of update -
select min(t.new_column)
from temp t
where t.new_column not in (select new_column from existing_table);
But I can't seem to get the syntax right...
Your problem is more complicated than you think. There's nothing reliable to join on. So, either you write a stored procedure which uses a cursor to loop through both tables and updating the existing table row by row (which can quickly become a performance nightmare, therefore I wouldn't recommend it) or you use this a little complicated query:
CREATE TABLE temp
(id int auto_increment primary key, `new_column` int)
;
INSERT INTO temp
(`new_column`)
VALUES
(1000),
(1001),
(1002),
(1003)
;
CREATE TABLE existing
(`old_column_1` int, `old_column_2` int, `new_column` varchar(4))
;
INSERT INTO existing
(`old_column_1`, `old_column_2`, `new_column`)
VALUES
(1, 100, NULL),
(2, 101, NULL),
(3, 102, NULL)
;
update
existing e
inner join (
select * from (
select
t.*
from temp t
)t
inner join
(
select
e.old_column_1, e.old_column_2,
#rownum := #rownum + 1 as rn
from existing e
, (select #rownum:=0) vars
)e on t.id = e.rn
) sq on sq.old_column_1 = e.old_column_1 and sq.old_column_2 = e.old_column_2
set e.new_column = sq.new_column;
see it working live in an sqlfiddle
I added an auto_increment column in your temporary table. Either you do it this way, or you simulate a rownumber like I did here:
select
e.old_column_1, e.old_column_2,
#rownum := #rownum + 1 as rn
from existing e
, (select #rownum:=0) vars
If you want to influence which row gets which row number, you can use ORDER BY whatever_column ASC|DESC in there.
So, what the query basically does, is, to create a row number in your existing table and join it via this column and the auto_increment column in the temporary table. Then I join this subquery again to the existing table, so that we can easily copy the column from temporary table to existing table.

Is cross-table indexing possible?

Consider a structure where you have a many-to-one (or one-to-many) relationship with a condition (where, order by, etc.) on both tables. For example:
CREATE TABLE tableTwo (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
eventTime DATETIME NOT NULL,
INDEX (eventTime)
) ENGINE=InnoDB;
CREATE TABLE tableOne (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
tableTwoId INT UNSIGNED NOT NULL,
objectId INT UNSIGNED NOT NULL,
INDEX (objectID),
FOREIGN KEY (tableTwoId) REFERENCES tableTwo (id)
) ENGINE=InnoDB;
and for an example query:
select * from tableOne t1
inner join tableTwo t2 on t1.tableTwoId = t2.id
where objectId = '..'
order by eventTime;
Let's say you index tableOne.objectId and tableTwo.eventTime. If you then explain on the above query, it will show "Using filesort". Essentially, it first applies the tableOne.objectId index, but it can't apply the tableTwo.eventTime index because that index is for the entirety of tableTwo (not the limited result set), and thus it must do a manual sort.
Thus, is there a way to do a cross-table index so it wouldn't have to filesort each time results are retrieved? Something like:
create index ind_t1oi_t2et on tableOne t1
inner join tableTwo t2 on t1.tableTwoId = t2.id
(t1.objectId, t2.eventTime);
Also, I've looked into creating a view and indexing that, but indexing is not supported for views.
The solution I've been leaning towards if cross-table indexing isn't possible is replicating the conditional data in one table. In this case that means eventTime would be replicated in tableOne and a multi-column index would be set up on tableOne.objectId and tableOne.eventTime (essentially manually creating the index). However, I thought I'd seek out other people's experience first to see if that was the best way.
Thanks much!
Update:
Here are some procedures for loading test data and comparing results:
drop procedure if exists populate_table_two;
delimiter #
create procedure populate_table_two(IN numRows int)
begin
declare v_counter int unsigned default 0;
while v_counter < numRows do
insert into tableTwo (eventTime)
values (CURRENT_TIMESTAMP - interval 0 + floor(0 + rand()*1000) minute);
set v_counter=v_counter+1;
end while;
end #
delimiter ;
drop procedure if exists populate_table_one;
delimiter #
create procedure populate_table_one
(IN numRows int, IN maxTableTwoId int, IN maxObjectId int)
begin
declare v_counter int unsigned default 0;
while v_counter < numRows do
insert into tableOne (tableTwoId, objectId)
values (floor(1 +(rand() * maxTableTwoId)),
floor(1 +(rand() * maxObjectId)));
set v_counter=v_counter+1;
end while;
end #
delimiter ;
You can use these as follows to populate 10,000 rows in tableTwo and 20,000 rows in tableOne (with random references to tableOne and random objectIds between 1 and 5), which took 26.2 and 70.77 seconds respectively to run for me:
call populate_table_two(10000);
call populate_table_one(20000, 10000, 5);
Update 2 (Tested Triggering SQL):
Below is the tried and tested SQL based on daniHp's triggering method. This keeps the dateTime in sync on tableOne when tableOne is added or tableTwo is updated. Also, this method should also work for many-to-many relationships if the condition columns are copied to the joining table. In my testing of 300,000 rows in tableOne and 200,000 rows in tableTwo, the speed of the old query with similar limits was 0.12 sec and the speed of the new query still shows as 0.00 seconds. Thus, there is a clear improvement, and this method should perform well into the millions of rows and farther.
alter table tableOne add column tableTwo_eventTime datetime;
create index ind_t1_oid_t2et on tableOne (objectId, tableTwo_eventTime);
drop TRIGGER if exists t1_copy_t2_eventTime;
delimiter #
CREATE TRIGGER t1_copy_t2_eventTime
BEFORE INSERT ON tableOne
for each row
begin
set NEW.tableTwo_eventTime = (select eventTime
from tableTwo t2
where t2.id = NEW.tableTwoId);
end #
delimiter ;
drop TRIGGER if exists upd_t1_copy_t2_eventTime;
delimiter #
CREATE TRIGGER upd_t1_copy_t2_eventTime
BEFORE UPDATE ON tableTwo
for each row
begin
update tableOne
set tableTwo_eventTime = NEW.eventTime
where tableTwoId = NEW.id;
end #
delimiter ;
And the updated query:
select * from tableOne t1
inner join tableTwo t2 on t1.tableTwoId = t2.id
where t1.objectId = 1
order by t1.tableTwo_eventTime desc limit 0,10;
As you know, SQLServer achieves this with indexed views:
indexed views provide additional performance benefits that cannot be
achieved using standard indexes. Indexed views can increase query
performance in the following ways:
Aggregations can be precomputed and stored in the index to minimize
expensive computations during query execution.
Tables can be prejoined and the resulting data set stored.
Combinations of joins or aggregations can be stored.
In SQLServer, to take advantage of this technique, you must query over the view and not over the tables. That means that you should know about the view and indexes.
MySQL does not have indexed views, but you can simulate the behavior with table + triggers + indexes.
Instead of creating a view, you must create an indexed table, a trigger to keep the data table up to date, and then you must query your new table instead of your normalized tables.
You must evaluate if the overhead of write operations offsets the improvement in read operations.
Edited:
Note that it is not always necessary to create a new table. For example, in a 1:N relationship (master-detail) trigger, you can keep a copy of a field from the 'master' table into the 'detail' table. In your case:
CREATE TABLE tableOne (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
tableTwoId INT UNSIGNED NOT NULL,
objectId INT UNSIGNED NOT NULL,
desnormalized_eventTime DATETIME NOT NULL,
INDEX (objectID),
FOREIGN KEY (tableTwoId) REFERENCES tableTwo (id)
) ENGINE=InnoDB;
CREATE TRIGGER tableOne_desnormalized_eventTime
BEFORE INSERT ON tableOne
for each row
begin
DECLARE eventTime DATETIME;
SET eventTime =
(select eventTime
from tableOne
where tableOne.id = NEW.tableTwoId);
NEW.desnormalized_eventTime = eventTime;
end;
Notice that this is a before insert trigger.
Now, the query is rewritten as follows:
select * from tableOne t1
inner join tableTwo t2 on t1.tableTwoId = t2.id
where t1.objectId = '..'
order by t1.desnormalized_eventTime;
Disclaimer: not tested.
Cross-table indexing is not possible in MySQL except via the now-defunct Akiban(?) Engine.
I have a rule: "Do not normalize 'continuous' values such as INTs, FLOATs, DATETIMEs, etc." The cost of the JOIN when you need to sort or range-test on the continuous value will kill performance.
DATETIME takes 5 bytes; INT takes 4. So any 'space' argument toward normalizing a datetime is rather poor. It is rare that you would need to 'normalize' a datetime in the off chance that all uses of a particular value were to change.
May be I'm wrong , but if this is my application I will not duplicate the data unless I need to order by 2 columns in 2 different tables and this is a hot query (it's required many times). But since there is no clear cut solution to avoid the filesort, what about this little trick (force the optimizer to use the index on the column in the order by clause eventTime)
select * from tableOne t1
inner join tableTwo t2 use index (eventTime) on t1.tableTwoId = t2.id and t2.eventTime > 0
where t1.objectId = 1
order by t2.eventTime desc limit 0,10;
notice use index (eventTime) and t2.eventTime > 0
It's explain shows that the optimizer has used the index on eventTime instead of filesort
1 SIMPLE t2 range eventTime eventTime 5 5000 Using where; Using index
1 SIMPLE t1 ref objectId,tableTwoId tableTwoId 4 tests.t2.id 1 Using where