How to optimise change history data for MySQL - mysql

The previous table this data was stored in approached 3-4gb, but the data wasn't compressed before/after storage. I'm not a DBA so I'm a little out of my depth with a good strategy.
The table is to log changes to a particular model in my application (user profiles), but with one tricky requirement: we should be able to fetch the state of a profile at any given date.
Data (single table):
id, username, email, first_name, last_name, website, avatar_url, address, city, zip, phone
The only two requirements:
be able to fetch a list of changes for a given model
be able to fetch state of model on a given date
Previously, all of the profile data was stored for a single change, even if only one column was changed. But to get a 'snapshot' for a particular date was easy enough.
My first couple of solutions in optimising the data structure:
(1) only store changed columns. This would drastically reduce data stored, but would make it quite complicated to get a snapshot of data. I'd have to merge all changes up to a given date (could be thousands), then apply that to a model. But that model couldn't be a fresh model (only changed data is stored). To do this, I'd have to first copy over all data from current profiles table, then to get snapshot apply changes to those base models.
(2) store whole of data, but convert to a compressed format like gzip or binary or whatnot. This would remove ability to query the data other than to obtain changes. I couldn't, for example, fetch all changes where email = ''. I would essentially have a single column with converted data, storing the whole of the profile.
Then, I would want to use relevant MySQL table options, like ARCHIVE to further reduce space.
So my question is, are there any other options which you feel are a better approach than 1/2 above, and, if not, which would be better?

First of all, I wouldn't worry at all about a 3GB table (unless it grew to this size in a very short period of time). MySQL can take it. Space shouldn't be a concern, keep in mind that a 500 GB hard disk costs about 4 man-hours (in my country).
That being said, in order to lower your storage requirements, create one table for each field of the table you want to monitor. Assuming a profile table like this:
CREATE TABLE profile (
profile_id INT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(50) -- and so on
);
... create two history tables:
CREATE TABLE profile_history_username (
profile_id INT NOT NULL,
username VARCHAR(50) NOT NULL, -- same type as profile.username
changedAt DATETIME NOT NULL,
PRIMARY KEY (profile_id, changedAt),
CONSTRAINT profile_id_username_fk
FOREIGN KEY profile_id_fkx (profile_id)
REFERENCES profile(profile_id)
);
CREATE TABLE profile_history_email (
profile_id INT NOT NULL,
email VARCHAR(50) NOT NULL, -- same type as profile.email
changedAt DATETIME NOT NULL,
PRIMARY KEY (profile_id, changedAt),
CONSTRAINT profile_id_fk
FOREIGN KEY profile_id_email_fkx (profile_id)
REFERENCES profile(profile_id)
);
Everytime you change one or more fields in profile, log the change in each relevant history table:
START TRANSACTION;
-- lock all tables
SELECT #now := NOW()
FROM profile
JOIN profile_history_email USING (profile_id)
WHERE profile_id = [a profile_id]
FOR UPDATE;
-- update main table, log change
UPDATE profile SET email = [new email] WHERE profile_id = [a profile_id];
INSERT INTO profile_history_email VALUES ([a profile_id], [new email], #now);
COMMIT;
You may also want to set appropriate AFTER triggers on profile so as to populate the history tables automatically.
Retrieving history information should be straightforward. In order to get the state of a profile at a given point in time, use this query:
SELECT
(
SELECT username FROM profile_history_username
WHERE profile_id = [a profile_id] AND changedAt = (
SELECT MAX(changedAt) FROM profile_history_username
WHERE profile_id = [a profile_id] AND changedAt <= [snapshot date]
)
) AS username,
(
SELECT email FROM profile_history_email
WHERE profile_id = [a profile_id] AND changedAt = (
SELECT MAX(changedAt) FROM profile_history_email
WHERE profile_id = [a profile_id] AND changedAt <= [snapshot date]
)
) AS email;

You can't compress the data without having to uncompress it in order to search it - which is going to severely damage the performance. If the data really is changing that often (i.e. more than an average of 20 times per record) then it would be more efficient to for storage and retrieval to structure it as a series of changes:
Consider:
CREATE TABLE profile (
id INT NOT NULL autoincrement,
PRIMARY KEY (id);
);
CREATE TABLE profile_data (
profile_id INT NOT NULL,
attr ENUM('username', 'email', 'first_name'
, 'last_name', 'website', 'avatar_url'
, 'address', 'city', 'zip', 'phone') NOT NULL,
value CARCHAR(255),
starttime DATETIME DEFAULT CURRENT_TIME,
endtime DATETIME,
PRIMARY KEY (profile_id, attr, starttime)
INDEX(profile_id),
FOREIGN KEY (profile_id) REFERENCES profile(id)
);
When you add a new value for an existing record, set an endtime in the masked record.
Then to get the value at a date $T:
SELECT p.id, attr, value
FROM profile p
INNER JOIN profile_date d
ON p.id=d.profile_id
WHERE $T>=starttime
AND $T<=IF(endtime IS NULL,$T, endtime);
Alternately just have a start time, and:
SELECT p.id, attr, value
FROM profile p
INNER JOIN profile_date d
ON p.id=d.profile_id
WHERE $T>=starttime
AND NOT EXISTS (SELECT 1
FROM prodile_data d2
WHERE d2.profile_id=d.profile_id
AND d2.attr=d.attr
AND d2.starttime>d.starttime
AND d2.starttime>$T);
(which will be even faster with the MAX concat trick).
But if the data is not changing with that frequency then keep it in the current structure.

You need a slow changing dimension:
i will do this only for e-mail and telephone so you understand (pay attention to the fact of i use two keys, 1 as unique in the table, and another that is unique to the user that it concerns. This is, the table key identifies the the record, and the user key identifies the user):
table_id, user_id, email, telephone, created_at,inactive_at,is_current
1, 1, mario#yahoo.it, 123456, 2012-01-02, , 2013-04-01, no
2, 2, erik#telecom.de, 123457, 2012-01-03, 2013-02-28, no
3, 3, vanessa#o2.de, 1234568, 2012-01-03, null, yes
4, 2, erik#telecom.de, 123459, 2012-02-28, null, yes
5, 1, super.mario#yahoo.it, 654321,2013-04-01, 2013-04-02, no
6, 1, super.mario#yahoo.it, 123456,2013-04-02, null, yes
most recent state of the database
select * from FooTable where inactive_at is null
or
select * from FooTable where is_current = 'yes'
All changes to mario (mario is user_id 1)
select * from FooTable where user_id = 1;
All changes between 1 jan 2013 and 1 of may 2013
select * from FooTable where created_at between '2013-01-01' and '2013-05-01';
and you need to compare with the old versions (with the help of a stored procedure, java or php code... you chose)
select * from FooTable where incative_at between '2013-01-01' and '2013-05-01';
if you want you can do a fancy sql statement
select f1.table_id, f1.user_id,
case when f1.email = f2.email then 'NO_CHANGE' else concat(f1.email , ' -> ', f2.email) end,
case when f1.phone = f2.phone then 'NO_CHANGE' else concat(f1.phone , ' -> ', f2.phone) end
from FooTable f1 inner join FooTable f2
on(f1.user_id = f2.user_id)
where f2.created_at in
(select max(f3.created_at) from Footable f3 where f3.user_id = f1.user_id
and f3.created_at < f1.created_at and f1.user_id=f3.user_id)
and f1.created_at between '2013-01-01' and '2013-05-01' ;
As you can see a juicy query, to compare the user_with the previews user row...
the state of the database on 2013-03-01
select * from FooTable where table_id in
(select max(table_id) from FooTable where inactive_at <= '2013-03-01' group by user_id
union
select id from FooTable where inactive_at is null group by user_id having count(table_id) =1 );
I think this is the easiest way of implement what you want... you could implement a multi-million tables relational model, but then it would be a pain in the arse to query it
Your database is not big enough, I work everyday with one even bigger. Now tell me is the money you save in a new server worthy the time you spend on a super-complex relational model?
BTW if the data changes too fast, this approach cannot be used...
BONUS: optimization:
create indexes on created_at, inactive_at, user_id and the pair
perform partition (both horizontal and vertical)

if you try and put all occurring changes in different tables and later if you require an instance on some date you join them along and display by comparing dates, for example if you want an instance at 1st of july you can run a query with condition where date is equal or less than 1st of july and order it in asc ordering limiting the count to 1. that way the joins will produce exactly the instance it was at 1st of july. in this manner you can even figure out the most frequently updated module.
also if you want to keep all the data flat try range partitioning on the basis of month that way mysql will handle it pretty easily.
Note: by date i mean storing unix timestamp of the date its pretty easier to compare.

I'll offer one more solution just for variety.
Schema
PROFILE
id INT PRIMARY KEY,
username VARCHAR(50) NOT NULL UNIQUE
PROFILE_ATTRIBUTE
id INT PRIMARY KEY,
profile_id INT NOT NULL FOREIGN KEY REFERENCES PROFILE (id),
attribute_name VARCHAR(50) NOT NULL,
attribute_value VARCHAR(255) NULL,
created_at DATETIME NOT NULL DEFAULT GETTIME(),
replaced_at DATETIME NULL
For all attributes you are tracking, simply add PROFILE_ATTRIBUTE records when they are updated, and mark the previous attribute record with the DATETIME it was replaced at.
Select Current Profile
SELECT *
FROM PROFILE p
LEFT JOIN PROFILE_ATTRIBUTE pa
ON p.id = pa.profile_id
WHERE p.username = 'username'
AND pa.replaced_at IS NULL
Select Profile At Date
SELECT *
FROM PROFILE p
LEFT JOIN PROFIILE_ATTRIBUTE pa
ON p.id = pa.profile_id
WHERE p.username = 'username'
AND pa.created_at < '2013-07-01'
AND '2013-07-01' <= IFNULL(pa.replaced_at, GETTIME())
When Updating Attributes
Insert the new attribute
Update the previous attribute's replaced_at value
It would probably be important that the created_at for a new attribute match the replaced_at for the corresponding old attribute. This would be so that there is an unbroken timeline of attribute values for a given attribute name.
Advantages
Simple two-table architecture (I personally don't like a table-per-field approach)
Can add additional attributes with no schema changes
Easily mapped into ORM systems, assuming an application lives on top of this database
Could easily see the history for a certain attribute_name over time.
Disadvantages
Integrity is not enforced. For example, the schema doesn't restrict on multiple NULL replaced_at records with the same attribute_name... perhaps this could be enforced with a two-column UNIQUE constraint
Let's say you add a new field in the future. Existing profiles would not select a value for the new field until they save a value to it. This is opposed to the value coming back as NULL if it were a column. This may or may not be an issue.
If you use this approach, be sure you have indexes on the created_at and replaced_at columns.
There may be other advantages or disadvantages. If commenters have input, I'll update this answer with more information.

Related

complex SQL query - one table

I am new to SQL.
I was wondering if there is a way to form a complex (I think) query of a certain form, regarding a single table - or a simple query for the same effect.
Let's say I have a table of voice actor candidates, with different attributes (columns) - name and characteristics.
Let's say I have two different actor evaluators (Stewie and Griffin), and all the candidates were evaluated by minimum one of them (one, or both). The evaluators evaluate the actors, and the table is built.
The rows in the table are per-evaluation, not per-person, meaning that some candidates have two separate rows, one from each evaluation.
The evaluator's name is also an attribute, a column.
Can I make a query that will choose all candidates that were evaluated by both evaluators? (and let's say show all these rows, an even number then)
(There is no attribute "evaluated by both" - that's the core)
I think it should find all rows with evaluator Stewie, then search the entire table for rows with the corresponding candidates' names, and get those with evaluator Griffin.
Summary
A table with people - names and characteristics. One or two rows per person. Each row was filled according to a different observer. There is an attribute "Is Nice". How to find all people that were observed by two observers, one marked "Yes" and one "No" under "Is Nice"?
Update
It will take me some time to check all the answers (as not enough experience yet), and I will update what worked for me.
Can I make a query that will choose all candidates that were evaluated
by both evaluators?
(and let's say show all these rows, an even number then)
There are multiple ways to do this. You can check the existence of other evaluator's evaluation, using EXISTS:
SELECT * FROM Candidate AS C1 WHERE EXISTS (SELECT * FROM Candidate AS C2 WHERE C1.id = C2.id AND C1.evaluator != C2.evaluator)
Or, you could join the table to itself: (The checks for evaluators should be changed as appropriate)
SELECT C1.candidateName FROM Candidate AS C1 JOIN Candidate AS C2 USING (id) WHERE C1.evaluator = Stewie AND C2.evaluator = Griffin
How to find all people that were observed by two observers, one marked
"Yes" and one "No" under "Is Nice"?
For this one, you add another condition to the queries above, that checks if one evaluation was "Yes" and the other one was "No".
You seem to want group by and having. SInce a person cannot have more than two rows, and there are only two distinct possible values for isnice (yes or no), we can phrase the query as:
select name
from people
group by name
having max(isnice) <> min(isnice)
This filter names that have (at least) two different values in isnice. Starting from the above assumptions, this is sufficient to ensure that that person was evaluated more than once, and that isnice has (at least) two different values.
So, I read the problem very carefully, and came up with my own solution.
Please verify the code below if this is what you were really asking for?
--Create Candidates Table
CREATE TABLE tbl_candidates
(
c_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
c_name VARCHAR(30),
)
--Create Evaluators Table
CREATE TABLE tbl_evaluators
(
e_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
e_name VARCHAR(30),
)
--Create Evaluations Table
CREATE TABLE tbl_evaluations
(
ee_id INT PRIMARY KEY NOT NULL IDENTITY(1,1),
ee_title VARCHAR(30) NOT NULL,
ee_remarks VARCHAR(30) NOT NULL,
ee_date date NOT NULL,
c_id INT FOREIGN KEY (c_id) REFERENCES tbl_candidates(c_id) NOT NULL,
e_id1 INT FOREIGN KEY (e_id1) REFERENCES tbl_evaluators(e_id) NOT NULL,
e_id2 INT FOREIGN KEY (e_id2) REFERENCES tbl_evaluators(e_id),
IsNice VARCHAR(4)
)
--Populate data & check to verify
INSERT INTO tbl_candidates (c_name) VALUES ('Sam') , ('Smith')
SELECT * FROM tbl_candidates
INSERT INTO tbl_evaluators (e_name) VALUES ('Stewie'),('Griffin')
SELECT * FROM tbl_evaluators
INSERT INTO tbl_evaluations
(ee_title,ee_remarks,ee_date,c_id,e_id1,e_id2,IsNice)
VALUES
('Some Title','Some Comment','2020-6-12',1,1,NULL,'No'),
('Some Title','Some Comment','2020-6-12',2,1,2,'Yes'),
('Some Title','Some Comment','2020-6-12',3,2,NULL,'No')
--finally comparing whether we have the matching data of our input vs tables combined data display
select * from tbl_evaluations
select ee_id,ee_title,c_name,ee_remarks,e1.e_name,e2.e_name,ee_date,IsNice from tbl_evaluations ee
left join tbl_candidates c on c.c_id = ee.c_id left join tbl_evaluators e1 on e1.e_id = ee.e_id1 left join tbl_evaluators e2 on e2.e_id = ee.e_id2
See the result proof :
This is surely not the best way to write it, but my first thought is
SELECT * FROM evaluations
WHERE PrName IN (
SELECT PrName
FROM evaluations
WHERE IsNice ='No')
AND PrName IN (
SELECT PrName
FROM evaluations
WHERE IsNice ='Yes')

Uncaught mysqli_sql_exception: Subquery returns more than 1 row

I have two tables one with names and telnumbers the second with calls
addressbook name(VARCHAR) number(VARCHAR)
calls date(DATE) number(VARCHAR) name(VARCHAR)
I want to update the names column in the calls table with the entries in the addressbook for the respective
UPDATE calls
SET name = ( SELECT name FROM addressbook WHERE number = calls.number )
WHERE DATE = "2020.01.01"
ORDER BY DATE
And I get Uncaught mysqli_sql_exception: Subquery returns more than 1 row but there are no doublette in the addressbook I checked it several times.
The only way your update statement can fail with
Subquery returns more than 1 row
is if there is at least one calls row whose number appears more than once in addressbook. You can find them with this query:
select number, count(*)
from addressbook
group by number
having count(*) > 1;
Let's say you have these two rows in addressbook:
name number
------ ------
fred 123
barney 123
And let's say this is the row in calls:
date name number
---------- ---- ------
2020.01.01 null 123
When you execute Stefano's update statement, the limit clause is not deterministic because there's no associated order by clause in the subquery. Nor is there any attribute common to calls and addressbook that would make it meaningful. The order by clause on the update is irrelevant. Therefore, you cannot guarantee which name will be assigned to the calls row. This is the point I was trying to make in my comment to Stefano's answer.
If the design of the system is to allow a number to be owned by multiple people over time (which they are of course), then your schema is not complete. And if that's true, then addressbook needs an effective date for the owner of the number.
If the design of the system is not to allow a number to be owned by multiple people over time, then you must delete the duplicate rows.
In either case, you need to do two things:
employ declarative referential integrity constraints so you don't run afoul again
stop updating calls: either insert (not update) the name or remove the column entirely
If I were to implement the tables of a telephony system, I would start with something like this:
create table PERSON (
PERSON_ID integer not null primary key,
NAME varchar(100) not null /*lots of other columns*/);
create table PERSON_PHONE (
PERSON_PHONE_ID integer not null primary key,
PERSON_ID integer not null,
PHONE_NUM varchar(30) not null,
CONTRACTED date not null, /*lots of other columns*/
unique (PERSON_ID, PHONE_NUM, CONTRACTED),
foreign key (PERSON_ID) references PERSON(PERSON_ID));
create table PHONE_CALL (
START_DATE date not null,
END_DATE date not null,
PERSON_PHONE_ID integer not null,
primary key (PERSON_PHONE_ID, START_DATE),
foreign key (PERSON_PHONE_ID) references PERSON_PHONE(PERSON_PHONE_ID));
It is true that sometimes, for the sake of making queries finish faster using fewer resources, people will sometimes denormalize a schema to decrease the number of join operations that would otherwise be required. Denormalization requires careful consideration.
The error is self explanatory, the sub query returns more than one row, a quick solution is:
SELECT name FROM addressbook WHERE number = calls.number LIMIT 1
if this solve the issue than the query return more than a row. If you need to returns just a row without using LIMIT 1 you should review your query adding more constraints or define a primary key for the addressbook table and continuing use your subquery as it is. This is on you.

Whats the equivalent of MySQL Set datatype in Oracle?

In MySQL, we can use Set datatype to select multiple values for each column of any specific row as follows:
CREATE TABLE `staff` ( `StaffID` int(5) NOT NULL AUTO_INCREMENT,
`Availability` set('Mon','Tue','Wed','Thur','Fri','SatAM','SatPM','Sun') DEFAULT NULL,
PRIMARY KEY (`StaffID`))
Then I can do,
INSERT INTO `staff` (`Availability`) VALUES ('Tue,Wed,SatPM')
I tried to find everywhere but couldn't find any suitable datatype to store multiple selected options as above in Oracle. The Oracle documentation seems to show having no alternative for the MySQL set but I can't believe they don't have it! So could anyone help me out in clarifying this?
Oracle Documentation reference Original link
In MySQL, the set data type is a named bit-mapped data type. Oracle does not have one, though you could emulate it if you really wanted to by using a constrained integer data type:
CREATE TABLE staff (
staffid INTEGER GENERATED BY DEFAULT AS IDENTITY ( START WITH 1 NOCACHE ORDER ) NOT NULL PRIMARY KEY
, availability INTEGER CHECK ( availability BETWEEN 0 AND 256 )
);
You'll have to remember or build a function to map between the bitmap values and days.
Another option is to use a lookup table:
create table availability as
with t1(id, mon, tue, wed, thu, fri, satam, satpm, sun) as (
select 0, 0, 0, 0, 0, 0, 0, 0, 0 from dual
union all
select id+1
, sign(bitand(id+1,1))
, sign(bitand(id+1,2))
, sign(bitand(id+1,4))
, sign(bitand(id+1,8))
, sign(bitand(id+1,16))
, sign(bitand(id+1,32))
, sign(bitand(id+1,64))
, sign(bitand(id+1,128))
from t1 where id+1 <256
)
select * from t1;
alter table availability add primary key (id);
CREATE TABLE staff (
staffid INTEGER GENERATED BY DEFAULT AS IDENTITY ( START WITH 1 NOCACHE ORDER ) NOT NULL PRIMARY KEY
, availability INTEGER
, CONSTRAINT staff_avail_fk FOREIGN KEY ( availability ) REFERENCES availability ( id )
);
Sentinel's answer (which I upvoted) addresses the question as literally asked.
However, I'm going to take a step back and, addressing the question as "how do I represent this type of data in Oracle", advise that a better solution to that problem is to not emulate the Set datatype.
The Set datatype is arguably a violation of first normal form. That isn't necessarily a moral judgement about its usage; such solutions make certain queries easier and/or more efficient, and others more difficult and/or less efficient. But the normal forms are built around many, many years of experience with RDBMS. For a wide range of applications, adhering to the first three normal forms is a good way to avoid problems on balance.
A normalized solution which can be implemented in Oracle (without jumping through hoops to emulate bit-field enumerations) might include a Staff table with an ID, an AvailabilityCode table with a row for each individual value in the set, and StaffAvailability, a cross-reference between the two with a row for each AvailabilityCode value you want to associate with a Staff id.

Realtime Performant Tag Search in MySQL or Redis

Problem Description:
A tag (tags) can be associated with arbitrary objects through a junction table (tagged_as). For a specific object type (specific_object), select the union or intersection of all of the objects associated with a series of tags, order the results by a numeric column on the object and limit the results for pagination purposes.
Contrived Schema:
CREATE TABLE tags (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE specific_object(
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
vote_sum INT NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
CREATE TABLE tagged_as(
id INT NOT NULL AUTO_INCREMENT,
tag_id INT NOT NULL,
content_type_id INT NOT NULL,
object_id INT NOT NULL,
PRIMARY KEY (id)
);
For the purposes of this example, I am omitting many other columns in the specific_object table.
Table Row Counts:
tags: 12,297
tagged_as: 46,642,064
specific_object: 2,444,944
Naive MySQL Solution:
SELECT
specific_object.*
FROM
specific_object
JOIN
tagged_as
ON
specific_object.id = tagged_as.object_id
AND
tagged_as.content_type_id = <SPECIFIC_OBJECT_CONTENT_TYPE_ID>
WHERE
tagged_as.tag_id = <TAG_ONE_ID>
AND
tagged_as.tag_id = <TAG_TWO_ID>
...
ORDER BY specific_object.vote_sum DESC
LIMIT 50
The problem with this solution is that MySQL cannot utilize an index to resolve the ORDER BY clause because the "key used to fetch the rows is not the same as the one used in the ORDER BY" (http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html). Execution time: 20+ seconds
Naive Redis Solution:
for each specific object: SET specfic_object:<ID> <ID>
for each tagged as: SADD tag:<TAG ID> specific_object:<ID>
specific_object_ids = SUNION tag:<TAG_ONE_ID> tag:<TAG_TWO_ID> ...
specific_object_ids = SINTER tag:<TAG_ONE_ID> tag:<TAG_TWO_ID> ...
SELECT * FROM specific_object WHERE id IN (<specific_object_ids>) ORDER BY vote_sum DESC
The problem with this solution is that the ORDER BY still has to been done by MySQL. Also, a tag could potentially be associated with hundreds of thousands of specific objects which is a lot of data to move around. Execution Time: 20+ seconds for larger tags
Possible Solutions I Haven't Tried Yet
Denormalize
Perhaps move the vote_sum column into the tagged_as table. Remove the need for the join to do the order by. This might have the same issue as the naive solution.
Redis Sorted Sets
for each specific object: SET specific_object:<ID> <ID>
for each specific object: SET specific_object_weight:<ID> <VOTE_SUM>
for each tagged as: SADD tag:<TAG_ID> specific_object:<ID>
SINTERSTORE result:<timestamp> <TAG_ONE_ID> <TAG_TWO_ID> ...
SORT result:<timestamp> BY specific_object_weight_* LIMIT 0 50
specific_object_ids = SMEMBERS result:<timestamp>
DEL result:<timestamp>
SELECT * FROM specific_object WHERE id IN (<specific_object_ids>)
Move all of the sorting into Redis. This add extra complexity because now you have to maintain the vote_sum values in Redis as well. Not sure if this would be fast enough.
Question:
Are either of the possible solutions viable? Are there other solutions or different technologies that would help? I am open to pretty significant changes to solve this problem.
When the problem has been performance of a DESC sort, what I've done in the past is to solve the problem is to store the value of -1*vote_sum in a separate column, and then ORDER BY that column ASC. I've been able to get MySQL to use an index to do the sort on that column.
You could either store a redundant column (both vote_sum and neg_vote_sum, or you could just store the negative value, and just multiply it by -1 when you need to return it as a positive value.
But I'm suspicious that the source of your performance issue is the sort operation. How does the performance of the statement compare, as a test, when you do an ORDER BY vote_sum ASC ?

SQL insert only table, how to select only newest entries

I've created an insert only table for the purpose of speed and maintaining a history. It's structure is very generic, and is as follows:
`id` bigint(20) unsigned NOT NULL auto_increment,
`user_id` bigint(20) unsigned NOT NULL,
`property` varchar(32) NOT NULL,
`value` longblob NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
It's simply a key/value table with a user_id assigned to it. This approach has its advantages as not all users have the same properties, so fields aren't wasted in a table. Also, it allows for a rolling log of changes, since I can see every change to a particular property ever made by a user.
Now, since no deletes or updates ever occur in this table, I can assume that the greatest id will always be the newest entry.
However, I want to select multiple properties at once, for example 'address1', 'address2', 'city', 'state', and I want each to be the entry of it's type with the highest id.
So, if they have changed their 'state' property 8 times, and 'city' property 4 times, then I'd only want a SELECT to return the latest of each (1 state and 1 city).
I'm not sure this can even be done efficiently with this type of a table, so I'm open to different table approaches.
Please, let me know if I need to produce anymore information or clarify my question better.
===
I tried the following, but, there could be 3 rows of 'address1' changes after the last 'address2' change. Perhaps using a GROUP BY will work?
SELECT property, value FROM kvtable WHERE user_id = 1 AND (property = 'address1' OR property = 'address2') ORDER BY id
Assuming your ids are incremental integers and you have not manually specified them out of order, you can do this with a few MAX() aggregates in a subquery. The point of the subquery is to return the latest entry per property name, per user. That is joined against the whole table to pull in the associated property values. Essentially, the subquery discards all rows which don't have a max(id) per group.
SELECT kvtable.*
FROM
kvtable
JOIN (
SELECT
MAX(id) AS id,
user_id,
property
FROM kvtable
/* optionally limit by user_id */
WHERE user_id = <someuser>
GROUP BY user_id, property
) maxids ON kvtable.id = maxids.id