voting issue - storing votes as serialize instead of multi rows - mysql

my question is : what is the best way
storing the vote of the user include the ip and answer id in new mysql row or making just ine field in the answer row include all votes as "serialize" data
and if
what is the type of files to store this serialized data

It is almost always a bad idea to store multiple values in one column, as it becomes difficult to parse out the values you need and can make a column index unusable following the string operations necessary to extract a part.
Make a normalized table which stores one row per answer, per user. If an individual answer is itself a single data point, it belongs as its own row.
If you are tracking users by IP:
CREATE TABLE votes (
voteid INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
userip VARCHAR(15) NOT NULL,
answerid INT NOT NULL
);
Plus, this gives you the benefit of being able to query your data in ways like:
/* Get vote count per user */
SELECT userip, COUNT(*) FROM votes GROUP BY userip;
/* Get users who have voted 3 or more times */
SELECT userip FROM votes GROUP BY userip HAVING COUNT(*) >= 3;
To accomplish the same thing with a serialized column, you would need to query it into application code, parse out the delimiters, and then perform your analysis. To implement the second (count >= 3) in application code requires re-implementing lots of the things the database is already very good at, like sorting, grouping, and counting.

You should create each distinct item as its own column.
This makes it much less complex to access individual columns. The queries will be less complex for you to write, in addition to generally being more efficient for MySQL.
It also allows you to specify different type and length requirements for each column. For example, you might have one Integer column, and one Character column, each with different storage requirements and implications for the type.
I recommend storing the IP address as in Integer column, using the INET_ATON() and INET_NTOA() functions.

Related

A rather complicated auto increment

This is one idea more complicated than a one-to-many connection. I have a bunch of tables like photos, posts, users etc. that can be commented on. My comments table contains 3 fields that help identify the comment:
item-id - the id of the item the comment belongs to
table - the table which item-id resides in ( saved as an integer, but displayed as name below to avoid confusion )
id - the id of the comment, relative to the item-id
A sample for better understanding:
id item-id table
1 1 photos
2 1 photos
1 1 posts
2 1 posts
1 2 posts
1 1 users
Now the problem is with inserts. I find it hard to determine the current last id. Given the table above if a user is to comment on a photo with item-id = 1, then the new comment needs to have an id of 3. The only way I could think of is to run a sub-query on insert but I'm not a big fan of sub-queries. Is there some mechanism built in mysql that can help me achieve this, or any other easy and robust way?
From your comment:
I've come up with this because of the fear of unique ids running out. I know that the maximum integer value mysql can store is 1*10 to the 19th or something, which is a ridiculously large number, but not infinite. As well as numbers as huge take up more space?
MySQL's signed INT type can go up to 231-1. An unsigned INT can go up to 232-1, which is 4,294,967,295.
You're right this is not infinite, but 4.2 billion is pretty high and easily able to handle most needs.
You can also use a signed or unsigned BIGINT, which is 8 bytes, twice the size of an INT, but if you need values larger than INT, then you must store them.
Unsigned BIGINT goes up to 264-1 or 18,446,744,073,709,551,615. You're really, really, really unlikely to exhaust these values in your lifetime, even if you re-load your entire database multiple times per hour.
Re your comment.
Yes, most data types are fixed-size, meaning they use the same number of bytes on every row, regardless of the value you store in it on any given row. The reason for this is that you could change the value later, and if MySQL had to find more space to grow a small numeric value into a large numeric value, it would lead to other kinds of performance problems.
See http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html for more info on the number of bytes MySQL uses for each data type.
The exception is some string data types, (VARCHAR, VARBINARY, TEXT, BLOB), use a variable amount of space per row depending on the lengths of the strings you actually use.
But there are no numeric or date/time data types in MySQL that vary in size.
Another comment: you should ask yourself how much time & effort you're spending on optimizing this, and whether it would be more economical to just get a bigger disk. It's true the extra 4 bytes per row per integer adds up if you have a large database, but you'd need to store billions of rows before it really matters.
One thing you should consider, why is this important to you? The purpose of an ID is to be a unique identifier. Sure it can represent order in the fact that it's monotonically increasing, but is there any reason it specifically has to go from 1 to 2 to 3 for each (item-id, table) pair? Would it be that harmful if it was instead 1, 6, 20?
If you're using PHP you'll still receive that data in the same order, and in PHP it'll be very easy to know which is 1, 2 and 3.
MyISAM allows you to do this easily:
For MyISAM and BDB tables you can specify AUTO_INCREMENT on a
secondary column in a multiple-column index.
However, it's limited to two columns, so you still need to normalize this to remove one of the columns.
Otherwise, you can insert the next users (item 1) row like this:
INSERT INTO table1 (id, `item-id`, `table`)
SELECT MAX(id) + 1, 1, 'users' FROM table1 WHERE `item-id` = 1 AND `table` = 'users'
To extend it a little, the IFNULL part allows you to use the same clause for inserting the first row.
INSERT INTO table1 (id, `item-id`, `table`)
SELECT IFNULL(MAX(id), 0) + 1, 2, 'users' FROM table1 WHERE `item-id` = 2 AND `table` = 'users'
In this case, you would probably have a multi-column primary key, consisting of all three columns.

Best way to store sort order/priority?

I'm using MySQL. I have a table where I need to be able to sort manually set the priority/order of the rows. I had originally thought of assigning each row an arbitrary order (1, 2, 3, etc.), then just "swapping" the order with the row being moved, but I don't think this is the best way to do it.
After doing some reading at related questions on here (like this one), a lot of people have said to assign a value to the priority column based off the id column (id * 1000). And to rearrange the rows, you would divide/subtract the difference between the columns. I don't quite understand how this works.
This is the layout of the table I need to sort.
CREATE TABLE liability_detail (
id int NOT NULL AUTO_INCREMENT,
analysis_id int NOT NULL, //(many-to-one relationship with analysis table)
other_columns various datatypes
sequence int DEFAULT 0
)
I'd like to setup an easy way to manage the priority of rows so I can easily sort them without having to write a lot of code to manage everything.
I ended up following the advice in this question: https://stackoverflow.com/a/6804302/731052
I set the sort-order column = id * 1000 so I could get unique orders. So far this works very well and haven't had any problems with it.

Best solution for saving boolean values and saving cpu and memory on searches

What is the best solution for inserting boolean values on database if you want more query performance and minimum losing of memory on select statement.
For example:
I have a table with 36 fields that 30 of them has boolean values (zero or one) and i need to search records using the boolean fields that just have true values.
SELECT * FROM `myTable`
WHERE
`field_5th` = 1
AND `field_12th` = 1
AND `field_20` = 1
AND `field_8` = 1
Is there any solution?
If you want to store boolean values or flags there are basically three options:
Individual columns
This is reflected in your example above. The advantage is that you will be able to put indexes on the flags you intend to use most often for lookups. The disadvantage is that this will take up more space (since the minimum column size that can be allocated is 1 byte.)
However, if you're column names are really going to be field_20, field_21, etc. Then this is absolutely NOT the way to go. Numbered columns are a sign you should use either of the other two methods.
Bitmasks
As was suggested above you can store multiple values in a single integer column. A BIGINT column would give you up to 64 possible flags.
Values would be something like:
UPDATE table SET flags=b'100';
UPDATE table SET flags=b'10000';
Then the field would look something like: 10100
That would represent having two flag values set. To query for any particular flag value set, you would do
SELECT flags FROM table WHERE flags & b'100';
The advantage of this is that your flags are very compact space-wise. The disadvantage is that you can't place indexes on the field which would help improve the performance of searching for specific flags.
One-to-many relationship
This is where you create another table, and each row there would have the id of the row it's linked to, and the flag:
CREATE TABLE main (
main_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
);
CREATE TABLE flag (
main_id INT UNSIGNED NOT NULL,
name VARCHAR(16)
);
Then you would insert multiple rows into the flag table.
The advantage is that you can use indexes for lookups, and you can have any number of flags per row without changing your schema. This works best for sparse values, where most rows do not have a value set. If every row needs all flags defined, then this isn't very efficient.
For performance comparisson you can read a blog post I wrote on the topic:
Set Performance Compare
Also when you ask which is "Best" that's a very subjective question. Best at what? It all really depends on what your data looks like and what your requirements are and how you want to query it.
Keep in mind that if you want to do a query like:
SELECT * FROM table WHERE some_flag=true
Indexes will only help you if few rows have that value set. If most of the rows in the table have some_flag=true, then mysql will ignore indexes and do a full table scan instead.
How many rows of data are you querying over? You can store the boolean values in an integer value and use bit operations to test for them them. It's not indexable, but storage is very well packed. Using TINYINT fields with indexes would pick one index to use and scan from there.

Can I optimize my database by splitting one big table into many small ones?

Assume that I have one big table with three columns: "user_name", "user_property", "value_of_property". Lat's also assume that I have a lot of user (let say 100 000) and a lot of properties (let say 10 000). Then the table is going to be huge (1 billion rows).
When I extract information from the table I always need information about a particular user. So, I use, for example where user_name='Albert Gates'. So, every time the mysql server needs to analyze 1 billion lines to find those of them which contain "Albert Gates" as user_name.
Would it not be wise to split the big table into many small ones corresponding to fixed users?
No, I don't think that is a good idea. A better approach is to add an index on the user_name column - and perhaps another index on (user_name, user_property) for looking up a single property. Then the database does not need to scan all the rows - it just need to find the appropriate entry in the index which is stored in a B-Tree, making it easy to find a record in a very small amount of time.
If your application is still slow even after correctly indexing it can sometimes be a good idea to partition your largest tables.
One other thing you could consider is normalizing your database so that the user_name is stored in a separate table and use an integer foriegn key in its place. This can reduce storage requirements and can increase performance. The same may apply to user_property.
you should normalise your design as follows:
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists properties;
create table properties
(
property_id smallint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;
drop table if exists user_property_values;
create table user_property_values
(
user_id int unsigned not null,
property_id smallint unsigned not null,
value varchar(255) not null,
primary key (user_id, property_id),
key (property_id)
)
engine=innodb;
insert into users (username) values ('f00'),('bar'),('alpha'),('beta');
insert into properties (name) values ('age'),('gender');
insert into user_property_values values
(1,1,'30'),(1,2,'Male'),
(2,1,'24'),(2,2,'Female'),
(3,1,'18'),
(4,1,'26'),(4,2,'Male');
From a performance perspective the innodb clustered index works wonders in this similar example (COLD run):
select count(*) from product
count(*)
========
1,000,000 (1M)
select count(*) from category
count(*)
========
250,000 (500K)
select count(*) from product_category
count(*)
========
125,431,192 (125M)
select
c.*,
p.*
from
product_category pc
inner join category c on pc.cat_id = c.cat_id
inner join product p on pc.prod_id = p.prod_id
where
pc.cat_id = 1001;
0:00:00.030: Query OK (0.03 secs)
Properly indexing your database will be the number 1 way of improving performance. I once had a query take a half an hour (on a large dataset, but none the less). Then we come to find out that the tables had no index. Once indexed the query took less than 10 seconds.
Why do you need to have this table structure. My fundemental problem is that you are going to have to cast the data in value of property every time you want to use it. That is bad in my opinion - also storing numbers as text is crazy given that its all binary anyway. For instance how are you going to have required fields? Or fields that need to have constraints based on other fields? Eg start and end date?
Why not simply have the properties as fields rather than some many to many relationship?
have 1 flat table. When your business rules begin to show that properties should be grouped then you can consider moving them out into other tables and have several 1:0-1 relationships with the users table. But this is not normalization and it will degrade performance slightly due to the extra join (however the self documenting nature of the table names will greatly aid any developers)
One way i regularly see databqase performance get totally castrated is by having a generic
Id, property Type, Property Name, Property Value table.
This is really lazy but exceptionally flexible but totally kills performance. In fact on a new job where performance is bad i actually ask if they have a table with this structure - it invariably becomes the center point of the database and is slow. The whole point of relational database design is that the relations are determined ahead of time. This is simply a technique that aims to speed up development at a huge cost to application speed. It also puts a huge reliance on business logic in the application layer to behave - which is not defensive at all. Eventually you find that you wan to use properties in a key relationsip which leads to all kinds of casting on the join which further degrades performance.
If data has a 1:1 relationship with an entity then it should be a field on the same table. If your table gets to more than 30 fields wide then consider movign them into another table but dont call it normalisation because it isnt. It is a technique to help developers group fields together at the cost of performance in an attempt to aid understanding.
I don't know if mysql has an equivalent but sqlserver 2008 has sparse columns - null values take no space.
SParse column datatypes
I'm not saying a EAV approach is always wrong, but i think using a relational database for this approach is probably not the best choice.

How many fields should be indexed and how should I create them?

I've got a table in a MySQL database that has the following fields:
ID | GENDER | BIRTHYEAR | POSTCODE
Users can search the table using any of the fields in any combination (i.e., SELECT * FROM table WHERE GENDER = 'M' AND POSTCODE IN (1000, 2000); or SELECT * FROM table WHERE BIRTHYEAR = 1973;)
From the MySQL docs, it uses left indexing. So if I create an index on all 4 columns it won't use the index if the ID field isn't used. Do I need to create an index for every possible combination of field (ID; ID/GENDER; ID/BIRTHYEAR; etc.) or will creating one index for all fields be sufficient?
If it makes any difference, there are upwards of 3 million records in this table.
In this situation I typically log search criteria, number of results returned and time taken to perform the search. Just because you're creating the flexibility to search by any field doesn't mean your users make use of this flexibility. I'd normally create indexes on sensible combinations and then once I've determined the usage patterns drop the lowly used indexes or create new unsuspected indexes.
I'm not sure if MySQL supports statistics or histograms for skewed data but the index on gender may or may not work. If MySQL supports statistics then this will indicate the selectivity of an index. In a general population an index on a field with a 50/50 split won't help. If you're sample data is computer programmers and the data is 95% males then a search for females would use the index.
Use EXPLAIN.
(I'd say, use Postgres, too, lol).
It seems recent versions of MySQL can use several indexes in the same query, they call this Index Merge. In this case 1 index per column will be enough.
Gender is a special case, since selectivity is 50% you don't need an index on it, it would be counterproductive.
Creating indexes on single fields is useful but it would be really useful if your data was of varchar type and each record had a different value, since birthyear and postcode are numbers they are already well indexed.
You can index birthyear because it should be different for many of the records (but up to 120 birthyears in total at max I guess).
Gender in my opinion doesn't need an index.
You can find out what field combinations are most likely to give different results and index those, like: birthyear - postcode, id - birthyear, id - postcode.