Best way to store huge log data - mysql

I need an advice on optimal approach to store statistical data. There is a project on Django, which has a database (mysql) of 30 000 online games.
Each game has three statistical parameters:
number of views,
number of plays,
number of likes
Now I need to store historical data for these three parameters on a daily basis, so I was thinking on creating a single database which will has five columns:
gameid, number of views, plays, likes, date (day-month-year data).
So in the end, every day for every game will be logged in one row, so in one day this table will have 30000 rows, in 10 days it will have size of 300000 and in a year it will have size of 10 950 000 rows. I'm not a big specialist in DBA stuff, but this says me, that this quickly will become a performance problem. I'm not talking what will happen in 5 years time.
The data collected in this table is needed for simple graphs
(daily, weekly, monthly, custom range).
Maybe you have better ideas on how to store this data? Maybe noSQL will be more suitable in this case? Really need your advice on this.d

Partitioning in postgresql works great for big logs. First create the parent table:
create table game_history_log (
gameid integer,
views integer,
plays integer,
likes integer,
log_date date
);
Now create the partitions. In this case one for each month, 900 k rows, would be good:
create table game_history_log_201210 (
check (log_date between '2012-10-01' and '2012-10-31')
) inherits (game_history_log);
create table game_history_log_201211 (
check (log_date between '2012-11-01' and '2012-11-30')
) inherits (game_history_log);
Notice the check constraints in each partition. If you try to insert in the wrong partition:
insert into game_history_log_201210 (
gameid, views, plays, likes, log_date
) values (1, 2, 3, 4, '2012-09-30');
ERROR: new row for relation "game_history_log_201210" violates check constraint "game_history_log_201210_log_date_check"
DETAIL: Failing row contains (1, 2, 3, 4, 2012-09-30).
One of the advantages of partitioning is that it will only search in the correct partition reducing drastically and consistently the search size regardless of how many years of data there is. Here the explain for the search for a certain date:
explain
select *
from game_history_log
where log_date = date '2012-10-02';
QUERY PLAN
------------------------------------------------------------------------------------------------------
Result (cost=0.00..30.38 rows=9 width=20)
-> Append (cost=0.00..30.38 rows=9 width=20)
-> Seq Scan on game_history_log (cost=0.00..0.00 rows=1 width=20)
Filter: (log_date = '2012-10-02'::date)
-> Seq Scan on game_history_log_201210 game_history_log (cost=0.00..30.38 rows=8 width=20)
Filter: (log_date = '2012-10-02'::date)
Notice that apart from the parent table it only scanned the correct partition. Obviously you can have indexes on the partitions to avoid a sequential scan.
Inheritance Partitioning

11M rows ins't excessive, but indexing in general and the clustering of your primary key will matter more (on InnoDB). I suggest (game_id, date) for a primary key so that queries for all data about a certain game are in sequential rows. Also, you may want to keep a separate table of just the current values for ranking games, etc, when just the latest figures are necessary.

There is no performance problems with MySQL with 10kk data. You can just apply partitioning by game id (requires atleast 5.5 version).
I have MySQL DB with data like this and currently there is no problems with 980kk rows.

I would advice not to use relational database at all.
Statictics is sort of things which is changing pretty rapidly, because new data are constantly arriving.
I believe smth like HBase will fit better - because adding new records works faster here.

Instead of keeping every row, keep recent data at high precision, middle data at mdeium precision, and long-term at low precision. This is the appoach taken by rrdtool which might be better for this than mysql.

Related

How to create sub tables in MySQL? [duplicate]

I have a MySQL database with over 20 tables, but one of them is significantly large because it collects measurement data from different sensors. It's size is around 145 GB on disk and it contains over 1 billion records. All this data is also being replicated to another MySQL server.
I'd like to separate the data to smaller "shards", so my question is which of the below solutions would be better. I'd use the record's "timestamp" for dividing the data by years. Almost all SELECT queries that are executed on this table contain the "timestamp" field in the "where" part of the query.
So below are the solutions that I cannot decide on:
Using the MySQL partitioning and dividing the data by years (e.g. partition1 - 2010, partition2 - 2011 etc.)
Creating separate tables and dividing the data by years (e.g. measuring_2010, measuring_2011 etc. tables)
Are there any other (newer) possible options that I'm not aware of?
I know that in the first case MySQL itself would get the data from the 'shards' and in the second case I'd have to write a kind of wrapper for it and do it by myself. Is there any other way for the second case that would make all separate tables to be seen as 'one big table' to fetch data from?
I know this question has already been asked in the past, but maybe somebody came up with some new solution (that I'm not aware of) or that the best practice solution changed by now. :)
Thanks a lot for your help.
Edit:
The schema is something similar to this:
device_id (INT)
timestamp (DATETIME)
sensor_1_temp (FLOAT)
sensor_2_temp (FLOAT)
etc. (30 more for instance)
All sensor temperatures are written at the same moment once a minute. Note that there around 30 different sensors measurements written in a row. This data is mostly used for displaying graphs and some other statistic purposes.
Well, if you are hoping for a new answer, that means you have probably read my answers, and I sound like a broken record. See Partitioning blog for the few use cases where partitioning can help performance. Yours does not sound like any of the 4 cases.
Shrink device_id. INT is 4 bytes; do you really have millions of devices? TINYINT UNSIGNED is 1 byte and a range of 0..255. SMALLINT UNSIGNED is 2 bytes and a range of 0..64K. That will shrink the table a little.
If your real question is about how to manage so much data, then let's "think outside the box". Read on.
Graphing... What date ranges are you graphing?
The 'last' hour/day/week/month/year?
An arbitrary hour/day/week/month/year?
An arbitrary range, not tied to day/week/month/year boundaries?
What are you graphing?
Average value over a day?
Max/min over the a day?
Candlesticks (etc) for day or week or whatever?
Regardless of the case, you should build (and incrementally maintain) a Summary Table with data. A row would contain summary info for one hour. I would suggest
CREATE TABLE Summary (
device_id SMALLINT UNSIGNED NOT NULL,
sensor_id TINYINT UNSIGNED NOT NULL,
hr TIMESTAMP NOT NULL,
avg_val FLOAT NOT NULL,
min_val FLOAT NOT NULL,
max_val FLOAT NOT NULL
PRIMARY KEY (device_id, sensor_id, hr)
) ENGINE=InnoDB;
The one Summary table might be 9GB (for current amount of data).
SELECT hr,
avg_val,
min_val,
max_val
FROM Summary
WHERE device_id = ?
AND sensor_id = ?
AND hr >= ?
AND hr < ? + INTERVAL 20 DAY;
Would give you the hi/lo/avg values for 480 hours; enough to graph? Grabbing 480 rows from the summary table is a lot faster than grabbing 60*480 rows from the raw data table.
Getting similar data for a year would probably choke a graphing package, so it may be worth building a summary of the summary -- with resolution of a day. It would be about 0.4GB.
There are a few different ways to build the Summary table(s); we can discuss that after you have pondered its beauty and read Summary tables blog. It may be that gathering one hour's worth of data, then augmenting the Summary table, is the best way. That would be somewhat like the flip-flop discussed my Staging table blog.
And, if you had the hourly summaries, do you really need the minute-by-minute data? Consider throwing it away. Or, maybe data after, say, one month. That leads to using partitioning, but only for its benefit in deleting old data as discussed in "Case 1" of Partitioning blog. That is, you would have daily partitions, using DROP and REORGANIZE every night to shift the time of the "Fact" table. This would lead to decreasing your 145GB footprint, but without losing much data. New footprint: About 12GB (Hourly summary + last 30 days' minute-by-minute details)
PS: The Summary Table blog shows how to get standard deviation.
You haven't said much about how you use/query the data or what the schema looks like but I try to make something up.
One thing how you can split your table is based on entities
(different sensors are different entities). That's useful if
different sensors require different columns. So you don't need to
force them into one schema that fits all of them (least common
multiple). Though it's not good if sensors are added or removed
dynamically since you would have to add tables at runtime.
Another approach is to split the table based on time. This is the
case if after some time data can be "historized" and it is not used for
the actual business logic anymore but for statistic purposes.
Both approaches can also be combined. Furthermore be sure that the table is properly indexed according to your query needs.
I strongly discourage any approach that often requires adding a table after some time or anything similar. As always I wouldn't split anything before there's a performance issue.
Edit:
I would clearly restructure the table to following and not split it at all:
device_id (INT)
timestamp (DATETIME)
sensor_id (INT) -- could be unique or not. if sensor_id is not unique make a
-- composite key from device_id and sensor_id given that you
-- need it for queries
sensor_temp (FLOAT)
If data grows fast and you're expecting to generate terabytes of data soon you are better off with a NoSQL approach. But that's a different story.

How to design a database/table which adds a lot of rows every minute

I am in the situation where I need to store data for 1900+ cryptocurrencies every minute, i use MySQL innoDB.
Currently, the table looks like this
coins_minute_id | coins_minute_coin_fk | coins_minute_usd | coins_minute_btc | coins_minute_datetime | coins_minute_timestamp
coins_minute_id = autoincrement id
coins_minute_coin_fk = medium int unsigned
coins_minute_usd = decimal 20,6
coins_minute_btc = decimal 20,8
coins_minute_datetime = datetime
coins_minute_timestamp = timestamp
The table grew incredibly fast in the matter of no time, every minute 1900+ rows are added to the table.
The data will be used for historical price display as a D3.js line graph for each cryptocurrency.
My question is how do i optimize this database the best, i have thought of only collecting the data every 5 minutes instead of 1, but it will still add up to a lot of data in no time, i have also thought if it was better to create a unique table for each cryptocurrency, does any of you who loves to design databases know some other very smart and clever way to do stuff like this?
Kindly Regards
(From Comment)
SELECT coins_minute_coin_fk, coins_minute_usd
FROM coins_minutes
WHERE coins_minute_datetime >= DATE_ADD(NOW(),INTERVAL -1 DAY)
AND coins_minute_coin_fk <= 1000
ORDER BY coins_minute_coin_fk ASC
Get rid of coins_minute_ prefix; it clutters the SQL without providing any useful info.
Don't specify the time twice -- there are simple functions to convert between DATETIME and TIMESTAMP. Why do you have both 'created' and an 'updated' timestamps? Are you doing UPDATE statements? If so, then the code is more complicated than simply "inserting". And you need a unique key to know which row to update.
Provide SHOW CREATE TABLE; it is more descriptive that what you provided.
30 inserts/second is easily handled. 300/sec may have issues.
Do not PARTITION the table without some real reason to do so. The common valid reason is that you want to delete 'old' data periodically. If you are deleting after 3 months, I would build the table with PARTITION BY RANGE(TO_DAYS(...)) and use weekly partitions. More discussion: http://mysql.rjweb.org/doc.php/partitionmaint
Show us the queries. A schema cannot be optimized without knowing how it will be accessed.
"Batch" inserts are much faster than single-row INSERT statements. This can be in the form of INSERT INTO x (a,b) VALUES (1,2), (11,22), ... or LOAD DATA INFILE. The latter is very good if you already have a CSV file.
Does your data come from a single source? Or 1900 different sources?
MySQL and MariaDB are probably identical for your task. (Again, need to see queries.) PDO is fine for either; no recoding needed.
After seeing the queries, we can discuss what PRIMARY KEY to have and what secondary INDEX(es) to have.
1 minute vs 5 minutes? Do you mean that you will gather only one-fifth as many rows in the latter case? We can discuss this after the rest of the details are brought out.
That query does not make sense in multiple ways. Why stop at "1000"? The output is quite large; what client cares about that much data? The ordering is indefinite -- the datetime is not guaranteed to be in order. Why specify the usd without specifying the datetime? Please provide a rationale query; then I can help you with INDEX(es).

MySQL: Splitting a large table into partitions or separate tables?

I have a MySQL database with over 20 tables, but one of them is significantly large because it collects measurement data from different sensors. It's size is around 145 GB on disk and it contains over 1 billion records. All this data is also being replicated to another MySQL server.
I'd like to separate the data to smaller "shards", so my question is which of the below solutions would be better. I'd use the record's "timestamp" for dividing the data by years. Almost all SELECT queries that are executed on this table contain the "timestamp" field in the "where" part of the query.
So below are the solutions that I cannot decide on:
Using the MySQL partitioning and dividing the data by years (e.g. partition1 - 2010, partition2 - 2011 etc.)
Creating separate tables and dividing the data by years (e.g. measuring_2010, measuring_2011 etc. tables)
Are there any other (newer) possible options that I'm not aware of?
I know that in the first case MySQL itself would get the data from the 'shards' and in the second case I'd have to write a kind of wrapper for it and do it by myself. Is there any other way for the second case that would make all separate tables to be seen as 'one big table' to fetch data from?
I know this question has already been asked in the past, but maybe somebody came up with some new solution (that I'm not aware of) or that the best practice solution changed by now. :)
Thanks a lot for your help.
Edit:
The schema is something similar to this:
device_id (INT)
timestamp (DATETIME)
sensor_1_temp (FLOAT)
sensor_2_temp (FLOAT)
etc. (30 more for instance)
All sensor temperatures are written at the same moment once a minute. Note that there around 30 different sensors measurements written in a row. This data is mostly used for displaying graphs and some other statistic purposes.
Well, if you are hoping for a new answer, that means you have probably read my answers, and I sound like a broken record. See Partitioning blog for the few use cases where partitioning can help performance. Yours does not sound like any of the 4 cases.
Shrink device_id. INT is 4 bytes; do you really have millions of devices? TINYINT UNSIGNED is 1 byte and a range of 0..255. SMALLINT UNSIGNED is 2 bytes and a range of 0..64K. That will shrink the table a little.
If your real question is about how to manage so much data, then let's "think outside the box". Read on.
Graphing... What date ranges are you graphing?
The 'last' hour/day/week/month/year?
An arbitrary hour/day/week/month/year?
An arbitrary range, not tied to day/week/month/year boundaries?
What are you graphing?
Average value over a day?
Max/min over the a day?
Candlesticks (etc) for day or week or whatever?
Regardless of the case, you should build (and incrementally maintain) a Summary Table with data. A row would contain summary info for one hour. I would suggest
CREATE TABLE Summary (
device_id SMALLINT UNSIGNED NOT NULL,
sensor_id TINYINT UNSIGNED NOT NULL,
hr TIMESTAMP NOT NULL,
avg_val FLOAT NOT NULL,
min_val FLOAT NOT NULL,
max_val FLOAT NOT NULL
PRIMARY KEY (device_id, sensor_id, hr)
) ENGINE=InnoDB;
The one Summary table might be 9GB (for current amount of data).
SELECT hr,
avg_val,
min_val,
max_val
FROM Summary
WHERE device_id = ?
AND sensor_id = ?
AND hr >= ?
AND hr < ? + INTERVAL 20 DAY;
Would give you the hi/lo/avg values for 480 hours; enough to graph? Grabbing 480 rows from the summary table is a lot faster than grabbing 60*480 rows from the raw data table.
Getting similar data for a year would probably choke a graphing package, so it may be worth building a summary of the summary -- with resolution of a day. It would be about 0.4GB.
There are a few different ways to build the Summary table(s); we can discuss that after you have pondered its beauty and read Summary tables blog. It may be that gathering one hour's worth of data, then augmenting the Summary table, is the best way. That would be somewhat like the flip-flop discussed my Staging table blog.
And, if you had the hourly summaries, do you really need the minute-by-minute data? Consider throwing it away. Or, maybe data after, say, one month. That leads to using partitioning, but only for its benefit in deleting old data as discussed in "Case 1" of Partitioning blog. That is, you would have daily partitions, using DROP and REORGANIZE every night to shift the time of the "Fact" table. This would lead to decreasing your 145GB footprint, but without losing much data. New footprint: About 12GB (Hourly summary + last 30 days' minute-by-minute details)
PS: The Summary Table blog shows how to get standard deviation.
You haven't said much about how you use/query the data or what the schema looks like but I try to make something up.
One thing how you can split your table is based on entities
(different sensors are different entities). That's useful if
different sensors require different columns. So you don't need to
force them into one schema that fits all of them (least common
multiple). Though it's not good if sensors are added or removed
dynamically since you would have to add tables at runtime.
Another approach is to split the table based on time. This is the
case if after some time data can be "historized" and it is not used for
the actual business logic anymore but for statistic purposes.
Both approaches can also be combined. Furthermore be sure that the table is properly indexed according to your query needs.
I strongly discourage any approach that often requires adding a table after some time or anything similar. As always I wouldn't split anything before there's a performance issue.
Edit:
I would clearly restructure the table to following and not split it at all:
device_id (INT)
timestamp (DATETIME)
sensor_id (INT) -- could be unique or not. if sensor_id is not unique make a
-- composite key from device_id and sensor_id given that you
-- need it for queries
sensor_temp (FLOAT)
If data grows fast and you're expecting to generate terabytes of data soon you are better off with a NoSQL approach. But that's a different story.

In MySql, is it worthwhile creating more than one multi-column indexes on the same set of columns?

I am new to SQL, and certainly to MySQL.
I have created a table from streaming market data named trade that looks like
date | time |instrument|price |quantity
----------|-----------------------|----------|-------|--------
2017-09-08|2017-09-08 13:16:30.919|12899586 |54.15 |8000
2017-09-08|2017-09-08 13:16:30.919|13793026 |1177.75|750
2017-09-08|2017-09-08 13:16:30.919|1346049 |1690.8 |1
2017-09-08|2017-09-08 13:16:30.919|261889 |110.85 |50
This table is huge (150 million rows per date).
To retrieve data efficiently, I have created an index date_time_inst (date,time,instrument) because most of my queries will select a specific date
or date range and then a time range.
But that does not help speed up a query like:
select * from trade where date="2017-09-08", instrument=261889
So, I am considering creating another index date_inst_time (date, instrument, time). Will that help speed up queries where I wish to get the time-series of one or a few instruments out of the thousands?
In additional database write-time due to index update, should I worry too much?
I get data every second, and take about 100 ms to process it and store in a database. As long as I continue to take less than 1 sec I am fine.
To get the most efficient query you need to query on a clustered index. According the the documentation this is automatically set on the primary key and can not be set on any other columns.
I would suggest ditching the date column and creating a composite primary key on time and instrument
A couple of recommendations:
There is no need to store date and time separately if time corresponds to time of the same date. You can instead have one datetime column and store timestamps in it
You can then have one index on datetime and instrument columns, that will make the queries run faster
With so many inserts and fixed format of SELECT query (i.e. always by date first, followed by instrument), I would suggest looking into other columnar databases (like Cassandra). You will get faster writes and reads for such structure
First, your use case sounds like two indexes would be useful (date, instrument) and (date, time).
Given your volume of data, you may want to consider partitioning the data. This involves storing different "shards" of data in different files. One place to start is with the documentation.
From your description, you would want to partition by date, although instrument is another candidate.
Another approach would be a clustered index with date as the first column in the index. This assumes that the data is inserted "in order", to reduce movement of the data on inserts.
You are dealing with a large quantity of data. MySQL should be able to handle the volume. But, you may need to dive into more advanced functionality, such as partitioning and clustered indexes to get the functionality you need.
Typo?
I assume you meant
select * from trade where date="2017-09-08" AND instrument=261889
^^^
Optimal index for such is
INDEX(instrument, date)
And, contrary to other Comments/Answers, it is better to have the date last, especially if you want more than one day.
Splitting date and time
It is usually a bad idea to split date and time. It is also usually a bad idea to have redundant data; in this case, the date is repeated. Instead, use
WHERE `time` >= "2017-09-08"
AND `time` < "2017-09-08" + INTERVAL 1 DAY
and get rid of the date column. Note: This pattern works for DATE, DATETIME, DATETIME(3), etc, without messing up with the midnight at the end of the range.
Data volume?
150M rows? 10 new rows per second? That means you have about 5 years' data? A steady 10/sec insertion rate is rarely a problem.
Need to see SHOW CREATE TABLE. If there are a lot of indexes, then there could be a problem. Need to see the datatypes to look for shrinking the size.
Will you be purging 'old' data? If so, we need to talk about partitioning for that specific purpose.
How many "instruments"? How much RAM? Need to discuss the ramifications of an index starting with instrument.
The query
Is that the main SELECT you use? Is it always 1 day? One instrument? How many rows are typically returned.
Depending on the PRIMARY KEY and whatever index is used, fetching 100 rows could take anywhere from 10ms to 1000ms. Is this issue important?
Millisecond resolution
It is usually folly to think that any time resolution is not going to have duplicates.
Is there an AUTO_INCREMENT already?
SPACE IS CHEAP. Indexes take time creating/inserting (once), but shave time retrieving (Many many times)
My experience is to create as many indexes with all the relevant fields in all orders. This way, Mysql can choose the best index for your query.
So if you have 3 relevant fields
INDEX 1 (field1,field2,field3)
INDEX 2 (field1,field3)
INDEX 3 (field2,field3)
INDEX 4 (field3)
The first index will be used when all fields are present. The others are for shorter WHERE conditions.
Unless you know that some combinations will never be used, this will give MySQL the best chance to optimize your query. I'm also assuming that field1 is the biggest driver of the data.

What are some optimization techniques for MySQL table with 300+ million records?

I am looking at storing some JMX data from JVMs on many servers for about 90 days. This data would be statistics like heap size and thread count. This will mean that one of the tables will have around 388 million records.
From this data I am building some graphs so you can compare the stats retrieved from the Mbeans. This means I will be grabbing some data at an interval using timestamps.
So the real question is, Is there anyway to optimize the table or query so you can perform these queries in a reasonable amount of time?
Thanks,
Josh
There are several things you can do:
Build your indexes to match the queries you are running. Run EXPLAIN to see the types of queries that are run and make sure that they all use an index where possible.
Partition your table. Paritioning is a technique for splitting a large table into several smaller ones by a specific (aggregate) key. MySQL supports this internally from ver. 5.1.
If necessary, build summary tables that cache the costlier parts of your queries. Then run your queries against the summary tables. Similarly, temporary in-memory tables can be used to store a simplified view of your table as a pre-processing stage.
3 suggestions:
index
index
index
p.s. for timestamps you may run into performance issues -- depending on how MySQL handles DATETIME and TIMESTAMP internally, it may be better to store timestamps as integers. (# secs since 1970 or whatever)
Well, for a start, I would suggest you use "offline" processing to produce 'graph ready' data (for most of the common cases) rather than trying to query the raw data on demand.
If you are using MYSQL 5.1 you can use the new features.
but be warned they contain lot of bugs.
first you should use indexes.
if this is not enough you can try to split the tables by using partitioning.
if this also wont work, you can also try load balancing.
A few suggestions.
You're probably going to run aggregate queries on this stuff, so after (or while) you load the data into your tables, you should pre-aggregate the data, for instance pre-compute totals by hour, or by user, or by week, whatever, you get the idea, and store that in cache tables that you use for your reporting graphs. If you can shrink your dataset by an order of magnitude, then, good for you !
This means I will be grabbing some data at an interval using timestamps.
So this means you only use data from the last X days ?
Deleting old data from tables can be horribly slow if you got a few tens of millions of rows to delete, partitioning is great for that (just drop that old partition). It also groups all records from the same time period close together on disk so it's a lot more cache-efficient.
Now if you use MySQL, I strongly suggest using MyISAM tables. You don't get crash-proofness or transactions and locking is dumb, but the size of the table is much smaller than InnoDB, which means it can fit in RAM, which means much quicker access.
Since big aggregates can involve lots of rather sequential disk IO, a fast IO system like RAID10 (or SSD) is a plus.
Is there anyway to optimize the table or query so you can perform these queries
in a reasonable amount of time?
That depends on the table and the queries ; can't give any advice without knowing more.
If you need complicated reporting queries with big aggregates and joins, remember that MySQL does not support any fancy JOINs, or hash-aggregates, or anything else useful really, basically the only thing it can do is nested-loop indexscan which is good on a cached table, and absolutely atrocious on other cases if some random access is involved.
I suggest you test with Postgres. For big aggregates the smarter optimizer does work well.
Example :
CREATE TABLE t (id INTEGER PRIMARY KEY AUTO_INCREMENT, category INT NOT NULL, counter INT NOT NULL) ENGINE=MyISAM;
INSERT INTO t (category, counter) SELECT n%10, n&255 FROM serie;
(serie contains 16M lines with n = 1 .. 16000000)
MySQL Postgres
58 s 100s INSERT
75s 51s CREATE INDEX on (category,id) (useless)
9.3s 5s SELECT category, sum(counter) FROM t GROUP BY category;
1.7s 0.5s SELECT category, sum(counter) FROM t WHERE id>15000000 GROUP BY category;
On a simple query like this pg is about 2-3x faster (the difference would be much larger if complex joins were involved).
EXPLAIN Your SELECT Queries
LIMIT 1 When Getting a Unique Row
SELECT * FROM user WHERE state = 'Alabama' // wrong
SELECT 1 FROM user WHERE state = 'Alabama' LIMIT 1
Index the Search Fields
Indexes are not just for the primary keys or the unique keys. If there are any columns in your table that you will search by, you should almost always index them.
Index and Use Same Column Types for Joins
If your application contains many JOIN queries, you need to make sure that the columns you join by are indexed on both tables. This affects how MySQL internally optimizes the join operation.
Do Not ORDER BY RAND()
If you really need random rows out of your results, there are much better ways of doing it. Granted it takes additional code, but you will prevent a bottleneck that gets exponentially worse as your data grows. The problem is, MySQL will have to perform RAND() operation (which takes processing power) for every single row in the table before sorting it and giving you just 1 row.
Use ENUM over VARCHAR
ENUM type columns are very fast and compact. Internally they are stored like TINYINT, yet they can contain and display string values.
Use NOT NULL If You Can
Unless you have a very specific reason to use a NULL value, you should always set your columns as NOT NULL.
"NULL columns require additional space in the row to record whether their values are NULL. For MyISAM tables, each NULL column takes one bit extra, rounded up to the nearest byte."
Store IP Addresses as UNSIGNED INT
In your queries you can use the INET_ATON() to convert and IP to an integer, and INET_NTOA() for vice versa. There are also similar functions in PHP called ip2long() and long2ip().