How to make a large mysql database capable of fast operations - mysql

I'm going to make a mysql database that is potentially going to hold thousands of records. before I create it I would like know what is the best way to approach this? I mean how can I make the database faster and when does a mysql databse start to slow down when fetching data.
Are there any free solutions to this?
Thanks for reading :)

Thousands of records is absolutely nothing for a DB.
Make proper indexes on the column you like to put conditions on in your queries. Example database table:
Persons
------------------------------
id int auto_increment
name varchar(200)
gender char(1)
Now imagine in the table are thousands of Person records. You want to select all data of a single person by the person's name.
The query would be
select * from persons
where name = 'John'
If you put an index on the name column then the query will perform much faster.

Related

10000 rows - selecting according to where clause - performance

I have a database table with 10000 rows in it and I'd like to select a few thousand items using something like the following:
SELECT id FROM models WHERE category_id = 2
EDIT: The id column is the primary index of the table. Also, the table has another index on category_id.
My question is what would be the impact on performance? Would the query run slow? Should I consider splitting my models table into separate tables (one table for each category)?
This is what database engines are designed for. Just make sure you have an index on the column that you're using in the WHERE clause.
You can try this to get the 100 records
SELECT id FROM models WHERE category_id = 2 LIMT 100
Also you can create index on that column to get the fast retrival of the result
ALTER TABLE `table` ADD INDEX `category_id ` (`category_id `)
EDIT:-
If you have index created on your columns then you dont have to worry about the performance, database engines are smart enough to take care of the performance.
My question is what would be the impact on performance? Would the
query run slow? Should I consider splitting my models table into
separate tables
No you dont have to split your tables as that would not help you in gaining performance
Im fairly new to SQL however I would first index the column
I agree with R.T.'s solution. In addition I can recommend you the link below :
https://indexanalysis.codeplex.com/
download the sql code. It's a stored procedure that helps me a lot when I want to analyze the impact of the indexes or what status they have in my database.
Please check.

MySQL: Index for fast DISTINCT queries?

What index(es) to I need to set to get results as fast as possible for DISTINCT queries on a certain column?
Example table columns:
id INTEGER
name VARCHAR(32)
groupname VARCHAR(16)
Every so often I need to get a list of all groups,
SELECT DISTINCT groupname FROM data ORDER BY groupname
The table can have > 200k entries, but only about a dozen groups. I would like to not use a separate table for the group names, because the data get imported often from a CSV file.
In this case, an index on groupname should get you the best possible results.
If that's not good enough, a couple more options to consider - first, you could cache the results of that query so that you only run it when you absolutely have to. Second, you could create a separate table to store the groupname values and populate it via an insert trigger (this would avoid having to change your CSV import process)
Indexing on groupname will solve your issue. If you are very much concern about the performance of your query while inserting/updating, then instead of indexing whole column try "column prefix Indexing".
Just adding indexes on varchar might slow down your insertion/updation as it need to update the index lookup for every write. For more information read BTree indexing algorithm

Question on how to improve my database structure and what fields should I index?

I am creating a simple comparison script and I have some questions for the database structure. Firstly the database will be huge, I am expecting more than 1 million entries in products.
Secondly, there will be a search form that the search term will look into (%$term%) the field name and display the product's related info and shop's info.
Below you can see my database structure named products.
id int(10) NOT NULL
name varchar(50) NOT NULL
link varchar(50) NOT NULL
description varchar(50) NOT NULL
image varchar(50) NOT NULL
price varchar(50) NOT NULL
My questions are:
Do you suggest me to index a field? Users will not be able to insert or update products, the only query will be SELECT to display the results and I will update the products from XML feeds often for possible products changes.
I have to store the shop info like name, shipping, link, image... This gives me two option. a) To create a new table named shops and join those two tables with a new field in products shopID that will look for the id in shops and display the info or b) Should I add these info (name, shipping, ...) in extra fields in products in every single product ? (I think the answer is obvious but I need your suggestion).
Are there any other things I should have in mind, or change?
I am not an advanced programmer and what I learn is through internet, so maybe the questions are too obvious for you, but for me is the ticket for learning.
Thank you for your answers.
Indexes are required to fetch records very fast. So yes, they're recommended. But what kind of an index would you like to use? MyISAM engine offers "regular" string index that you can use with a LIKE clause (e.g. LIKE 'hello%') but it restricts you from using a wildcard at the beginning of the search phrase. In addition, MyISAM has a FULLTEXT index that allows you to search words in the whole string, not just the beginning of the string. So you could create a FULLTEXT index on the columns description and name - but 2 FULLTEXT indexes seem redundant in this case. Maybe you could join those columns and separate the values with a token or a character? If so, you'll need to create only 1 FULLTEXT index on the joined column, which can save a lot fragmentation and disk space. One of the cons for using MyISAM engine is that when writing to it (UPDATE/DELETE queries) - it locks the entire table. So, if the table is written to many times a minute, it will probably make other queries hang. That's why you should see if InnoDB engine suits your needs - which enables concurrent read/write operations on the table.
That's probably a good idea, since having index on the column price seems essential, and FULLTEXT indexes doesn't work together with other indexes.
I'd say: Use InnoDB and Sphinx, and have a primary index on id & a regular index on price.
The most important thing for you to understand is that when writing a code for specific software, you must be well familiar with that software and it's caveats. You should read High performance MySQL - extremely recommended.
Edit:
If you want to add an indexes in the products table, you can do that with
ALTER TABLE /* etc */ when the table is empty or contains small amount of data. If the table has a lot of data, then it's recommended to create another table that's similar to products, altering that new table and populating it with data from the old products table, e.g.:
CREATE TABLE `products_new` LIKE `products`;
ALTER TABLE `products_new` ADD FULLTEXT (`name`);
LOCK TABLES `products` READ, `products_new` WRITE;
INSERT INTO `products_new` SELECT * FROM `products`;
LOCK TABLES `products` WRITE, `products_new` WRITE;
ALTER TABLE `products` RENAME TO `products_bad`;
ALTER TABLE `products_new` RENAME TO `products`;
/* The following doesn't work:
RENAME TABLE `products` TO `products_bad`, `products_new` TO `products`;
See: http://bugs.mysql.com/bug.php?id=22246
*/
DROP TABLE `products_bad`;
Nikolai,
The ID should be a primary key. That automatically puts an index on ID, and will speed up any queries that need to get specific products.
The shop table should be a second table, but you should have a 3rd table that joins product with shops. At it's most basic, it would have two fields, shop_id, product_id. This let's you have a single product in multiple shops. These two fields should be foreign keys to the product table and shop table.
If you are ever thinking about having a different price for a product per shop, then the product_store join table should also contain the price, although the base price could be stored in the products table.
Price should be a decimal, so that you can do calculations on the price field.
1) You should generally index fields that are commonly used. However since your search on name uses a wildcard at the start an index will have no effect on this query.
2) Creating a shops table and linking to this would be better.
Price for sure because something tells me you will search over this field and do orderings.
"Premature optimization is a root of all evil" (c) Donald Knuth. So, I suggest to normalize your tables, so YES - create table for shops. Once your applicated grown big, and you faced to highloads, you will be able to denormalize your database to avoid JOINS (one way to optimize your voracious application)
Get back to stackoverflow with your problem ;-)
Generally you should index fields that will be intensively used. But using wildcard for your search won't help much.
Better use another table with foreign key.
Also shouldn't your "id" field in your products table be define as PRIMARY KEY ?
Here are my suggestions:
To be able to search for %term% you need full-text search, an index will not do you any good when the search-term starts with a wildcard.
Yes you should put an index on the id-column (and probably make it auto increment) since that seems to be the unique column in the table. Other than that there's no point in us suggesting any other indexes since we don't which queries you are going to run.
Yes, create another table for shops, otherwise you will have data that is not normalized, for shop-name and so on (there might be rare cases that "require" de-normalization, such as optimization, but you have not reached there yet). Not normalized data will cause problems, in your specific case, such as what will you do when a shop needs to change it name? Well, you will have to update all matching rows in the product table.
There are many things you should keep in mind, but it's out of scope for this answer. I suggest that you get to work and learn as you go, because learning by doing is a great way become a better developer. Then when you hit a specific problem, search for/post it here on stackoverflow.

Can I optimize my database by splitting one big table into many small ones?

Assume that I have one big table with three columns: "user_name", "user_property", "value_of_property". Lat's also assume that I have a lot of user (let say 100 000) and a lot of properties (let say 10 000). Then the table is going to be huge (1 billion rows).
When I extract information from the table I always need information about a particular user. So, I use, for example where user_name='Albert Gates'. So, every time the mysql server needs to analyze 1 billion lines to find those of them which contain "Albert Gates" as user_name.
Would it not be wise to split the big table into many small ones corresponding to fixed users?
No, I don't think that is a good idea. A better approach is to add an index on the user_name column - and perhaps another index on (user_name, user_property) for looking up a single property. Then the database does not need to scan all the rows - it just need to find the appropriate entry in the index which is stored in a B-Tree, making it easy to find a record in a very small amount of time.
If your application is still slow even after correctly indexing it can sometimes be a good idea to partition your largest tables.
One other thing you could consider is normalizing your database so that the user_name is stored in a separate table and use an integer foriegn key in its place. This can reduce storage requirements and can increase performance. The same may apply to user_property.
you should normalise your design as follows:
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists properties;
create table properties
(
property_id smallint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;
drop table if exists user_property_values;
create table user_property_values
(
user_id int unsigned not null,
property_id smallint unsigned not null,
value varchar(255) not null,
primary key (user_id, property_id),
key (property_id)
)
engine=innodb;
insert into users (username) values ('f00'),('bar'),('alpha'),('beta');
insert into properties (name) values ('age'),('gender');
insert into user_property_values values
(1,1,'30'),(1,2,'Male'),
(2,1,'24'),(2,2,'Female'),
(3,1,'18'),
(4,1,'26'),(4,2,'Male');
From a performance perspective the innodb clustered index works wonders in this similar example (COLD run):
select count(*) from product
count(*)
========
1,000,000 (1M)
select count(*) from category
count(*)
========
250,000 (500K)
select count(*) from product_category
count(*)
========
125,431,192 (125M)
select
c.*,
p.*
from
product_category pc
inner join category c on pc.cat_id = c.cat_id
inner join product p on pc.prod_id = p.prod_id
where
pc.cat_id = 1001;
0:00:00.030: Query OK (0.03 secs)
Properly indexing your database will be the number 1 way of improving performance. I once had a query take a half an hour (on a large dataset, but none the less). Then we come to find out that the tables had no index. Once indexed the query took less than 10 seconds.
Why do you need to have this table structure. My fundemental problem is that you are going to have to cast the data in value of property every time you want to use it. That is bad in my opinion - also storing numbers as text is crazy given that its all binary anyway. For instance how are you going to have required fields? Or fields that need to have constraints based on other fields? Eg start and end date?
Why not simply have the properties as fields rather than some many to many relationship?
have 1 flat table. When your business rules begin to show that properties should be grouped then you can consider moving them out into other tables and have several 1:0-1 relationships with the users table. But this is not normalization and it will degrade performance slightly due to the extra join (however the self documenting nature of the table names will greatly aid any developers)
One way i regularly see databqase performance get totally castrated is by having a generic
Id, property Type, Property Name, Property Value table.
This is really lazy but exceptionally flexible but totally kills performance. In fact on a new job where performance is bad i actually ask if they have a table with this structure - it invariably becomes the center point of the database and is slow. The whole point of relational database design is that the relations are determined ahead of time. This is simply a technique that aims to speed up development at a huge cost to application speed. It also puts a huge reliance on business logic in the application layer to behave - which is not defensive at all. Eventually you find that you wan to use properties in a key relationsip which leads to all kinds of casting on the join which further degrades performance.
If data has a 1:1 relationship with an entity then it should be a field on the same table. If your table gets to more than 30 fields wide then consider movign them into another table but dont call it normalisation because it isnt. It is a technique to help developers group fields together at the cost of performance in an attempt to aid understanding.
I don't know if mysql has an equivalent but sqlserver 2008 has sparse columns - null values take no space.
SParse column datatypes
I'm not saying a EAV approach is always wrong, but i think using a relational database for this approach is probably not the best choice.

mysql join is very slow

I have two tables (MYISAM)
create table A (email varchar(50));
create table B( email varchar(50) key 'email' (email));
Table A has 130K records
Table B has 20K records
why does this sql statement take very long time (more than two minutes, then i aborted query by Ctrl+C)
Statement is:
select count(*) from user A, tmp B where A.email=B.email;
Thanks
I'd guess that the Query optimizer has nothing to go on. Why don't you try defining indexes on the email columns.
In general, joining on strings is more expensive than joining on shorter data types like int.
You could speed up this query by making sure both email columns are indexed.
If table A has an int ID field then table B should store that ID instead of storing the email string again. That would decrease the DB size and along with indexes would provide a much faster query speed than a string would ever give you.