I do not fully understand indexes and would like some precisions.
I have a table, named posts, which overtime might become very big.
Each post belongs to a category and a language, through 2 columns category_id and lang
If I create indexes on the columns category_id and lang, does this mean that the posts table will be "organized"/"classified" in mysql by "blocs" of category_id and lang, allowing a better performance of the selection of data when I precise a category_id and/or a lang in my query...?
Which type of index should be created then ?
I hope I'm clear enough here...
What an index does is create a "shadow" table, consisting of only the index values, so it only has to look through the index to find what you're looking for.
If you're doing a query, with a where like this:
WHERE zipcode = 555 AND phone = 12345678
You will need an index on Zipcode and Phone.
If the query is only:
WHERE zipcode = 555
You will need to index zipcode only.
Related
I am building a database for an app and I am testing performance issues on a larger data set. I generated about 250,000 location records. Each location can be assigned to many categories and a category can be assigned to many locations. My data-set has 2-4 categories assigned to each location.
I want to allow the user to search for locations by filtering which categories should be allowed using a wild card search. So maybe I want to match all categories with the word "red" in it. So if I type red, now it shows all locations which have a category title that has "red" in it. In addition, I would like to wildcard search the location title with that same string.
I wrote up a query which works but performance is awful in large data-sets. Essentially I am using inner queries which is fine if my limit is set and I find results quick (around .05ms). If I don't find any results right away, it looks like it goes through the whole database and the query takes around 9-10 seconds.
Here is a simplified layout of my database:
locations: id | title | address
categories: id | title
locations_categories: id | location_id | category_id
Here is the query I currently am using:
SELECT `id`,`title`,`address`
FROM (`locations`)
WHERE title LIKE '%string%'
AND WHERE id IN (
SELECT location_id
FROM locations_categories
JOIN categories ON categories.id = locations_categories.category_id
WHERE categories.title LIKE '%string%')
First of all, you main query just uses the value of the subquery, so it can be rewritten:
SELECT location_id
FROM locations_categories
JOIN categories ON categories.id = locations_categories.category_id
WHERE categories.title LIKE '%string%'
But I'd propose to split this query in two—JOINs are slow for big datasets. First one will get necessary category IDs (with paging):
SELECT id
FROM categories
WHERE title LIKE '%string%' LIMIT BY <start>, <step>
Then you can get locations_categories:
SELECT location_id FROM locations_categories WHERE category_id IN (...)
And you'll use the location IDs you've got to retrieve corresponding records:
SELECT * FROM locations WHERE id IN (...)
These 3 queries combined will be much faster then your original one.
Also, make sure your title column is indexed—it can be the bottleneck. But since you have a wildcard at the start of the search term, you'll have to use FULLTEXT index here.
Your explain plan will confirm (or disprove) this but I suspect that your issue is that the leading % in the clauses
WHERE categories.title LIKE '%string%'
and
WHERE title LIKE '%string%`
forces full table scans. To address this often requires some knowledge of the domain and application in question
The simple approach is to only search for 'starts with'. Others include full text searching, function based indexes, having a 'grouping table' that presorts and lists the relevant records for known searches.
I am sure this is a basic question but I am new to SQL so anyways, for my user profile I want to display this: location = "Hollywood, CA - USA" if a user lives in Hollywood. So I assume in the user table there will be 1 column like current_city which will have ID say 1232 which is a FK to the city table where city_name for this PK = Hollywood. Then connect with the state table and the country table to find the names CA and USA as the city lookup table will only store the IDs (like CA = 21 and USA = 345)
Is this the best way to design the table OR I was thinking should I add 2 columns like city_id and city_name to the user_table. And also add country_id, country_name, state_id, state_name to the city table. This way i save on trips to other parent tables just to fetch the name for the IDs.
This is only a sample use case but I have lots of lookup ID tables so I will apply the same principle to all tables once i know how to do it best. My requirement is scalability and performance so whatever works best for these is what i would like.
The first way you described is almost always better.
Having both the city_id and city_name (or any pair of that kind) in the users table is not best practice since it may cause data discrepancies - a wrong update may result in a city_id that does not match the city_name and then the system behavior becomes unexpected.
As said, your first suggestion would be the common and usually the best way to do this. If table keys are designed properly so all select statements can use them efficiently this would also give the best performance.
For example, having just the city_name in the users table would make it a little quicker to find and show the city for one user, but when trying to run other queries - like finding all users in city X, that would make much less sense.
You can find a nice series of articles for beginners about DB normalization here:
http://databases.about.com/od/specificproducts/a/2nf.htm. This article has an example which is very much like what you are trying to achieve, and the related articles will probably help you design many other tables in your DB.
Good luck!
Ok, I have a database with with a table for storing classified posts, each post belongs to a different city. For the purpose of this example we will call this table posts. This table has columns:
id (INT, +AI),
cityid (TEXT),
postcat (TEXT),
user (TEXT),
datatime (DATETIME),
title (TEXT),
desc (TEXT),
location (TEXT)
an example of that data would be:
'12039',
'fayetteville-nc',
'user#gmail.com',
'December 28th, 2010 - 11:55 PM',
'post title',
'post description',
'spring lake'
id is auto incremented, cityid is in text format (this is where I think i will be losing performance once the database is large)...
Originally I planned on having a different table for each city and now since a user has to have the option of searching and posting through multiple cities, I think I need them all in one table. Everything was perfect when I had one city per table, where I could:
SELECT *
FROM `posts`
WHERE MATCH (`title`, `desc`, `location`)
AGAINST ('searchtext' IN BOOLEAN MODE)
AND `postcat` LIKE 'searchcatagory'
But then I ran into problems when trying to search multiple cities at one time, or listing all of a users posts for them to delete or edit.
So looks like I have to have one table with all the posts, and also match another FULLTEXT field: cityid. I am guessing I need full-text because if a user chooses an entire state, and my cityid is "fayetteville-nc" I would need to match cityid against "-nc" this is only an assumption and I would love another way. This database could easily reach over a million rows within 6 months, and a fulltext search against 4 columns is probably going to be slow.
My question is, is there a better way to do this more efficiently? The database has nothing in it now, except for some test posts made by me. So I can completely redesign the table structure if necessary. I am open to any and all suggestions, even if it is just a more efficient way to perform my query.
Yes, one table for all posts sounds sensible. It would also be normal design for the posts table to have a city_id, referring to the id in a city table. Each city would also have a state_id, referring to the id in a state table, and similarly each state would have a country_id referring to the id in a country table. So you could write:
SELECT $columns
FROM posts JOIN city ON city.id = posts.city_id
WHERE city.tag = 'fayetteville-nc'
Once you've brought the cities into a separate table, it might make more sense for you to do the city-to-city_id resolving up front. This fairly naturally happens if you have a city chose from a dropdown, for instance. But if you're entering free text into a search field, you may want to do it differently.
You can also search for all posts in a given state (or set of states) as:
SELECT $columns
FROM posts
JOIN city ON city.id = posts.city_id
JOIN state ON state.id = city.state_id
WHERE state.tag = 'NC'
If you're going to go more fancy or international, you may need a more flexible way of arranging locations into a hierarchy (e.g. you may want city districts, counties, multinational regions, intranational regions (Midwest, East Coast etc)) but stay easy for now :)
Is there any way how to create an functioning index for this query and to get rid of "filesort"?
SELECT id, title FROM recipes use index (topcat) where
(topcat='$cid' or topcat2='$cid' or topcat3='$cid')
and approved='1' ORDER BY id DESC limit 0,10;
I created index "topcat" ( columns: topcat1+topcat2+topcat3+approved+id) but still ge "Using where; Using filesort".
I can create one more column, lets say, "all_topcats" to store topcat numbers in an array - 1,5,7 and then to run query "... where $cid iIN ()...". But the probem is that in this case "all_topcats" column will be "varchar" but "approved" and "id" columns - int, and index will not be used anyway.
Any ideas? Thanks.
You might improve performance for that query if you reordered the columns in the index:
approved, topcat1, topcat2, topcat3, id
It would be useful to know what the table looks like and why you have three columns named like that. It might be easier to organise a good query if you had a subsidiary table to store the topcat values, with a link back to the main table, but without knowing why you have it set up like that it's hard to know whether that would be sensible.
Can you post the CREATE TABLE?
Edit in response to user message
Your table doesn't sound like it's well-designed. The following design would be better: Add two new tables, Category and Category_Recipe (a cross-referencing table). Category will contain a list of your categories and Category_Recipe will contain two columns, one a foreign key to Category and one a foreign key to the existing Recipe table. A row of Category_Recipe is a statement "this recipe is in this category". You will then be able to very simply write a query that will search for recipes in a given category. You also have the ability to put a recipe in arbitrarily many categories, rather than being limited to 3. Look up "database normalisation" and "foreign keys".
I need help for this problem.
In MYSQL Table i have a field :
Field : artist_list
Values : 1,5,3,401
I need to find all records for artist uid 401
I do this
SELECT uid FROM tbl WHERE artist_list IN ('401');
I have all record where artist_list fields values are '401' only, but if i have 11,401 this query do not match.
Any idea ?
(I cant user LIKE method because if artist uid is 3 (match for 30, 33, 3333)...
Short Term Solution
Use the FIND_IN_SET function:
SELECT uid
FROM tbl
WHERE FIND_IN_SET('401', artist_list) > 0
Long Term Solution
Normalize your data - this appears to be a many-to-many relationship already involving two tables. The comma separated list needs to be turned into a table of it's own:
ARTIST_LIST
artist_id (primary key, foreign key to ARTIST)
uid (primary key, foreign key to TBL)
Your database organization is a problem; you need to normalize it. Rather than having one row with a comma-separated list of values, you should do one value per row:
uid artist
1 401
1 11
1 5
2 5
2 4
2 2
Then you can query:
SELECT uid
FROM table
WHERE artist = 401
You should also look into database normalization because what you have is just going to cause more and more problems in the future.
SELECT uid
FROM tbl
WHERE CONCAT(',', artist_list, ',') LIKE '%,401,%'
Although it would make more sense to normalise your data properly in the first place. Then your query would become trivial and have much better performance.