How slow is the LIKE query on MySQL? (Custom fields related) - mysql

Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.

Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.

Related

Cache, Database, Over 400k Listing

In my MySQL database I have a table of products which contains almost 625k rows. The table has 162 columns.
Now there is a search box on my home page where you can search for anything and, if your search term is matched from any of my product titles, it give you a list of 15 products. This is similar to Amazon and other e-commerce websites.
What I did so far was to create a JSON file with all the product ID's and title names. When user inputs a minimum of 3 chars into the search field, an AJAX request is made and gets the list. But my issue is that the JSON file is almost 12MB in size, and the ajax calls it whenever user write's a char or removes a char. It was working fine until I was on local Machine and now as soon as I made it live it doesn't work for users, having lower then 5 MBPS internet connection. So I am looking for some advice, how do I create it fast as Amazon. I mean the search with auto suggestion from 625K products.
I am really sorry, but there is nothing more to give as an advice here then "go do some reading on database design and schema normalization".
If you have 162 columns in a table you will never be able to do an efficient search. The database (especially MySQL) will not hold the table in memory and indexes will not help either. Yes, you can throw it all into an ElasticSearch instance and it will fix some of your problems. But, honestly, this solution does not clean up the mess you have.
You should have a table with relevant information (titles, names, etc.) in one column (or also a numeric column for prices, etc). This metadata should reference the main table, the column should be fulltext-indexed. This way you ask for matches, filter results and JOIN relevant lines from the main table. This will work quickly with very little resources used.

Asking opinion about table structure

I'm working on a project to make a digital form of this paper
this paper (can't post image)
and the data will displayed on a Web in a simple table view. There will be NO altering, deleting, updating. It's just displaying (via SELECT * of course) the data inputted.
The data will be inserted via android app and stored in a single table which has 30 columns in mysql.
and the question is, is it a good idea if i use a single table? because i think there will be no complex operation in the sql.
and the other question is, am i violating some rules for this method?
I need your opinion. thanks.
It's totally ok to use only one table, if that suits your needs. What you can do to make the database a little bit 'smarter' is add new tables for attributes in your paper that will be repeated. So, for example, the Soil Type could be another table where there are two columns, ID and Description, and you will use it as a foreign key in each record in the main table. You need this if you want your database to be in 3NF.
To sum up, yes you can have one table if that's all you need. However, adding more tables might help save some space and make your database more flexible. It's up to you to decide! :)

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

More rows or more tables in a db design?

I'm pretty sure I already know the answer, but would like some confirmation...
We received 220 text files of providers. Each file is a different category of provider. In total there are 3.2 million records.
My inclination is to create a category table and a provider table that links to category by an ID, then index any other columns that may be searched on like state, or even last name.
The other option is to have one table per category, but I think other than the smaller row size there are a lot of disadvantages to this approach.
It's a PHP/MySQL implementation.
Anyone think the separate table option is better for any reason?
Thanks,
D.
Go with two table approach -- categories and providers.
This will enable you to
easily adding new categories
easily reverse search Categories based on a column such as state of provider.
It make sense from data-structure point of view as well. One type of data in one table.
I agree with your original thought, and with Nishant's answer. In addition to his points, it also normalizes the data, and allows easy updates if a category changes names for some reason.

MySQL - Best method of saving and loading items

So on my older work, I had always used the 'text' data type to store items, like so:
0=4151:54;1=995:5000;2=521:1;
So basically: slot=item:amount;
I've been looking into finding the best ways of storing information in a sql database, and everywhere i go, it says that using text is a big performance hit.
I was thinking of doing something else, like having a table with the following columns:
id, owner_id, slot_id, item_id, amount
Where as now i can just insert a row for each item a character allocates. But i have no clue how to save them, since the slot's item can change, etc. A character has 28 inventory slots, and 500 bank slots, should i insert them all at registration? or is there a smarter way to save the items
Yes use that structure. Using text to store relational data defeats the purpose of a relational database.
I don't see what you mean by insert them all at registration. Can you not insert them as you need to?
Edit
Based on your previous comment I would recommend only inserting a slot as it is needed (if I understand your problem). It may be an idea to keep the ID of the slot in the application, if need be.
If I understand you correctly, and that the slot's item can change, then you want to further abstract the mapping between item_id and the item:
entry_tbl.item_id->item_rel_realitems_tbl.real_id->items_tbl
This way, all entries with an itemid point to a table that maps those ids to a mutable item. When you UPDATE an item in 'items_tbl' then the mapping automatically updates the entry_tbl.
Another JOIN is needed however. I would also use stored procedures in any case to abstract the mechanism from semantics.
I am not sure I understand the wording of your question however.