In my MySQL database I have a table of products which contains almost 625k rows. The table has 162 columns.
Now there is a search box on my home page where you can search for anything and, if your search term is matched from any of my product titles, it give you a list of 15 products. This is similar to Amazon and other e-commerce websites.
What I did so far was to create a JSON file with all the product ID's and title names. When user inputs a minimum of 3 chars into the search field, an AJAX request is made and gets the list. But my issue is that the JSON file is almost 12MB in size, and the ajax calls it whenever user write's a char or removes a char. It was working fine until I was on local Machine and now as soon as I made it live it doesn't work for users, having lower then 5 MBPS internet connection. So I am looking for some advice, how do I create it fast as Amazon. I mean the search with auto suggestion from 625K products.
I am really sorry, but there is nothing more to give as an advice here then "go do some reading on database design and schema normalization".
If you have 162 columns in a table you will never be able to do an efficient search. The database (especially MySQL) will not hold the table in memory and indexes will not help either. Yes, you can throw it all into an ElasticSearch instance and it will fix some of your problems. But, honestly, this solution does not clean up the mess you have.
You should have a table with relevant information (titles, names, etc.) in one column (or also a numeric column for prices, etc). This metadata should reference the main table, the column should be fulltext-indexed. This way you ask for matches, filter results and JOIN relevant lines from the main table. This will work quickly with very little resources used.
Related
For example, here's the problem I faced... I have three tables. Products, Districtrates, Deliverycharges. In my app, a product's delivery will be calculated through a pre-defined rate defined in the Districtrates table. If we want, we can also add a custom rate overriding the pre-defined rate. Each product can have all 25 districts or only some. So here's my solution :
Create three tables as I mentioned above. Districtrates table will only have 25 records for all the 25 districts in my country. For each product, I will add 25 records to the Deliverycharges table with the productID, deliveryrateID and a custom rate value if available. Some products might have less than 25 districts (Only the ones available for that product).
I can even store this in a simple hash in one cell in the products table. Like this : {district1: nil, district2: 234, district4: 543} (It's in Ruby syntax). In here, if the value is nil, we can take the default value from the deliveryrate table. Here also the hash will have all 25 districts! But the above method (creating a table) is easy to work with. The only problem is, it will add nearly 25 records per each product.
So my question is, is this a good thing? This is only one scenario... There are more where we can use one simple array or hash in a cell rather than creating a table. Creating a table is easy to maintain. But is it the right way?
One of the main points of using a relational database is the ability to query (and update) the data in it using SQL.
That only works if you put the data in a form that the database actually understands. Traditionally, this means defining a table schema.
There are now extensions to let the database work with "semi-structured" data (such as XML/JSON/JSONB), but you should only need to go there when the data really does not fit into the relational model, otherwise you are giving up on a lot of features/performance.
If you put a Ruby string into a text column, you will not have any way to use it from SQL. So no proper searching, indexing, or efficient updates of these delivery rates.
I’m currently developing an Application for Win, Linux Mac. The Purpose of the Application is that multiple users are able create Projects based on a single Article. Every Article has up to 15 different Fields/Options (could also be more in future). The Fields of the Article should be changeable so I should be able to add, edit or remove them.
Fields I want to store:
Numbers
Texts (mostly options [1 Word], sometimes Comments [some sentences])
Path/Links to Files
What I want to do with the dB:
load all projects of a user at login
add, edit, remove, delete single projects
set a lock on projects (because multiple people are operating one user-account at the same time and therefore they may not be allowed to edit a project at the same time so if one starts editing it should be locked until he's saving, channelling or time-out)
What is the best way to manage this kind of Data?
Should I create a Table for each user and only make a ID Column and one where all the Values of the all the fields (who are merged to one big string)?
Should I create Tables for every Project and make Columns for every Field/Option and also one for the user / owner?
Or are there any other possibility’s?
If you don't know what you are going to store, then I doubt whether a relational database is the best option for you. Maybe a document store/noSQL database is a better decision, because you can just store documents (usually in the form of Json objects) that can have all kinds of additional fields.
A couple of such databases to look at are MongoDB, Cassandra, ElasticSearch, but you can find a big list on Wikipedia.
Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.
I'm pretty sure I already know the answer, but would like some confirmation...
We received 220 text files of providers. Each file is a different category of provider. In total there are 3.2 million records.
My inclination is to create a category table and a provider table that links to category by an ID, then index any other columns that may be searched on like state, or even last name.
The other option is to have one table per category, but I think other than the smaller row size there are a lot of disadvantages to this approach.
It's a PHP/MySQL implementation.
Anyone think the separate table option is better for any reason?
Thanks,
D.
Go with two table approach -- categories and providers.
This will enable you to
easily adding new categories
easily reverse search Categories based on a column such as state of provider.
It make sense from data-structure point of view as well. One type of data in one table.
I agree with your original thought, and with Nishant's answer. In addition to his points, it also normalizes the data, and allows easy updates if a category changes names for some reason.
I'm planning a database who has a couple of tables who contain plenty of address information, city, zip code, email address, phone #, fax #, and so on (about 11 columns worth of it), a table is an organizations table containing (up to) 2 addresses (legal contacts and contacts they should actually be used), plus every user has the same information tied to him.
We are going to have to run some geolocation stuff on those addresses too (like every address that's within X Kilometers from another address).
I have a bunch of options, each with its own problem:
I could put all the information inside every table but that would make for tables with a very large amount of columns which I'd have problems indexing, and if I change my address format it'll take a while to fix it.
I could put all the information inside an array and serialize it, then store the serialized information in one field, same problem with the previous method with a little less columns and much less availability through mysql queries
I could create a separate table with address information and link it to the other tables either by
putting an address_id column in the users and organizations table
putting a related_id and related_table columns in the addresses table
That should keep stuff tidier, but it might create some unforeseen problems with excessive joining or whatever.
Personally I think that solution 3.2 is the best, but I'm not too confident about it, so I'm asking for opinions.
Option 2 is definitely out as it would put the filtering logic into your codes instead of letting the DBMS handle them.
Option 1 or 3 will depend on your need.
if you need fast access to all the data, and you usually access both addresses along with the organization information, then you might consider option 1. But this will make it difficult to query out (i.e. slow) if the table get too big in mysql.
option 3 is good provided you index the tables correctly.