Better way to organize lots of columns and data? - mysql

I'm creating a real-estate website and i was wondering if there was a better way of organizing my columns or tables, not sure what would be the best way to go about it, i currently have a lot of columns and im worried about performance issues.
The columns are as follows
5 for things like property id, add date, duration, owner/user id.
35 columns for things like title, description, price, energy rating, location, etc.
40 columns for features like swimming-pool, central heating, river front, garage, well, etc.
15 for image locations which are stored on server
15 for the image descriptions
Is 110+ columns bad practice in MySQL? Everything is lightning fast but i'm in localhost at the mo, wont the monstrous size of the tables slow queries? Especially if I have a couple hundred properties?
Am i ok with my current setup? What would best practice be? How do e-commerce websites that have many feature options go about this?

It is not a good practice since the data can be stored in separate tables. What would help you most would be to create an ERD to visualize how you can organize your tables. Even if you do not understand the ins and outs of ERDs, you can still use it to at least organize your thoughts.
It seems that you already have your tables separated based on the bullet points that you made within your question. One thing that I would add to your bullets is maybe breaking down your features into categories and creating a table for each.
For example, swimming pools and riverfront can be placed in a table called
LandscapeFeatures or OutdoorFeatures.

Most likely, the property features would be better stored in a separate table, with one row per property feature, rather than as columns in your main table. I understand this as a many-to-many relationship between a propery and its features, so this suggest two more tables:
properties (property_id (pk), date_added, title, description)
features (feature_id (pk), description)
property_features (property_id (fk), feature_id (fk))
Such structure is much more flexible and easier to query than having one column per feature. As examples:
easy to add features by creating new rows in the features table (while in the old structure you had to create a new column)
easy to aggregate the features, and answer a question like: count how many features each property has
As for images, they should have their ow table too. If an image maby belong to several user, then it's a many-to-many relationship, and you can follow the above pattern. If each image belongs to a single user, one more table is enough:
properties (property_id (pk), date_added, title, description)
features (feature_id (pk), description)
property_features (property_id (fk), feature_id (fk))
images (image_id, location, description, property_id (fk))

One table:
Columns for the dozen or so values that you are most likely to search on.
Devise several composite indexes that involve those columns, starting with the more commonly searched columns.
Devise a TEXT column and put "words" in it for a FULLTEXT index. If this is home sales, consider words like "swimming pool septic tank gazebo Eichler". This will help with certain "boolean" type queries. (If you like this idea, let's discuss how to make use of filtering with indexes and/or fulltext; it gets tricky.)
Put the rest into a JSON (or TEXT column). Do not plan on searching it; instead bring the row(s) into your app code for further filtering after searching by the actual INDEXes

Related

Relationship database design - object specific many to many, do I solve with self join table or new table

Being new to relational database design, I am trying to clarify one piece of information to properly design this database. Although I am using Filemaker as the platform, I believe this is a universal question.
Using the logic of ideally having all one to many relationships, and using separate tables or join tables to solve these.
I have a database with multiple products, made by multiple brands, in multiple product categories. I also want this to be as scale-able as possible when it comes to reporting, being able to slice and dice the data in as many ways as possible since the needs of the users are constantly changing.
So when I ask the question "Does each Brand have multiple products" I get a yes, and "Does each product have multiple brands" the answer is no. So this is a one to many relationship, but it also seems that a self-join table might give me everything that I need.
This methodology also seems to go down a rabbit hole for other "product related" information such as product category, each product is tied to one product category, but only one product category is related to a product.
So I see 2 possibilities, make three tables and join them with primary and foreign keys, one for Brand, one for Product Category, and one for Products.
Or the second possibility is to create one table that has the brand and product category and product info all in one table (since they are all product related) and simply do self-joins and other query based tables to give me the future reporting requirements that will be changing over time.
I am looking for input from experiences that might point me in the right direction.
Thanks in advance!
Could you ever want to store additional information about a brand (company URL, phone number, etc.) or about a product category (description, etc.)?
If the answer is yes, you definitely want to use three tables. If you don't, you'll be repeating all that information for every single item that belongs to the same brand or same category.
If the answer is no, there is still an advantage to using three tables - it will prevent typos or other spelling inconsistencies from getting into your database. For example, it would prevent you from writing a brand as "Coca Cola" for some items and as "Coca-Cola" for other items. These inconsistencies get harder and harder to find and correct as your database grows. By having each brand only listed once in it's own table, it will always be written the same way.
The disadvantage of multiple tables is the SQL for your queries is more complicated. There's definitely a tradeoff, but when in doubt, normalize into multiple tables. You'll learn when it's better to de-normalize with more experience.
I am not sure where do you see a room for a self-join here. It seems to me you are saying: I have a table of products; each product has one brand and one (?) category. If that's the case then you need either three tables:
Brands -< Products >- Categories
or - in Filemaker only - you can replace either or both the Brands and the Categories tables with a value list (assuming you won't be renaming brands/categories and at the expense of some reporting capabilities). So really it depends on what type of information you want to get out in the end.
If you truly want your solution to be scalable you need to parse and partition your data now. Otherwise you will be faced with the re-structuring of the solution down the road when the solution grows in size. You will also be faced with parsing and relocating the data to new tables. Since you've also included the SQL and MySQL tags if you plan on connecting Filemaker to an external data source then you will definitely need to up your game structurally.
Building everything in one table is essentially using Filemaker to do Excel work and it won't cut it if you are connecting to SQL, MySQL, etc.
Self join tables are a great tool. However, they should really only be used for calculating small data points and should not be used as pivot points or foundations for your reporting features. It can grow out of control as time goes on and you need to keep your backend clean.
Use summary and sub-summary reporting features to slice product based data.
For retail and general product management solutions, whether it's Filemaker/SQL/or whatever the "Brand" or "Vendor" is it's own table. Then you would have a "Products" table (the match key being the "Brand ID").
The "Product Category" field should be a field in the "Products" table. You can manage the category values by building a standard value list or building a value list based on a "Product Category" table. The second scenario is better for long term administration.

MySQL multiple column relationships between 2 tables

I have this problem in a table where there are 4 columns which include terms describing the product. I want to make this terms editable (and you can add more) in my app and there are 4 groups of them obviously. I created a table who has all these terms altogether but the product table will have to create 4 relationships with the ID of the terms table.
Is this a good solution?
The main reason I don't want to make 4 different tables for the terms is because there aren't many of them and as the app progresses we might have even more different term groups, thus adding many small tables cluttering the database.
Any suggestion?
Update #1: Here is my current schema http://i.imgur.com/q2a1ldk.png
Have a product table and a terms (product_id, terms_name, terms_description) which will allow you to add as many or as little terms for each product as you want. You just need to retrieve all terms from the terms table with a particular product id.
You could try a mapping table:
apputamenti(id, ...)
term_map (apputamenti_id, term_id)
terms (id, text, type)
So you can add as many terms as you want.
Or if you want to specify the mapping with one more field, change:
term_map (apputamenti_id, term_id, map_type)
so you can use an enum for map_type like enum(tipologia, feedback, target) or whatever your original fields where

Many highly similar objects in the same database table

Hello, stackoverflow community!
I am working on a rather large database-driven web application. The underlying database is growing in complexity as more components are being added, but so far I've had absolutely no trouble normalizing the data quite nicely.
However, this final component implies a table that can hold products.
Each product has a category, and depending on the category, has different fields.
Making a table for each product category doesn't seem right, as there are currently five types, and they still have quite a lot of fields in common. (but in weird ways - a few general fields such as description and price are common to all 5 categories, but some attributes are shared between 1 and 2, others 3,4,5 and so on).
I'm trying to steer away from the EAV model for obvious performance reasons.
The thing is that according to what product type the user wants to enter into the database there is a somewhat (but not completely) different field structure - all of them have a name and general description, but other attributes such as "area covered" can be applied only to certain categories such as seeds and pesticides, but not fuel, which would have a diesel/gasoline boolean and a bunch of other fuel-related attributes.
Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.
My current idea would be to have the product table contain all the fields from all the possible categories, and then just have another table to describe which category from the product table has which fields.
product: id | type | name | description | price | composition | area covered | etc.
fields: id | name (contains a list of the fields in the above table)
product-fields: id | product_type | field_id (links a bunch of fields to the product table based on the product type)
I reckon this wouldn't be too slow, easy to search (no need to actually join the other tables, just perform the search on the main product table based on some inputs) and it would facilitate things like form generation and data validation with just one lightweight additional query /join. (fetch a product from the db and join a concatenated list of the fields actually used in a string - split that and display the proper form fields based on what it contains, i.e. the fields actually associated with that product.
Thanks for your trouble!
Andrei Bârsan
EAV can actually be quite good at storing data and fetching that databack again when you know the key. It also excels in it's ability to add fields without changing the schema. But where it's quite poor is when you need the equivilent of WHERE field1 = x and field2 = y.
So while I agree the data behaviour is important (how many products share the same fields, etc), the use of that data is also important.
Which fields need searching, which fields are always just data storage, etc
In most cases I'd suggest keeping all fields that need searching, in combination with each other, in the same table.
In practice this often leads to a single table solution.
New fields require schema changes, new indexes, etc
Potential for sparsely populated data, using more space than is 'required'
Allows simple queries, simple indexing and often the fastest queries
Often, though not always, the space overhead is marginal
Where the sparse-data overheads reach a critical point, I would then head towards additional tables grouped by what fields they contain. More specifically, I would not create tables by product. This is on the dual assumption that most/all fields will be shared across at least some products, and that those fields will need searching.
This gives a schema more like...
Main_table ( PK, Product_Type, Field1, Field2, Field3 )
Geo_table ( PK, county, longitute, latitude )
Value ( PK, cost, sale_price, tax )
etc
You may also have a meta-data table describing which product types have which fields, etc.
What this schema allows is a more densly populated set of tables, which can be easily indexed and so quickly searched, while minimising table clutter and joins by grouping related fields.
In the end, there isn't a true answer, it's all a balancing act. My general rule of thumb is to stay with a single table until I actually have a real and pressing reason not to, not just a theoretical one.
In my experience unless you are writing a a complete framework that can render fully described fields (we are talking about a lot of metadata describing each field) it is not worth separating field definitions from the main object. Modern frameworks (like Grails) allow for virtual zero pain adding a new column to a domain/Model class and table.
If your common field overlap is about 80% between all object types I would put them all in 1 table and use Table per Hierarchy inheritance model, where a descriminator field helps you tell your object types apart. On the other hand if you have 20% overlap of common fields then go with Table per Class inheritance model with base class and table containing common fields. And other joint tables hang off the base.
Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.
This is called a SuperType - SubType relationship. It works very well if most of your queries are one of two types:
If you will be querying mostly the SupetType table and only drilling down into the SubType table infrequently.
If you will be querying the database after being filtered to a specific SubType.

creating user profiles, each with personal mysql data, using php

I'm trying to figure out the best practices for storing user data on a php/mysql site.
let's say the website will host a service of saving people's input for items they have in their house.
I have set up tables that includes: kitchen, bathroom, bedroom, etc.
Sally adds her 6 kitchen items.
John adds his 3 kitchen items.
etc.
I'm just wondering what may be the common practice on storing other user information in the mysql database. I've taken a class on databases, so i'm thinking relationally linking by foreign key, john with his items in the lists, and sally too..
does that sound about right? or is there a better way? I can see the list getting really large quite quickly.
would it be possible to set up a different table to each user? is that possible? or would it be silly?
I would not set up a table for each user.
Definitely go relational. I am not sure I follow you completely around "john with his items.." and so on. So I interpret this as
user table
room table
item table
relational user->item (id, user_id, item_id, room_id) OR:
relational item->room
So you can pull a user, list the rooms they have related to them, then list the items in that room. Additionally, like this you do not need a new item entry for common things like tables, stoves, spatulas, etc.
Your list could get large, but if you scale properly and plan a back end based update migration when you absolutely need to (like millions of users) then you should be fine. Consider how many relations sites like facebook and ebay have to maintain. Large relations are normal for databases so I wouldn't let a couple million rows scare you.
I would use three tables:
rooms (id, room), to store values kitchen, bathroom, bedroom, etc.
users
items: assuming you have a common structure your your current kitchen, bathroom, bedroom tables, one table could replace all of them. This table should also contain two foreign keys, user_id and room_id.
With that structure, you can easily retrieve and filter your data.

Is it good practice to consolidate small static tables in a database?

I am developing a database to store test data. Each piece of data has 11 tags of metadata. Currently I have a separate table for each of the metadata options. I have seen a few questions on here regarding best practices for numerous small tables, but I thought I'd pose the question for my own project because I didn't get a clear answer from the other questions asked.
Here is my table list, with the fields in each table:
Source Type - id, name, description
For Flight - id, name, description
Site - id, name, abrv, description
Stand - id, site (FK site table), name, abrv, descrition
Sensor Type - id, name, channels, descrition
Vehicle - id, name, abrv, descrition
Zone - id, vehicle (FK vehicle table), name, abrv, description
Event Type - id, name, description
Event - id, event type (FK to event type Table), name, descrition
Analysis - id, name, descrition
Bandwidth - id, name, descrition
You can see the fields are more or less the same in each of these tables. There are three tables that reference another table.
Would it be better to have just one large table called something like Meta with the following fields:
Meta: id, metavalue, name, abrv, FK, value, descrition
where metavalue = one of the above table names
and FK = a reference to another row in the Meta table in place of a foreign key?
I am new to databases and multiple tables seems most intuitive, but one table makes the programming easier.
So questions are:
Is it good practice to reduce the number of tables and put all static values in one table.
Is it bad to have a self referencing table.
FYI I am making this web database using django and mysql on a windows server with NTFS formatting.
Tips and best practices appreciate.
thanks.
"Would it be better to have just one large table" - emphatically and categorically, NO!
This anti-pattern is sometimes referred to as 'The one table to rule them all"!
Ten Common Database Design Mistakes: One table to hold all domain values.
Using the data in a query is much easier
Data can be validated using foreign key constraints very naturally,
something not feasible for the other
solution unless you implement ranges
of keys for every table – a terrible
mess to maintain.
If it turns out that you need to keep more information about a
ShipViaCarrier than just the code,
'UPS', and description, 'United Parcel
Service', then it is as simple as
adding a column or two. You could even
expand the table to be a full blown
representation of the businesses that
are carriers for the item.
All of the smaller domain tables will fit on a single page of disk.
This ensures a single read (and likely
a single page in cache). If the other
case, you might have your domain table
spread across many pages, unless you
cluster on the referring table name,
which then could cause it to be more
costly to use a non-clustered index if
you have many values.
You can still have one editor for all rows, as most domain tables will
likely have the same base
structure/usage. And while you would
lose the ability to query all domain
values in one query easily, why would
you want to? (A union query could
easily be created of the tables easily
if needed, but this would seem an
unlikely need.)
Most of these look like they won't do anything but expand codes into descriptions. Do you even need the tables? Just define a bunch of constants, or codes, and then have a dictionary of long descriptions for the codes.
The field in the referring table just stores the code. eg: "SRC_FOO", "EVT_BANG" etc.
This is also often known as the One True Lookup Table (OTLT) - see my old blog entry OTLT and EAV: the two big design mistakes all beginners make.