I have 30 tables, each representing a different neighborhood.
Each table holds real estate listing with a "Price", "Number Of Rooms", "Square Feet" etc columns.
The end user would be able to choose as many neighborhoods as he likes with the
option to screen out results such as "At least 5 Rooms", "Below 250k" etc and
sort the results by "Lowest Price", "Time Submitted", you get the point.
Now I'm a programmer not a DBMS guy. I've search the web but feel that trying to build the query one step at a time would be the wrong approach without some guidance on what to avoid.
I would love to hear and learn from the StackOverflow community on best approaches with this one. Please help me sort this up.
EDIT: i'm currently using MyISAM
You should not have 30 tables. Normalize your schema:
NEIGHBORHOOD
ID, Name
PROPERTY
ID, NeighborhoodID, Name, Price,
Rooms, SquareFeet
Then you can join these together:
SELECT n.Name AS Neighborhood, p.Name AS Property, Price, Rooms, SquareFeet
FROM Property AS p
INNER JOIN Neighborhood AS n ON h.NeighborhoodID = p.ID
WHERE p.NeighborhoodID = X
Then you may need indexes on the tables as the data grows.
You should start modifying your database model. Creating 30 tables for storing the same data (real state information) is not adequate. Try to put all the data in a single table adding a column that indicates the neighborhood. This neighborhood could point to another table with the name, description, ... of the neighborhood. Then you can query a single table to search across all neighborhoods and optionally filtrate the neighborhood the user want to search for.
The best way is to change your db model, get rid of 30 tables, and put everything in one table. With your current model, I don't see any other ways but create a huge union (you can put it into a view, and query this view).
Related
I help with an SQL (using phpmyadmin) to join these tables and create a CLUB MEMBERSHIP list for a particular club, however I need to indicate whether the member is a club president, vice president,etc or just an ordinary member:
CLUBS: CLUBID,PRESIDENTID(memberID),VICEPRESIDENTID(memberID),TREASURER(memberID),
SECRETARY(MemberID)
MEMBERS_CLUB:
MEMBERID,CLUBID
MEMBERS:
MEMBERID, NAME,ADDRESS
There are probably half a dozen ways to solve this, and you certainly could come up with a way to make this table structure work for you, but it's probably not going to be pretty. Part of making this work well is determining what information you need to get out as well as store in the database. In this structure, it's very hard to get the member's officer status from their name, so we can improve that by changing your structure. On the other hand, if all you ever needed was a list of officers for each club, your current structure would be okay.
You could add a "member status" field to MEMBERS_CLUB (and remove the four corresponding columns from CLUBS). Each member gets a row for each club or position they hold.
SELECT `MEMBERS`.`NAME`, `MEMBERS_CLUB`.`STATUS`, `CLUBS`.`CLUBID`
FROM `MEMBERS`, `MEMBERS_CLUB`, `CLUBS`
WHERE `MEMBERS`.`MEMBERID` = `MEMBERS_CLUB`.`MEMBERID`
which is close but gives us duplicates if there are duplicates in the table, for instance if you have two entries for Bob, one listing him as president and one listing him as secretary. By using GROUP_CONCAT() we can accomplish exactly what you are looking for while dealing properly with duplicated names:
SELECT `MEMBERS`.`NAME`, GROUP_CONCAT(`MEMBERS_CLUB`.`STATUS`), `CLUBS`.`CLUBID`
FROM `MEMBERS`, `MEMBERS_CLUB`, `CLUBS`
WHERE `MEMBERS`.`MEMBERID` = `MEMBERS_CLUB`.`MEMBERID`
GROUP BY `NAME`
I want to summarize rows from one end of a relationship tree with a table on the other side. Is "correlate" the correct term? Really just knowing the terms would help me solve this problem.
I am using MySQL and am extending an existing DB structure - though would have the liberty to rearrange data if needed. I'm getting better at creating "filtering" queries using JOINs, and I'm sure this next piece will be straight-forward once I understand it (without performing tons of queries : )
I made a simplified schema (and theme!) for this example, but the idea is the same.
Say there are many DietPlans, which is related to a bunch of MenuItems and each MenuItem has an ItemType (such as 'Healthy','Fast','Normal', etc.) On the other side of DietPlan there are Persons, who each store how many DailyCalories they consume, and another table MenuAllocations, where a Person stores how much percent of their daily intake is from what MenuItem.
As examples of scale, There could be 1000 MenuItems, and 50 of those associated with each of 200 DietPlans. Also, each DietPlan might have 10,000 Persons, who each will have 5-10 MenuAllocations of various types.
What I'd like to do feels complex to me. I want to create a dashboard for each DietPlan (there could be many), gathering data from the Persons of that DietPlan, and tabulating the number of calories for each item type.
The math is simple: tblPerson.dailyCalories * tblMenuAllocations.percent. But I want to do that for each Person in the DietPlan, for each ItemType.
I understand the JOINs required to 'filter' from tblItemType around to tblMenuAllocation and think it would be similar to this:
SELECT *
FROM tblMenuAllocation
INNER JOIN tblPerson
on personId = PersonId
INNER JOIN tblDietPlan
on tblPerson.dietPlanId = tblDietPlan.DietPlanId
INNER JOIN tblMenuItem
on tblMenuItem.dietPlanId = tblDietPlan.DietPlanId
INNER JOIN tblItemTyp
on ItemTypeId = itemTypeId
WHERE ItemTypeId = 2
It feels like one query for each tblItemType, which could be a LOT of Person and MenuAllocation data to sort through, and doing that many consecutive queries feels like I'm missing something. Also, I think math can be handled in the query to sum values, but I've never done that. Where can I begin?
EDIT: The final results would be something like this:
----------------------------------------------
ItemId | ItemDesc | TotalCalories
----------------------------------------------
1 Healthy 450,876
2 Fast 1,987,948
3 Vegan 349,123
etc.
I would be willing to accept some manipulation of data outside the query, but the Person's specific dailyCalories is very important to the tblMenuAllocation.percent calculation. Some tblMenuAllocation rows might be of the same ItemType!
I think you are looking for these topics :
Aggregate Functions and
Group By Modifiers
On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.
I'm working on a web application where I need to do some research before I implement the database. I hope you can help me make some good decisions before I start to code.
Today i have a database that among other things contains about two million contacts in a table
Contact:
cid, name, phone, address, etc...
Users of the application can search the contact table based on different criteria, and get a list of contacts.
Users are stored i a separate database table
User: uid, name, email, etc...
Now I want to make the users able to store a search result as a selection. The selection has to be a list of cid's representing every contact in the search result the user got. When the selection is stored, a user can open the selection and add notes, statuses etc to the different contacts in the selection.
My first thought is to make a selection table and a selection-contact mapping table like this:
Selection: sid, name, description, uid, etc
SelectionContactMap: sid, cid, status, note, etc...
With an average selection size between 1 000 and 100 000 contacts, and several thousand users storing many selections, I see that the SelectionContactMap table is going to grow very big very fast.
The database is MySql and the application is written in PHP. I'm on a limited budget so I can not throw unlimited hardware on the task.
I'm I on the wrong way here?
Do you have any suggestions to solve this the best possible way?
Other database?
MySql specific suggestions, table type etc?
Other database design?
Any comments and suggestions are appreciated.
Thanks in advance :)
-- Tor Inge
Question: What happens if the results of the query change - eg: a selected contact no longer has the chosen attribute or a new contact gets added?
If the answer is "The result set should be updated" - then you want to store the criteria in the database, not the results themselves.
If you need to cache the results for a period of time, this may be better handled by the application, not the database.
Suppose I have a table called cities (id, name) and another called people (id, name, city_id).
Now, you can add as many cities as you want, without any kind of limit, so, in this situation, wouldn't it be better to have the city column inside people and when I want to get the different cities I can just call DISTINCT?
Is the complexity I save better than the flexibility I lose?
Are the joins I save better than the distinct I have to use?
Setting the city column as index, would it be as fast as lookin up the ID in another table?
I must say that in the application I'm working on I have around 5 tables of this kind, so it's, like, a lot of joins.
DISTINCT will work reasonably well if all your users have different cities. However, the more people you have from the same city the higher the overhead of not separating the data. Best case optimization would be a unique scan on the the index for the cities column.
Putting the city column in the people table can also slow certain access requests to the people table.
The complexity can make maintaining the data quality of your cities table. Assuming the appropriate metadata is available it is reasonably simple to validate new cities which have been added. The allows you clean up issues like having the cities 'Newyork', 'new york', 'New York', and 'new York'
The question of which approach will be faster really depends on the application. If city is always required when accessing a person's record, putting the city in person record might be faster. If you frequently need a list of cities, then having them in a separate table is better.
Having the extra tables may make your queries a little more complex. However, you will likely gain significant data quality. The flexibility you get by putting the city column in the people table will likely be at the expense of data quality.
Your database designer appears to have done a good job.
You can always cache the cities table. if you use a string you lose object orientation. Lets say you want an abbreviation for cities. or a city has_many zip_codes... or city has_many area_codes. Now you will have a pain upgrading your app.
you can always do something like this to lose the extra query...
def city_name
Rails.cache.fetch("city-#{city_id}-name", :expires_in => 1.days) do
city.name
end
end
also you should probably use belongs_to rather than has_one