I'm designing a database with the following attribute dependencies:
Approach 1: A <- B <- C <-D
Approach 2: A <- B, {A,B} < C, {A,B,C} <- D;
With the first approach, attribute D is dependent on attribute C, C on B and B on A.
With second approach, attribute D can be gotten directly from A.
Please I need your help on which approach is better. Thanks
EDIT
Sample tables for approach 1
Country_info
------------- , state info, city_info, village_info
id | country_id | name
TABLE PAIRS
country_state
id | state_id | country_id
state_division
id | division_id | state_id
village_division
id | village_id | division_id
Now, I have the id of a village and I want to know the name of the country in which it belongs. I will have to look for the division, state before arriving at the country.
With the second approach, the village table will have the division_id, state_id and the country_id.
Thanks!
If village is "main" obiect which will be used very often (and it's relations to other tables will be also often used) then by using second approach you will reduce number of code lines and increase performance (eg. in filtering villages by country).
KISS.
Table 1: A business/person/etc has an address and a City.
Table 2: The City also includes the Viliage, State, Province, Country_code, Postal_code, whatever.
Normalizing each layer is overkill.
If you have half a dozen tables, imagine the number of JOINs needed to get all the parts of the address!
Related
Hi I am relooking at the design of my mysql dbase for efficiency..
currently i have 3 tables
tble country :
country id, country name
table state :
state id, state name, country id
table city :
city id, city name, state id
I am thinking whether it is better to have ...
country name instead of country id in table state
state name instead of state id in table city
this is because everywhere in my code i have to run extra queries to convert country id, state id and city id from numbers to alphabets (eg. 1 to USA)... wouldn't it be better to just reference alphabetically.. (less queries)
The whole world has roughly
260 country/regions
5000 states
many many cities
Design varies based on what you need.
1 - For the purpose of tiny storage:
country(id,country)
state(id,state,country_id)
city(id,city,state_id)
2 - For the purpose of quick query:
city(id,city,state,country)
3 - For the purpose of middle way:
country(code,country)
state(code,country) -- you might merge country and state into one table based on code
city(state_code,city)
You might be interested to have a look at the iso codes:
https://en.wikipedia.org/wiki/ISO_3166-1 eg US
https://en.wikipedia.org/wiki/ISO_3166-2 eg US-NY
As a result iso state code contains iso country code.
UPDATE as per more info from you:
If you are designing property websites for USA.
1 - You do not need a country table, most likely all properties are within USA
2 - There are less than 60 states within USA, so you can use enum to save sates. As nearly all of you will understand NY = New York, as a result you do not need a state table.
3 - So you need a city table. As you will use city_id for more than 10,000,000 property records.
usa_cities(
id int PK
state enum('NY', 'LA', ...)
city varchar
...
)
properties(
id int PK
city_id int,
....
)
AS property table is usually very big, you might skip state table, and de-normalize design to speed up query for less joins:
properties (
id int PK,
state enum('NY', 'LA',...)
city varchar
...
)
You might be able to use enum for city as well, I m not sure how many cities in usa, but it is not encouraged at my first thought.
If you want less query there is some technique call denormalization.
You can weight what most important and fit to your need.
for more about demonalization meaning from techopidia and wikipedia
I have a Perl program that queries a MySQL database to bring back results based upon which "report" option a user has selected from a web page.
One of the reports is all occupants of a student housing building who have applied for a parking permit, but who have not yet been given one.
When the students apply for a permit, it records the specifics about their car (make, model, year, color, etc.) in a single table row. Each apartment can have up to three students, and each student may apply for a permit. So an apartment might have 0 permits, or 1, 2, or 3 permits, depending upon how many of them have cars.
What I'd like to be able to do, is execute a MySQL query that will find out how many occupants in each apartment have applied for a parking permit, and then based on the results of that query, find out how many permits have been issued. If the number of permits issued is less than the number of applications, that apartment number should be returned in the result set. It doesn't have to name the specific occupant, just the fact that the apartment has at least one occupant who has applied for a permit, but not yet received one.
So I have two tables, one is called occupant_info and it contains all kinds of info about the occupant, but the relevant fields are:
counter (a unique row id)
parking_permit_1_number
parking_permit_2_number
parking_permit_3_number
When a parking permit has been assigned, it is recorded in the appropriate parking_permit_#_number field (if it's occupant number one's permit, it would be recorded in parking_permit_1_number, etc.).
The second table is called, parking_permits, and contains all of the car/owner specifics (make, model, year, owner, owner address, etc.). It also contains a field which references the counter from the occupant_info table.
So an example would be:
occupant_info table
counter | parking_permit_1_number | parking_permit_2_number | parking_permit_3_number
--------|-------------------------|-------------------------|------------------------
1 | 12345 | | 98765
2 | 43920 | |
3 | 30239 | | 34233
parking_permits table
counter | counter_from_occupant_info | permit_1_name | permit_2_name | permit_3_name
--------|----------------------------|---------------|-----------------|-------------------
1 |2 | David Jones | James Cameron | Michael Smerconish
2 |3 | Bill Epps | Hillary Clinton | Donald Trump
3 |1 | Joanne Miller | | Sridevi Gupta
I want a query that will first look at how many occupants in an apartment have applied for a permit. This is determined by counting the names in the parking_permits table. In that table, row 1 has three names, row 2 has three names, and row 3 has two names. The query should then look at the occupant_info table, and for each counter_from_occupant_info from the parking_permits table, see if the same number of parking permits have been issued. This can be determined by comparing the number of non-blank parking_permit_#_number fields.
Using the data above, the query would see the following :
parking_permit table row 1
Has counter_from_occupant_info equal to "2"
Has three names
The row in occupant_info with counter = "2" has only one permit number issued,
so counter_from_occupant_info 2 from parking_permits should be in the result set.
parking_permit table row 2
Has counter_from_occupant_info equal to "3"
Has three names
The row in occupant_info with counter = "3" has only two permit numbers issued,
so counter_from_occupant_info 3 from parking_permits should be in the result set.
parking_permit table row 3
Has counter_from_occupant_info equal to "1"
Has two names
The row in occupant_info with counter = "1" has two permit numbers issued,
so this row should *not* be in the result set.
I've thought about using if, then, case, when, type logic to do this in one query, but frankly can't wrap my head around how to do so.
I was thinking something like:
SELECT
CASE WHEN ( SELECT counter_from_occupant_info
FROM parking_permits
WHERE parking_permit_1_name != ""
AND parking_permit_2_name != ""
AND parking_permit_3_name != "" ) THEN
IF ( SELECT parking_permit_1_number,
parking_permit_2_number,
parking_permit_3_number
FROM occupant_info
WHERE counter = ***somehow reference counter from above case statement--I don't know how to do this***
But then my head explodes and I realize I don't know what the heck I'm doing.
Any help would be appreciated. :-)
Doug
You have a few problems:
Your occupants table schema is bad. There's worse out there, but it looks like someone that doesn't understand how a database works built this table.
Your permits table is also bad. Same reason.
You have no idea what you are doing (kidding... kidding...)
Problem 1:
Your occupants table should probably be two tables. Because an occupant could have 0-3 permits (possibly more, I can't tell from the sample data) then you need a table for your occupant's attributes (name, height, gender, age, primary smell, favorite color, first rent date, I dunno).
Occupants
OccupantID | favorite TV Show | number of limbs | first name | last name | aptBuilding
And... another table for Relationship between the occupant and the permit:
Occupant_permits
OccupantID | Permit ID | status
Now... an occupant can have as many permits as you can stuff into that table and the relationship between them has a status "Applied for", or "Granted" or "Revoked" or what have you.
Problem 2
Your permit info table is doing double duty as well. It holds the information about a permit (it's name) as well as the relationship to the occupant. Since we already have a relationship to the occupant with the "Occupant_Permits" table above, we just need a permits table to hold attributes of a permit:
Permits
Permit ID | Permit Name | Description | etc..
Problem 3
Now that you have a correct schema where objects are in their own table (Occupant, Permit, Occupant and Permit Relationship) your query to get a list of apartments that have at least one occupant that has applied, but not yet received a permit would be:
SELECT
COUNT(DISTINCT o.AptBuilding)
FROM
occupants as o
INNER JOIN occupants_permit as op
ON o.occupant_id = op.occupant_id
INNER JOIN permits as p
ON op.permit_id = p.permit_id
WHERE
op.Status = "Applied"
That's nice and simple and you aren't relying on CASE or UNION or count comparison or any fancy stuff. Just nice straight joins and a simple WHERE clause. This will be fast to query and there's no funny business.
Because your schema isn't great, in order to get something similar you'll need to make use of either UNION queries to stack your many permit_N_ fields into a single field and run something similar to the above query, or you'll have use a fair amount of CASE/IF statements:
SELECT DISTINCT p.pCounter
FROM
(
SELECT
counter as Ocounter
CASE WHEN parking_permit_1_number IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_2_number IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_3_number IS NOT NULL THEN 1 ELSE 0 END AS permitCount
FROM occupant_info
) as o
LEFT OUTER JOIN
(
SELECT
counter_from_occupant_info as pCounter
CASE WHEN parking_permit_1_name IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_2_name IS NOT NULL THEN 1 ELSE 0 END
+
CASE WHEN parking_permit_3_Name IS NOT NULL THEN 1 ELSE 0 END AS permitPermitCount
) as p ON o.Ocounter = p.Pcounter
WHERE p.permitCounter > o.PermitCount
I'm not 100% convinced that is exactly what you are looking for since your schema is confusing where you have multiple objects in a single table and everything is pivoted, but... it should get you in the ball park.
This will be much slower too. There's intermediate result sets, CASE statements, and math, so don't expect MySQL to spit this out in milliseconds.
I have a bunch of data that is stored pertaining to county demographics in a database. I need to be able to access the average of data within in the state of a certain county.
For example, I need to be able to get the average of all counties who's state_id matches the state_id of the county with a county_id of 1. Essentially, if a county was in Virginia, I would need the average of all of the counties in Virginia. I'm having trouble setting up this query, and I was hoping that you guys could give me some help. Here's what I have written, but it only returns one row from the database because of it linking the county_id of the two tables together.
SELECT AVG(demographic_data.percent_white) as avg_percent_white
FROM demographic_data,counties, states
WHERE counties.county_id = demographic_data.county_id AND counties.state_id = states.state_id
Here's my basic database layout:
counties
------------------------
county_id | county_name
states
---------------------
state_id | state_name
demographic_data
-----------------------------------------
percent_white | percent_black | county_id
Your query is returning one row, because there's an aggregate and no GROUP BY. If you want an average of all counties within a state, we'd expect only one row.
To get a "statewide" average, of all counties within a state, here's one way to do it:
SELECT AVG(d.percent_white) AS avg_percent_white
FROM demographic_data d
JOIN counties a
ON a.county_id = d.county_id
JOIN counties o
ON o.state_id = a.state_id
WHERE o.county_id = 42
Note that there's no need to join to the state table. You just need all counties that have a matching state_id. The query above is using two references to the counties table. The reference aliased as "a" is for all the counties within a state, the reference aliased as "o" is to get the state_id for a particular county.
If you already had the state_id, you wouldn't need a second reference:
SELECT AVG(d.percent_white) AS avg_percent_white
FROM demographic_data d
JOIN counties a
ON a.county_id = d.county_id
WHERE a.state_id = 11
FOLLOWUP
Q What if I wanted to bring in another table.. Let's call it demographic_data_2 that was also linked via the county_id
A I made the assumption that the demographic_data table had one row per county_id. If the same holds true for the second table, then a simple JOIN operation.
JOIN demographic_data_2 c
ON c.county_id = d.county_id
With that table joined in, you could add an appropriate aggregate expression in the SELECT list (e.g. SUM, MIN, MAX, AVG).
The trouble spots are typically "missing" and "duplicate" data... when there isn't a row for every county_id in that second table, or there's more than one row for a particular county_id, that leads to rows not included in the aggregate, or getting double counted in the aggregate.
We note that the aggregate returned in the original query is an "average of averages". It's an average of the values for each county.
Consider:
bucket count_red count_blue count_total percent_red
------ --------- ---------- ----------- -----------
1 480 4 1000 48
2 60 1 200 30
Note that there's a difference between an "average of averages", and calculating an average using totals.
SELECT AVG(percent_red) AS avg_percent_red
, SUM(count_red)/SUM(count_total) AS tot_percent_red
avg_percent_red tot_percent_red
--------------- ---------------
39 45
Both values are valid, we just don't want to misinterpret or misrepresent either the value.
I have three tables in a MySQL database:
stores (PK stores_id)
states (PK states_id)
join_stores_states (PK join_id, FK stores_id, FK states_id)
The "stores" table has a single row for every business. The join_stores_states table links an individual business to each state it's in. So, some businesses have stores in 3 states, so they 3 rows in join_stores_states, and others have stores in 1 state, so they have just 1 row in join_stores_states.
I'm trying to figure out how to write a query that will list each business in one row, but still show all the states it's in.
Here's what I have so far, which is obviously giving me every row out of join_stores_states:
SELECT states.*, stores.*, join_stores_states.*
FROM join_stores_states
JOIN stores
ON join_stores_states.stores_id=stores.stores_id
JOIN states
ON join_stores_states.states_id=states.states_id
Loosely, this is what it's giving me:
store 1 | alabama
store 1 | florida
store 1 | kansas
store 2 | montana
store 3 | georgia
store 3 | vermont
This is more of what I want to see:
store 1 | alabama, florida, kansas
store 2 | montana
store 3 | georgia, vermont
Suggestions as to which query methods to try would be just as appreciated as a working query.
If you need the list of states as a string, you can use MySQL's GROUP_CONCAT function (or equivalent, if you are using another SQL dialect), as in the example below. If you want to do any kind of further processing of the states separately, I would prefer you run the query as you did, and then collect the resultset into a more complex structure (hashtable of arrays, as a simplest measure, but more complex OO designs are certainly possible) in the client by iterating over the resulting rows.
SELECT stores.name,
GROUP_CONCAT(states.name ORDER BY states.name ASC SEPARATOR ', ') AS state_names
FROM join_stores_states
JOIN stores
ON join_stores_states.stores_id=stores.stores_id
JOIN states
ON join_stores_states.states_id=states.states_id
GROUP BY stores.name
Also, even if you only need the concatenated string and not a data structure, some databases might not have an aggregate concatenation function, in which case you will have to do the client processing anyway. In pseudocode, since you did not specify a language either:
perform query
stores = empty hash
for each row from query results:
get the store object from the hash by name
if the name isn't in the hash:
put an empty store object into the hash under the name
add the state name to the store object's stores array
Firstly I'd like to start by apologizing for the potentially miss-leading title... I am finding it difficult to describe what I am trying to do here.
With the current project I'm working on, we have setup a 'dynamic' database structure with MySQL that looks something like this.
item_details ( Describes the item_data )
fieldID | fieldValue | fieldCaption
1 | addr1 | Address Line 1
2 | country | Country
item_data
itemID | fieldID | fieldValue
12345 | 1 | Some Random Address
12345 | 2 | United Kingdom
So as you can see, if for example I wanted to lookup the address for the item 12345 I would simply do the statement.
SELECT fieldValue FROM item_data WHERE fieldID=1 and itemID=12345;
But here is where I am stuck... the database is relatively large with around ~80k rows and I am trying to create a set of search functions within PHP.
I would like to be able to perform a query on the result set of a query as quickly as possible...
For example, Search an address name within a certain country... ie: Search for the fieldValue of the results with the same itemID's as the results from the query:
'SELECT itemID from item_data WHERE fieldID=2 and fieldValue='United Kingdom'..
Sorry If I am unclear, I have been struggling with this for the past couple of days...
Cheers
You can do this in a couple of ways. One is to use multiple joins to the item_data table with the fieldID limited to whatever it is you want to get.
SELECT *
FROM
Item i
INNER JOIN item_data country
ON i.itemID = country.itemID
and fieldid = 2
INNER JOIN item_data address
ON i.itemID = country.itemID
and fieldid = 1
WHERE
country.fieldValue= 'United Kingdom'
and address.fieldValue= 'Whatever'
As an aside this structure is often referred to as an Entry Attribute Value or EAV database
Sorry in advance if this sounds patronizing, but (as you suggested) I'm not quite clear what you are asking for.
If you are looking for one query to do the whole thing, you could simply nest them. For your example, pretend there is a table named CACHED with the results of your UK query, and write the query you want against that, but replace CACHED with your UK query.
If the idea is that you have ALREADY done this UK query and want to (re-)use its results, you could save the results to a table in the DB (which may not be practical if there are a large number of queries executed), or save the list of IDs as text and paste that into the subsequent query (...WHERE ID in (...) ... ), which might be OK if your 'cached' query gives you a manageable fraction of the original table.