I'm debating between using a CASE statement or a lookup table to replace text from table2.columnB when table1.columnB = table2.columnA. I'd rather use a lookup table because it's easier to manage.
Our database pulls all the customer order information from our online store. It receives all the state names in full and I need to replace all instances of U.S. states with their 2-character abbreviation. (e.g. Texas -> TX)
How would I use a lookup table with this query for State?
Here's my query: http://sqlfiddle.com/#!9/e44aa3/12/0
Thank you in advance!
For your question how would add the lookup table in your code, you must add this join:
LEFT JOIN `state_abbreviations` AS `sa` ON `sa`.`shipping_zone` = `o`.`shipping_zone`
and change this line:
`o`.`shipping_zone` AS `State`
with:
COALESCE(`sa`.`zone_abbr`, `o`.`shipping_zone`) AS `State`
so you get the abbreviation returned.
See the demo.
Results:
Order ID Name State Qty Option Size Product Ref
12345 Mason Sklut NC 1 R L Tee R / Tee L
12346 John Doe OH 2 Bl S Hood 2x Bl / Hood S
Using a CASE expression is sure an option. However, it does not scale well: there are 50+ states in the US, so you would need to write 50 when branches, like:
case state
when 'North Carolina' then 'NC'
when 'Ohio' then 'OH'
when ...
end
Creating a mapping table seems like a better idea. It is also a good way to enforce referential integrity (ie ensure that the names being used really are state names).
That would look like:
create table states (
code varchar(2) not null primary key,
name varchar(100) not null
);
In your original table, you want to have a column that stores the state code, with a foreign key constraint that references states(code) (you may also store the state name, but this looks like a less efficient option in terms of storage).
You can do the mapping in your queries with a join:
select t.*, s.name state_name
from mytable t
inner join states s on s.code = t.state_code
Related
I am trying to apply join over two table, the column on which join needs to be applied values for them are not identical due to which i need to used concat but the problem is its taking very long time to run. So here is the example:
I have two tables:
Table: MasterEmployee
Fields: varchar(20) id, varchar(20) name, Int age, varchar(20) status
Table: Employee
Fields: varchar(20) id, varchar(20) designation, varchar(20) name, varchar(20) status
I have constant prefix: 08080
Postfix of constant length 1 char but value is random.
id in Employee = 08080 + {id in MasterEmployee} +{1 char random value}
Sample data:
MasterEmployee:
999, John, 24, approved
888, Leo, 26, pending
Employee:
080809991, developer, John, approved
080808885, Tester, Leo, approved
Here is the query that i am using:
select * from Employee e inner join MasterEmployee me
on e.id like concat('%',me.id,'%')
where e.status='approved' and me.status='approved';
Is there any better way to do the same ?? because i need to run same kind of query over very large dataset.
It would certainly be better to use the static prefix 08080 so that the DBMS can use an index. It won't use an index with LIKE and a leading wildcard:
SELECT * FROM Employee e INNER JOIN MasterEmployee me
ON e.id LIKE CONCAT('08080', me.id, '_')
AND e.status = me.status
WHERE e.status = 'approved';
Note that I added status to the JOIN condition since you want Employee.status to match MasterEmployee.status.
Also, since you only have one postfix character you can use the single-character wildcard _ instead of %.
It's not concat that's the issue, scalar operations are extremely cheap. The problem is using like like you are. Anything of the form field like '%...' automatically skips the index, resuling in a scan operation -- for what I think are obvious reasons.
If you have to have this code, then that's that, there's nothing you can do and you have to be resigned to the large performance hit you'll take. If at all possible though, I'd rethink either your database scheme or the way you address it.
Edit: Rereading it, what you want is to concatenate the prefix so your query takes the form field like '08080...'. This will make use of any indices you might have.
I am trying to apply join over two table, the column on which join needs to be applied values for them are not identical due to which i need to used concat but the problem is its taking very long time to run. So here is the example:
I have two tables:
Table: MasterEmployee
Fields: varchar(20) id, varchar(20) name, Int age, varchar(20) status
Table: Employee
Fields: varchar(20) id, varchar(20) designation, varchar(20) name, varchar(20) status
I have constant prefix: 08080
Postfix of constant length 1 char but value is random.
id in Employee = 08080 + {id in MasterEmployee} +{1 char random value}
Sample data:
MasterEmployee:
999, John, 24, approved
888, Leo, 26, pending
Employee:
080809991, developer, John, approved
080808885, Tester, Leo, approved
Here is the query that i am using:
select * from Employee e inner join MasterEmployee me
on e.id like concat('%',me.id,'%')
where e.status='approved' and me.status='approved';
Is there any better way to do the same ?? because i need to run same kind of query over very large dataset.
It would certainly be better to use the static prefix 08080 so that the DBMS can use an index. It won't use an index with LIKE and a leading wildcard:
SELECT * FROM Employee e INNER JOIN MasterEmployee me
ON e.id LIKE CONCAT('08080', me.id, '_')
AND e.status = me.status
WHERE e.status = 'approved';
Note that I added status to the JOIN condition since you want Employee.status to match MasterEmployee.status.
Also, since you only have one postfix character you can use the single-character wildcard _ instead of %.
It's not concat that's the issue, scalar operations are extremely cheap. The problem is using like like you are. Anything of the form field like '%...' automatically skips the index, resuling in a scan operation -- for what I think are obvious reasons.
If you have to have this code, then that's that, there's nothing you can do and you have to be resigned to the large performance hit you'll take. If at all possible though, I'd rethink either your database scheme or the way you address it.
Edit: Rereading it, what you want is to concatenate the prefix so your query takes the form field like '08080...'. This will make use of any indices you might have.
So, for each table there is a id, generally it is int(11) Auto inc, not null.
When I was in school I mainly named it as id,
However, when I encounter the more complex database design , I found if I use "id", I have to do more in the Select Query
For example, there is a table "customer", "customer_group"
So I simply get the customer and his customer_group info like this:
SELECT *
FROM customer
JOIN customer_group
ON customer.group_id = customer_group.id
Notice that there will be two id return, if I would like differentiate them, I need to do:
SELECT customer.id AS cid
,customer_group_id AS cgid
,customer.NAME
,.......
FROM customer
JOIN customer_group
ON customer.group_id = customer_group.id
That makes the work very tedious and the query is long. So, I wonder what is the practice in naming the id, should I use table_name + "id" to name it? Thanks
We can do something like this using aliasing, to make it less tedious and more readable..
SELECT c.id AS cid
,customer_group_id AS cgid
,c.NAME
,.......
FROM customer c
JOIN customer_group cg
ON c.group_id = cg.id
In Case of Naming the ID the best practice of ID is opinion based, or implementation standard based. Per situation the benefits varies on each method.
Generally, the ID of a X Table should be ID but when it is used in other table lets say Y Table as foreign key it will become X_ID
So, this will make things clear in the Y table that this X_ID refers to(coming from) X table's ID
Naming conventions are defined as per convenience of developers then it becomes methodology and later-on it will become standard.
But, this also affects us with the change in development environment..
I will suggest if its commercial project choose naming convention as per the standard.
If personal project, choose what suits your habits..
I store destinations a user is willing to ship a product to in a varchar field like this:
"userId" "destinations" "product"
"1" "US,SE,DE" "apples"
"2" "US,SE" "books"
"3" "US" "mushrooms"
"1" "SE,DE" "figs"
"2" "UK" "Golf Balls"
I was hoping this query would return all rows where US was present. Instead it returns only a single row.
select * from destinations where destinations IN('US');
How do I get this right? Am I using the wrong column type? or is it my query that's failing.
Current Results
US
Expected Results
US,SE,DE
US,SE
US
Try with FIND_IN_SET
select * from destinations where FIND_IN_SET('US',destinations);
Unfortunately, the way you've structured your table, you'll have to check for a pattern match for "US" in your string at the beginning, middle, or end.
One way you can do that is using LIKE, as follows:
SELECT *
FROM destinations
WHERE destinations LIKE ('%US%');
Another way is using REGEXP:
SELECT *
FROM destinations
WHERE destinations REGEXP '.*US.*';
Yet another is using FIND_IN_SET, as explained by Sadkhasan.
CAVEAT
None of these will offer great performance or data integrity, though. And they will all COMPOUND their performance problems when you add criteria to your search.
E.g. using FIND_IN_SET, proposed by Sadkhasan, you would have to do something like:
SELECT * FROM destinations
WHERE FIND_IN_SET('US',destinations)
OR FIND_IN_SET('CA',destinations)
OR FIND_IN_SET('ET',destinations);
Using REGEXP is a little better, though REGEXP is innately slow:
SELECT *
FROM destinations
WHERE destinations REGEXP '.*US|CA|ET.*';
SO WHAT NOW?
Your best bet would be switching to a 3NF design with destinations applying to products by splitting into 2 tables that you can join, e.g.:
CREATE TABLE products (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
userId INT NOT NULL REFERENCES users(id),
name VARCHAR(255) NOT NULL
) TYPE=InnoDB;
Then you would add what's called a composite key table, each row containing a productId and a single country, with one row per country.
CREATE TABLE product_destinations (
productId INT NOT NULL REFERENCES products(id),
country VARCHAR(2) NOT NULL,
PRIARY KEY (productId, country)
) TYPE=InnoDB;
Data in this table would look like:
productId | country
----------|--------
1 | US
1 | CA
1 | ET
2 | US
2 | GB
Then you could structure a query like this:
SELECT p.*
FROM products AS p
INNER JOIN product_destinations AS d
ON p.id = d.productId
WHERE d.country IN ('US', 'CA', 'ET')
GROUP BY p.id;
It's important to add the GROUP (or DISTINCT in the SELECT clause), as a single product may ship to multiple countries, resulting in multiple row matches - aggregation will reduce those to a single result per product id.
An added bonus is you don't have to UPDATE your countries column and do string operations to determine if the country already exists there. You can let the database do that for you, and INSERT - preventing locking issues that will further compound your problems.
You can use this if your destinations have just two caracters of the countries.
SELECT * FROM destinations WHERE destinations LIKE ('%US%')
to add other country
SELECT * FROM destinations WHERE destinations LIKE ('%US%')
AND destinations LIKE ('%SE%')
^^^--> you use AND or OR as you want the result.
I'm wondering if this its even posible.
I want to join 2 tables based on the data of table 1.
Example table 1 has column food with its data beeing "hotdog".
And I have a table called hotdog.
IS it possible to do a JOIN like.
SELECT * FROM table1 t join t.food on id = foodid
I know it doesnt work but, its even posible, is there a work arround?.
Thanks in advance.
No, you can't join to a different table per row in table1, not even with dynamic SQL as #Cade Roux suggests.
You could join to the hotdog table for rows where food is 'hotdog' and join to other tables for other specific values of food.
SELECT * FROM table1 JOIN hotdog ON id = foodid WHERE food = 'hotdog'
UNION
SELECT * FROM table1 JOIN apples ON id = foodid WHERE food = 'apples'
UNION
SELECT * FROM table1 JOIN soups ON id = foodid WHERE food = 'soup'
UNION
...
This requires that you know all the distinct values of food, and that all the respective food tables have compatible columns so you can UNION them together.
What you're doing is called polymorphic associations. That is, the foreign key in table1 references rows in multiple "parent" tables, depending on the value in another column of table1. This is a common design mistake of relational database programmers.
For alternative solutions, see my answers to:
Possible to do a MySQL foreign key to one of two possible tables?
Why can you not have a foreign key in a polymorphic association?
I also cover solutions for polymorphic associations in my presentation Practical Object Oriented Models In SQL, and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Only with dynamic SQL. It is also possible to left join many different tables and use CASE based on type, but the tables would be all have to be known in advance.
It would be easier to recommend an appropriate design if we knew more about what you are trying to achieve, what your design currently looks like and why you've chosen that particular table design in the first place.
-- Say you have a table of foods:
id INT
foodtype VARCHAR(50) (right now it just contains 'hotdog' or 'hamburger')
name VARCHAR(50)
-- Then hotdogs:
id INT
length INT
width INT
-- Then hamburgers:
id INT
radius INT
thickness INT
Normally I would recommend some system for constraining only one auxiliary table to exist, but for simplicity, I'm leaving that out.
SELECT f.*, hd.length, hd.width, hb.radius, hb.thickness
FROM foods f
LEFT JOIN hotdogs hd
ON hd.id = f.id
AND f.foodtype = 'hotdog'
LEFT JOIN hamburgers hb
ON hb.id = f.id
AND f.foodtype = 'hamburger'
Now you will see that such a thing can be code generated (or even for a very slow prototype dynamic SQL on the fly) from SELECT DISTINCT foodtype FROM foods given certain assumptions about table names and access to the table metadata.
The problem is that ultimately whoever consumes the result of this query will have to be aware of new columns showing up whenever a new table is added.
So the question moves back to your client/consumer of the data - how is it going to handle the different types? And what does it mean for different types to be in the same set? And if it needs to be aware of the different types, what's the drawback of just writing different queries for each type or changing a manual query when new types are added given the relative impact of such a change anyway?