I have a Products table, which maps to a bridging table that contains all of the items said product comprises of, so Product A could be comprised of several items or just one. (OTM)
The product_sub_item_bridge looks like this,
+------------+----------------------------+
| product_id | client_sub_product_item_id |
+------------+----------------------------+
| 137 | 332 |
| 138 | 333 |
| 139 | 334 |
| 140 | 332 |
| 140 | 335 |
+------------+----------------------------+
So say a client orders product 140, items 332 and 335 will be inserted into a table called client_sub_products which houses the relationship to the order and the items themselves that are stored in the client_sub_product_items table.
What I would like to do now is get all of the client_sub_products, group them by the client_order_id and maybe GROUP_CONCAT() the id's, and somehow join the Products table onto it via the bridging table, so that I can get a list containing the COUNT(), for all of theProducts that are comprised of those exact client_sub_product_items. Like so...
+--------------+---------------------+
| product_name | count(product_name) |
+--------------+---------------------+
| Product A | 15 |
| Product B | 25 |
+--------------+---------------------+
Here is what I have thus far,
SELECT GROUP_CONCAT(`client_sub_products`.`client_sub_product_item_id`) FROM `client_sub_products` LEFT JOIN `client_sub_product_items` ON `client_sub_product_items`.`id` = `client_sub_products`.`client_sub_product_item_id` GROUP BY `client_sub_products`.`client_order_id` ORDER BY `client_sub_products`.`client_order_id` ASC;
I can't seem to get past the bridging table, I am not sure how I can join the client_sub_product_items onto the Products through the bridging table, because there are products that have more than one client_sub_product_item related to it, I seem to be confusing myself there.
I hope I have explained myself adequately, and not just confused everyone... please let me know if I should try clarify anything mentioned above.
Why associate the subproducts to the client if the subproducts are not orderable entities? Why not just assign the products to the client, as it would be easy to derive the subprodcuts for each client from that? As it stands right now you have no definitive way to determine whether subproduct 332 from your example is related to product 137 or 140.
Related
My question title is confusing, sorry about that.
I have one application that saves data in the database in XML'ish format, referencing keys and values.
The problem is that I have only one column with several values that corresponds to a certain key.
I need to have certain keys as columns but I am failing miserably to achieve that:
Below a sample of the table I have
xml_type | xml_key | xml_content_key | xml_content_value
----------------------------------------------------------------------------------------------------
Archiv::144 | 144 | [1]{'Version'}[1]{'Carrier'}[1]{'Content'} | 151
Archiv::144 | 144 | [1]{'Version'}[1]{'CarrierID'}[1]{'Content'} | 5714141614
Archiv::144 | 144 | [1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'} | 145
So, I can run this and have all the carriers:
select xml_content_key as Carrier, xml_content_value as 'Carrier Result'
from xml_storage xs where xml_content_key LIKE '%[1]{\'Version\'}[1]{\'Carrier\'}[1]{\'Content\'}%'
But how would I other keys from column xml_content_key to be shown as coluns.
I have tried nested selects but got "Returns more than one value", a join would not apply since this is on a single table.
In short, I would like to run a query to gather a few keys from column xml_content_key and have each in a new column.
Thank you.
Without knowing the schema of either your table or your XML document I'll have to make some assumptions. But I think this isn't too hard. First I'll write out the assumptions I'm making. Please correct me if these assumptions are wrong.
It seems like you have a table in which xml_content_key is what should be the column name, and xml_key is what should be the row identifier. You only showed a very limited sample in your question, but my assumption would suggest that more data might look like this.
xml_type | xml_key | xml_content_key | xml_content_value
----------------------------------------------------------------------------------------------------
Archiv::144 | 144 | [1]{'Version'}[1]{'Carrier'}[1]{'Content'} | 151
Archiv::144 | 144 | [1]{'Version'}[1]{'CarrierID'}[1]{'Content'} | 5714141614
Archiv::144 | 144 | [1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'} | 145
Archiv::144 | 145 | [1]{'Version'}[1]{'Carrier'}[1]{'Content'} | 123
Archiv::144 | 145 | [1]{'Version'}[1]{'CarrierID'}[1]{'Content'} | 4567891234
Archiv::144 | 145 | [1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'} | 567
Archiv::144 | 146 | [1]{'Version'}[1]{'Carrier'}[1]{'Content'} | 891
Archiv::144 | 146 | [1]{'Version'}[1]{'CarrierID'}[1]{'Content'} | 2345678912
Archiv::144 | 146 | [1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'} | 345
And I think you're trying to write a query to reorganize it like this.
+---------+---------+------------+-------------------+
| xml_key | Carrier | CarrierID | CustomerInterface |
+---------+---------+------------+-------------------+
| 144 | 151 | 5714141614 | 145 |
| 145 | 123 | 4567891234 | 567 |
| 146 | 891 | 2345678912 | 345 |
+---------+---------+------------+-------------------+
If I'm wrong about this part then there's no point in reading on. But if I'm right so far, then I'd like to highlight a quote from your question.
a join would not apply since this is on a single table.
You have been missing out on a great feature of SQL: self joins are extremely useful in cases like this.
It appears that there are three "content keys" (or columns) for each xml_key (or row). We will join together all the xml_content_key's that share the same xml_key, so that each row will describe a single xml_key. By the way, I'm assuming your table is named xml_storage.
SELECT xs1.xml_key AS 'xml_key',
xs1.xml_content_value AS 'Carrier',
xs2.xml_content_value AS 'CarrierID',
xs3.xml_content_value AS 'CustomerInterface'
FROM xml_storage xs1
INNER JOIN xml_storage xs2 ON xs2.xml_key = xs1.xml_key
INNER JOIN xml_storage xs3 ON xs3.xml_key = xs1.xml_key
WHERE xs1.xml_content_key LIKE "%[1]{'Version'}[1]{'Carrier'}[1]{'Content'}%"
AND xs2.xml_content_key LIKE "%[1]{'Version'}[1]{'CarrierID'}[1]{'Content'}%"
AND xs3.xml_content_key LIKE "%[1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'}%"
The basic idea here is that we separate the table into three tables and then put them back together. We put the Carriers in xs1, the CarrierIDs in xs2, and the CustomerInterfaces in xs3. Then we join these back together, putting all of the content associated with a particular xml_key on the same row.
You will probably need to alter this to fit your actual schema. In particular, this query assumes that you have exactly one Carrier, CarrierID, and CustomerInterface per unique xml_key. I am confident that this general approach will work if your data is anything like I've been assuming, but imperfect data would necessitate a more robust query than the example I've given here.
If you can share more details about your particular schema, I would be happy to edit my suggested query to fit your situation.
I'm doing a kind of point-of-sale system whose MySQL database has (among other things) a table with items for sale, a table with sales, and a table with purchases (a purchase being my ad-hoc notation for any single item bought in a sale; if the same person buys three items at once, for example, that's one sale consisting of three purchases). All these tables have logical IDs, viz. item_id, sale_id, purchase_id, and are easily joined with simple pivotal tables.
I am now trying to add a discount feature; basically your garden-variety supermarket discount: buy these particular items and pay X instead of paying the full sum of the regular item prices. These 'package deals' have their own table and are linked to the items table with a simple pivotal table containing deal_id and item_id.
My problem is getting to the point of figuring out when this is to be applied. To give some example data:
items
+---------+--------+---------+
| item_id | title | price |
+---------+--------+---------+
| 12 | Shoe | 10 |
| 76 | Coat | 23 |
| 82 | Whip | 19 |
+---------+--------+---------+
sales
+---------+-----------+
| sale_id | timestamp |
+---------+-----------+
| 2973 | 144995839 |
| 3092 | 144996173 |
+---------+-----------+
purchases
+-------------+-------------+---------+----------+---------+
| purchase_id | no_of_items | item_id | at_price | sale_id |
+-------------+-------------+---------+----------+---------+
| 12993 | 1 | 12 | 10 | 2973 |
| 12994 | 1 | 76 | 23 | 2973 |
| 12996 | 1 | 82 | 19 | 2973 |
| 13053 | 1 | 12 | 10 | 3092 |
| 13054 | 1 | 82 | 19 | 3092 |
+-------------+-------------+---------+----------+---------+
package_deals
+---------+-------+
| deal_id | price |
+---------+-------+
| 1 | 40 |
+---------+-------+
deals_items
+---------+---------+
| deal_id | item_id |
+---------+---------+
| 1 | 12 |
| 1 | 76 |
| 1 | 82 |
+---------+---------+
As is hopefully obvious from that, we have a shoe that cost $10 (let's just assume we use dollars as our currency here, doesn't matter), a coat that costs $23, and a whip that costs $19. We also have a package deal that if you buy both a shoe, a coat, and a whip, you get the whole thing for $40 altogether.
Of the two sales given, one (2973) has purchased all three things and will get the discount, while the other (3092) has purchased only the shoe and the whip and won't get the discount.
In order to find out whether or not to apply the package-deal discount, I of course have to find out whether all the item_ids in a package deal are present in the purchases table for a given sale_id.
How do I do this?
I thought I should be able to do something like this:
SELECT deal_id, item_id, purchase_id
FROM package_deals
LEFT JOIN deals_items
USING (deal_id)
LEFT JOIN purchases
USING (item_id)
WHERE
sale_id = 2973
AND item_id IS NULL
GROUP BY deal_id
In my head, that retrieved all rows from the package_deal table where at least one of the item_ids associated with the package deal in question does not have a corresponding match in the purchases table for the sale_id given. This would then have told me which packages don't apply; i.e., it would return zero rows for purchase 2973 (since none of the items associated with package deal 1 are absent from the purchases table filtered on sale_id = 2973) and one row for 3092 (since one of the items associated with package deal one—namely the coat, item_id 76—is absent from the purchases table filtered on sale_id = 3092).
Obviously, it doesn't do what I naïvely thought it would—rather, it just always returns zero rows, no matter what.
It doesn't really matter much to me whether the resulting set gives me one row for each package deal that should apply, or one for each package deal that shouldn't apply—but how do I get it to show me either in a single query?
Is it even possible?
The problem with your query above is that sale_id is also NULL in the missing row that you're interested in, due to the LEFT JOIN.
This query will return the deal_id for any deals that DO NOT apply to a given order:
SELECT DISTINCT
pd.deal_id
FROM package_deals pd
JOIN deals_items di on pd.deal_id = di.deal_id
WHERE di.item_id NOT IN (SELECT item_id FROM purchases WHERE sale_id = 3092)
From that it's easy to work out the ones that do apply. Note that for a fully functioning system, you'd still need to take the purchase quantities into account - e.g. if the customer had bought 2 of two the items in the deal, but only 1 of the third... etc.
A SQL fiddle demonstrating the query is here: http://sqlfiddle.com/#!9/f2ae4/8
Note that I've made my joins using the ON syntax, as I'm simply more familiar than with USING. I expect that would work too if you prefer it.
Okay so I have a soccer website im building when a user signs up they get a team and and 6 different stadium to chose from. so I have a teams table:
----------------------------------------
| user_id | team_name | stadium_id |
----------------------------------------
| 1 | barca | 2 |
----------------------------------------
Then I decided to make the stadiums its own table
----------------------------------------------
| id | name | capacity | price |
----------------------------------------------
| 1 | Maracana | 50000 | 90000 |
------------------------------------------------
| 2 | Morombi | 80000 | 150000 |
------------------------------------------------
to get the teams arena name I would have to get the arena_id from the teams table and then fetch the arena name with the id. Now I don't think this is very efficient so I gave it some thought and I think the best solution is adding a pivot table like so:
| id | arena_id | team_id |
---------------------- ----------------
| 1 | 2 | 1
--------------------------------------|
| 2 | 1 | 2
--------------------------------------|
I usually think of pivot tables as tables for many to many relationships not one to one relationships. Is using a pivot table in this instance the best solution or should I just leave the implementation I'm currently using?
You don't need to use a pivot-table for this. It can either be a One-To-One or a One-To-Many relationship. It's a One-To-One if every user/team does only relate to one stadium (no stadium can be used by two teams). In a One-To-Many relationship multiple teams/users could use the same stadium, which might become necessary if you have thousands of users and start running out of stadiums.
A JOIN statement would be efficient and sufficient here.
SELECT s.name, t.team_name -- Get the team's and stadium's name
FROM team t -- From the team table
JOIN stadium s -- Join it with the stadium table
ON (t.stadium_id = s.id) -- Join ON the stadium_id
This will return the team name and stadium name of every registered team.
You might need to adjust the query, but you should be able to catch the grasp of it after reading the MySQL reference I linked above.
I apologise in advance if this might seem simple as my assignment needs to be passed in 2 hours time and I don't have enough time to do some further research as I have another pending assignment to be submitted tonight. I only know the basic MYSQL commands and not these types. And this is one of the final questions left unanswered and is making me go nuts even if i have already read the JOIN documentation . Please help.
Say I have 4 tables
_______________ _______________ _______________ _______________
| customers | | orders | | ordered_items | | products |
|_______________| |_______________| |_______________| |_______________|
|(pk)customer_id| | (pk)order_id | | (pk)id | |(pk)product_id |
| first_name | |(fk)customer_id| | (fk)order_id | | name |
| last_name | | date | |(fk)product_id | | description |
| contact_no | | | | quantity | | price |
|_______________| |_______________| |_______________| |_______________|
How would i be able to query all the products ordered by (eg: customer_id = '5')
I only know basic SQL like straight forward queries on 1 table and joins from 2 different, but since its 4 different tables having different relations to one another, how would i be able to get all the products ordered by a particular customer id?
Because its like get all the products from ordered products where order_id = (* orders by customer_id = 5).
But what can be an optimised and best practice way in doing this type of query
You only need to join 3 tables - orders, order_items, and products:
SELECT DISTINCT products.*
FROM products
JOIN order_items USING (product_id)
JOIN orders USING (order_id)
WHERE orders.customer_id = 35
As many have mentioned, you would do yourself a big favor by learning about table JOINS. There isn't much difference in the syntax between joining 2 tables to joining 4 or more.
SQLFiddle is a highly recommended resource for practicing and sharing your queries.
This is a comment because you appear to be new to SQL. You need to learn basic syntax for queries (which is why you are getting downvoted).
But you also ask about form. The data structure is actually pretty well laid out. I do have two comments. First, you should be consistent about how you name the id columns. For Ordered_Items, the id should be ordered_item_id.
Second, you should avoid using SQL special words for columns names and table names. Instead of date, use OrderDate.
Just after some opinions on the best way to achieve the following outcome:
I would like to store in my MySQL database products which can be voted on by users (each vote is worth +1). I also want to be able to see how many times in total a user has voted.
To my simple mind, the following table structure would be ideal:
table: product table: user table: user_product_vote
+----+-------------+ +----+-------------+ +----+------------+---------+
| id | product | | id | username | | id | product_id | user_id |
+----+-------------+ +----+-------------+ +----+------------+---------+
| 1 | bananas | | 1 | matthew | | 1 | 1 | 2 |
| 2 | apples | | 2 | mark | | 2 | 2 | 2 |
| .. | .. | | .. | .. | | .. | .. | .. |
This way I can do a COUNT of the user_product_vote table for each product or user.
For example, when I want to look up bananas and the number of votes to show on a web page I could perform the following query:
SELECT p.product AS product, COUNT( v.id ) as votes
FROM product p
LEFT JOIN user_product_vote v ON p.id = v.product_id
WHERE p.id =1
If my site became hugely successful (we can all dream) and I had thousands of users voting on thousands of products, I fear that performing such a COUNT with every page view would be highly inefficient in terms of server resources.
A more simple approach would be to have a 'votes' column in the product table that is incremented each time a vote is added.
table: product
+----+-------------+-------+
| id | product | votes |
+----+-------------+-------+
| 1 | bananas | 2 |
| 2 | apples | 5 |
| .. | .. | .. |
While this is more resource friendly - I lose data (eg. I can no longer prevent a person from voting twice as there is no record of their voting activity).
My questions are:
i) am I being overly worried about server resources and should just stick with the three table option? (ie. do I need to have more faith in the ability of the database to handle large queries)
ii) is their a more efficient way of achieving the outcome without losing information
You can never be over worried about resources, when you first start building an application you should always have resources, space, speed etc. in mind, if your site's traffic grew dramatically and you never built for resources then you start getting into problems.
As for the vote system, personally I would keep the votes like so:
table: product table: user table: user_product_vote
+----+-------------+ +----+-------------+ +----+------------+---------+
| id | product | | id | username | | id | product_id | user_id |
+----+-------------+ +----+-------------+ +----+------------+---------+
| 1 | bananas | | 1 | matthew | | 1 | 1 | 2 |
| 2 | apples | | 2 | mark | | 2 | 2 | 2 |
| .. | .. | | .. | .. | | .. | .. | .. |
Reasons:
Firstly user_product_vote does not contain text, blobs etc., it's purely integer so it takes up less resources anyways.
Secondly, you have more of a doorway to new entities within your application such as Total votes last 24 hr, Highest rated product over the past 24 hour etc.
Take this example for instance:
table: user_product_vote
+----+------------+---------+-----------+------+
| id | product_id | user_id | vote_type | time |
+----+------------+---------+-----------+------+
| 1 | 1 | 2 | product |224.. |
| 2 | 2 | 2 | page |218.. |
| .. | .. | .. | .. | .. |
And a simple query:
SELECT COUNT(id) as total FROM user_product_vote WHERE vote_type = 'product' AND time BETWEEN(....) ORDER BY time DESC LIMIT 20
Another thing is if a user voted at 1AM and then tried to vote again at 2PM, you can easily check when the last time they voted and if they should be allowed to vote again.
There are so many opportunities that you will be missing if you stick with your incremental example.
In regards to your count(), no matter how much you optimize your queries it would not really make a difference on a large scale.
With an extremely large user-base your resource usage will be looked at from a different perspective such as load balancers, mainly server settings, Apache, catching etc., there's only so much you can do with your queries.
If my site became hugely successful (we can all dream) and I had thousands of users voting on thousands of products, I fear that performing such a COUNT with every page view would be highly inefficient in terms of server resources.
Don't waste your time solving imaginary problems. mysql is perfectly able to process thousands of records in fractions of a second - this is what databases are for. Clean and simple database and code structure is far more important than the mythical "optimization" that no one needs.
Why not mix and match both? Simply have the final counts in the product and users tables, so that you don't have to count every time and have the votes table , so that there is no double posting.
Edit:
To explain it a bit further, product and user table will have a column called "votes". Every time the insert is successfull in user_product_vote, increment the relevant user and product records. This would avoid dupe votes and you wont have to run the complex count query every time as well.
Edit:
Also i am assuming that you have created a unique index on product_id and user_id, in this case any duplication attempt will automatically fail and you wont have to check in the table before inserting. You will just to make sure the insert query ran and you got a valid value for the "id" in the form on insert_id
You have to balance the desire for your site to perform quickly (in which the second schema would be best) and the ability to count votes for specific users and prevent double voting (for which I would choose the first schema). Because you are only using integer columns for the user_product_vote table, I don't see how performance could suffer too much. Many-to-many relationships are common, as you have implemented with user_product_vote. If you do want to count votes for specific users and prevent double voting, a user_product_vote is the only clean way I can think of implementing it, as any other could result in sparse records, duplicate records, and all kinds of bad things.
You don't want to update the product table directly with an aggregate every time someone votes - this will lock product rows which will then affect other queries which are using products.
Assuming that not all product queries need to include the votes column, you could keep a separate productvotes table which would retain the running totals, and keep your userproductvote table as a means to enforce your user voting per product business rules / and auditing.