Mysql - XML like table with multiple values in same column - mysql

My question title is confusing, sorry about that.
I have one application that saves data in the database in XML'ish format, referencing keys and values.
The problem is that I have only one column with several values that corresponds to a certain key.
I need to have certain keys as columns but I am failing miserably to achieve that:
Below a sample of the table I have
xml_type | xml_key | xml_content_key | xml_content_value
----------------------------------------------------------------------------------------------------
Archiv::144 | 144 | [1]{'Version'}[1]{'Carrier'}[1]{'Content'} | 151
Archiv::144 | 144 | [1]{'Version'}[1]{'CarrierID'}[1]{'Content'} | 5714141614
Archiv::144 | 144 | [1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'} | 145
So, I can run this and have all the carriers:
select xml_content_key as Carrier, xml_content_value as 'Carrier Result'
from xml_storage xs where xml_content_key LIKE '%[1]{\'Version\'}[1]{\'Carrier\'}[1]{\'Content\'}%'
But how would I other keys from column xml_content_key to be shown as coluns.
I have tried nested selects but got "Returns more than one value", a join would not apply since this is on a single table.
In short, I would like to run a query to gather a few keys from column xml_content_key and have each in a new column.
Thank you.

Without knowing the schema of either your table or your XML document I'll have to make some assumptions. But I think this isn't too hard. First I'll write out the assumptions I'm making. Please correct me if these assumptions are wrong.
It seems like you have a table in which xml_content_key is what should be the column name, and xml_key is what should be the row identifier. You only showed a very limited sample in your question, but my assumption would suggest that more data might look like this.
xml_type | xml_key | xml_content_key | xml_content_value
----------------------------------------------------------------------------------------------------
Archiv::144 | 144 | [1]{'Version'}[1]{'Carrier'}[1]{'Content'} | 151
Archiv::144 | 144 | [1]{'Version'}[1]{'CarrierID'}[1]{'Content'} | 5714141614
Archiv::144 | 144 | [1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'} | 145
Archiv::144 | 145 | [1]{'Version'}[1]{'Carrier'}[1]{'Content'} | 123
Archiv::144 | 145 | [1]{'Version'}[1]{'CarrierID'}[1]{'Content'} | 4567891234
Archiv::144 | 145 | [1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'} | 567
Archiv::144 | 146 | [1]{'Version'}[1]{'Carrier'}[1]{'Content'} | 891
Archiv::144 | 146 | [1]{'Version'}[1]{'CarrierID'}[1]{'Content'} | 2345678912
Archiv::144 | 146 | [1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'} | 345
And I think you're trying to write a query to reorganize it like this.
+---------+---------+------------+-------------------+
| xml_key | Carrier | CarrierID | CustomerInterface |
+---------+---------+------------+-------------------+
| 144 | 151 | 5714141614 | 145 |
| 145 | 123 | 4567891234 | 567 |
| 146 | 891 | 2345678912 | 345 |
+---------+---------+------------+-------------------+
If I'm wrong about this part then there's no point in reading on. But if I'm right so far, then I'd like to highlight a quote from your question.
a join would not apply since this is on a single table.
You have been missing out on a great feature of SQL: self joins are extremely useful in cases like this.
It appears that there are three "content keys" (or columns) for each xml_key (or row). We will join together all the xml_content_key's that share the same xml_key, so that each row will describe a single xml_key. By the way, I'm assuming your table is named xml_storage.
SELECT xs1.xml_key AS 'xml_key',
xs1.xml_content_value AS 'Carrier',
xs2.xml_content_value AS 'CarrierID',
xs3.xml_content_value AS 'CustomerInterface'
FROM xml_storage xs1
INNER JOIN xml_storage xs2 ON xs2.xml_key = xs1.xml_key
INNER JOIN xml_storage xs3 ON xs3.xml_key = xs1.xml_key
WHERE xs1.xml_content_key LIKE "%[1]{'Version'}[1]{'Carrier'}[1]{'Content'}%"
AND xs2.xml_content_key LIKE "%[1]{'Version'}[1]{'CarrierID'}[1]{'Content'}%"
AND xs3.xml_content_key LIKE "%[1]{'Version'}[1]{'CustomerInterface'}[1]{'Content'}%"
The basic idea here is that we separate the table into three tables and then put them back together. We put the Carriers in xs1, the CarrierIDs in xs2, and the CustomerInterfaces in xs3. Then we join these back together, putting all of the content associated with a particular xml_key on the same row.
You will probably need to alter this to fit your actual schema. In particular, this query assumes that you have exactly one Carrier, CarrierID, and CustomerInterface per unique xml_key. I am confident that this general approach will work if your data is anything like I've been assuming, but imperfect data would necessitate a more robust query than the example I've given here.
If you can share more details about your particular schema, I would be happy to edit my suggested query to fit your situation.

Related

MySQL: How to find a value that may exist across multiple columns

I currently have a table that stores rankings of music sales in a shop:
|-date-------|-rank_1---|-rank_2---|-...
| 2015-06-30 | 112 | 145 | ...
| 2015-07-31 | 145 | 147 | ...
| ...
| ...
Each number in the rank_# column is a foreign key that references an album in a separate table:
|-album_id---|-album_name----|-...
| 112 | An Album | ...
| 145 | Another Album | ...
| ...
I want to implement a feature where I can search for an album and see its ranking across the dates. However, the album_id can show up in any of the rank_# columns and I'd like to know if there was any way that I could "invert" the tables so I get a result like:
SELECT * FROM table WHERE ....
=> |-date-------|-column-----|
| 2015-06-30 | rank_2 |
| 2015-07-31 | rank_1 |
| ...
Now, the brute-force method I can think of is just to loop through the table and look at each cell in the table, but seeing as how the table is quite large, I was wondering if there was a more efficient method of doing this.
Thanks for the help everyone! It seems like I was structuring the tables incorrectly and instead should have set it up like:
|-date-------|-album_id-|-rank--|
| 2015-06-30 | 112 | 1 |
| 2015-07-31 | 145 | 2 |
| ...
| ...
Wolfgang, here's my approach:
Albums Table:
album_id;album_name
1;ska
2;psychobilly
3;punk
4;nu-metal
Rankings Table:
id;date;rank_1;rank_2
1;2016-08-01;1;2
2;2016-08-02;2;1
4;2016-08-03;2;4
Suggested Query:
select r.date,a.album_id,a.album_name,r.rank_1,r.rank_2,
case
when a.album_id= r.rank_1 then "rank_1"
when a.album_id= r.rank_2 then "rank_2"
end as "rank"
from albums a
inner join rankings r on (r.rank_1=a.album_id or r.rank_2=a.album_id)
where a.album_name like '%psychobilly%'
Results:
date;album_id;album_name;rank_1;rank_2;rank
2016-08-01;2;psychobilly;1;2;rank_2
2016-08-02;2;psychobilly;2;1;rank_1
2016-08-03;2;psychobilly;2;4;rank_1
Explanation:
The last column of the query will contain the position in the ranking, according to the value of the album id, between "rank_1" and "rank_2" columns.
Feel free to try, this may help you...

Table design for a dictionary that can have words with many different spellings

I'm working on a small, personal dictionary database in Microsoft Access (the 2013 version). There are a lot of words in English that have two or even more spellings. Realistically speaking though, there are not that many words with three, let alone, four spellings. Nevertheless, they do exist. Examples include aerie/aery/eyrie/eyry (a word with four spellings) and ketchup/catsup/catchup (a word with three spellings). Not to mention that English is literally rife with words that have two spellings. Everybody knows that (the differences between the English and British spelling systems come immediately to mind). So, I need to design my tables in such a way that there are no significant flaws with the design. I'm going to explain step by step what the database should look like and introduce the problems I have found with my current design along the way. So, here we go.
All words, obviously, should be stored in the same table. And I'm not going to include irrelevant aspects of the design such as other columns that might be part of the table (in reality, the database is much more complex). Let's focus on the most important parts. Here's what the Words table with some pre-filled sample data will look like:
+---------+-----------+
| word_id | word |
+---------+-----------+
| 1 | ketchup |
| 2 | catsup |
| 3 | catchup |
| 4 | moneyed |
| 5 | monied |
| 6 | delicious |
+---------+-----------+
To keep track of a group of words that are the same, but just have different spellings, it is probably wise to choose one of them as the main word and the other ones as its child words. Here's the diagram to show you how I envision that (here, ketchup and moneyed are main words, all the others child words):
All this information will be placed in a new table which we shall call the Alternative Spellings table (The columns word_id and alt_spell_word_id are going to be part of the table's compound primary key):
+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
| 1 | 2 |
| 1 | 3 |
| 4 | 5 |
+---------+-------------------+
Here's how all this looks in Access's Relationships panel (notice that I have enforced referential integrity between the word_id column of the Words table and the word_id column of the Alternative Spellings table and checked off the Cascade Delete Related Records option):
Although straight-forwardly simple, that's the only design I've been able to come up with so far. And I think that will basically do it. This is as simple as it gets. The problem with this design, however, is threefold:
1: This is not a serious problem, but I'd still like to hear your thoughts anyway. Every time I'm making a lookup of a word to see it in the Word Details form, I have to go through the entire Alternative Spellings table to see if it has other spellings associated with it or if it is a child word. So, I'd have to search both the word_id and alt_spell_word_id columns. And this process will be talking place for each and every word in the database every time I want to check the details of it. One possible solution is in the Words table to create an additional Boolean column that will keep track of whether a word has alternative spellings. This will indicate if we should scan the Alternative Spellings table at all when opening it up in the Word Details form. Here's what this would look like:
+---------+-----------+------------------+
| word_id | word | has_alt_spelling |
+---------+-----------+------------------+
| 101 | ketchup | yes |
| 102 | catsup | no |
| 103 | catchup | no |
| 104 | moneyed | yes |
| 105 | monied | no |
| 106 | delicious | no |
+---------+-----------+------------------+
I think that's a good design, but, as I said, I'd very much like to hear what you've got to say about this: a problem/not a problem? Your solution?
2: The other problem, which is of more serious nature, has to do with primary keys. word_id and alt_spell_word_id should be part of a compound primary key, of course. We don't want duplicate rows in the table. We all understand that. Not a problem. But here's what happens when we try to enforce referential integrity between the Words table and Alternative Spellings table (see the screenshot above). Everything is fine except that now we can associate a word with the id of a nonexistent word and the database is not going to complain because, for example, the last record in word_id has 4 in it, which is true, we do have a record with the id of 4 in the Words table, but there is no way to impose any kind of constraint on the alt_spell_word_id column. We can put any kind of nonsense in there:
+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
| 1 | 2 |
| 1 | 3 |
| 4 | 5 |
| 4 | 34564 |
+---------+-------------------+
I think that breaks the referential integrity of the database schema and thus is a serious problem. What kind of solution would you like to offer?
3: Another problem with this design is that if we want to delete a certain word from the Words table, the deletion will cascade through the Alternative Spellings table and delete all related records there, which is perfectly fine, but here's the catch: since we agreed that different words in the database can actually be just one word with different spellings, they all should be deleted along with the main word. But that's not going to happen as things stand at the moment. For instance, if I were to delete ketchup in the Words table, all related records in the Alternative Spellings table would be deleted. Fine. But we'd really get two dangling records, catchup and catsup—they can't exist on their own because they are part of the group where ketchup is the main word, but now it has been deleted:
+---------+-----------+
| word_id | word |
+---------+-----------+
| 2 | catsup |
| 3 | catchup |
| 4 | moneyed |
| 5 | monied |
| 6 | delicious |
+---------+-----------+
+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
| 4 | 5 |
+---------+-------------------+
Here's the actual database (simplified version) if you want to play with it.
Thank you all in advance.
1) For 1, if you add indexes to the database, it is probably not a big concern (since your look-ups of a word then joining to get the alternate words will be fast). However, if a child word can have only one parent, then you do not need an additional table:
The word table can just be:
+---------+-----------+------------------+
| word_id | word | parent_word_id |
+---------+-----------+------------------+
| 101 | ketchup | |
| 102 | catsup | 101 |
| 103 | catchup | 101 |
| 104 | moneyed | |
| 105 | monied | 104 |
| 106 | delicious | |
+---------+-----------+------------------+
A query for a word and its children would then be:
select wordGroup.word
from word w join word wordGroup on
(w.word_id = wordGroup.parent_word_id
or wordGroup.word_id = w.word_id)
where w.word = {your_word};
A query for a word and associated words regardless of whether it was the child word or not would be:
select wordGroup.word
from word w join word wordGroup on
(w.word_id = wordGroup.parent_word_id
or wordGroup.word_id = w.word_id)
where wordGroup.word_id = {your_word};
2 The right way of doing this is to place a foreign key constraint (referential constraint) on the tables. In my example for 1, the parent_word_id would have a referential constraint back to word(word_id). For your example, alt_spell_word_id would have a referential constraint back to the word table and the word_id. You could then place a unique constraint on the combination of word_id and alt_spell_id. See (on Access constraints): https://msdn.microsoft.com/en-us/library/bb177889(v=office.12).aspx
3 I think deletion of a primary word has a meaning problem in your design. What does it mean to delete the primary word and keep the grouping? In theory, you would have to do a series of operations: 1-decide on a new primary word; 2-delete the old one. This would be true of almost any design including a primary word.
Another option, is to not have a primary word but to have groups. This alters the db design from a one-to-many relationship between primary word and other words to a many-to-many between words. In this case, deletion is easy because you just cascade all associations to the word out of the word_groups table.
The resulting tables would be:
word:
+---------+-----------+
| word_id | word |
+---------+-----------+
| 101 | ketchup |
| 102 | catsup |
| 103 | catchup |
| 104 | moneyed |
| 105 | monied |
| 106 | delicious |
+---------+-----------+
word_groups:
+---------+-----------+
| word_id |sibling_word_id
+---------+-----------+
| 101 | 102 |
| 101 | 103 |
| 102 | 101 |
| 102 | 103 |
| 103 | 101 |
| 103 | 102 |
| 104 | 105 |
| 105 | 104 |
+---------+-----------+
Foreign key constraints protect referential integrity while indexes will make look-ups fast.
I think that I would use a model in which another table defines word_spelling_groups, so that for every word that can mean the same as "ketchup", there is an entry in this table with the same value of word_spelling_group as "ketchup"s value of word_spelling_group.
An advantage of this would be that a word can be a member of multiple spelling groups, in case it had alternative spellings only in the context of a particular meaning (I struggle for an example).

Sum query for MySQL where field contain certain values

I need help with a Query, i have a table like this:
| ID | codehwos |
| --- | ----------- |
| 1 | 16,17,15,26 |
| 2 | 15,32,12,23 |
| 3 | 53,15,21,26 |
I need an outpout like this:
| codehwos | number_of_this_code |
| -------- | ---------------------- |
| 15 | 3 |
| 17 | 1 |
| 26 | 2 |
I want to sum all the time a code is used in a row.
Can anyone make a query for doing it for all the code in one time?
Thanks
You have a very poor data format. You should not store lists in strings and never store lists of numbers in strings. SQL has a great data structure for storing lists. Hint: it is called a "table" not a "string".
That said, sometimes one is stuck with other people's really poor design choices. We wouldn't make them ourselves, but we still need to get something done. Assuming you have a list of codes, you can do what you want with:
select c.code, count(*)
from codes c join
table t
on find_in_set(c.code, t.codehwos) > 0
group by c.code;
If you have any influence over the data structure, then advocate for a junction table, the right way to store this data in a relational database.

Join through bridging table with one to many relationship

I have a Products table, which maps to a bridging table that contains all of the items said product comprises of, so Product A could be comprised of several items or just one. (OTM)
The product_sub_item_bridge looks like this,
+------------+----------------------------+
| product_id | client_sub_product_item_id |
+------------+----------------------------+
| 137 | 332 |
| 138 | 333 |
| 139 | 334 |
| 140 | 332 |
| 140 | 335 |
+------------+----------------------------+
So say a client orders product 140, items 332 and 335 will be inserted into a table called client_sub_products which houses the relationship to the order and the items themselves that are stored in the client_sub_product_items table.
What I would like to do now is get all of the client_sub_products, group them by the client_order_id and maybe GROUP_CONCAT() the id's, and somehow join the Products table onto it via the bridging table, so that I can get a list containing the COUNT(), for all of theProducts that are comprised of those exact client_sub_product_items. Like so...
+--------------+---------------------+
| product_name | count(product_name) |
+--------------+---------------------+
| Product A | 15 |
| Product B | 25 |
+--------------+---------------------+
Here is what I have thus far,
SELECT GROUP_CONCAT(`client_sub_products`.`client_sub_product_item_id`) FROM `client_sub_products` LEFT JOIN `client_sub_product_items` ON `client_sub_product_items`.`id` = `client_sub_products`.`client_sub_product_item_id` GROUP BY `client_sub_products`.`client_order_id` ORDER BY `client_sub_products`.`client_order_id` ASC;
I can't seem to get past the bridging table, I am not sure how I can join the client_sub_product_items onto the Products through the bridging table, because there are products that have more than one client_sub_product_item related to it, I seem to be confusing myself there.
I hope I have explained myself adequately, and not just confused everyone... please let me know if I should try clarify anything mentioned above.
Why associate the subproducts to the client if the subproducts are not orderable entities? Why not just assign the products to the client, as it would be easy to derive the subprodcuts for each client from that? As it stands right now you have no definitive way to determine whether subproduct 332 from your example is related to product 137 or 140.

Is it worth normalizing?

I am studying about databases and I have encountered this question.If I have for example the table product_supply which containts Invoice_Id(pk),Product_Id(pk),Date_Of_Supply,Quantity and Value_Of_Product.
| Invoice_ID | Product_ID | Date_Of_Supply | Quantity | Value_Of_Product |
-------------------------------------------------------------------------
| AA111111111| 5001 | 08-07-2013 | 50 | 200$ |
| AA111111111| 5002 | 08-07-2013 | 20 | 300$ |
| BB222222222| 5003 | 10-09-2013 | 70 | 50$ |
| CC333333333| 5004 | 15-10-2013 | 100 | 40$ |
| CC333333333| 5005 | 15-10-2013 | 70 | 25$ |
| CC333333333| 5006 | 15-10-2013 | 100 | 30$ |
As we Can see The table is already in the 1NF form.My question here is.In terms of normalization if it is wise to normalize this table to a 2NF form and have another table for example supply_date with Invoice_ID(pk) and Date_Of_Supply or if having the upper table is ok?
| Invoice_ID | Date_Of_Supply |
-------------------------------
|AA111111111 | 08-07-2013 |
|BB222222222 | 10-09-2013 |
|CC333333333 | 15-10-2013 |
It's definitely worth normalizing. If you need to modify a supply date, with 1NF, you need to update several records; with 2NF, you only need to update one record. Also, note the redundancy of data in 1NF, where the supply date is stored multiple times for each invoice id. Not only does it waste space, it makes it harder to process a query like "list all invoices that were supplied between dates X and Y".
EDIT
As Robert Harvey points out in his comments (which it took me a while to understand because I was being thick for some reason), if you already have a table that has a single row for each Invoice_ID (say, an "invoice table"), then you should probably add a column for Date_Of_Supply to that table rather than create a new table.
Changing the table to second normal form involves removing redundancies in the first normal form table. The first question is to determine whether there are even any redundancies.
If a redundancy exists, then we should be able to create a second table which does NOT involve the primary key (Invoice_ID) of the first one. Based on the non PK columns in the first table (namely Product_ID, Date_Of_Supply, Quantity, and Value_Of_Product), it is not clear that any of these are dependent on each other.
As a general rule of thumb, if you have a table where all non PK columns are dependent solely on the PK column of that table, it is already in 2NF.