How to split CSVs from one column to rows in a new table in MSSQL 2008 R2 - sql-server-2008

Imagine the following (very bad) table design in MSSQL2008R2:
Table "Posts":
| Id (PK, int) | DatasourceId (PK, int) | QuotedPostIds (nvarchar(255)) | [...]
| 1 | 1 | | [...]
| 2 | 1 | 1 | [...]
| 2 | 2 | 1 | [...]
[...]
| 102322 | 2 | 123;45345;4356;76757 | [...]
So, the column QuotedPostIds contains a semicolon-separated list of self-referencing PostIds (Kids, don't do that at home!). Since this design is ugly as a hell, I'd like to extract the values from the QuotedPostIds table to a new n:m relationship table like this:
Desired new table "QuotedPosts":
| QuotingPostId (int) | QuotedPostId (int) | DatasourceId (int) |
| 2 | 1 | 1 |
| 2 | 1 | 2 |
[...]
| 102322 | 123 | 2 |
| 102322 | 45345 | 2 |
| 102322 | 4356 | 2 |
| 102322 | 76757 | 2 |
The primary key for this table could either be a combination of QuotingPostId, QuotedPostId and DatasourceID or an additional artificial key generated by the database.
It is worth noticing that the current Posts table contains about 6,300,000 rows but only about 285,000 of those have a value set in the QuotedPostIds column. Therefore, it might be a good idea to pre-filter those rows. In any case, I'd like to perform the normalization using internal MSSQL functionality only, if possible.
I already read other posts regarding this topic which mostly dealt with split functions but neither could I find out how exactly to create the new table and also copying the appropriate value from the Datasource column, nor how to filter the rows to touch accordingly.
Thank you!
€dit: I thought it through and finally solved the problem using an external C# program instead of internal MSSQL functionality. Since it seems that it could have been done using Mikael Eriksson's suggestion, I will mark his post as an answer.

From comments you say you have a string split function that you you don't know how to use with a table.
The answer is to use cross apply something like this.
select P.Id,
S.Value
from Posts as P
cross apply dbo.Split(';', P.QuotedPostIds) as S

Related

MS Access help needed forming a specific report

I have a table with a column for agent names and a column for each of the skills those agents could possibly have. Each skill the agent is assigned shows a 1 in the field under that skill.
Columns look like this:
+---------+----------+----------+----------+
| Name | 'Skill1' | 'Skill2' | 'Skill3' |
+---------+----------+----------+----------+
| John | 1 | | 1 |
| Sam | 1 | 1 | |
| Roberta | 1 | | 1 |
+---------+----------+----------+----------+
I would like to make a query that returns a list of all agent names that have a 1 for each particular skill. The query would return something like this:
+-----------+
| Skill 1 |
+-----------+
| John |
| Sam |
| Roberta |
+-----------+
Additionally I would like to be able to query a single name and retrieve all skills that agent has (all rows the Name column has a 1 in) like this:
+-----------+
| John |
+-----------+
| Skill 1 |
| Skill 3 |
+-----------+
I've done this in Excel using an index but I'm new to Access and not sure how to complete this task.
Thanks in advance.
One of the reasons that you are finding this task difficult is because your database is not normalised and so due to the way that your database is structured, you are working against MS Access, not with it.
Consequently, whilst a solution is still possible with the current data, the resulting queries will be painful to construct and will either be full of multiple messy iif statements, or several union queries performing the same operations over & over again, one for each 'skill'.
Then, if you every wish to add another Skill to the database, all of your queries have to be rewritten!
Whereas, if your database was normalised (as Gustav has suggested in the comments), the task would be a simple one-liner; and what's more, if you add a new skill later on, your queries will automatically output the results as if the skill had always been there.
Your data has a many-to-many relationship: an agent may have many skills, and a skill may be known by many agents.
As such, the most appropriate way to represent this relationship is using a junction table.
Hence, you would have a table of Agents such as:
tblAgents
+-----+-----------+----------+------------+
| ID | FirstName | LastName | DOB |
+-----+-----------+----------+------------+
| 1 | John | Smith | 1970-01-01 |
| ... | ... | ... | ... |
+-----+-----------+----------+------------+
This would only contain information unique to each agent, i.e. minimising the repeated information between records in the table.
You would then have a table of possible Skills, such as:
tblSkills
+-----+---------+---------------------+
| ID | Name | Description |
+-----+---------+---------------------+
| 1 | Skill 1 | Skill 1 Description |
| 2 | Skill 2 | Skill 2 Description |
| ... | ... | ... |
+-----+---------+---------------------+
Finally, you would have a junction table linking Agents to Skills, e.g.:
tblAgentSkills
+----+----------+----------+
| ID | Agent_ID | Skill_ID |
+----+----------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 2 |
+----+----------+----------+
Now, say you want to find out which agents have Skill 1, the query is simple:
select Agent_ID from tblAgentSkills where Skill_ID = 1
What if you want to find out the skills known by an agent? Equally as simple:
select Skill_ID from tblAgentSkills where Agent_ID = 1
Of course, these queries will merely return the ID fields as present in the junction table - but since the ID uniquely identifies a record in the tblAgents or tblSkills tables, such ID is all you need to retrieve any other required information:
select
tblAgents.FirstName,
tblAgents.LastName
from
tblAgentSkills inner join tblAgents on
tblAgentSkills.AgentID = tblAgents.ID
where
tblAgentSkills.Skill_ID = 1
To get all agents with skill1, open the query designer and create the following query:
this will generate the following sql
SELECT Skills.AgentName
FROM Skills
WHERE (((Skills.Skill1)=1));
If you adjust the names you can also paste this query into the sql pane of the designer to get the query you want.
To get all the skills an agent has I chose a parameterized query. Open the query designer and create a new query:
When you run this query it will ask you for the name of the agent. Make sure to type the agent name exactly. Here is the resulting sql:
SELECT Skills.AgentName, Skills.Skill1, Skills.Skill2, Skills.Skill3
FROM Skills
WHERE (((Skills.AgentName)=[Agent]));
If you continue working with this query I would improve the table design by breaking your table into a skills table, agents table, skills&agents table. Then link the skills and agents tables to the skills&agents table in a many to many relationship. The query to get all an agents skills would then look like this in the designer:

MySQL - At what point should more than one table be used?

Edit for future viewers: Aside from the accepted answer which helped me I found some really good info here .
I've got a database with a single table for displaying inventory on a website (RVs). It stores the typical info: year, make, model, etc. I originally made it with 6 extra columns for storing "special features", but I don't like having such a hard limit on what options can be listed. Since I've never messed with more than a single table my gut instinct was to just add 24 or so more columns to cover everything, but something in my head told me that there might be a better way. So when do I decide N columns is too many? The data in these columns will commonly not be unique.
(Sorry for crappy diagram)
Current table design:
-----------------------------------------------------------------------
| id | year | make | model | price | ft_1 | ft_2 | ft_3 | ft_4 | ft_5 |
-----------------------------------------------------------------------
| | | | | | | | | | |
-----------------------------------------------------------------------
Possible better design:
table #1
------------------------------------
| id | year | make | model | price |
------------------------------------
| | | | | |
------------------------------------
table #2
---------------------------------------------
| unique_id(?) | feature | unit_ref |
---------------------------------------------
| 0 | "Diesel Pusher" | 2,6,14 |
---------------------------------------------
I feel like a bonus of the second table might be that I could more easily propagate a dropdown containing all the previously entered features to speed up adding new units to inventory.
Is this the right way to go about it, or should I just add more columns and be content?
Thanks.
Believe it or not, your best option would likely be to add a third table.
Since each record in your rvs table can be linked to multiple rows in the features table, and each feature can correspond to multiple rvs, you have a many-to-many relationship which is inherently difficult to maintain in a relational dbms. By adding a third "intersection" table you convert it to a one-to-many-to-one relationship which can be enforced declaratively by the dbms.
Your table structure would then become something like
rvs
------------------------------------
| id | year | make | model | price |
------------------------------------
| | | | | |
------------------------------------
features
--------------------------
| id | feature |
--------------------------
| 1192 | "Diesel Pusher" |
--------------------------
rv_features
----------------------
| rv_id | feature_id |
----------------------
| | |
----------------------
How do you make use of this? Suppose you want to record the fact that the 2016 Travelmore CampMaster has a 25kW diesel generator. You would first add a record to rvs like
--------------------------------------------------
| id | year | make | model | price |
--------------------------------------------------
| 0231 | 2016 | Travelmore | CampMaster | 750000 |
| 2101 | 2016 | Travelmore | Domestant | 650000 |
--------------------------------------------------
(Note the value in the id column is entirely arbitrary; its sole purpose is to serve as the primary key which uniquely identifies the record. It can encode meaningful information, but it must be something that will not change throughout the life of the record it identifies.)
You then add (or already have) the generator in the features table:
--------------------------------
| id | feature |
--------------------------------
| 1192 | Diesel Pusher 450hp |
| 3209 | diesel generator 25kW |
--------------------------------
Finally, you associate the rv to the feature with a record in rv_features:
----------------------
| rv_id | feature_id |
----------------------
| 0231 | 3209 |
| 0231 | 1192 |
| 2101 | 3209 |
----------------------
(I've added a few other records to each table for context.)
Now, to retrieve the features of the 2016 CampMaster, you use the following SQL query:
SELECT r.year, r.make, r.model, f.feature
FROM rvs r, features f, rv_features rf
WHERE r.id = rf.rv_id
AND rv.feature_id = f.id
AND r.id = '2031';
to get
----------------------------------------------------------
| year | make | model | feature |
----------------------------------------------------------
| 2016 | Travelmore | CampMaster | diesel generator 25kW |
| 2016 | Travelmore | CampMaster | Diesel Pusher 450hp |
----------------------------------------------------------
To see the rvs with a 25kW generator, change the query to
SELECT r.year, r.make, r.model, f.feature
FROM rvs r, features f, rv_features rf
WHERE r.id = rf.rv_id
AND rv.feature_id = f.id
AND f.id = '3209';
Sherantha's link to A Quick-Start Tutorial on Relational Database Design actually looks like a good intro to table design and normalization; you might find it useful.
There is a thing calles "third normal form" it says that everything without the unique ids shuld be unique. This means you need to make a table for year, a table for make a table for models etc and a table where you can combine all these ids to one connected dataset.
But this is not always practical, io think the best way to take this is something in between, like tables for entrys that repeat very often, but there dont need to be an extra table for price with unique ids, that would be overkill i think.
Based upon your scenario, if you believe no. of features columns remain same then no need for second table. And in case if there any possibility that features can be increased at any time in future then you should break up your table into two. (RVS & Features). Then create a third table that identify RVS & features as it seems there is many-to-many relationship. So I suggest you to use three tables.
I think it is better for you to be more familiar with relational database design. This is a short but great article I have found earlier.

Sum query for MySQL where field contain certain values

I need help with a Query, i have a table like this:
| ID | codehwos |
| --- | ----------- |
| 1 | 16,17,15,26 |
| 2 | 15,32,12,23 |
| 3 | 53,15,21,26 |
I need an outpout like this:
| codehwos | number_of_this_code |
| -------- | ---------------------- |
| 15 | 3 |
| 17 | 1 |
| 26 | 2 |
I want to sum all the time a code is used in a row.
Can anyone make a query for doing it for all the code in one time?
Thanks
You have a very poor data format. You should not store lists in strings and never store lists of numbers in strings. SQL has a great data structure for storing lists. Hint: it is called a "table" not a "string".
That said, sometimes one is stuck with other people's really poor design choices. We wouldn't make them ourselves, but we still need to get something done. Assuming you have a list of codes, you can do what you want with:
select c.code, count(*)
from codes c join
table t
on find_in_set(c.code, t.codehwos) > 0
group by c.code;
If you have any influence over the data structure, then advocate for a junction table, the right way to store this data in a relational database.

One to one relationship for same relationship to different tables

I creating a database in which I have an artefact that can be associated with either a project, production or performance. I will call the relationship 'comes_from'. This relationship can be a project or a more specific version of a project such as a production or performance.
I don't want to have separate foreign keys on my artefact for each possible value of the 'comes_from' relationship as it feels wrong to have multiple attributes for the same relationship. The only way I can think of doing this is having a separate table that stores the comes_from relationship containing the id of the referenced project or more specific version along with the table the item is located in.
artefact table
+-------------+------------+
| artefact_id | comes_from | -- Foreign key to comes_from
+-------------+------------+
| 1 | 7 |
| 2 | 8 |
+-------------+------------+
comes_from table
+---------------+-----------------+---------------------------------+
| comes_from_id | comes_from (FK) | comes_from_table (FK table) |
+---------------+-----------------+---------------------------------+
| 7 | 19 | project |
| 8 | 13 | performance |
| 9 | 21 | production |
+---------------+-----------------+---------------------------------+
project table
+-------------+
| project_id |
+-------------+
| 19 |
| 20 |
+-------------+
performance table
+-----------------+
| performance_id |
+-----------------+
| 13 |
| 14 |
+-----------------+
production table
+---------------+
| production_id |
+---------------+
| 21 |
| 22 |
+---------------+
Is there a better way to do this as I am not sure I can even resolve this relationship in a SQL query and it may cause issues when I use Doctrine as an ORM on top of this database.
Your solution is good, the "comes_from_table" column could be a simple VARCHAR or INT indexed field acting as a discriminator field. However, I would remove the "comes_from" column from the "artefact" table and the "comes_from_id" column and use directly the "artefact_id" column to reference artefacts in the relationship table.
Regarding Doctrine there shouldn't be any problem, I did something similar in the past using Symfony2 and Doctrine2 for an entity called Tags where a Tag could either belong to a contact or to a contact spouse. I also created a function in the repository file where I could pass the "tag_type" as a parameter so that I could get either the contact or the contact spouse tags.

mysql: how to split list field

I have a table which only contains id and a field whose data is a list of data. e.g.
--------------
| id | data |
| 1 | a,b,c,d|
| 2 | a,b,k,m|
---------------
I guess it's not a good design that put a list data in a field, so I want to know how can I redesign it?
As per me you need two tables i.e. Master and Transaction tables only when some details are gonna be same for every records and some are gonna be changing. In your case if there are not any other thing related to your id field is gonna be same you can carry on with one table and with following structure.
--------------
| id | data |
| 1 | a |
| 1 | b |
| 1 | c |
| 1 | d |
| 2 | a |
| 2 | b |
| 2 | k |
| 2 | m |
---------------
BUT if there are any other things related to the id fields that is gonna be same for same id records you will have to use two tables.
like following case. there are 3 fields id, name and data.
and you current table looks something like
--------------------------
| id | name | data |
| 1 | testname | a,b,c,d|
| 2 | remy | a,b,c,d|
--------------------------
your new table structure should look like.
table 1 Master
-----------------
| id | name |
| 1 | testname |
| 2 | remy |
-----------------
Table 2 Transaction
--------------
| id | data |
| 1 | a |
| 1 | b |
| 1 | c |
| 1 | d |
| 2 | a |
| 2 | b |
| 2 | k |
| 2 | m |
---------------
For better database management we might need to normalize the data.
Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships. You can find more on below links
3 Normal Forms Database Tutorial
Database normalization
If you have only those two fields in your table then you should have only 1 table as below
id | data
with composite primary key as PRIMARY KEY(id,data) so that there won't be any duplicate data for the respective ID.
The data would be like this
id | data
1 | a
1 | b
1 | c
1 | d
2 | a
2 | b
2 | k
2 | m
You will need another table which can be of the ONE to MANY type.
For e.g. you could have another table datamapping which would have data and ID column where the ID column is a FOREIGN KEY to the ID column of the data table.
So according to your example there would be 4 entries for ID = 1 in the datamapping table.
You will need two tables with a foreign key.
Table 1
id
Table 2
id
datavalue
So the data looks like:
Table 1:
id
1
2
3
Table 2:
id | data
1 | a
1 | b
1 | c
1 | d
2 | a
2 | b
2 | k
2 | m
You are correct, this this is not a good database design. The data field violates the principle of atomicity and therefore the 1NF, which can lead to problems in maintaining and querying the data.
To normalize your design, split the original table in two. There are 2 basic strategies to do it: using non-identifying and using identifying relationship.
NOTE: If you only have id in the parent table, and no other FKs on it, and parent cannot exist without at least one child (i.e. data could not have been empty in the original design), you can dispense with the parent table altogether.