Normalizing data from a pair of CSV tables - mysql

I'm trying to normalize some data, and can't seem to come up with a solution. What I have is a table like this:
weight position1 position2 position3
1 10 20 30
2 25 35 45
3 17 05 22
and one like this:
location position
6 1
7 1
8 2
9 2
10 2
11 3
12 3
How do I normalize the above so that given a location and a weight, I can find the value for a given position?
I can use Perl, Python, Excel, MySQL or pretty much any tool on the block to do the actual reshuffling of the data; where I'm having a problem is in coming up with a reasonable schema.
The desired outcome here is something like
if location == 11 -> position is 3
therefore,
if weight == 2 -> the value is 45

The only thing to do is "unpivot" your first table to this:
weight position value
1 1 10
1 2 20
1 3 30
2 1 25
2 2 35
2 3 45
3 1 17
3 2 05
3 3 22
The first two columns should contain unique pairs of values. If you have other information that only depends on weight, you would need another table for that. Same for positions.
Converting to the new model
If you already have the tables, then you can create the first table (t1) with this statement:
create table t1_new
select weight, 1 as position, position1 as value
from t1
union all
select weight, 2 as position, position2 as value
from t1
union all
select weight, 3 as position, position3 as value
from t1
Then, after verification of the result, drop t1, and rename t1_new to t1.
Querying from the new model
To query from these tables the value for a given location and weight, you should use a join:
select value
from t1
inner join t2 on t2.weight = t1.weight
where t2.location = 11
and t1.position = 3

Related

Create multiple results from single row joining either 2 or 3 tables based off conditional of table1 results?

First let me say that yes, this is a horrible way to have stored data, second, it isn't my fault :) I am trying to integrate with a 3rd party database to extract info which is stored in 3 tables, which really should have been in two AND stored where table 2 had a many to one relationship. Since that isn't the case, I have a puzzle to share.
Table one contains rows in which multiple values can be stored. Each row has codeid1-codeid20. These columns may contain a value, or a 0 (they are never null). They also have a corresponding codetype1-codetype20 which will be either 0 or 1.
If codetype1 is equal to 0, we go to table 2 and select description from the matching table1.codeid1=table2.id. If codetype1 equals 1, we now have to look at table3 and find where table1.codeid1=table3.id and then match table3.table2id=table2.id and return the description.
Here is the data structure:
table1
codeid1,codeid2,codeid3,...codeid20 ... codetype1,codetype2,codetype3,.....codetype20
18 13 1 33 0 0 1 1
13 21 45 0 0 1 0 0
table2
id, description
13 Item 13 Description
15 Item 15 Description
17 Item 17 Description
18 Item 18 Description
21 Item 21 Description
28 Item 28 Description
45 Item 45 Description
table3
id, table2id
1 15
33 17
21 28
The results I would be looking for would look like this:
rowid, description
1 Item 18 Description
1 Item 13 Description
1 Item 15 Description
1 Item 17 Description
2 Item 13 Description
2 Item 28 Description
2 Item 45 Description
I got started working with someone last night, but I had missed part of the complexity of my situation in not integrating table3. Like I said, fun puzzle... This gives me the relationship between the first 2 tables, but I am unsure how I can work in a 3rd table.
SELECT table1.rowid, table2.description
FROM table2
INNER JOIN table1
ON table2.id=table1.codeie1
OR table2.id=table1.codeie2
...
The database is a Faircom C-Tree DB over an ODBC connection, which is generally compatible with Mysql statements including UNION, WITH, INTERSECT, EXISTS, JOIN... There is no PIVOT function.
https://docs.faircom.com/doc/sqlref/sqlref.pdf
Perhaps it will work with or rather than in:
where exists (select 1
from table1 as t1
where t2.id = t1.codeid1 or
t2.id = t2.codeid2 or
. . .
);

Duplicating rows in one select MySql query

At first I would like greet all Users and apologize for my english :).
I'm new user on this forum.
I have a question about MySQL queries.
I have table Items with let say 2 columns for example itemsID and ItemsQty.
itemsID ItemsQty
11 2
12 3
13 3
15 5
16 1
I need select itemsID but duplicated as many times as indicated in column ItemsQty.
itemsID ItemsQty
11 2
11 2
12 3
12 3
12 3
13 3
13 3
13 3
15 5
15 5
15 5
15 5
15 5
16 1
I tried that query:
SELECT items.itemsID, items.itemsQty
FROM base.items
LEFT OUTER JOIN
(
SELECT items.itemsQty AS Qty FROM base.items
) AS Numbers ON items.itemsQty <=Numbers.Qty
ORDER BY items.itemsID;
but it doesn't work correctly.
Thanks in advance for help.
SQL answer - Option 1
You need another table called numbers with the numbers 1 up to the maximum for ItemsQuantity
Table: NUMBERS
1
2
3
4
5
......
max number for ItemsQuantity
Then the following SELECT statement will work
SELECT ItemsID, ItemsQty
FROM originaltable
JOIN numbers
ON originaltable.ItemsQty >= numbers.number
ORDER BY ItemsID, number
See this fiddle -> you should always set-up a fiddle like this when you can - it makes everyone's life easier!!!
code answer - option 2
MySQL probably won't do what you want 'cleanly' without a second table (although some clever person might know how)
What is wrong with doing it with script?
Just run a SELECT itemsID, ItemsQty FROM table
Then when looping through the result just do (pseudo code as no language specified)
newArray = array(); // new array
While Rows Returned from database{ //loop all rows returned
loop number of times in column 'ItemsQty'{
newArray -> add 'ItemsID'
}
}//end of while loop
This will give you a new array
0 => 11
1 => 11
2 => 12
3 => 12
4 => 12
5 => 13
etc.
Select DISTINCT items.itemsID, items.itemsQty From base.items left outer join (select items.itemsQty as Qty from base.items) As Numbers On items.itemsQty <=Numbers.Qty
order by items.itemsID;
Use DISTINCT to remove duplicates. Read more here - http://dev.mysql.com/doc/refman/5.0/en/select.html
It seems like I understood what you asked differently than everyone else so I hope I answer you question. What I would basically do is -
create a new table for those changes.
Create a mysql procedure which given a line in the original table add new lines to the new table - http://dev.mysql.com/doc/refman/5.6/en/loop.html
Run this procedure for each line in the original table.
try this to get distinct values from both columns
SELECT DISTINCT itemsID FROM items
UNION
SELECT DISTINCT itemsQty FROM items

Mysql :Exclude row that does not satisfy the condition list

So Here is My data
ID C1 C2 C3
6 Digit 2 6,8,10,12
12 Digit 3 15
15 127 Digit 2 6,7,8,9,10,11,12,13
68 140,141 Digit 11 85,86,87,88,167,168,158,159
73 1 Digit 11 85,86,87,88,169,170
76 Digit 11 85,86,87,91,164,165,166,167,168
99 Digit 11 20,27,85,86,87
106 Digit 1 1,2
111 Digit 11 85,86,87,88
112 Digit 11 85,86,87,88
135 Digit 11 85,86,87
and my condition string is (2,6,15,37,42,52,62,65,79,85,94,100,104,107,113,124,131)
Now,I want to exclude row 3,4,5 if the values 127,140,141,1 are not in the list condition. I tried Not in , but no avail. I think I might be missing something basic, but just cant get it.
It's better not to store multiple values in a column if possible. Then it's easier to do queries like this.
You cannot use "IN" or "NOT IN" because they are looking for a list of separate items. But C3 is just one item that happens to have commas in it.
Try this:
SELECT * FROM
(SELECT ID, C1, C2, CONCAT('|',REPLACE(C3,',','|'),'|') as C3 FROM `table` WHERE `C3` ) as t1
WHERE t1.C3 NOT LIKE "|127|" AND t1.C3 NOT LIKE "|140|" AND t1.C3 NOT LIKE "|141|" AND t1.C3 NOT LIKE "|1|"
You could avoid the "|" and just concat "," to the start and end.
Or you could fix your database schema so that it actually acts like a Normalized Relational Database.
Every column that contains multiple values should be separated out into its own table.
There should be no column C3 in your table above. Instead, you should have a table, some_other_data:
At this point, I see that C3=6 is related to more than one record in the main table. Therefore, you actually need a third, linking table, in addition to some_other_data. See below.
`some_other_data`
id
6
8
10
12
15
`main_table_to_some_other_data_link`
some_other_data_id | main_table_id
6 6
8 6
10 6
12 6
15 12
6 15
etc. You can see that the linking table can contain duplicates of either value. But your other two tables would have completely unique ids.
I think you're trying to solve the wrong problem.
(I'm assuming you can change your table structure. If you can't someone else will need to address your question.)
The long lists of comma-separated data are a flag that they have a one-to-many relationship with ID.
For example, make the data in C3 its own table:
ID MainID C3
================
1 6 6
2 6 8
3 6 10
4 6 12
5 12 15
6 15 6
7 15 7
8 15 8
9 15 9
10 15 10
11 15 11
12 15 12
13 15 13
// and so forth //
So ID is the primary key of the new table, MainID is the foreign key that refers to the record in your primary table, and C3 is the data in C3.
Each separate value of C3 now has its own record.
Now, you're in a position to use something like
Select * from MainTable
Inner Join NewTable
On MainTable.ID = NewTable.MainID
Where NewTable.C3 Not In (2,6,15,37,42,52,62,65,79,85,94,100,104,107,113,124,131);
If you can, pulling out the one-to-many relationships into their own tables will make things easier for you.

MySQL delete except one

I've the following database structure:
id idproperty idgbs
1 1 136
2 1 128
3 1 10
4 1 1
5 2 136
6 2 128
7 2 10
8 2 1
9 3 561
10 3 560
11 3 10
12 3 1
13 4 561
14 4 560
15 4 10
16 4 1
17 5 234
18 5 120
19 5 1
20 6 234
21 6 120
22 6 1
Here are the details:
The table refers idproperty with different geographic location. For example:
idgbs
1 refers to United States
10 refers to Alabama with parentid 1 (United States)
128 refers to Alabama Gulf Coast with parentid 10 (Alabama)
136 Dauphin Island with parentid 128 (Alabama Gulf Coast)
So, the structure is:
United States > Alabama > Alabama Gulf Coast > Dauphin Island
I want to delete all entries for idproperty EXCEPT the first with the set of idgbs 136, 128, 10, 1 i.e. leave atleast 1 property in all GBS and delete others.
Also, sometimes it is 4 level of geographic entries, sometimes it is 3 level.
Please share the logic & SQL query to delete all entries except one in every unique GBS.
GBS 1, 10, 128, 136 is one unique, so database should only contain 1 property id with these GBS.
After the query, the table would look like this:
id idproperty idgbs
1 1 136
2 1 128
3 1 10
4 1 1
9 3 561
10 3 560
11 3 10
12 3 1
17 5 234
18 5 120
19 5 1
Rephrasing the question:
I want to keep properties in every root level GBS i.e. there should be only ONE property in Dauphin Island.
Whew... I think I understand what you are after now. I couldn't let this one go ;-)
I had to realize that in the question, you wanted property 2 deleted, because it shared a hierarchy with property 1. Once I realized that, I got the following idea. Basically, we join to an aggregated version of self twice: the first one tells us what our "gbs hierarchy path" is, and the second one matches any previous properties with the same hierarchy. Rows which find that there are no "previous" properties that share their hierarchy are spared, the rest with that hierarchy are deleted. It's possible that this could be further tweaked, but I wanted to share this now. I have tested it with the data you showed, and I got the results you posted.
DELETE
each_row.*
FROM property_gbs AS each_row
JOIN ( SELECT
idproperty,
GROUP_CONCAT(idgbs ORDER BY idgbs DESC SEPARATOR "/") AS idgbs_path
FROM property_gbs
GROUP BY idproperty
) AS mypath
USING(idproperty)
LEFT JOIN ( SELECT
idproperty,
GROUP_CONCAT(idgbs ORDER BY idgbs DESC SEPARATOR "/") AS idgbs_path
FROM property_gbs
GROUP BY idproperty
) AS previous_property
ON mypath.idgbs_path = previous_property.idgbs_path
AND previous_property.idproperty < each_row.idproperty
WHERE previous_property.idproperty
Note that the last line is not a typo, we are just checking if there is a previous property with the same path. If there is, then delete the currently-evaluated row.
Cheers!
note for clarification
The thought here is to associate every row with it's hierarchy, even if it's a row which represents somewhere in the middle of the hierarchy (such as row: {2, 1, 128} in the question). With the first join to the aggregate, each row now "knows" what it's path is (so that row would get "136/128/10/1"). We can then use that value in the second join to find other properties with the same path, but only if they have a LOWER property id. This allows us to check for the existence of a lower-ID property with the same "path", and delete any row which represents a property which does have such a "lower-order path-sibling".
I'm really not sure. But try this.
DELETE a1 FROM table a1, table a2
WHERE a1.id > a2.id
AND a1.idgbs = a2.idgbs
AND a1.idgbs <> 1
if you want to keep the row with the lowest id.
This one was difficult #dang but I enjoyed the challenge.
;With [CTE] as (Select id ,idproperty ,idgbs ,Row_Number() Over(Partition By idgbs order by idproperty Asc) as RN From [TableGBS])
,[CTE2] as (Select * From [CTE] Where RN > 1)
,[CTE3] as (Select idproperty ,count(*) as [Count] From [CTE2] Group by idproperty)
Delete from [TableGBS] Where id in (Select a.id From [CTE] as a Left Join [CTE3] as b on a.idproperty = b.idproperty Where RN > 1 And [Count] > 2);
Since i dont think you can do a delete statement in sqlfiddle here are the rows it will delete showing in a select statement: http://sqlfiddle.com/#!3/08108/40
Edit: I use MySQL linked to Microsoft SQL Server Management Studio so this might not work for you
This technique joins the table against an aggregated version of itself, essentially matching each row in the table to the knowledge of which idproperty is the lowest for it's idgbs, and deletes the row if it does not share that idproperty (i.e. the row with the lowest idproperty, when joined to itself, will not be deleted, but the rest of the rows with that idgbs will).
DELETE
each_row.*
FROM table AS each_row
JOIN (select MIN(idproperty), idgbs FROM table GROUP BY idgbs) as lowest_id
USING(idgbs)
WHERE each_row.idproperty != lowest_id.idproperty;

MySQL: Matching inexact values using "ON"

I'm way out of my league here...
I have a mapping table (table1) to assign particular values (value) to a whole number (map_nu). My second table (table2), is a collection of averages (avg) for each user (user_id).
(I couldn't figure out how to properly make a markdown table, please feel free to edit!)
table1: table2:
(value)(Map_nu) (user_id)(avg)
---- -----
1 1 1 1.111
1.045 2 2 1.2
1.09 3 3 1.33333
1.135 4 4 1
1.18 5 5 1.389
1.225 6 6 1.42
1.27 7 7 1.07
1.315 8
1.36 9
1.405 10
The value Map_nu is a special number that each user gets assigned according to their average. I need to find a way to match the averages from table2 to the closest value in table1. I only need to match to the 2 digit past the decimal, so I've added the Truncated function
SELECT table2.user_id, map_nu
FROM `table1`
JOIN table2 ON TRUNCATE(table1.value,2)=TRUNCATE(table2.avg,2)
I still miss the values that don't match the averages exactly. Is there a way to pick the nearest truncated value or even to round to the second decimal? Rounding up/down wont matter as long as its applied to all values the same.
I am trying to have the following result (if rounded up):
(user_id)(Map_nu)
----
1 4
2 6
3 6
4 1
5 10
6 11
7 3
Thanks!
i think you might have to do this in 2 separate queries. there is no 'nearest' operator in sql, so you can either calculate it in your software, or you could use
select map_nu from table1 ORDER BY abs(value - $avg) LIMIT 1
inside a loop. however, that cannot be used as a join function as it requires the ORDER and LIMIT which are not valid as joins.
another way of looking at it is it seems that your map_nu and value are deterministic in relation to each other - value = 1 + ((map_nu - 1) * 0.045) - so maybe you could make use of that fact and calculate an integer based on that equation? assuming that relationship holds true for all values of map_nu.
This is an awkward database design. What is the data representing and what are you trying to solve? There might be a better way.
Maybe do something like...
SELECT a.user_id, b.map_nu, abs(a.avg - b.value)
FROM
table2 a
join table1 b
left join table1 c on abs(a.avg - b.value) > abs(a.avg - c.value)
where c.value is null
order by a.user_id
Doesn't actually produce the same output as the one you were expecting for (doesn't do any rounding). Though you should be able to tweak it from there. Above query will produce the output below (w/ data you've provided):
user_id map_nu abs(a.avg - b.value)
------- ------ --------------------
1 3 0.0209999999999999
2 5 0.02
3 8 0.01833
4 1 0
5 10 0.016
6 10 0.0149999999999999
7 3 0.02
Beware though if you're dealing with large tables. Evaluate the explain of the above query if it'll be practical to run it within MySQL or if better to be done outside it.
Note 2: Will produce duplicate rows if there are avg values that are equi-distant to value values within table1 (Ex. if value for map_nu's 11 and 12 are 2 and 3 and someone get's an avg of 2.5). Your question doesn't really specify what to do for that so you might want to take that into account.
Its taking a little extra work, but I figure the easiest way to get my results will be to map all values to the second decimal place in table1:
1 1
1.01 1
1.02 1
1.03 1
1.04 1
1.05 2
1.06 2
1.07 2
1.08 2
1.09 3
1.1 3
1.11 3
1.12 3
1.13 3
1.14 4
...
Thanks for the suggestions! Sorry I couldn't present the question more clear.