I have a table with different sorts of things. The thing is that there are columns that just some of the rows fill with data. Because the other rows doesn't need that. Is it bad to have some cells empty? Should I store all of then in different tables and have a main table where columns that all objects need are? And then when I want to select something I do joins instead?
It is not bad to have some empty cells. Mostly you need only multiple tables when you can have a 1 on more relation between the tables.
From the little information you have provided, this might be a case of storing information about different types of objects in one table, like cars, people, houses etc.
And forgive me in advance for my overly simplistic example, could not think of a better one.
People have attributes like name, family name, birth date. Cars have attributes like brand, color, number of doors, engine volume etc. Houses have area, number of bedrooms, garden (yes or no) etc.
When such a table would be populated with entries, it would look like this:
id name attr1 atttr2 attr3 attr4 attr5 attr6
1 car1 1 1 0 0 0 0
2 car2 1 1 0 0 0 0
3 man1 0 0 1 1 0 0
4 man2 0 0 1 1 0 0
5 house1 0 0 0 0 1 1
6 house2 0 0 0 0 1 1
1 means cell has some data, 0 means cell is empty.
If your table looks like this, you might be better off splitting your table into more tables, as database normalization suggests.
Related
Sorry for the beginner question.
I have an Outputs table:
ID
value
0
x
1
y
2
z
And an Inputs table that is linked to the Outputs through the outputsID:
ID
outputsID
name
0
0
A
1
1
B
2
1
C
3
2
B
4
2
C
Assuming that multiple outputs have at least one shared input (in this example outputID 1,3 and 2,4 are the same), is there a way to avoid the duplication of entries in my Inputs table (inputID 3 and 4)?
The 'normal' answer to your question is no. Rows 1 and 2 address output 1, and Rows 3 and 4 address output 2. They aren't duplicates and each reflect something distinct.
So if you are a beginner, I would say you shouldn't want to get rid of these rows.
That said, there are some more advanced techniques. For example, you could have the OutputsID column be an array with multiple values. This is harder, more complex, and non-standard.
I am not a very experienced database designer and would like make this
table design better.
ID Title ParentID GroupID Price
1 single product 0 0 12.00 // single
2 main product 0 0 44.00 // parent
3 sub product 2 0 4.00 // child
4 product set A 0 A 49.00 // complete price (ignore part price)
5 set part A1 0 A 22.00
6 set part A2 0 A 6.00
7 set part A3 0 A 31.00
8 product set B 0 B 0 // sum price (22 + 6 + 31 = 59)
9 set part B1 0 B 22.00
10 set part B2 0 B 6.00
11 set part B3 0 B 31.00
So there are four different products in the basket (and count
basket products with sql is a problem ;)). Not very straight
SQL and I need a lot of logic to handle the result.
I know that I can realise parent/child products with GroupID
but parent/child products will be displayed different in the frontend.
I need the information, is it a set or a parent/child product...
Does anyone have an idea how to realize this better?
Thank you very much & best regards
Let's say I have the following data:
id disease
1 0
1 1
1 0
2 0
2 1
3 0
4 0
4 0
I would like to remove the duplicate observations in Stata.
For example
id disease
1 1
2 1
3 0
4 0
For group id=1, keep observation 2
For group id=2, keep observation 2
For group id=3, keep observation 1 (because it has only 1 obs)
For group id=4, keep observation 1 (or any of them but one obs)
I am trying Stata duplicates command,
duplicates tag id if disease==0, generate(info)
drop if info==1
but it's not working as I required.
It is no surprise that duplicates does not do what you are wanting, as it does not fit your problem. For example, the observation with id == 2, disease == 0 is not a duplicate of any other observation. More generally, duplicates does not purport to be a general-purpose command for dropping observations you don't want.
Your criteria appear to be
Keep one observation for each id.
If id has any observation with value of 1, that is to be kept.
A solution to that is
bysort id (disease) : keep if _n == _N
That keeps the last observation for each distinct id: after sorting within id on disease observations with the disease are necessarily at the end of each group.
I'm trying to write a query to process a single table that looks like this:
record_id item_id part_id part_length
----------- ------- -------- ------------
1 0 0 123.12
2 0 0 123.09
3 0 1 231.24
4 0 1 239.14
5 1 0 45.91
6 1 0 46.12
7 1 1 62.24
8 1 1 59.40
which is basically a table of inaccurate length measurements of some parts of some items recorded multiple times (not twice, actually each part has 100s of measurements). With a single select, I want to get a result like this:
record_id item_id part_id unit part_length_ratio
----------- ------- -------- ----- ----------------
1 0 0 1 123.12 / 231.24
2 0 0 1 123.09 / 239.14
3 0 1 0 231.24 / 123.12
4 0 1 0 239.14 / 123.09
5 1 0 1 45.91 / 62.24
6 1 0 1 46.12 / 59.40
7 1 1 0 62.24 / 45.91
8 1 1 0 59.40 / 46.12
which is basically selecting each part of an item as the unit and calculates the ratio of the length of other parts of the same item to this unit while matching the measurement times. I wrote a script which computes this kind of table but would like to do it with sql. I can understand if you fail to understand the question :)
for each item i
for each part unit of i
for each part other of i
if unit != other
print i.id other.part_id unit.part_id other.length / unit.length
As I said in a comment, tables are unordered sets: there is no first or second row...
... unless if you want to use the id column to explicitly order the rows.
However, can you guarantee that there will always be (exactly) two samples for each case and that the "lower ID" always match the first sample? This appears to be quite fragile as in real-life, there will probably have cases where a test will be performed twice or a test will be missing or done "late". Not mentioning concurrent access to your DB.
Can't you simply add a "sample number" column?
I'm trying to construct a multiplex gate. It has two inputs, and one selector. I got as far as
the truth table.
A | B | Sel | Out
0 0 1 0
0 1 1 0
1 0 1 1
1 1 1 1
0 0 0 0
0 1 0 1
1 0 0 0
1 1 0 1
And this is where my method fails. I've constructed simpler gates such as AND, and OR. Those were so simple I didn't need an articulate method. I went to wikipedia to see if I could get
a method. Instead I only discovered which gates I need to construct the circuit. For my goals, this misses the point. More important to me is the method that arrives at the answer, rather than the answer itself. I know I need to use DeMorgan's Laws, but fall down when trying to come up with specifics. Any hints would be most welcome.
Just to elaborate on Keith's answer, here's the Karnaugh map for your truth table:
AB
00 01 11 10
___________
sel 0 | 0 1 1 0
1 | 0 0 1 1
This is created by grouping A and B, and then making a matrix of the outputs for any given input. Note the column headings do not count in binary, rather they are more like a grey code, having only one transition between each column.
Now that's done, you can write an equation that ORs together terms that cover all the 1s in the Karnaugh map.
On the Karnaugh map, it's pretty easy to see terms that cover multiple 1s. For example, the term B.sel' (B and not sel) covers both the 1's in the top row.
That combined with A.sel for the 1's in the bottom row gives the equation
output = B.sel' + A.sel
This works out at 4 gates, including the NOT.
You can make a Karnaugh Map, which will help you pick the gates you need to implement your function.