Should I normalize or not? If yes how? - mysql

Currently I have a table with a column containing CSVs. I am not sure whether to normalize the whole table or not. The problem is this column, configuration, may contain up to 50 or more different types of values. For example in the table shown below it's 18, 20, but for other data in the same column it may be 0, 20, 21, 22, 23, 25, 26, 27, 40, 52, 54, 55 and so on, however these values are unique. They will never repeat.
I do not know what is the maximum number for it(it may vary) so that's why I kept it in CSV. I am currently have trouble normalizing it, or rather I am not sure whether I should even normalize it. Any help here?
id tester_type device_id board_id configuration
75946 UFLEX 997 220
44570 UFLEX 450 220 18,20
44569 UFLEX 449 220 18,20
44568 UFLEX 448 220 18,20
44567 UFLEX 447 220 18
Note: Configuration column does also contain empty values or empty spaces.

I do have to query against it so I guess I have to normalize it.
Yes, you do :)
If do create the table, does that mean I have to create for every possible configuration value?
An example of a normalised structure would be:
join table
==========
test_id configuration_id (spanning unique constraint)
------- ----------------
44570 18
44570 20
44569 18
44569 20
44569 20
44568 18
44568 20
44567 18
configurations table
====================
configuration_id
----------------
18
20
If you're using InnoDB, each column of the join table is also a foreign key to their respective parent tables.

I disagree with both "must" and "must not" normalize stands. My 2 cents:
Do not normalize "continuous" values such as prices, numbers, dates, floats, etc.
Do not normalize values that are unique or nearly so.
Do not normalize fields that are narrow. For example, don't replace a 2-letter country code with a 4-byte country_id.
"Normalize for simplicity": Do normalize things that are used in multiple tables and are subject to change. Sometimes names, addresses, company names, etc fall into this category. This is so you can change the value in exactly one place, not lots of places.
"Normalize for space": Do normalize things that would save a significant amount of overall space for the dataset. (This applies to gigabyte tables much more so than to kilobyte tables.)
Normalize, but don't "over-normalize". You will figure out what I mean when you have over-normalized and a nasty JOIN can't be optimized.
If you would like further specific advice, let's see SHOW CREATE TABLE and sample values for any un-obvious columns.

Related

Store all possible combinations of a specific number range

Supposing I have 1000 numbers from 1 -> 1000, and a user can have any of the 1000 combination (eg: 4, 25, 353..).
How can I efficiently store that combination in a MySQL DB.
What I thought. I can use the power of 2, and store each number in a really large int, like:
1 -> 01
2 -> 10
4 -> 100
etc.
So if I happen to get the number 6 (110) I know the user has the combination of numbers 2, 4 (2 | 4 = 6) .
So we can have 2^1000 combinations, 125byte. But that is not efficient at all since bigint has 8bytes and I cant store
that in MySQL without using vachars etc. Nodejs cant handle that big number either (and I dont as well) with 2^53-1 being the max.
Why I am asking this question; can I do the above with base 10 instead of 2 and minimize the max bytes that the int can be. That was silly and I think making it to base10 or another base out of 2 changes nothing.
Edit: Additional thoughts;
So one possible solution is to make them in sets of 16digit numbers then convert them to strings concat them with a delimiter, and store that instead of numbers. (Potentially replace multiple 1's or 0's with a certain character to make it even smaller. Though I have a feeling that falls into the compression fields, but nothing better has come to my mind.)
Based on your question I am assuming you are optimizing for space
If most users have many numbers from the set then 125 bytes the way you described is the best you can do. You can store that in a BINARY(125) column though. In Node.js you could just a Buffer (you could use a plain string but should use a Buffer) to operate on the 125 byte bit-field.
If most users have only a few elements in the set then it will take less space to have a separate table with two columns such as:
user_id | has_element (SMALLINT)
---------------------
1 | 4
1 | 25
1 | 353
2 | 7
2 | 25
2 | 512
2 | 756
2 | 877
This will also make queries cleaner and more efficient for doing simple queries like SELECT user_id FROM user_elements WHERE has_element = 25;. You should probably add an index on has_element if you do queries like that to make them many times more efficient than storing a bitfield in a column.

Finding Reccurring Number Combinations in Column of Numbers

I have searched and found discussions and solutions to similar problems, but not quite or as complex as I'm trying to figure out.
I have an access table which consists of two columns Draw Number and Number Drawn as shown below. Draw Number is repeated 20 times, to correspond to the 20 numbers that are drawn in each particular draw.
I'm trying to figure a way to determine the most frequent occurring combination of numbers (5 numbers) for all of the draws in each of the 20 number sets. So for instance, 12341 occurs n x, 12342 occurs nx, 12343 occurs n x, etc.
I've created parameter queries which allow me to search for different number combinations from 2 to 10 numbers, and they work OK returning the number of occurrences of a combination of numbers that I input through a simple UI. But the goal is to figure out pragmatically what the optimum combination of numbers.
Hope this makes sense. And by the way, there are 36 million or so rows in the table. The para queries work quite well however; it takes just over a second to return results for each number added. So, query two numbers = 2 second wait, three numbers = 3 second wait, etc.
I've been thinking about a loop of some type but don't know how to get started? Processing time isn't an issue; can take a day if required!
This is written in VBA and has an assortment of queries, temp tables, etc to get the job done.
The text says Access, but the tags say MySql, which is it? – RBarryYoung 21 hours ago
This part confuses me: I'm trying to figure a way to determine the most frequent occurring combination of numbers (5 numbers) for all of the draws in each of the 20 number sets. So for instance, 12341 occurs n x, 12342 occurs nx, 12343 occurs n x, etc. – Newd 21 hours ago
^What do you mean five numbers? No where in your sample data do I see 12341. Please explain using the data you have, and give expected results using that data. – McAdam331 21 hours ago
drosberg - clarification:
thanks for the response. It is an Access application, but as a first-time poster Stackoverflow recommends tags?
By five numbers I mean the most frequently occurring group of five numbers (I used five as an example, could be groups of 2 to 10 numbers) which occur in each draw, where a draw consists of 20 drawn numbers from a total of 80 numbers. So the data that I posted was intended as an example. The sample provided only has 50, 51 in common. I can plug 50 and 51 into the parameter query and it will tell me that this combination occurs 60,000 times (or whatever), but perhaps 50 and 57 occurs 65,000 times.
If i was to do this manually, and assuming I'm looking for the most frequent 5 number combination I would enter the following in the parameter query: 1,2,3,4,1 group = 30,000 occurrences 1,2,3,4,2 group = 31,000 occurrences 1,2,3,4,3 group = 31,050 occurrences 1,2,3,4,4 group = 29,050 occurrences etc........... etc...........
but I would have to do this for every combination of 5 numbers that can be derived from the numbers 1 thru 80. I'm hoping to have program do the work!!
thanks
don
DRAW NUMBER NUMBER DRAWN
1 1
1 28
1 19
1 3
1 38
1 46
1 43
1 29
1 13
1 22
1 20
1 11
1 50
1 51
1 53
1 54
1 57
1 64
1 76
1 78
2 29
2 14
2 2
2 1
2 35
2 40
2 39
2 30
2 10
2 27
2 21
2 6
2 42
2 50
2 51
2 53
2 54
2 61
2 65
2 69
I wrote a post a while ago about generating permutations with and without repetition using Excel. Perhaps you can use it.
https://michiel.wordpress.com/2015/03/29/permutations-with-repetition-using-excel/
Here's how it works. I am using strings, but you can easily modify that for numbers (since you say you need 5).
You can use the MID function to grab a single char from a string, and generate permutations from it.
=MID(Pattern,MOD([N]/[P],Length)+1,1)
N revers to the column N
P refers to the horizontal row (1,4,16). You can generate these with a formula like =4^.
After putting in the code, you can make a list of all permutations in Excel and in the cell next to it generate a sql query that you can perform as well from VBA.
Example: Looking up Access database in Excel
Or find a commercial tool like http://thingiequery.com/
I don't know if there's any open source tools for it.
I'm thinking that you should consider:
Say there are 100 balls.
Setting up a table to have one row for each "Draw number" with 100 columns one for every possible number each column has type boolean.
When you look to see which draws had number 23 you just add a
WHERE Column23 = true.
For numbers 23 and 56
WHERE Column23 = true AND Column56 = true
This should massivel simplify and speed up your SQL.
You set up a table with every possible combination of numbers.
You run SQL to find the counts.
Harvey

Access Calculated Field

I am having difficulty trying to make a calculated field that I need. So here is what I am trying to do:
I have a query that combines the information based on three tables. The most important fields that for the application are as follows:
Family Income Age Patient
15,000 18 Yes
28,000 25 No
30,000 1 Yes
From here I want to make a calculated field that gives the correct program the patient was enrolled in. based on these fields ie:
Program Minimum Income Maximum Income Minimum Age Maximum Age Patient
Children's 0 20,000 1 19 Yes
Adult 0 12,000 19 65 No
Non Patient 0 20,000 1 19 No
Adult 2 12,000 50,000 19 65 No
Etc.
to create:
Family Income Age Patient Program
15,000 18 Yes Children's
28,000 25 No Adult 2
30,000 1 Yes Children's 2
I know I can use IIf to hard code it in to the field, but then it will be really difficult for other people to update the information as the guidelines change. Is it possible to have the information stored in a table? and use the information on the table form etc, or will I need to use IIf
Any Ideas? is it possible to dynamically create the IIf in SQL using VBA while pulling the information from the table?
EDIT:::
Thank you for your response and for formatting my tables, I still have no idea how you changed it, but it looks amazing!
I tried to add the SQL you added down below, but I was not able to make it work. I'm not sure if I made a mistake so I included the SQL of my Query. The query currently returns 0 values, so I think I messed something up. (The real Query is embarassing...I'm sorry for that). Unfortunately, I have done everything in my power to avoid SQL, and now I am paying the price.
SELECT qry_CombinedIndividual.qry_PrimaryApplicant.[Application Date],
qry_CombinedIndividual.qry_PrimaryApplicant.[Eligibility Rep],
qry_CombinedIndividual.qry_PrimaryApplicant.Name,
qry_CombinedIndividual.qry_PrimaryApplicant.Clinic,
qry_CombinedIndividual.qry_PrimaryApplicant.Outreach,
qry_CombinedIndividual.qry_PrimaryApplicant.[Content Type ID],
qry_CombinedIndividual.qry_PrimaryApplicant.[Application Status],
qry_CombinedIndividual.qry_PrimaryApplicant.Renewal,
qry_CombinedIndividual.qry_Enrolled.EthnicityEnr,
qry_CombinedIndividual.qry_Enrolled.GenderEnr, qry_CombinedIndividual.AgeAtApp,
qry_CombinedIndividual.[Percent FPL], tbl_ChildrensMedical.MinPercentFPL,
tbl_ChildrensMedical.MaxPercentFPL, tbl_ChildrensMedical.MinAge,
tbl_ChildrensMedical.MaxAge, tbl_ChildrensMedical.Program
FROM qry_CombinedIndividual
INNER JOIN tbl_ChildrensMedical ON qry_CombinedIndividual.qry_Enrolled.Patient = tbl_ChildrensMedical.Patient
WHERE (((qry_CombinedIndividual.AgeAtApp)>=[tbl_ChildrensMedical].[MinAge]
And (qry_CombinedIndividual.AgeAtApp)<[tbl_ChildrensMedical].[MinAge])
AND ((qry_CombinedIndividual.[Percent FPL])>=[tbl_ChildrensMedical].[MinPercentFPL]
And (qry_CombinedIndividual.[Percent FPL])<[tbl_ChildrensMedical].[MaxPercentFPL]));
Also there are many different programs. Here is the real Children's Table (eventually I would like to add adults if possible)
*Note the actual table uses FPL (which takes family size into account, but is used the same as income). I am again at a total loss as to how you formated the table.
Program Patient MinPercentFPL MaxPercentFPL MinAge MaxAge
SCHIP (No Premium) No 0 210 1 19
SCHIP (Tier 1) No 210 260 1 19
SCHIP (Tier 2) No 260 312 1 19
Newborn No 0 300 0 1
Newborn (Patient) Yes 0 300 0 1
Children's Medical Yes 0 200 1 19
CHIP (20 Premium) Yes 200 250 1 19
CHIP (30 Premium) Yes 250 300 1 19
Do I have the correct implementation for the table I have? Or should I be changing something. I can also send more information/sample data if that would help.
Thank you again!
I just created some tables with your sample data and used the following SQL. Your 3rd 'patient' doesn't match any of the ranges (Age 1, Income $30K)
SELECT tblPatient.PatName, tblPatient.FamInc, tblPatient.Age, tblPatient.Patient,
tblPatientRange.Program, tblPatientRange.MinInc, tblPatientRange.MaxInc, tblPatientRange.MinAge,
tblPatientRange.MaxAge, tblPatientRange.Patient
FROM tblPatient INNER JOIN tblPatientRange ON tblPatient.Patient = tblPatientRange.Patient
WHERE (((tblPatient.FamInc)>=[tblPatientRange]![MinInc] And (tblPatient.FamInc)<=[tblPatientRange]![MaxInc])
AND ((tblPatient.Age)>=[tblPatientRange]![MinAge] And (tblPatient.Age)<=[tblPatientRange]![MaxAge]));

Updating the queue priority automatically

I have this table which decides the order of articles to be displayed in my webportal:-
Table- ARTICLE_POSITION
`article_id` int(12) NOT NULL,
`article_position` int(11) NOT NULL
say this is sample data in table:-
article_id article_position
56 1
58 2
443 3
88 4
5667 5
322 6
for showing the relevant article I use query to sort them according to article_position and display on webpage, Now the problem is that when I try to move any article to different position say I want to move
Article 5667 from position 5 to position 1, I have to update position of each article lying between 1 and 5,using UPDATE query
final table structure
article_id article_position
5667 1*
56 2*
58 3*
443 4*
88 5*
322 6
( * position updated )
this update query becomes really time consuming and inefficient when database is large, is there any other way of doing it....
If you want to avoid re-ordering all of your articles, try using another datatype for article_position other than an int. Go with something like a timestamp. Then your ordering query can present the articles in order of newest timestamp to oldest. If you need to move one article to the top, just assign a newer timestamp to its article_position element. This should solve the problem of having to reorder all of the article_position elements in your original example.
I'm making the assumption that you're looking for a solution to insert at any point in the article_position order. I can't think of a solution where you would never have to update article_position for all article_id's, but you could offset the need to do it EVERY time an article_position changed. Rather than incrementing each article_position by 1, add some padding for insertion through an increased position increment (5, 10, 25, 100, etc...). This would leave room for changing an existing article's article_position without having to update all of the other articles.
To demonstrate with your example:
article_id article_position
56 5
58 10
443 15
88 20
5667 25
322 30
After re-order (* = updated article_position)
article_id article_position
5667 2*
56 5
58 10
443 15
88 20
322 30
Eventually, you would need to update all of the article_position's to keep from running out of insertion space between some articles. But this could be done at some planned maintenance interval rather than every time an article_position changes.

What would be the best practice to store multiple 2 digit dataset in MySql server

Let say i want to store several dataset ie
78 94 33 22 14 55 18 10 11
44 59 69 79 39 49 29 19 39
And later on i would like to be able run queries that will determine the frequency of certain number. What would be the best way to this? What would be table structure to make a fast query.
Please be specific as you can be.
To get the counts, you can run a query such as:
SELECT value, COUNT(*) from table_of_values GROUP BY value
Placing an index on the single integer value column is pretty much all you can do to speed that up.
You could of course also just keep a table with every two-digit value and a count. You will have to pre-fill the table with zero counts for every value.
Then increment the count instead of inserting:
UPDATE table_of_values SET count = count + 1 WHERE value = (whatever)