I have searched and found discussions and solutions to similar problems, but not quite or as complex as I'm trying to figure out.
I have an access table which consists of two columns Draw Number and Number Drawn as shown below. Draw Number is repeated 20 times, to correspond to the 20 numbers that are drawn in each particular draw.
I'm trying to figure a way to determine the most frequent occurring combination of numbers (5 numbers) for all of the draws in each of the 20 number sets. So for instance, 12341 occurs n x, 12342 occurs nx, 12343 occurs n x, etc.
I've created parameter queries which allow me to search for different number combinations from 2 to 10 numbers, and they work OK returning the number of occurrences of a combination of numbers that I input through a simple UI. But the goal is to figure out pragmatically what the optimum combination of numbers.
Hope this makes sense. And by the way, there are 36 million or so rows in the table. The para queries work quite well however; it takes just over a second to return results for each number added. So, query two numbers = 2 second wait, three numbers = 3 second wait, etc.
I've been thinking about a loop of some type but don't know how to get started? Processing time isn't an issue; can take a day if required!
This is written in VBA and has an assortment of queries, temp tables, etc to get the job done.
The text says Access, but the tags say MySql, which is it? – RBarryYoung 21 hours ago
This part confuses me: I'm trying to figure a way to determine the most frequent occurring combination of numbers (5 numbers) for all of the draws in each of the 20 number sets. So for instance, 12341 occurs n x, 12342 occurs nx, 12343 occurs n x, etc. – Newd 21 hours ago
^What do you mean five numbers? No where in your sample data do I see 12341. Please explain using the data you have, and give expected results using that data. – McAdam331 21 hours ago
drosberg - clarification:
thanks for the response. It is an Access application, but as a first-time poster Stackoverflow recommends tags?
By five numbers I mean the most frequently occurring group of five numbers (I used five as an example, could be groups of 2 to 10 numbers) which occur in each draw, where a draw consists of 20 drawn numbers from a total of 80 numbers. So the data that I posted was intended as an example. The sample provided only has 50, 51 in common. I can plug 50 and 51 into the parameter query and it will tell me that this combination occurs 60,000 times (or whatever), but perhaps 50 and 57 occurs 65,000 times.
If i was to do this manually, and assuming I'm looking for the most frequent 5 number combination I would enter the following in the parameter query: 1,2,3,4,1 group = 30,000 occurrences 1,2,3,4,2 group = 31,000 occurrences 1,2,3,4,3 group = 31,050 occurrences 1,2,3,4,4 group = 29,050 occurrences etc........... etc...........
but I would have to do this for every combination of 5 numbers that can be derived from the numbers 1 thru 80. I'm hoping to have program do the work!!
thanks
don
DRAW NUMBER NUMBER DRAWN
1 1
1 28
1 19
1 3
1 38
1 46
1 43
1 29
1 13
1 22
1 20
1 11
1 50
1 51
1 53
1 54
1 57
1 64
1 76
1 78
2 29
2 14
2 2
2 1
2 35
2 40
2 39
2 30
2 10
2 27
2 21
2 6
2 42
2 50
2 51
2 53
2 54
2 61
2 65
2 69
I wrote a post a while ago about generating permutations with and without repetition using Excel. Perhaps you can use it.
https://michiel.wordpress.com/2015/03/29/permutations-with-repetition-using-excel/
Here's how it works. I am using strings, but you can easily modify that for numbers (since you say you need 5).
You can use the MID function to grab a single char from a string, and generate permutations from it.
=MID(Pattern,MOD([N]/[P],Length)+1,1)
N revers to the column N
P refers to the horizontal row (1,4,16). You can generate these with a formula like =4^.
After putting in the code, you can make a list of all permutations in Excel and in the cell next to it generate a sql query that you can perform as well from VBA.
Example: Looking up Access database in Excel
Or find a commercial tool like http://thingiequery.com/
I don't know if there's any open source tools for it.
I'm thinking that you should consider:
Say there are 100 balls.
Setting up a table to have one row for each "Draw number" with 100 columns one for every possible number each column has type boolean.
When you look to see which draws had number 23 you just add a
WHERE Column23 = true.
For numbers 23 and 56
WHERE Column23 = true AND Column56 = true
This should massivel simplify and speed up your SQL.
You set up a table with every possible combination of numbers.
You run SQL to find the counts.
Harvey
Related
I'm not sure if SSRS is dumb, or I am (I'm leaning towards both).
I have a dataset that (as a result of joins etc) has some columns with the same values duplicated across every row (fairly standard database stuff):
rid cnt bid flg1 flg2
-------------------------------
4 2882 1 17 3
5 2784 1 17 3
6 1293 1 17 3
18 9288 2 4 9
20 762 2 4 9
Reporting based on cnt is straightforward enough. I can also make a tablix that shows the following:
bid flg1 flg2
------------------
1 17 3
2 4 9
(Where the tablix is grouped by Fields!bid.Value and the columns are just Fields!flg1.Value and Fields!flg2.Value respectively.)
What I can't figure out is how to display the sum of these values -- specifically I want to show that the sum of flg1 is 21 and the sum of flg2 is 12 -- not the sum of every row in the dataset (counting each value more than once).
(Note that I'm not looking for a sum of distinct values, as they may not be unique. I want a sum of one value from each bid group, because it's from a table join so they will always have the same value.)
If possible, I'd also like to be able to do a similar calculation at the top level of the report (not in any tablix); although I'd settle for hiding the detail row if that's the only way.
Obviously, Sum(Fields!flg1.Value) isn't the answer, as this either returns 51 (if on the first row inside the group) or 59 (if outside it).
I also tried Sum(Fields!flg1.Value, "bid") but this wasn't considered a valid scope.
I also tried Sum(First(Fields!flg1.Value, "bid")) but apparently you're not allowed to sum first values for some weird reason (and may have had the same scope problem anyway).
Using Sum(Max(Fields!flg1.Value, "bid")) does work, but feels wrong. Is there a better way to do this?
(Related: is there a good way to save the result of that calculation so that I can later also show a Sum of those totals without an even hairier expression?)
There are two basic ways to do this.
Do what you have already done (Sum(Max(Fields!flg1.Value, "bid")))
Sum the rendered values. To do this check the name of the cell containing the data you want (check it's properties) and then use something like =SUM(ReportItems!flg1.Value) where flg1 is the name of the textbox, which is not necessarily always the same name as the field.
Is there a function to reduce the amount of redundant data from one column to match the number of cells in a second column?
I have logged data from two sensors that sent values at different rates. in 8 hours, I collected 11857 values for the first sensor and 8130 for the second one.
I need to compress the first column by deleting data to match the number of cells on the second column, so I can display synchronized values on a chart.
It is not a matter of cutting 3727 cells from the head or tail of the first column, but to delete cells in a proportional way.
I've tried using de Modulus function, but it does not give me the right amount of compression; e.g., by running =MOD(A1,3) and then filtering cells containing '0' value and deleting those rows, I get 7905, which is close to 8130 but still, the data is shifted out.
Edit:
I found a method that requires several steps:
Copy the sensors' data into two columns
Get the number of cells for both columns using COUNTA
Get the ratio between the smaller count over the bigger count
In a new column, create an index for the rows using =INT(ROW()*ratio)
Remove duplicate rows using the index column as the reference with Data > Remove Duplicates
It works, but it will be much faster if there was a ready-made function that will run over the provided data columns and copy the values into two new columns
I tested this solution in LibreOffice Calc. The functions used are basic enough to be found in Excel as well.
Here's a sample with data from 2 sensors, s1 and s2, similar to yours:
Row s1 s2
1 2 3
2 4 6
3 6 9
4 8 12
5 10 15
6 12 18
7 14 21
8 16
9 18
10 20
11 22
What I did was match the data from s1 samples with those from s2 that relatively match the position of the first, so instead of ending up with a number of rows with no s2 values, I padded non-existent s2 values with the last sample taken for any given period of time (column s2a)
Row s1 s2 s2a
1 2 3 3
2 4 6 6
3 6 9 6
4 8 12 9
5 10 15 12
6 12 18 12
7 14 21 15
8 16 18
9 18 18
10 20 21
11 22 21
Assuming that s1 is column A and s2 is column B in the spreadsheet, the function you want on each cell of the new column is:
=INDIRECT( ADDRESS( CEILING( ROW()* COUNT(B:B)/COUNT(A:A)),2))
Let's go from bottom to top:
COUNT(B:B)/COUNT(A:A) - this is the ratio. 0.63' above. It indicates that each sample in any given row in s1 will be found at that row x 0.63 in column s2.
Ceiling - Spreadsheets don't start at row 0, so the first one HAS to be 1. I experimented with Int(), but if the ratio were less than 0.5 we would end up with a 0, which we don't want.
Address - Returns a string with the address of a cell given its row,column coordinates (e.g. Address(3, 2) = "B3" and Address(3,2,2) as used here, will yield an absolute column or "$B3").
Indirect - Returns the contents of a cell whose address is passed as a string (e.g. Address("x5") will return whatever value is stored in cell X5).
Alex
here are records and we want move id #1 between #3 & #4
id title sort
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
method one :
get #3 sort number and plus 1 to it and update #1 sort with that so we have
id title sort
1 a 4
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
and then plus 1 to #4 sort and any records after that
and we have
id title sort
1 a 4
2 b 2
3 c 3
4 d 5
5 e 6
6 f 7
and after sort
id title sort
2 b 2
3 c 3
1 a 4
4 d 5
6 e 6
6 f 7
it works fine but imagine we have 2,000,000 records and all records must be update...
method two
get sum sort of #3 and #4 and divide on 2 => (3+4)/2=3.5
and just put it for #1 sort
id title sort
2 b 2
3 c 3
1 a 3.5
4 d 4
5 e 5
6 f 6
it is work fine too but imagine thousand of this operation make big floats like 3.99999999999 and after a while its horrible
is there any mysql/mariadb trick or method for do this ?
Your "drop it half-way between items" method may be the best.
Let's go with BIGINT UNSIGNED since it gives you 64 bits in 8 bytes. Less good: DOUBLE would give you 53 bits in 8 bytes, and some funny business with exponents. DECIMAL gives you more bits at a cost of more bytes, while not eliminating the need for the following code.
You know which row to put it "after" based on user input?
Discover the row after by using ORDER BY ... ASC LIMIT 1.
Average the two values; check to see if the avg is equal either of them -- if so, you have a bad case.
Digression... 2M rows. Start with 2K, 4K, 6K, etc as the sort values (2M*2K = 4G, the limit of BIGINT UNSIGNED.)
This says you can squeeze 2K items between any adjacent pair. However, in the worst case of repeatedly inserting exactly after the first value, you get only 11 inserts before hitting the wall. 11 ~= log2(2000). That is, the re-sort is may be quick, but up to 1 time in 11, it will be costly.
(Please don't quibble between 2K meaning 2000 vs 2048; it does not matter to the algorithm.)
So, what to do when there is no room to insert a new sort value? Rebuilding the numbers would lock the table (of 2M rows) for "too long", so let's try to avoid that.
How about this:
Grab the 10 rows before and after (2 SELECTs with ORDER BY and LIMIT). Fix those sort values so that they are evenly spread out.
Possibly no issue with hitting the start or end of the table; it would be less than 20 rows. And there is a silent 0 and 4G-1 boundaries.
If the 20 rows are not enough, then broaden the span.
Do all this (including the original, simple, half-way code) in a transaction.
Use FOR UPDATE on all(?) SELECTs so that other threads are blocked.
Check for deadlocks. If encountered, start over completely. (The second try will probably find that the half-way attempt works fine -- because some other thread finished spreading the sort values out.)
Timing:
The half-way case, even with transaction, will probably take a millisecond or so.
The more complex case won't take much longer, in spite of locking and updating 20 rows.
You could probably handle 1K actions per second.
I am having difficulty trying to make a calculated field that I need. So here is what I am trying to do:
I have a query that combines the information based on three tables. The most important fields that for the application are as follows:
Family Income Age Patient
15,000 18 Yes
28,000 25 No
30,000 1 Yes
From here I want to make a calculated field that gives the correct program the patient was enrolled in. based on these fields ie:
Program Minimum Income Maximum Income Minimum Age Maximum Age Patient
Children's 0 20,000 1 19 Yes
Adult 0 12,000 19 65 No
Non Patient 0 20,000 1 19 No
Adult 2 12,000 50,000 19 65 No
Etc.
to create:
Family Income Age Patient Program
15,000 18 Yes Children's
28,000 25 No Adult 2
30,000 1 Yes Children's 2
I know I can use IIf to hard code it in to the field, but then it will be really difficult for other people to update the information as the guidelines change. Is it possible to have the information stored in a table? and use the information on the table form etc, or will I need to use IIf
Any Ideas? is it possible to dynamically create the IIf in SQL using VBA while pulling the information from the table?
EDIT:::
Thank you for your response and for formatting my tables, I still have no idea how you changed it, but it looks amazing!
I tried to add the SQL you added down below, but I was not able to make it work. I'm not sure if I made a mistake so I included the SQL of my Query. The query currently returns 0 values, so I think I messed something up. (The real Query is embarassing...I'm sorry for that). Unfortunately, I have done everything in my power to avoid SQL, and now I am paying the price.
SELECT qry_CombinedIndividual.qry_PrimaryApplicant.[Application Date],
qry_CombinedIndividual.qry_PrimaryApplicant.[Eligibility Rep],
qry_CombinedIndividual.qry_PrimaryApplicant.Name,
qry_CombinedIndividual.qry_PrimaryApplicant.Clinic,
qry_CombinedIndividual.qry_PrimaryApplicant.Outreach,
qry_CombinedIndividual.qry_PrimaryApplicant.[Content Type ID],
qry_CombinedIndividual.qry_PrimaryApplicant.[Application Status],
qry_CombinedIndividual.qry_PrimaryApplicant.Renewal,
qry_CombinedIndividual.qry_Enrolled.EthnicityEnr,
qry_CombinedIndividual.qry_Enrolled.GenderEnr, qry_CombinedIndividual.AgeAtApp,
qry_CombinedIndividual.[Percent FPL], tbl_ChildrensMedical.MinPercentFPL,
tbl_ChildrensMedical.MaxPercentFPL, tbl_ChildrensMedical.MinAge,
tbl_ChildrensMedical.MaxAge, tbl_ChildrensMedical.Program
FROM qry_CombinedIndividual
INNER JOIN tbl_ChildrensMedical ON qry_CombinedIndividual.qry_Enrolled.Patient = tbl_ChildrensMedical.Patient
WHERE (((qry_CombinedIndividual.AgeAtApp)>=[tbl_ChildrensMedical].[MinAge]
And (qry_CombinedIndividual.AgeAtApp)<[tbl_ChildrensMedical].[MinAge])
AND ((qry_CombinedIndividual.[Percent FPL])>=[tbl_ChildrensMedical].[MinPercentFPL]
And (qry_CombinedIndividual.[Percent FPL])<[tbl_ChildrensMedical].[MaxPercentFPL]));
Also there are many different programs. Here is the real Children's Table (eventually I would like to add adults if possible)
*Note the actual table uses FPL (which takes family size into account, but is used the same as income). I am again at a total loss as to how you formated the table.
Program Patient MinPercentFPL MaxPercentFPL MinAge MaxAge
SCHIP (No Premium) No 0 210 1 19
SCHIP (Tier 1) No 210 260 1 19
SCHIP (Tier 2) No 260 312 1 19
Newborn No 0 300 0 1
Newborn (Patient) Yes 0 300 0 1
Children's Medical Yes 0 200 1 19
CHIP (20 Premium) Yes 200 250 1 19
CHIP (30 Premium) Yes 250 300 1 19
Do I have the correct implementation for the table I have? Or should I be changing something. I can also send more information/sample data if that would help.
Thank you again!
I just created some tables with your sample data and used the following SQL. Your 3rd 'patient' doesn't match any of the ranges (Age 1, Income $30K)
SELECT tblPatient.PatName, tblPatient.FamInc, tblPatient.Age, tblPatient.Patient,
tblPatientRange.Program, tblPatientRange.MinInc, tblPatientRange.MaxInc, tblPatientRange.MinAge,
tblPatientRange.MaxAge, tblPatientRange.Patient
FROM tblPatient INNER JOIN tblPatientRange ON tblPatient.Patient = tblPatientRange.Patient
WHERE (((tblPatient.FamInc)>=[tblPatientRange]![MinInc] And (tblPatient.FamInc)<=[tblPatientRange]![MaxInc])
AND ((tblPatient.Age)>=[tblPatientRange]![MinAge] And (tblPatient.Age)<=[tblPatientRange]![MaxAge]));
Let say i want to store several dataset ie
78 94 33 22 14 55 18 10 11
44 59 69 79 39 49 29 19 39
And later on i would like to be able run queries that will determine the frequency of certain number. What would be the best way to this? What would be table structure to make a fast query.
Please be specific as you can be.
To get the counts, you can run a query such as:
SELECT value, COUNT(*) from table_of_values GROUP BY value
Placing an index on the single integer value column is pretty much all you can do to speed that up.
You could of course also just keep a table with every two-digit value and a count. You will have to pre-fill the table with zero counts for every value.
Then increment the count instead of inserting:
UPDATE table_of_values SET count = count + 1 WHERE value = (whatever)