How to reduce redundant cells in a column containing logged data - mysql

Is there a function to reduce the amount of redundant data from one column to match the number of cells in a second column?
I have logged data from two sensors that sent values at different rates. in 8 hours, I collected 11857 values for the first sensor and 8130 for the second one.
I need to compress the first column by deleting data to match the number of cells on the second column, so I can display synchronized values on a chart.
It is not a matter of cutting 3727 cells from the head or tail of the first column, but to delete cells in a proportional way.
I've tried using de Modulus function, but it does not give me the right amount of compression; e.g., by running =MOD(A1,3) and then filtering cells containing '0' value and deleting those rows, I get 7905, which is close to 8130 but still, the data is shifted out.
Edit:
I found a method that requires several steps:
Copy the sensors' data into two columns
Get the number of cells for both columns using COUNTA
Get the ratio between the smaller count over the bigger count
In a new column, create an index for the rows using =INT(ROW()*ratio)
Remove duplicate rows using the index column as the reference with Data > Remove Duplicates
It works, but it will be much faster if there was a ready-made function that will run over the provided data columns and copy the values into two new columns

I tested this solution in LibreOffice Calc. The functions used are basic enough to be found in Excel as well.
Here's a sample with data from 2 sensors, s1 and s2, similar to yours:
Row s1 s2
1 2 3
2 4 6
3 6 9
4 8 12
5 10 15
6 12 18
7 14 21
8 16
9 18
10 20
11 22
What I did was match the data from s1 samples with those from s2 that relatively match the position of the first, so instead of ending up with a number of rows with no s2 values, I padded non-existent s2 values with the last sample taken for any given period of time (column s2a)
Row s1 s2 s2a
1 2 3 3
2 4 6 6
3 6 9 6
4 8 12 9
5 10 15 12
6 12 18 12
7 14 21 15
8 16 18
9 18 18
10 20 21
11 22 21
Assuming that s1 is column A and s2 is column B in the spreadsheet, the function you want on each cell of the new column is:
=INDIRECT( ADDRESS( CEILING( ROW()* COUNT(B:B)/COUNT(A:A)),2))
Let's go from bottom to top:
COUNT(B:B)/COUNT(A:A) - this is the ratio. 0.63' above. It indicates that each sample in any given row in s1 will be found at that row x 0.63 in column s2.
Ceiling - Spreadsheets don't start at row 0, so the first one HAS to be 1. I experimented with Int(), but if the ratio were less than 0.5 we would end up with a 0, which we don't want.
Address - Returns a string with the address of a cell given its row,column coordinates (e.g. Address(3, 2) = "B3" and Address(3,2,2) as used here, will yield an absolute column or "$B3").
Indirect - Returns the contents of a cell whose address is passed as a string (e.g. Address("x5") will return whatever value is stored in cell X5).
Alex

Related

Mysql put record between two records order

here are records and we want move id #1 between #3 & #4
id title sort
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
method one :
get #3 sort number and plus 1 to it and update #1 sort with that so we have
id title sort
1 a 4
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
and then plus 1 to #4 sort and any records after that
and we have
id title sort
1 a 4
2 b 2
3 c 3
4 d 5
5 e 6
6 f 7
and after sort
id title sort
2 b 2
3 c 3
1 a 4
4 d 5
6 e 6
6 f 7
it works fine but imagine we have 2,000,000 records and all records must be update...
method two
get sum sort of #3 and #4 and divide on 2 => (3+4)/2=3.5
and just put it for #1 sort
id title sort
2 b 2
3 c 3
1 a 3.5
4 d 4
5 e 5
6 f 6
it is work fine too but imagine thousand of this operation make big floats like 3.99999999999 and after a while its horrible
is there any mysql/mariadb trick or method for do this ?
Your "drop it half-way between items" method may be the best.
Let's go with BIGINT UNSIGNED since it gives you 64 bits in 8 bytes. Less good: DOUBLE would give you 53 bits in 8 bytes, and some funny business with exponents. DECIMAL gives you more bits at a cost of more bytes, while not eliminating the need for the following code.
You know which row to put it "after" based on user input?
Discover the row after by using ORDER BY ... ASC LIMIT 1.
Average the two values; check to see if the avg is equal either of them -- if so, you have a bad case.
Digression... 2M rows. Start with 2K, 4K, 6K, etc as the sort values (2M*2K = 4G, the limit of BIGINT UNSIGNED.)
This says you can squeeze 2K items between any adjacent pair. However, in the worst case of repeatedly inserting exactly after the first value, you get only 11 inserts before hitting the wall. 11 ~= log2(2000). That is, the re-sort is may be quick, but up to 1 time in 11, it will be costly.
(Please don't quibble between 2K meaning 2000 vs 2048; it does not matter to the algorithm.)
So, what to do when there is no room to insert a new sort value? Rebuilding the numbers would lock the table (of 2M rows) for "too long", so let's try to avoid that.
How about this:
Grab the 10 rows before and after (2 SELECTs with ORDER BY and LIMIT). Fix those sort values so that they are evenly spread out.
Possibly no issue with hitting the start or end of the table; it would be less than 20 rows. And there is a silent 0 and 4G-1 boundaries.
If the 20 rows are not enough, then broaden the span.
Do all this (including the original, simple, half-way code) in a transaction.
Use FOR UPDATE on all(?) SELECTs so that other threads are blocked.
Check for deadlocks. If encountered, start over completely. (The second try will probably find that the half-way attempt works fine -- because some other thread finished spreading the sort values out.)
Timing:
The half-way case, even with transaction, will probably take a millisecond or so.
The more complex case won't take much longer, in spite of locking and updating 20 rows.
You could probably handle 1K actions per second.

Finding Reccurring Number Combinations in Column of Numbers

I have searched and found discussions and solutions to similar problems, but not quite or as complex as I'm trying to figure out.
I have an access table which consists of two columns Draw Number and Number Drawn as shown below. Draw Number is repeated 20 times, to correspond to the 20 numbers that are drawn in each particular draw.
I'm trying to figure a way to determine the most frequent occurring combination of numbers (5 numbers) for all of the draws in each of the 20 number sets. So for instance, 12341 occurs n x, 12342 occurs nx, 12343 occurs n x, etc.
I've created parameter queries which allow me to search for different number combinations from 2 to 10 numbers, and they work OK returning the number of occurrences of a combination of numbers that I input through a simple UI. But the goal is to figure out pragmatically what the optimum combination of numbers.
Hope this makes sense. And by the way, there are 36 million or so rows in the table. The para queries work quite well however; it takes just over a second to return results for each number added. So, query two numbers = 2 second wait, three numbers = 3 second wait, etc.
I've been thinking about a loop of some type but don't know how to get started? Processing time isn't an issue; can take a day if required!
This is written in VBA and has an assortment of queries, temp tables, etc to get the job done.
The text says Access, but the tags say MySql, which is it? – RBarryYoung 21 hours ago
This part confuses me: I'm trying to figure a way to determine the most frequent occurring combination of numbers (5 numbers) for all of the draws in each of the 20 number sets. So for instance, 12341 occurs n x, 12342 occurs nx, 12343 occurs n x, etc. – Newd 21 hours ago
^What do you mean five numbers? No where in your sample data do I see 12341. Please explain using the data you have, and give expected results using that data. – McAdam331 21 hours ago
drosberg - clarification:
thanks for the response. It is an Access application, but as a first-time poster Stackoverflow recommends tags?
By five numbers I mean the most frequently occurring group of five numbers (I used five as an example, could be groups of 2 to 10 numbers) which occur in each draw, where a draw consists of 20 drawn numbers from a total of 80 numbers. So the data that I posted was intended as an example. The sample provided only has 50, 51 in common. I can plug 50 and 51 into the parameter query and it will tell me that this combination occurs 60,000 times (or whatever), but perhaps 50 and 57 occurs 65,000 times.
If i was to do this manually, and assuming I'm looking for the most frequent 5 number combination I would enter the following in the parameter query: 1,2,3,4,1 group = 30,000 occurrences 1,2,3,4,2 group = 31,000 occurrences 1,2,3,4,3 group = 31,050 occurrences 1,2,3,4,4 group = 29,050 occurrences etc........... etc...........
but I would have to do this for every combination of 5 numbers that can be derived from the numbers 1 thru 80. I'm hoping to have program do the work!!
thanks
don
DRAW NUMBER NUMBER DRAWN
1 1
1 28
1 19
1 3
1 38
1 46
1 43
1 29
1 13
1 22
1 20
1 11
1 50
1 51
1 53
1 54
1 57
1 64
1 76
1 78
2 29
2 14
2 2
2 1
2 35
2 40
2 39
2 30
2 10
2 27
2 21
2 6
2 42
2 50
2 51
2 53
2 54
2 61
2 65
2 69
I wrote a post a while ago about generating permutations with and without repetition using Excel. Perhaps you can use it.
https://michiel.wordpress.com/2015/03/29/permutations-with-repetition-using-excel/
Here's how it works. I am using strings, but you can easily modify that for numbers (since you say you need 5).
You can use the MID function to grab a single char from a string, and generate permutations from it.
=MID(Pattern,MOD([N]/[P],Length)+1,1)
N revers to the column N
P refers to the horizontal row (1,4,16). You can generate these with a formula like =4^.
After putting in the code, you can make a list of all permutations in Excel and in the cell next to it generate a sql query that you can perform as well from VBA.
Example: Looking up Access database in Excel
Or find a commercial tool like http://thingiequery.com/
I don't know if there's any open source tools for it.
I'm thinking that you should consider:
Say there are 100 balls.
Setting up a table to have one row for each "Draw number" with 100 columns one for every possible number each column has type boolean.
When you look to see which draws had number 23 you just add a
WHERE Column23 = true.
For numbers 23 and 56
WHERE Column23 = true AND Column56 = true
This should massivel simplify and speed up your SQL.
You set up a table with every possible combination of numbers.
You run SQL to find the counts.
Harvey

SSRS - Sum each cell value into next column

SSRS Question - Is there a way to sum each cell value into
the next colum. Here's what I'm trying to achieve. Colm B displays
the sum of colum A upto that row
Col A Col B
1
1
2
3
3
6
4
10
5
15
6
21
7
28
8
36
9
45
You want running totals. Everything you need is here.
Basically it will take each value from a data set and sum it up with the total from all previous values.
Some basic syntax: =RunningValue(Fields!A.Value,Sum,"yourDataSet")

How to apply a formula for removing data noise in R?

I am working on NGSim Traffic data, having 18 columns and 1180598 rows in a text file. I want to smooth the position data, in the column 'Local Y'. I know there are built-in functions for data smoothing in R but none of them seem to match with the formula I am required to apply. The data in text file looks something like this:
Index VehicleID Total_Frames Local Y
1 2 5 35.381
2 2 5 39.381
3 2 5 43.381
4 2 5 47.38
5 2 5 51.381
6 4 8 504.828
7 4 8 508.325
8 4 8 512.841
9 4 8 516.338
10 4 8 520.854
11 4 8 524.592
12 4 8 528.682
13 4 8 532.901
14 5 7 39.154
15 5 7 43.153
16 5 7 47.154
17 5 7 51.154
18 5 7 55.153
19 5 7 59.154
20 5 7 63.154
The above data columns are just example taken out of original file. Here you can see 3 vehicles, with vehicle IDs = 2, 4 and 5 but in fact there are 2169 vehicles with different IDS. The column Total_Frames tell us how many times vehicle Id of each vehicle is repeated in the first column, for example in the table above, vehicle ID 2 is repeated 5 times, hence '5' in Total_Frames column. Following is the formula I am required to apply to remove data noise (smoothing) from column 'Local Y':
Smoothed Position Value = (1/(Summation of [EXP^-abs(i-k)/delta] from k=i-D to i+D)) * ( (Summation of (Local Y) *[EXP^-abs(i-k)/delta] from k=i-D to i+D))
where,
i = index #
delta = 5
D = 15
I have tried using the built-in functions, which I know of, but they don't smooth the data as required. My question is: Is there any built-in function in R which can do the data smoothing in the way of given formula or which could take this formula as an argument? I need to apply the formula to every value in Local Y which has 15 values before and 15 values after them (i-D and i+D) for same vehicle Id. Can anyone give me any idea how to approach the problem? Thanks in advance.
You can place your formula in a function and then use the apply function of R to apply it to the elements in your "Local Y" column of the dataframe

how to push data down a row in sql results

I would like help with sql query code to push the consequent data in a specific column down by a row.
For example in a random table like the following,
x column y column
6 6
9 4
89 30
34 15
the results should be "pushed" down a row, meaning
x column y column
6 null or 0 (preferably)
9 6
89 4
34 30
SQL tables have no inherent concept of ordering. Hence, the concept of "next row" does not make sense.
Your example has no column that specifies the order for the rows. There is no definition of next. So, what you want to do cannot be done.
I am not aware of a simple way to do this with the way you are showing the table being formatted. If your perhaps added two consecutively numbered integer fields that provide row number and row number + 1 values, you could join the table to itself and get that information.
After taking a backup of you table:
Make a PHP function that will:
- Load all values of Y into an array
- Set Y = 0 (MYSQL UPDATE)
- load the values back from PHP array to MYSQL