I'm trying to de-duplicate an output CSV column by column, stack the results in a new CSV while also deduplicating them again - duplicates

I've been working with Wireshark for a while now, and I've written scripts using tshark to pull specific types of data out and organize it in CSV files in Linux. Now, I'm trying to pull out all of the ports in a given day's collect and only show the unique ports used in that day, regardless of whether they were source or destination. An example of what I'm trying to have as a result:
Source
Destination
Output
1
2
1
14
4
2
22
2
14
14
6
4
1
4
22
22
8
6
8

Related

How to reduce redundant cells in a column containing logged data

Is there a function to reduce the amount of redundant data from one column to match the number of cells in a second column?
I have logged data from two sensors that sent values at different rates. in 8 hours, I collected 11857 values for the first sensor and 8130 for the second one.
I need to compress the first column by deleting data to match the number of cells on the second column, so I can display synchronized values on a chart.
It is not a matter of cutting 3727 cells from the head or tail of the first column, but to delete cells in a proportional way.
I've tried using de Modulus function, but it does not give me the right amount of compression; e.g., by running =MOD(A1,3) and then filtering cells containing '0' value and deleting those rows, I get 7905, which is close to 8130 but still, the data is shifted out.
Edit:
I found a method that requires several steps:
Copy the sensors' data into two columns
Get the number of cells for both columns using COUNTA
Get the ratio between the smaller count over the bigger count
In a new column, create an index for the rows using =INT(ROW()*ratio)
Remove duplicate rows using the index column as the reference with Data > Remove Duplicates
It works, but it will be much faster if there was a ready-made function that will run over the provided data columns and copy the values into two new columns
I tested this solution in LibreOffice Calc. The functions used are basic enough to be found in Excel as well.
Here's a sample with data from 2 sensors, s1 and s2, similar to yours:
Row s1 s2
1 2 3
2 4 6
3 6 9
4 8 12
5 10 15
6 12 18
7 14 21
8 16
9 18
10 20
11 22
What I did was match the data from s1 samples with those from s2 that relatively match the position of the first, so instead of ending up with a number of rows with no s2 values, I padded non-existent s2 values with the last sample taken for any given period of time (column s2a)
Row s1 s2 s2a
1 2 3 3
2 4 6 6
3 6 9 6
4 8 12 9
5 10 15 12
6 12 18 12
7 14 21 15
8 16 18
9 18 18
10 20 21
11 22 21
Assuming that s1 is column A and s2 is column B in the spreadsheet, the function you want on each cell of the new column is:
=INDIRECT( ADDRESS( CEILING( ROW()* COUNT(B:B)/COUNT(A:A)),2))
Let's go from bottom to top:
COUNT(B:B)/COUNT(A:A) - this is the ratio. 0.63' above. It indicates that each sample in any given row in s1 will be found at that row x 0.63 in column s2.
Ceiling - Spreadsheets don't start at row 0, so the first one HAS to be 1. I experimented with Int(), but if the ratio were less than 0.5 we would end up with a 0, which we don't want.
Address - Returns a string with the address of a cell given its row,column coordinates (e.g. Address(3, 2) = "B3" and Address(3,2,2) as used here, will yield an absolute column or "$B3").
Indirect - Returns the contents of a cell whose address is passed as a string (e.g. Address("x5") will return whatever value is stored in cell X5).
Alex

How to create complex JSON config maps in q?

Is there a good way in q to input somewhat large complicated nested dictionaries which represent/will be converted to json? I'm trying to control the echarts javascript library which basically just renders charts based on json config options. What I'm doing now is:
opt.title.text:"my chart"
opt.xAxis.data:til 100
opt.series.data:100?5
opt.series.type:`line
toClient[opt] /serializes and sends to browser
but is there an obvious way to get rid of the intermediate assignment? Is making a function to take key-path/value pairs and turn them into a dictionary the way to go or is there a better way to go about this?
Or is this something that should be avoided in q, and instead just manually set write q to set specific options and handle the json object map in the javascript client?
Not sure if this is really what you are looking for, but you can create the nested dictionary structure directly if that's what you're after?
q)`title`xAxis`series!(enlist[`text]!enlist"my chart";enlist[`data]!enlist til 100;`data`type!(100?5;`line))
title | (,`text)!,"my chart"
xAxis | (,`data)!,0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ..
series| `data`type!(0 1 1 3 3 3 2 2 4 1 3 3 1 4 0 4 4 4 2 4 3 3 4 0 4 0 0 1 0..

Microsoft Access returns multiple rows, but I only want to display one

I'm using Microsoft Access 2007 to query two separate SharePoint sources.
The first has most of the data I need. The unique ID number for each row in the first has a corresponding column in the second. The ID from the first can occur multiple times in the second. (It's a mapping between two different databases of defects.)first.
What I want to do is this: find all the ID's from table one that occur in the second, and list ID's from the second for each item that corresponds with the first. For starters, I want something a bit like this:
Table 1 ID Table 2 ID's
5 9, 13, 23
10 11, 15
20 8
But there's also more data from Table 1 I want to display for each item.
What I'm getting is this:
Table 1 ID Table 2 ID Table 1 Data
5 9 Row 5 Additional Data
5 13 Row 5 Additional Data
5 23 Row 5 Additional Data
10 11 Row 10 Additional Data
10 15 Row 10 Additional Data
20 8 Row 20 Additional Data
What I want is something like this:
Table 1 ID Table 2 ID's Table 1 Data
5 9, 13, 23 Row 5 Additional Data
10 11, 15 Row 10 Additional Data
20 8 Row 20 Additional Data
Or perhaps:
Table 1 ID Table 2 ID's
5 9, 13, 23
Row 5 Additional Data
10 11, 15
Row 10 Additional Data
20 8
Row 20 Additional Data
How can I create a report like that?
Comma-separated list from multiple records
Grouping of multiple data rows into a comma-separated list is not a built-in feature of Access. There are various ways to do this, but I most often see links to Allen Browne's tutorial.
Multi-line row details
The difference between your last two examples is just a matter of formatting a Form or Report in Design View. A Report (here capitalized) in Access is a specific type of object for generating custom, formatted views of your data, often for printing or read-only viewing. A Form is a dynamic, on-screen view of your data. I suspect that your use of "report" is of a more general sense.
First of all, there is no way to make multiple lines using the default Datasheet View of tables and queries. To get multiple lines per row of data, you need to create a Form or Report object in Access. In Design View, you can move the data controls around the detail area to produce multiple lines for each data row. I suggest searching for tutorials on the web for creating Access Forms and Reports.
See Guide to designing reports.

How to apply a formula for removing data noise in R?

I am working on NGSim Traffic data, having 18 columns and 1180598 rows in a text file. I want to smooth the position data, in the column 'Local Y'. I know there are built-in functions for data smoothing in R but none of them seem to match with the formula I am required to apply. The data in text file looks something like this:
Index VehicleID Total_Frames Local Y
1 2 5 35.381
2 2 5 39.381
3 2 5 43.381
4 2 5 47.38
5 2 5 51.381
6 4 8 504.828
7 4 8 508.325
8 4 8 512.841
9 4 8 516.338
10 4 8 520.854
11 4 8 524.592
12 4 8 528.682
13 4 8 532.901
14 5 7 39.154
15 5 7 43.153
16 5 7 47.154
17 5 7 51.154
18 5 7 55.153
19 5 7 59.154
20 5 7 63.154
The above data columns are just example taken out of original file. Here you can see 3 vehicles, with vehicle IDs = 2, 4 and 5 but in fact there are 2169 vehicles with different IDS. The column Total_Frames tell us how many times vehicle Id of each vehicle is repeated in the first column, for example in the table above, vehicle ID 2 is repeated 5 times, hence '5' in Total_Frames column. Following is the formula I am required to apply to remove data noise (smoothing) from column 'Local Y':
Smoothed Position Value = (1/(Summation of [EXP^-abs(i-k)/delta] from k=i-D to i+D)) * ( (Summation of (Local Y) *[EXP^-abs(i-k)/delta] from k=i-D to i+D))
where,
i = index #
delta = 5
D = 15
I have tried using the built-in functions, which I know of, but they don't smooth the data as required. My question is: Is there any built-in function in R which can do the data smoothing in the way of given formula or which could take this formula as an argument? I need to apply the formula to every value in Local Y which has 15 values before and 15 values after them (i-D and i+D) for same vehicle Id. Can anyone give me any idea how to approach the problem? Thanks in advance.
You can place your formula in a function and then use the apply function of R to apply it to the elements in your "Local Y" column of the dataframe

Add together grouped rows into one value

I've got an issue where I've been presented data in this format from SQL, and have directly imported that into SSRS 2008.
I do have access to the stored procedure for this report, however I don't want to change it as a few other reports rely on it.
Project HoursSpent Cost
1 5 45
1 8 10
1 7 25
1 5 25
2 1 15
2 3 10
2 5 15
2 6 10
3 6 10
3 4 5
3 4 10
3 2 5
I've been struggling all morning to understand how/when I should be implementing the SUM() function with this.
I have tried already to SUM() the rows, but it still outputs the above result.
Should I be adding any extra groups?
Ideally, I need to have the following output:
Project HoursSpent Cost
1 25 105
2 15 40
3 16 30
EDIT: Here is my current structure:
"LineName" is a group for each project
You have to add a row group on "Project" since you want to sum up the data per each project.