Use libpcap to capture multiple interfaces to same file - libpcap

I would like to use libpcap to capture on multiple specific interfaces (not 'any') to the same file
I have the following code (error handling and some args removed):
static gpointer pkt_tracing_thread(gpointer data)
{
while (1)
{
pcap_dispatch(g_capture_device1, .., dump_file1);
pcap_dispatch(g_capture_device2, .., dump_file2);
}
}
fp1 = calloc(1, sizeof(struct bpf_program));
fp2 = calloc(1, sizeof(struct bpf_program));
cap_dev1 = pcap_open_live(interface1,...
cap_dev2 = pcap_open_live(interface2,...
pcap_compile(cap_dev1, fp1, ...
pcap_compile(cap_dev2, fp2, ...
pcap_setfilter(cap_dev1, fp1);
pcap_setfilter(cap_dev2, fp2);
dump_file1 = pcap_dump_open(g_capture_device1, filename);
dump_file2 = pcap_dump_open(g_capture_device2, filename);
g_thread_create_full(pkt_tracing_thread, (gpointer)fp1, ...
g_thread_create_full(pkt_tracing_thread, (gpointer)fp2, ...
This does not work. What I see in filename is just packets on one of the interfaces. I'm guessing there could be threading issues in the above code.
I've read https://seclists.org/tcpdump/2012/q2/18 but I'm still not clear.
I've read that libpcap does not support writing in pcapng format, which would be required for the above to work, although I'm not clear about why.
Is there any way to capture multiple interfaces and write them to the same file?

Is there any way to capture multiple interfaces and write them to the same file?
Yes, but 1) you have to open the output file only once, with one call to pcap_dump_open() (otherwise, as with your program, you may have two threads writing to the same file independently and stepping on each other) and 2) you would need to have some form of mutex to prevent both threads from writing to the file at the same time.
Also, you should have one thread reading only from one capture device and the other thread reading from the other capture device, rather than having both threads reading from both devices.

As user9065877, you have to open the output file only once and write to it only from one thread at a time.
However, since you'd be serializing everything anyway, you may prefer to ask libpcap for pollable file descriptors for the interfaces and poll in a round-robin fashion for packets, using a single thread and no mutexes.

Related

Psychopy: how to avoid to store variables in the csv file?

When I run my PsychoPy experiment, PsychoPy saves a CSV file that contains my trials and the values of my variables.
Among these, there are some variables I would like to NOT be included. There are some variables which I decided to include in the CSV, but many others which automatically felt in it.
is there a way to manually force (from the code block) the exclusion of some variables in the CSV?
is there a way to decide the order of the saved columns/variables in the CSV?
It is not really important and I know I could just create myself an output file without using the one of PsychoPy, or I can easily clean it afterwards but I was just curious.
PsychoPy spits out all the variables it thinks you could need. If you want to drop some of them, that is a task for the analysis stage, and is easily done in any processing pipeline. Unless you are analysing data in a spreadsheet (which you really shouldn't), the number of columns in the output file shouldn't really be an issue. The philosophy is that you shouldn't back yourself into a corner by discarding data at the recording stage - what about the reviewer who asks about the influence of a variable that you didn't think was important?
If you are using the Builder interface, the saving of onset & offset times for each component is optional, and is controlled in the "data" tab of each component dialog.
The order of variables is also not under direct control of the user, but again, can be easily manipulated at the analysis stage.
As you note, you can of course write code to save custom output files of your own design.
there is a special block called session_variable_order: [var1, var2, var3] in experiment_config.yaml file, which you probably should be using; also, you should consider these methods:
from psychopy import data
data.ExperimentHandler.saveAsWideText(fileName = 'exp_handler.csv', delim='\t', sortColumns = False, encoding = 'utf-8')
data.TrialHandler.saveAsText(fileName = 'trial_handler.txt', delim=',', encoding = 'utf-8', dataOut = ('n', 'all_mean', 'all_raw'), summarised = False)
notice the sortColumns and dataOut params

How to run Julia function on specific processor using remotecall(), when the function itself does not have return

I tried to use remotecall() in julia to distribute work to specific processor. The function I like to run does not have any return but it will output something. I can not make it work as there is no output file after running the code.
This is the test code I am creating:
using DelimitedFiles
addprocs(4) # add 4 processors
#everywhere function test(x) # Define the function
print("hi")
writedlm(string("test",string(x),".csv"), [x], ',')
end
remotecall(test, 2, 2) # To run the function on process 2
remotecall(test, 3, 3) # To run the function on process 3
This is the output I am getting:
Future(3, 1, 67, nothing)
And there is no output file (csv), or "hi" shown
I wonder if anyone can help me with this or I did anything wrong. I am fairly new to julia and have never used parallel processing.
The background is I need to run a big simulation (A big function with bunch of includes, but no direct return outputs) lots of times, and I like to split the work to different processors.
Thanks a lot
If you want to use a module function in a worker, you need to import that module locally in that worker first, just like you have to do it in your 'root' process. Therefore your using DelimitedFiles directive needs to occur "#everywhere" first, rather than just on the 'root' process. In other words:
#everywhere using DelimitedFiles
Btw, I am assuming you're using a relatively recent version of Julia and simply forgot to add the using Distributed directive in your example.
Furthermore, when you perform a remote call, what you get back is a "Future" object, which is a way of allowing you to obtain the 'future results of that computation' from that worker, once they're finished. To get the results of that 'future computation', use fetch.
This is all very simplistic and general information, since you haven't provided a specific example that can be copy / pasted and answered specifically. Have a look at the relevant section in the manual, it's fairly clearly written: https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1

Read raw Genicam H.264 data to avlib

I try to get familiar with libav in order to process a raw H.264 stream from a GenICam supporting camera.
I'd like to receive the raw data via the GenICam provided interfaces (API), and then forward that data into libav in order to produce a container file that then is streamed to a playing device like VLC or (later) to an own implemented display.
So far, I played around with the GenICam sample code, which transferres the raw H.264 data into a "sample.h264" file. This file, I have put through the command line tool ffmpeg, in order to produce an mp4 container file that I can open and watch in VLC
command: ffmpeg -i "sample.h264" -c:v copy -f mp4 "out.mp4"
Currently, I dig through examples and documentations for each H.264, ffmpeg, libav and video processing in general. I have to admit, as total beginner, it confuses me a lot.
I'm at the point where I think I have found the according libav functions that would help my undertaking:
I think, basically, I need the functions avcodec_send_packet() and avcodec_receive_packet() (since avcodec_decode_video2() is deprecated).
Before that, I set up an avCodedContext structure and open (or combine?!?) it with the H.264 codec (AV_CODEC_ID_H264).
So far, my code looks like this (omitting error checking and other stuff):
...
AVCodecContext* avCodecContext = nullptr;
AVCodec *avCodec = nullptr;
AVPacket *avPacket = av_packet_alloc();
AVFrame *avFrame = nullptr;
...
avCodec = avcodec_find_decoder(AV_CODEC_ID_H264);
avCodecContext = avcodec_alloc_context3(avCodec);
avcodec_open2 ( avCodecContext, avCodec, NULL );
av_init_packet(avPacket);
...
while(receivingRawDataFromCamera)
{
...
// receive raw data via GenICam
DSGetBufferInfo<void*>(hDS, sBuffer.BufferHandle, BUFFER_INFO_BASE, NULL, pPtr)
// libav action
avPacket->data =static_cast<uint8_t*>(pPtr);
avErr = avcodec_send_packet(avCodecContext, avPacket);
avFrame = av_frame_alloc();
avErr = avcodec_receive_frame( avCodecContext, avFrame);
// pack frame in container? (not implemented yet)
..
}
The result of the code above is, that both calls to send_packet() and receive_frame() return error codes (-22 and -11), which I'm not able to decrypt via av_strerror() (it only says, these are error codes 22 and 11).
Edit: Maybe as an additional information for those who wonder if
avPacket->data = static_cast<uint8_t*>(pPtr);
is a valid operation...
After the very first call to this operation, the content of avPacket->data is
{0x0, 0x0, 0x0, 0x1, 0x67, 0x64, 0x0, 0x28, 0xad, 0x84, 0x5,
0x45, 0x62, 0xb8, 0xac, 0x54, 0x74, 0x20, 0x2a, 0x2b, 0x15, 0xc5,
0x62}
which somehow looks as something to be expected becaus of the NAL marker and number in the beginning?
I don't know, since I'm really a total beginner....
The question now is, am I on the right path? What is missing and what do the codes 22 and 11 mean?
The next question would be, what to do afterwards, in order to get a container that I can stream (realtime) to a player?
Thanks in advance,
Maik
At least for the initally asked question I found the solution for myself:
In order to get rid of the errors on calling the functions
avcodec_send_packet(avCodecContext, avPacket);
...
avcodec_receive_frame( avCodecContext, avFrame);
I had to manually fill some parameters of 'avCodecContext' and 'avPacket':
avCodecContext->bit_rate = 8000000;
avCodecContext->width = 1920;
avCodecContext->height = 1080;
avCodecContext->time_base.num = 1;
avCodecContext->time_base.den = 25;
...
avPacket->data = static_cast<uint8_t*>(pPtr);
avPacket->size = datasize;
avPacket->pts = frameid;
whereas 'datasize' and 'frameid' are received via GenICam, and may not be the appropriate parameters for the fields, but at least I do not get any errors anymore.
Since this answers my initial question on how I get the raw data into the structures of libav, I think, the question is answered.
The discussion and suggestions with/from Vencat in the commenst section lead to additional questions I have, but which should be discussed in a new question, I guess.

GCP dataflow - processing JSON takes too long

I am trying to process json files in a bucket and write the results into a bucket:
DataflowPipelineOptions options = PipelineOptionsFactory.create()
.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
options.setProject("the-project");
options.setStagingLocation("gs://some-bucket/temp/");
Pipeline p = Pipeline.create(options);
p.apply(TextIO.Read.from("gs://some-bucket/2016/04/28/*/*.json"))
.apply(ParDo.named("SanitizeJson").of(new DoFn<String, String>() {
#Override
public void processElement(ProcessContext c) {
try {
JsonFactory factory = JacksonFactory.getDefaultInstance();
String json = c.element();
SomeClass e = factory.fromString(json, SomeClass.class);
// manipulate the object a bit...
c.output(factory.toString(e));
} catch (Exception err) {
LOG.error("Failed to process element: " + c.element(), err);
}
}
}))
.apply(TextIO.Write.to("gs://some-bucket/output/"));
p.run();
I have around 50,000 files under the path gs://some-bucket/2016/04/28/ (in sub-directories).
My question is: does it make sense that this takes more than an hour to complete? Doing something similar on a Spark cluster in amazon takes about 15-20 minutes. I suspect that I might be doing something inefficiently.
EDIT:
In my Spark job I aggregate all the results in a DataFrame and only then write the output, all at once. I noticed that my pipeline here writes each file separately, I assume that is why it's taking much longer. Is there a way to change this behavior?
Your jobs are hitting a couple of performance issues in Dataflow, caused by the fact that it is more optimized for executing work in larger increments, while your job is processing lots of very small files. As a result, some aspects of the job's execution end up dominated by per-file overhead. Here's some details and suggestions.
The job is limited rather by writing output than by reading input (though reading input is also a significant part). You can significantly cut that overhead by specifying withNumShards on your TextIO.Write, depending on how many files you want in the output. E.g. 100 could be a reasonable value. By default you're getting an unspecified number of files which in this case, given current behavior of the Dataflow optimizer, matches number of input files: usually it is a good idea because it allows us to not materialize the intermediate data, but in this case it's not a good idea because the input files are so small and per-file overhead is more important.
I recommend to set maxNumWorkers to a value like e.g. 12 - currently the second job is autoscaling to an excessively large number of workers. This is caused by Dataflow's autoscaling currently being geared toward jobs that process data in larger increments - it currently doesn't take into account per-file overhead and behaves not so well in your case.
The second job is also hitting a bug because of which it fails to finalize the written output. We're investigating, however setting maxNumWorkers should also make it complete successfully.
To put it shortly:
set maxNumWorkers=12
set TextIO.Write.to("...").withNumShards(100)
and it should run much better.

How to use the 7z DLL to compress and append many small chunks of data to a file

I would like to use the 7z DLL to append small amounts of data to one compressed file. At the moment my best guess would be uncompress the 7z file, append the data, and recompress it. Obviously, this is not a good solution performance wise, if the size of the 7z file becomes large (say 1 gb) and I want do save a new chunk every second. How could I do this in a better way?
I could use any compression format supported by the 7z DLL.
Have a look at the Python LZMA bindings (LZMA is the compression algorithm name of 7z), you should do what you want without ctypes stuff.
EDIT
To be confirmed, but a quick look at py7zlib.py shows only support for reading 7z files, not writing. However in the src dir there's a pylzma_compressfile.c, so maybe there's something to do.
EDIT 2
The pylzma.compressfile function seems to be there, so fine.
This is NOT MY ANSWER.
How can I use a DLL file from Python?
I think ctypes is the way to go.
The following example of ctypes is from actual code I've written (in Python 2.5). This has been, by far, the easiest way I've found for doing what you ask.
import ctypes
# Load DLL into memory.
hllDll = ctypes.WinDLL ("c:\\PComm\\ehlapi32.dll")
# Set up prototype and parameters for the desired function call.
# HLLAPI
hllApiProto = ctypes.WINFUNCTYPE (ctypes.c_int,ctypes.c_void_p,
ctypes.c_void_p, ctypes.c_void_p, ctypes.c_void_p)
hllApiParams = (1, "p1", 0), (1, "p2", 0), (1, "p3",0), (1, "p4",0),
# Actually map the call ("HLLAPI(...)") to a Python name.
hllApi = hllApiProto (("HLLAPI", hllDll), hllApiParams)
# This is how you can actually call the DLL function.
# Set up the variables and call the Python name with them.
p1 = ctypes.c_int (1)
p2 = ctypes.c_char_p (sessionVar)
p3 = ctypes.c_int (1)
p4 = ctypes.c_int (0)
hllApi (ctypes.byref (p1), p2, ctypes.byref (p3), ctypes.byref (p4))
The ctypes stuff has all the C-type data types (int, char, short, void*, ...) and can pass by value or reference. It can also return specific data types although my example doesn't do that (the HLL API returns values by modifying a variable passed by reference).