ffprobe frame information in decode or presentation order? - ffprobe

When you do a command like:
ffprobe -show_frames inputfile
In what order does it give you the frames? Because Presentation order and Decode Order will be different... I am thinking that depending on which order it gives you, then others who are trying to count the size of the GOP by grepping for pict-type and then counting it that way may be getting slightly different results?
Thanks

Presentation order; see the coded_picture_number progression - for streams with B-frames, the sequence won't be monotonic.

Related

jq: groupby and nested json arrays

Let's say I have: [[1,2], [3,9], [4,2], [], []]
I would like to know the scripts to get:
The number of nested lists which are/are not non-empty. ie want to get: [3,2]
The number of nested lists which contain or not contain number 3. ie want to get: [1,4]
The number of nested lists for which the sum of the elements is/isn't less than 4. ie want to get: [3,2]
ie basic examples of nested data partition.
Since stackoverflow.com is not a coding service, I'll confine this response to the first question, with the hope that it will convince you that learning jq is worth the effort.
Let's begin by refining the question about the counts of the lists
"which are/are not empty" to emphasize that the first number in the answer should correspond to the number of empty lists (2), and the second number to the rest (3). That is, the required answer should be [2,3].
Solution using built-in filters
The next step might be to ask whether group_by can be used. If the ordering did not matter, we could simply write:
group_by(length==0) | map(length)
This returns [3,2], which is not quite what we want. It's now worth checking the documentation about what group_by is supposed to do. On checking the details at https://stedolan.github.io/jq/manual/#Builtinoperatorsandfunctions,
we see that by design group_by does indeed sort by the grouping value.
Since in jq, false < true, we could fix our first attempt by writing:
group_by(length > 0) | map(length)
That's nice, but since group_by is doing so much work when all we really need is a way to count, it's clear we should be able to come up with a more efficient (and hopefully less opaque) solution.
An efficient solution
At its core the problem boils down to counting, so let's define a generic tabulate filter for producing the counts of distinct string values. Here's a def that will suffice for present purposes:
# Produce a JSON object recording the counts of distinct
# values in the given stream, which is assumed to consist
# solely of strings.
def tabulate(stream):
reduce stream as $s ({}; .[$s] += 1);
An efficient solution can now be written down in just two lines:
tabulate(.[] | length==0 | tostring )
| [.["true", "false"]]
QED
p.s.
The function named tabulate above is sometimes called bow (for "bag of words"). In some ways, that would be a better name, especially as it would make sense to reserve the name tabulate for similar functionality that would work for arbitrary streams.

How can I limit the result set of gulp.src to give only X number of files?

In a directory I have the following files:
1.whatIneed
2.whatIneed
3.whatIneed
4.whatIneed
5.whatIneed
6.whatIneed
7.whatIneed
8.whatIneed
9.whatIneed
10.whatIneed
I know that gulp.src will get all of those files, which is fine.
gulp.src("./files/*.whatIneed")
I additionally pipe gulp-sort into this to reverse the order. gSort is my defined constant for require("gulp-sort");
gulp.src("./files/*.whatIneed").pipe(gSort({asc: false}))
This does what I need and reverses the order of the file list.
10.whatIneed
9.whatIneed
8.whatIneed
7.whatIneed
6.whatIneed
5.whatIneed
4.whatIneed
3.whatIneed
2.whatIneed
1.whatIneed
This is where I am stuck. I want to limit that list now to only use the top X number of results to limit my list to this:
10.whatIneed
9.whatIneed
Any help would be appreciated!
I was able to get this solved with additional trial and error along with some friends helping out.
By using glob - I was able to turn the file list into a native js array. Then slice and reverse accordingly.
glob.sync(files_needed_path).slice(-3).reverse();

Complex Gremlin queries to output nodes/edges

I am trying to implement a query and graph visualisation framework that allows a user to enter a Gremlin query, returning a D3 graph of results. The D3 graph is built using a JSON - this is created using separate vertices and edges outputs from the Gremlin query. For simple queries such as:
g.V.filter{it.attr_a == "foo"}
this works fine. However, when I try to perform a more complicated query such as the following:
g.E.filter{it.attr_a == 'foo'}.groupBy{it.attr_b}{it.outV.value}.cap.next().findAll{k,e->e.size()<=3}
- Find all instances of *value*
- Grouped by unique *attr_b*
- Where *attr_a* = foo
- And *attr_b* is paired with no more than 2 other instances of *value*
Instead, the output is of the following form:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
I would like to know if there is a way for Gremlin to output the results as a list of nodes and edges so I can display the results as a graph. I am aware that I could edit my D3 code to take in this new output but there are currently no restrictions to the type/complexity of the query, so the key/value pairs will no necessarily be the same every time.
Thanks.
You've hit what I consider one of the key problems with visualizing Gremlin results. They can be anything. Gremlin results might not just be a list of vertices and edges. There is no way to really control this that I can think of. At the end of the day, you can really only visualize results that match a pattern that D3 expects. I'd start by trying to detect that pattern and visualize only in those cases (simply display non-recognized patterns as JSON perhaps).
Thinking of your specific example that results like this:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
What would you want D3 to visualize there? The vertices/edges that were traversed over to get that result? If so, you might be stuck. Gremlin doesn't give you a way to introspect the pipeline to see what's passing through it. In other words, unless the user explicitly gathers vertices and edges within the pipeline that were touched you won't have access to them. It would be nice to be able to "spy" on a pipeline in that way, but at the moment it doesn't do that. There's been internal discussion within TinkerPop to create a new kind of pipeline implementation that would help with that, but at the moment, it doesn't exist.
So, without the "spying" capability, I think your only workarounds would be to:
detect vertex/edge list on your client side and only render those with d3. this would force users to always write gremlin that returned data in such a format, if they wanted visualization. put it in the users hands.
perhaps supply server-side bindings for a list of vertices/edges that a user could explicitly side-effect their vertices/edges into if their results did not conform to those expected by your visualization engine. again, this would force users to write their gremlin appropriately for your needs if they want visualization.

How to connect SentiWordNet to RapidMiner?

SentiWordNet is a text file. In RapidMiner 'OpenWordNet Dictionary' can only be used to access only exe files. How can I extract the sentiment scores from SentiWordNet for further processing?
Thanks in Advance.
of course you can.. with a little bit of code you can take the sentiwordnet score from the text file.
but the problem is each same word might have several different meaning.
in handling this you can simply take the average score or doing wordsense disambiguation

Where do I get "junk" data to help test my code?

For my C class I've written a simple statistics program -- it calculates max, min, mean, etc. Anyway, I've gotten the program successfully compiled, so all I need to do now is actually test it; the only problem is that I don't have anything to test with.
In my case, I need a list of doubles -- my program needs to accept between 2 and 1,000,000; Is there some resource online that can produce lists of otherwise meaningless data? I know Lorem Ipsum gets used for typesetting, and I'm wondering if there's something similar for various types of numerical data.
Or am I out of luck, and I'll have to just create my own junk data?
The problem with testing software is not the source of the data, but the test set. I mean, can you test an int sum(int a, int b) method by just inputting random numbers to it? No, you need to know what to expect. This is a test set: inputs and expected outputs.
What do you say when you discover that 548888876+99814465=643503341? How can you tell this is the real result?
More than finding random numbers to give your program, you must somehow know the results of your computation in advance in order to compare it.
There are a few ways to do it: what I suggest you is to pick a random number generator (amphetamachine +1) and use the data both on your code and on a program that you already know is good, ie. Matlab for your purposes. After computing your statistics with both, compare your results and see if you coded good or need to do some debug.
By the way, I volountarily altered the result of the above sum...
What about just generating a random double?
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
double number = r.NextDouble();
//do something with the value
}
Since the data you need will depend on the program, there is no source of generic data that I know of.
If you are able to write that program, you should be able to write a script to generate dummy data for yourself.
Just use a loop to print out random numbers within the range your program can accept.
Generate a file with random bytes:
$ dd \
of=random-bytes \
if=/dev/urandom \
bs=1024 \
count=1024
http://www.generatedata.com/#generator
I've used that data generator before with some success. To be fair, it will usually involve copy/pasting the data it generates into some other format that you'll be able to read in.
You can generate your own data for this specific case quite easily though. Loop a random number of times with a terminating condition of 1,000,000. Generating random doubles within the range you expect. Feed that in and away you go.
Generating your own test data in this case is probably the best option.
You could take the first million digits of pi and chop them up into however many doubles you want.
The first few could be 3.14159, 2.65358, 9.79323, 8.46264, 3.38327, 9.50288, 4.19716, and 9.39937, for example.