Duplicating every frame in ffmpeg - duplicates

I'm planning on filming a stop-motion at 12 fps, and need to duplicate every frame once. I would rather not do it in camera (save my mirror) - is it possble to do this in ffmpeg? (I've seen lots of posts on removing duplicates, but not on inserting them)
Thanks
Gillian

Related

How to optimally store the viewing time of an episode in the database?

How to optimally store the viewing time of an episode in the database? What I have done now is that as soon as the user starts an episode 1- At first, it is checked whether there is a user observation record in the table or not. 2- If there is, the time column will be updated with Ajax every second 3- If it does not exist, a record will be recorded and the viewing time will be recorded again in the table Note: This system is currently working well, but my question is, if, for example, a thousand people are watching the episodes at the same time, will the database not face a serious problem? Every second, four queries are sent by one person, and if a thousand people see it, this number reaches four thousand queries per second, what is the best solution? Does this system not cause failure?
A consideration is how accurate you want the viewing time to be.
You could for instance, write the viewing time to the session (only) and then persist it to the database every minute, or when the viewer navigates to a new page.
But, every second seems an unrealistic expectation since you have per second requests to your server coming in from every viewer.
So, consider what is reasonable. If it is just for your own stats then maybe 10 second increments is enough. If it is so that the user can leave and return to the exact second they left then again, possible, but probably unreasonable. I don't expect this to happen with youtube or netflix - there is normally some amount of overlap, which actually helps me remember where I got to.

Apache Camel problems aggregating large (1mil record) CSV files

My question is (1) is there a better strategy for solving my problem (2) is it possible to tweak/improve my solution so it works and doesn't split the aggregation in a reliable manner (3 the less important one) how can i debug it more intelligently? figuring out wtf the aggregator is doing is difficult because it only fails on giant batches that are hard to debug because of their size. answers to any of these would be very useful, most importantly the first two.
I think the problem is I'm not expressing to camel correctly that I need it to treat the CSV file coming in as a single lump and i dont want the aggregator to stop till all the records have been aggregated.
I'm writing a route to digest a million line CSV file, split then aggregate the data on some key primary fields, then write the aggregated records to a table
unforuntaely the primary constraints of the table are getting violated (which also correspond to the aggregation keys), implying that the aggregator is not waiting for the whole input to finish.
it works fine for small files of a few thousand records, but on the large sizes it will actually face in production, (1,000,000 records) it fails.
Firstly it fails with a JavaHeap memory error on the split after the CSV unmarshall. I fix this with .streaming(). This impacts the aggregator, where the aggregator 'completes' too early.
to illustrate:
A 1
A 2
B 2
--- aggregator split ---
B 1
A 2
--> A(3),B(2) ... A(2),B(1) = constraint violation because 2 lots of A's etc.
when what I want is A(5),B(3)
with examples of 100, 1000 etc, records it works fine and correctly. but when it processes 1,000,000 records, which is the real-size it needs to handle, firstly the split() gets an OutOfJavaHeapSpace exception.
I felt that simply changing the heap-size would be a short-term solution and just pushing the problem back until the next upper-limit of records comes through, so I got around it by using the .streaming() on the split.
Unfortunately now, the aggregator is being drip-fed the records, not getting them in a big cludge and it seems to be completing early and doing another aggregation, which is violating my primary constraint
from( file://inbox )
.unmarshall().bindy().
.split().body().streaming()
.setHeader( "X" Expression building string of primary-key fields)
.aggregate( header("X") ... ).completionTimeout( 15000 )
etc.
I think part of the problem is that I'm depending on the streaming split not timeing out longer than a fixed amount of time, which just isn't foolproof - e.g. a system task might reasonably cause this, etc. Also everytime I increase this timeout it makes it longer and longer to debug and test this stuff.
PRobably a better solution would be to read the size of the CSV file that comes in and not allow the aggregator to complete until every record has been processed. I have no idea how I'd express this in camel however.
Very possibly I just have a fundamental stategy misunderstanding of how I should be approaching / describing this problem. There may be a much better (simpler) approach that I dont know.
there's also such a large amount of records going in, I can't realistically debug them by hand to get an idea of what's happening (I'm also breaking the timeout on the aggregator when I do, I suspect)
You can split the file first one a line by line case, and then convert each line to CSV. Then you can run the splitter in streaming mode, and therefore have low memory consumption, and be able to read a file with a million records.
There is some blog links from this page http://camel.apache.org/articles about splitting big files in Camel. They cover XML though, but would be related to splitting big CSV files as well.

Regularly Updating Mysql records based on importance

I have a database table of items, lets call them games. each game has a release date
I run a script that selects a game at random, and updates various bits of information such as price etc from my source data. this script is on a cron to fire at regular intervals throughout the day
There are 20,000 odd game records and growing, so obviously some of these games being kept up to date are more important than others. This is mostly based on the release date, but could include data from other fields too.
Is there any way I can get my batch processing script to select a record based on this importance, without having to run through all results until each one has been updated and then start at the top?
So the frequency of updating the more important games would be higher than the less important ones?
As #Usman mentioned, you need to define a way of measuring importance that works properly. Then, my suggestion would be to have your script update two records each time it ran. You'd choose one of those records at random from among the "important" records, and the other one at random from among all records.
That way you would not reduce your probability of updating any given record, and at the same time you'd increase the probability of updating the important ones.
But, you know, even if you ran your random update script once a second, there's no guarantee you'd get to all 20,000 records daily. The fan of the game you don't update for a week might become annoyed that your data was stale. It might be better to update things on a fixed schedule, or when you get new data for them, rather than randomly.

Storing the order of videos that are in a playlist

I am working on a simple video-database with a playlist feature. In such a playlist, videos can be placed in a user-specified order.
So I thought I assign a number_in_playlist to each video_id. The problem with this is, if say video 19 is lateron moved to a position between video 2 and 3, then additionally the number_in_playlist of all the videos inbetween needs to be updated.
Now that strongly reminds me of Array vs Linked List. So I thought a linked list would solve that problem, i.e storing something like previous_video_id_in_playlist and next_video_id_in_playlist for each video record. However, in that case I am not sure how to fetch (in order) all videos that are in the playlist?
This must be a problem that others have encountered before, so I was wondering if there is a standard recommended solution?
PS: I am using MySQL and I very much prefer short, fast queries (which I think speaks against the linked list solution?)
If you make your playlist.number_in_playlist column a double, then you can start by sequencing your videos with whole numbers. When an item in a playlist is moved to a new position, you set the new number_in_playlist value to the (probably fractional) number that is half-way between the preceeding and following videos. This lets you move videos around for a very long time before you ever have to worry about resequencing your whole playlist.
The trigger for resequencing is when your new calculated is equal to one of your end points (i.e. the same value as the preceeding or following video). For practical purposes this will happen very, very rarely unless your users spend more time resequencing videos than watching them.
can you not do something like:
SELECT *
FROM videos
WHERE playlist_id = 1
ORDER BY next_video_id_in_playlist ASC
Is the list usually not too long? Is write performance not a problem? In this case I'd just use the number_in_playlist solution. On every write all numbers need to be updated basically.
Linked lists in a relational database smell like they will cause unforseeable problems. Like cycles caused by bugs.

Getting top line metrics FAST from a large MySQL DB?

I'm painfully aware there probably isn't a magic bullet to this, but it's becoming a problem. Each user has hundreds of thousands of rows of metrics data across 3 tables, this is updated on a second by second basis.
When a user logs in, I want to quickly deliver them top line stats for a number of their assets (i.e. alongside each asset in navi they have top level stats).
I've tried a number of ideas; but please - if someone has some advice or experience in this area it'd be great. Stuff tried or looked into so far:-
Produce static versions of top line stats every hour or so - This is intensive across all users and all assets. So how this can be done regularly, I'm not sure.
Call stats via AJAX, so they can be processed and fill in (getting top level stats right now can take up to 10 seconds for a larger user) once page has loaded. This could also cache stats in session to save redoing queries each page load.
Query run at 30 min intervals, i.e. you log on, it'll query and then it'll hopefully use query cache every time it's loaded (only 1/2 seconds) until the next 30min interval.
The first one seems to have most legs, but I'm not sure how to do this, given only a small number of users will be needing those stats - it seems awfully expensive to do it for everyone all the time.
Produce static versions of top line stats every hour or so - This is
intensive across all users and all assets. So how this can be done
regularly, I'm not sure.
Call stats via AJAX, so they can be processed and fill in (getting
top level stats right now can take up to 10 seconds for a larger
user) once page has loaded. This could also cache stats in session to
save redoing queries each page load.
Query run at 30 min intervals, i.e. you log on, it'll query and then
it'll hopefully use query cache every time it's loaded (only 1/2
seconds) until the next 30min interval.
Your option 1 and 3 in mySQL is known as a materialized view MySQL doesn't currently support them but the concept can be completed link provides examples
hundreds of thousands of records isn't that much. good indexes and the use of analytic queries will get you quite far. Sadly this concept isn't implemented in full but there are workarounds as well as indicated in the link provided.
It really depends on top line stats. are you wanting real time data down to the second or are 10-20 or even 30 minute intervals acceptable? Using event scheduler one can schedule the creation/update of reporting table(s) which contain summarized data faster to query. This data then is available at fractions of seconds delivery time as all the heavy lifting has already been completed. Your focus can then be on indexing these tables to improve performance without worrying about impacts to production tables.
You are in the datawarehousing domain with your setup. This means, that not all the NF1 rules apply. So my approach would be to use triggers to fill a seperate stats table.