Documentation for Apple's ithmb format? - ipod

I am looking for documentation of the ithmb format used by Apple for photos stored on an Apple iPod. I would be happy with source code or a description. The only "documentation" I can find is pre-compiled executables that crack out the JPEGs.
Does anyone know how to do this?

According to the makers of FileJuicer (which can perform this conversion on a Mac),
The ithmb files are compressed using 16 bits per pixel in YUV format for television and they are therefore a bit smaller than the converted TIFF files.
It appears that the ithmb format varies from device to device (iPod Photo, iPhone, etc.), so you'll probably have to manipulate things based on which device the file came from.
FileJuicer ithmb Format Page:
http://echoone.com/filejuicer/formats/ithmb
Wikipedia YUV Page:
http://en.wikipedia.org/wiki/YUV
FOURCC YUV/RGB File Information:
http://www.fourcc.org/fccyvrgb.php
http://www.fourcc.org/yuv.php

There has been a discussion in the ilounge forum a while ago,
http://forums.ilounge.com/showthread.php?p=650968
and the ipodlinux wiki has some informations about the itunesDB (including the photo parts)
http://www.ipodlinux.org/wiki/ITunesDB

Related

Running a deep learning TTS in 2019 using (DeepVoice | WaveNet | etc)

I'm attempting to convert a series of sentences in a txt file to WAV files in as clear a voice as possible.
According to a 2019 survey there are many recent advancements using deep learning techniques.
Which is great news, because the built-in or commonly used text-to-speech engines sound very robotic. (OSX's "say" command, espeak, etc).
The problem is, the github pages or collab notebook links are focused on how to train a new model, or set up a docker instance, and don't seem to include a minimum
git clone ...
./speak "How are you doing?" -o hayd.wav
Do you know how to install and run any of the 2019 engines from that article to speak a sentence?
I'll update if/when I find one that works.
I don't know about any of the others in the list, but for WaveNet you can use Google's API. Your code sends the text to Google, and they return the audio. There are client libraries available for C#, Go, Java, Node.js, PHP, Python, and Ruby. If you want to do it from another language you could use the REST API. For WaveNet, the first 1 million characters per month are free. After that it is $16 per 1 million characters. See their pricing page.
If your project is a relatively small one-off and you are not bothered about doing it programatically (it wasn't clear from the question), then you could just use their online demo page and use a browser add-on (e.g. Video DownloadHelper or one of many others) to download the results as audio files. Alternatively you could use the API on the command line.
The quality of WaveNet is in my opinion, excellent, and is a vast improvement over the previous generations of text-to-speech algorithms. You can almost believe the voices are real at times.

Is there a way to offer multiple video qualities (resolutions) without uploading multiple videos in HTML5 video player?

I'm trying to add a few videos to my website using HTML5. My videos are all 1080, but I want to give people the option to watch in a lower quality if needed. Can I do this without having to upload multiple videos (1 for each quality) without the usage of a server-side language?
I've been extensively searching for this. Haven't find anyone say that it can't be done, but no one said it can either. I am using Blogger as my host, which is why I am can't use server-side languages.
Thank you.
without the usage of a server-side language?
Yes, of course. The client can choose what version of the video to download.
Can I do this without having to upload multiple videos (1 for each quality)
Not practically, no. You need to transcode that video and upload those different versions.
Haven't find anyone say that it can't be done
A couple things to consider... first is that a video file can contain many streams. I don't know what your aversion is to multiple files, but yes it is possible to have several bitrates of video in a single container. A single MP4, for example, could easily contain a 768 kbps video, a 2 Mbps video, and an 8 Mbps video, while having a single 256 kbps audio track.
To play such a file, a client (implemented with Media Source Extensions and the Fetch API) would need to know how to parse the container and make ranged requests for specific chunks out of the file. To my knowledge, no such client exists as there's little point to it when you can simply use DASH and/or HLS. The browser certainly doesn't do this work for you.
Some video codecs, like H.264, support the concept of scaling. The idea here is that rather than having multiple encodings, there's just one where additional data enhances the previous video that was sent. There is significant overhead with this mechanism, and even more work you'd have to do. Not only does your code now need to understand the container, but now it has to handle the codec in use as well... and it needs to do it efficiently.
To summarize, is it possible to use one file? Technically, yes. Is there any benefit? None. Is there anything off-the-shelf for this? No.
Edit: I see now your comment that the issue is one of storage space. You should really put that information in your question so you can get a useful answer.
It's common for YouTube and others to transcode things ahead of time. This is particularly useful for videos that get a ton of traffic, as the segments can be stored on the CDN, with nodes closer to the clients. Yes, it's also possible to transcode on-demand as well. You need fast hardware for this.
No.
I can't fathom how this could ever be possible. Do you have an angle in mind?
Clients can either download all or part(s) of a file. But to do this you would have to somehow download only select pixels of each frame. Even if you had knowledge of which byte-ranges of each frame were which pixels, the overhead involved in requesting each byte-range would be greater than the size of the full 1080p video.
What is your aversion to hosting multiple qualities? Is it about storage space, or complexity/time of conversion?

Is it possible to check he bitrate of a twilio video stream?

I am developing a video chat application using twilio. I would like to check the bitrate of a video stream playing in the browser to study how the bitrate will be affected at different bandwidths. How can I do this?
Twilio developer evangelist here.
You can measure various data about the incoming and outgoing streams using the WebRTC getStats API. There's a really good article that walks through the available stats that you should read to understand it. I would try to write more about it here, but reading the spec and checking out that article will be more accurate and useful to you.
Hope this helps.
Many videos actually will have variable bit rate, so you can either get an average by simply dividing the file size by the time, or alternatively use a tool like VLC player which will show you the bit rate changing over time (on a Mac it shows the numbers but I believe on Windows it shows a graph):
If you are more interested in the download bandwidth itself, you can use developer tools in Chrome to see the bit rate.
If you open developer tools and go to the network tab you should see a waterfall column.
Hover over the timeline here in a row that corresponds to your video download and you can see all the details about the request and response including the time it took. The time combined with the size which is also in the row will show you the actual achieved download bit rate.
Here is an example for a YouTube video:

How is soundcloud player programmed?

This may be too broad a question, but how is soundcloud actually programmed?
To be more specific,
What language was used to program it?
How does it display the frequency data?
If user uploads a file in a format different from MP3, is it converted MP3 or gets played as is? If former, how does the conversion work?
How does it "graphically" appear on a browser as it does? Is it also HTML 5 thing which I don't know anything about?
I'm a big fan of the soundcloud and couldn't stop wondering how all of these work!
Please help me out :)
SoundCloud developer here,
The API and the current website are built with Rails. For information about the architecture/infrastructure and how it evolved over the last 5 years, check out Evolution of SoundCloud's Architecture. The "next" version of the website (still in private beta) is built entirely with Javascript, and just uses the API to get its data. There's a lot more detail available in Building The Next SoundCloud.
I'm not sure exactly what language/libraries are used to process the audio, but many audio libraries do provide frequency data, and we just extract that.
Users can upload AIFF, WAVE (WAV), FLAC, OGG, MP2, MP3, AAC, AMR or WMA files. The originals are kept exactly as is for the download option, but for streaming on the site, they're converted to 128kbps MP3 files. Again, I'm not sure of the software/libraries, but I'm pretty sure it'd be ffmpeg.
For displaying the waveforms, on the back-end when the audio files are processed when they're uploaded, the waveform data is saved into a PNG file. On the current version of the website, we simply load that file. On Next, the png is processed to get the original data back out, and then it is drawn to a canvas at the exact dimensions needed (which keeps the image crisp). We're currently experimenting with getting waveform data in a JSON format to speed up this process.
I am copying the following info posted by David Noël somewhere else in 2010.
Web tier: Varnish, nginx, haproxy, thin
Data Management: Cassandra, MongoDB, mySQL master/slave cluster, memcached
Web framework: Ruby on Rails
CDN: Akamai and Edgecast
Transcoding/storage: AWS EC2/S3

Quick question regarding CSS sprites and memory usage

Well, it's more to do with images and memory in general. If I use the same image multiple times on a page, will each image be consolidated in memory? Or will each image use a separate amount of memory?
I'm concerned about this because I'm building a skinning system for a Windows Desktop Gadget, and I'm looking at spriting the images in the default skin so that I can keep the file system looking clean. At the same time I want to try and keep the memory footprint to a minimum. If I end up with a single file containing 100 images and re-use that image 100 times across the gadget I don't want to have performance issues.
Cheers.
What about testing it? Create a simple application with and without spriting, and monitor your windows memory to see which approach is better.
I'm telling you to test it because of this interesting post from Vladimir, even endorsed by Mozilla "use sprites wisely" entry:
(...) where this image is used as a sprite. Note that this is a 1299x15,000 PNG.
It compresses quite well — the actual download size is around 26K - but browsers
don't render compressed image data. When this image is downloaded and
decompressed, it will use almost 75MB in memory (1299 * 15000 * 4).
(At the end of Vladimir's post there are some other great references to check)
Since I don't know how Windows renders it's gadgets (and if it's not going to handle compressed image data), it's dificult IMHO to say exactly which approach is better without testing.
EDIT: The official Windows Desktop blog (not updated since 2007) says the HTML runtime used for Windows Gadgets is MSHTML, so I think a test is really needed to know how your application would handle the CSS sprites.
However, if you read some of the official Windows Desktop Gadgets and Windows sidebar documentation, there's an interesting thing about your decision to not use css sprites, in the The GIMAGE Protocol section:
This protocol is useful for adding
images to the gadget DOM more
efficiently than the standard HTML
tag. This efficiency results
from improved thumbnail handling and
image caching (it will attempt to use
thumbnails from the Windows cache if
the requested size is smaller than 256
pixels by 256 pixels) when compared
with requesting an image using the
file:// or http:// protocols. An added
benefit of the gimage protocol is that
any file other than a standard image
file can be specified as a source, and
the icon associated with that file's
type is displayed.
I would try to use this protocol instead of CSS sprites and do some testing too.
If none of this information would help you, I would try to ask at Windows Desktop Gadgets official forums.
Good luck!
The image will show up one time in the cache (as long as the url is the same and there's no query string appended to the file name). Spriting is the way to go.
Webbrowsers identifies cacheable resources by their ETag response header. If it is absent or differs among requests, then the image may be downloaded and stored in cache multiple times. If you (actually, the webserver) supply an unique and the same ETag header for each unique resource, then any decent webbrowser is smart enough to keep one in cache and reuse it as long as its Expires header allows.
Any decent webserver will supply the ETag header automatically for static resources, it is often autogenerated based on a combination of the local filename, the file length and the last modified timestamp. But often they don't add the Expires header, so you need to add it yourself. After judging your post history here at Stackoverflow I safely assume that you're familiar with Apache HTTPD as web server, so I'd suggest to have a look at mod_expires documentation to learn how to configure it yourself to an optimum.
In a nutshell, serve the sprite image along with an ETag and a far future Expires header and it'll be okay.