Running a deep learning TTS in 2019 using (DeepVoice | WaveNet | etc) - deep-learning

I'm attempting to convert a series of sentences in a txt file to WAV files in as clear a voice as possible.
According to a 2019 survey there are many recent advancements using deep learning techniques.
Which is great news, because the built-in or commonly used text-to-speech engines sound very robotic. (OSX's "say" command, espeak, etc).
The problem is, the github pages or collab notebook links are focused on how to train a new model, or set up a docker instance, and don't seem to include a minimum
git clone ...
./speak "How are you doing?" -o hayd.wav
Do you know how to install and run any of the 2019 engines from that article to speak a sentence?
I'll update if/when I find one that works.

I don't know about any of the others in the list, but for WaveNet you can use Google's API. Your code sends the text to Google, and they return the audio. There are client libraries available for C#, Go, Java, Node.js, PHP, Python, and Ruby. If you want to do it from another language you could use the REST API. For WaveNet, the first 1 million characters per month are free. After that it is $16 per 1 million characters. See their pricing page.
If your project is a relatively small one-off and you are not bothered about doing it programatically (it wasn't clear from the question), then you could just use their online demo page and use a browser add-on (e.g. Video DownloadHelper or one of many others) to download the results as audio files. Alternatively you could use the API on the command line.
The quality of WaveNet is in my opinion, excellent, and is a vast improvement over the previous generations of text-to-speech algorithms. You can almost believe the voices are real at times.

Related

Running R code from website (without paid hosting)

There are many related questions, but all of them are about Shiny R, and that requires paid hosting to be always available (since free options such as shinyapps.io have limits). So I'm wondering whether there is any alternative solution for running R code from a website hosted, for example, at GitHub.
To be more clear, I want to use an R script to interactively display a few plots and some derived information, based on some basic settings given by a user. To give a super simple example:
var_from_gui = 7 # input in HTML, user e.g. clicks OK
print(paste("input plus five is:", var_from_gui + 5)) # info displayed on website
plot(c(1, 2, 5) * var_from_gui) # image to be displayed on website
Firstly, I assume this is very possible in Shiny R - is that correct?
Secondly, is this possible in another way that allows me to run this via e.g. GitHub pages? (Actually I can also use this more comprehensive university server, but I don't suppose it helps with this case.)
I'm aware of htmlwidgets too, but, as far as I understand, that only allows very limited interaction such as filtering, and not things like drawing plots based on user input.
One option I found and seems to fit well is OpenCPU, but what's discouraging is the apparent lack of activity (no recent questions/answers/posts etc.) and hardly any useful tutorials or overviews, which also makes it hard to assess whether it's worth trying.
For up to 5 small apps with little traffic you could use the free plan on https://www.shinyapps.io/
very easy to deploy, because its a RStudio service
You can host your R functions on the public OpenCPU server, for free.
I have done that for my own applications and it works well. None of the limitations that you listed in your question. Also tried Shiny but, as you mentioned, not flexible enough for what I was trying to achieve.
OpenCPU is really a great tool, although not well supported by the community (not sure why, looking at the great value it brings)
I followed the docs here to get it up and running. Setup is a bit tedious but fairly well documented.
Once live, I found this server very reliable - your R functions are continuously available, with very low latency (much faster than a Shiny server from my experience)
You are also asking for "a solution for running R code from a website hosted, for example, at GitHub" - OpenCPU does handle CD/CI (Continuous Integration) from your custom GitHub repo through a webhook mechanism.
I also implemented such webhook for my apps, so can confirm it works smoothly. Just follow the well-written provided documentation here.
By now I guess I can answer my own question – though Marc's answer seems also useful in general (and prompted to write my own answer).
In essence, shinyapps.io worked for me perfectly fine. For a small and not too often used application the free plan is easily enough. What's more, even in the unlikely case that the website goes down due to excessive usage, R users have the possibility to easily run the Shiny app from their own computer (provided that they have R installed).
And of course, the example given in the question is very possible to implement in Shiny R: typically the code is executed via the eventReactive function, and, for the "trigger" button, one can use actionButton.

Can I make CNC Editor with HTML5?

I would like to make my own CNC Editor.
I am looking for some guidance. I don’t know if it is even possible with HTML5. But it would be great if I can. If possible, please list what I should research and learn.
I don’t need it to be online accessible, I will only have it on my computer. I will be accessing it via local network from multiple different computers. I don’t want it accessing the internet, because it’s not always available.
Desired Features:
⁃ Read and Write files with different extensions (all files used are easily opened in notepad)
⁃ Store and retrieve data from a simple database file.
⁃ Make calculations
⁃ Have a text Editor window
⁃ Have a Display area for simple vector graphics depending on data loaded and provided by user.
It is possible but requires a lot of work. I would say that these are technologies you would need to master in order to pull this off:
Node.js (use express.js) - for storing and retrieving files from database and for reading/writing local files with extensions you want (server-side)
Vue.js or Angular.js or React - for building frontend interface to manipulate your vector graphics. It can also do calculations and It's good with svgs and that kind of stuff.
Electron.js (not mandatory) - it wraps it in native-app like experience. This framework actually gives you ability to write desktop apps for any os and arch.
So as I said, It would be a lot of work but its possible in the end.
Funny coincidence is that my brother is planning to build CNC machine so I might be doing this as well in next couple of months. Feel free to contact me if you need any further help!
UPDATE: You cant do it with just HTML5. It would be like trying to make wooden space shuttle.

Can HTML5 communicate with peripherals like scanners and credit card readers?

My company writes software that installs on client machines to perform point-of-sale transactions. The software interfaces with a variety of external peripherals (receipt printers, bar code scanners, credit-card readers, etc). We do this with a WinForms app that we created in Visual Studio using the Microsoft OPOS library, which in turn communicates with our cloud server.
There are obvious inefficiencies in this model, primarily with updates. I'm researching other ways to communicate with these peripherals over the web, preferably via web browser. So far as I can tell, Java is one of the only technologies out there that can do what we're looking for (via applet), and I assume Adobe Flash can as well (via the Air platform). These are viable, but not preferable because we want to run our software on web-enabled mobile devices.
Does anybody have suggestions for other ways to communicate with external peripherals over the web?
UPDATE (Jan 16th, 2019): The Credential Management API has been announced. It's currently only supported on Chrome and Opera but it's looking promising. Google Developers wrote an article elaborating on the spec.
UPDATE (Dec 28th, 2016): Another couple years gone, and another update. This one will be more focused on two new developments than anything else. See the new "WebUSB & Web BlueTooth" section under "Full Device API". But the answer remains the same.
UPDATE (Nov 3rd, 2014): It's been just over two years since the original post was made, but the answer remains mostly the same for now. We are, however, closer to your original goal in several areas.
ORIGINAL ANSWER:
There would be a number of ways to go about this.
Background
The HTML5 specification has entered into the "Recommendation" state. This means that HTML5 is pretty much set for what it looks like. However, I will be using HTML5 in the same way that every marketing person in the world has decided is best. That is, I will not be talking about HTML. Well, I will, in so far as you will utilize it from an HTML page, but not really. What I'll actually be discussing is JavaScript (JS) and that's a horse of a different color. But for all intents and purposes, we're putting it all under the same heading as HTML5, which has been decided to mean "shiny and new" now.
Also, the items which I am discussing will vary in support. Some are very browser dependent projects (like Chromium specific implementations), and some are more standards driven projects that may not have browsers implementing or experimenting with them yet. I'll try to distinguish between the two as I go along.
Full Device API
Status: Incoming, but not ready
Being able to access devices from the browser is making slow but steady progress. Right now, many modern browsers have access to some of the more common devices like the camera or gamepads, but they are all high level APIs. Browser vendors, the standards groups, and lots of companies involved with the web are all trying to make webapps just as powerful as your local applications.
But the APIs you are looking for are still in progress and a ways off. For your particular case, and for the more general case of connecting your webapp to most devices, we're still a few years away from something we can use. If you want to see what awesome things are coming up in that field, here are just a select few items that may help you directly:
Web Near Field Communication (NFC) API
This one unfortunately may be dead in the water for now. But it looks like originally some folks at the W3C (mostly Intel it looks like) were looking at adding a NFC API to the web.
Media Capture Streams
The WebRTC group is working on programmatic access to media streams like the camera which would allow to integrate things like barcode scanning or other features. This has reached CR status and is available in browsers, but is less helpful on its own.
Web Bluetooth
If you had bluetooth capable tools, this API would help you connect with them from computers and devices that were able to listen and connect. The primary driver for this at the moment seems like it is the Chrome team, including an experimental implementation, but I wouldn't consider it anywhere ready to use yet (See "WebUSB & Web BlueTooth" section).
WebUSB
This would allow full access to low level USB information including listing devices and interacting with them. Same as Web BlueTooth, this seems to be current Chrome pet project, but I also wouldn't rely on it (See "WebUSB & Web BlueTooth" section).
Network Service Discovery
If you have other devices or items on the network which broadcast and use HTTP, this API would allow you to discover and interact with these services. No browser implementation, but it is in a working draft for the W3C.
Originally, Mozilla was pushing a number of these forward because of Boot2Gecko (or Firefox OS). However, with that project officially cancelled, we aren't seeing much progress from them in these areas now.
Members of the Chrome team, however, seem to have decided to dive in and start not only working towards these, but putting them live in browsers. Which leads us to...
WebUSB & Web BlueTooth
Like sausages, it's better to not know how Web Standards are made
-Abraham Lincoln (probably)
There's been a little bit of buzz in this area as it looks like the Chrome team snuck in these as experimental features and developed their own specification for it. Which is great! Just maybe not in the way that you were hoping for.
Each browser vendor and W3C contributor group has their own style and makes contributions towards the specs in their own way. The result is usually a fairly decent specification that the browsers have agreed upon. But getting from nothing to something is... messy. Real messy. And is quite a process a lot of times. It doesn't always result in a good spec (yeah, I'm talking about you Florian compromise...) but even when it does, it takes a while.
However, It seems like Google developed this version of the spec all on their own. And, in my experience, Google's approach to the specs is always a little... well... setting my personal opinions aside we'll say "gung-ho". They tend to just dive right into the deep end. And that seems to be what they've done here.
I highly doubt these specs or implementations will look anything like this when they become standards. And there's nothing wrong with that. That's part of the process. But I wouldn't go relying on this implementation or developing any code or products against it. This is an unprecedented feature on the web and all the browser vendors are gonna want a big say in this.
That said, this is actually good. One of the things Google often does (for better or worse) with situations like this is forces the conversation and it can push things along. And having a feature shipped in the browser, even an experimental feature, can turn up the heat on the demand for it. So we may see more progress in this area soon.
PhoneGap Apache Cordova. You know, for your phone
Status: Not fully featured and phone only
Apache Cordova, previously Adobe PhoneGap, is a way to write your program in HTML, CSS, and JS that allows you to access lower level functionality on things like phones, and compile across devices. This would be a way to implement your program, but it would be a phone application, not necessarily a desktop one. An option to consider, and something I figured I would mention.
Cordova implements a few of the above features already, but doesn't have some of the more powerful ones like NFC or BlueTooth.
The Native-App solution (for Windows 8)
Status: Possible, but OS specific and desktop app
Windows 8 offers the ability to build applications in HTML and JS. This would allow you to easily access lower level functionality on the OS via their API. From the looks of it, it is pretty extensive and you can do a lot. You mentioned cross OS support, however, and this obviously limits you to one OS.
It's so Flash-y!
Status: Dying/Dead, not possible as a web app
Flash won't have direct access to the system through the web. You could create an AIR application, but that will sort of defeat the purpose of having it web based. In addition, Flash support on mobile, and on the web it would seem, is on the decline.
NodeJS
Status: Can be a bit of a pain and only possible as a desktop app
NodeJS and JS applications have sort of been a hot topic the last couple years. I didn't discuss it in my original post because I felt it wasn't quite there yet. However, things have progressed and it is much closer to being ready for this sort of thing, and has the support and power of a growing user base. That said, for your particular case, I wouldn't recommend using it. It would have to be local on the users machine, and because of how NodeJS (and similar engines) are at the moment, it would require a lot of extra configuration and setup that would complicate things a bit.
So you could build an app using HTML, CSS and JS with NodeJS or similar engines and have low level access to what you need, but it has to be local, and it would take more work than I'm sure you want to do every time you'd like to install it for a customer.
... Now where was I?
So where does that leave us? Well, simple: if you want a single language/set of code as your code base, HTML/CSS/JS aren't a great option... yet. But they could be some day. For now, your options are limited to what you feel is best for your customer. Java is a stable option you listed, but obviously comes with its own drawbacks. As the web develops, I think we'll see a lot of really cool things coming out of the new functionality, but we've got a ways to go still.
More reading:
Brian.IO: Beyond HTML5
HTML5 Apps on Windows 8
Wikipedia list of projects built using JS
This is possible, but it would have to be done indirectly. In theory, you could write a socket-server in a low level language, which gets I/O, and sends the I/O through the socket (relaying, I guess). HTML5 uses WebSockets, or some equivalent to communicate with this socket-server.
Now it can be achieved with WebUSB API.
It is available in Chrome since version 54.
It is a W3C editor's draft so we can expect (hope) that it will be adopted by other browser vendors...
I've been thinking about this a lot lately... have a POS app mostly written in VB6, considering what to do next. HTML5 is an option and I was thinking I'd use VSPE to get the serial stuff into the JS.
http://www.eterlogic.com/Products.VSPE.html
Love this product! Works very well for getting serial traffic where you need it, so I think it would work well, at least as a proof-of-concept to get you going. You'll want to use a combination of "connector" types along with the "tcpclient" and "tcpserver".
Just for the record, a method that works well in 2016 (since chrome 26), but is to be withdrawn over the next 2 years is to deploy your html5 as a chrome app and use chrome.usb (or chrome.serial or chrome.bluetooth).
I am currently using chrome.usb and planning to migrate to a web app using WebUSB API (see Supersharp's answer), which I hope will be adopted by the time Google discontinue chrome apps 🤞.

What are the current options for basic web-based video editing?

I'm wondering what options exist as of April 2011 for allowing a user to, having uploaded a piece of video to our server:
Trim the footage by indicating a start and end point
Capture specific frames to use as a thumbnail / snapshot for the video.
Transcoding is no problem - we have various wrappers around ffmpeg that are doing this just fine - but in terms of providing a web interface, I'm not sure where to begin.
HTML5, Flash or Silverlight options - or even Java applets, I guess - would all be worth considering. Open source is always a bonus.
Our company does this, http://www.forbidden.co.uk/ Based in Wimbledon UK. We mainly use a java frontend to cloud-based backend. But there is an Android version too.
Our core business is editing for professional TV productions. But we have various sorts of possible integration, and do things like thumbnailing and frame extraction into facebook etc.
Not open source I'm afraid, but a lot of our infrastructure is, linux based ingester etc.

Why is Google's "face recognition" feature available only in Picasa WEB and not Picasa for the PC?

I friend asked me this today.
Picasa Web has a cool (and frightening :-) feature where it will recognize all the faces in your photo album.
But the PC (desktop) version doesn't have this.
Several reasons I can think of:
They just haven't gotten around to writing the PC version of the code.
They are licensing that feature and it costs a lot more (or isn't available) on the PC.
Takes a lot of processing power (this seems odd b/c MY PC cycles are free to Google, but they have to pay for for cycles consumed on their server.
Any other thoughts?
I'm certain it'll make it out in coming releases but Google is a funny company when it comes to its own competing/complementing services. One thing is for sure, only somebody on the Picasa team could give an accurate answer.
But we could hypothesise several things...
They don't want their code reverse-engineered.
(As you say), they aren't licensed to redist
It's blocked in the dev version by other new features that aren't complete yet
They don't want to release it because they want people to use PicasaWeb as a social photo network.
I don't think processing power is an issue. If they're running it in bulk on their own servers for free, a modern desktop could probably run it without issue.
From my limited contact with face recognition software, it's probably the redistribution issue. When I dealt with it, face recognition was its own little world with extremely high per-CPU licensing costs and tremendous paranoia about code getting loose.
I'm not so sure it's not a processing issue. It took Google's massive servers 30 minutes to run through all my photos. I can only imagine that same task would have taken days on my local machine.
Actually, its in, just in limited functionality when you do a search, there's an icon to find only photos with faces. The experimental passport feature also works that way.
So the answer is:
Not the same base (APIs) available or used and not the same language so its not directly portable.
Not the same software and there are no stated goals to make both apps feature equivalent.
Programmers are limited and their time is too. They make choices as to what implement now.
No idea if this is the case for Picasa, but there's another case where licensing could be the issue. If the server-side code is using code with a restrictive license with DRM (GPL, for example) which restricts how you can distribute modules using the code. Running that module on a web server, where the user only gets the output, is legal under such licenses. If that code was distributed, there would be many legal requirements attached which would likely be very undesirable for commercial software companies, including google. This is one very good reason to have some capabilities only accessible through web services.
This was also the case with Riya (who was arguably the first to market with reliable facial recognition for consumer photo collections).
The biggest reasons are likely:
Processing Time (they can't control
how fast your CPU is and therefore
they can't control the experience).
Facial recognition is very likely to
be process intensive (this was Riya's
stated reason for not doing it
client-side)
The recognition process requires a
LARGE volume of data for processing
that is only accessible on the
server? (In other words, the process needs to spin through millions of faces, not just the faces that you have on your hard drive?)