Cloud Speech-to-Text bad sample rate hertz - json

Attempting to emulate audio to text through Cloud Shell as outlined in https://codelabs.developers.google.com/codelabs/cloud-speech-intro/index.html?index=..%2F..%2Findex#0-
{
"config": {
"encoding":"FLAC",
"languageCode": "en-US"
},
"audio": {
"uri":"gs://cloud-samples-tests/speech/brooklyn.flac"
}
}
This works.
Using same config, and loading the brooklyn.flac file obtained from the above codelabs document (file is actually a .wav) into a bucket and calling that address in the "audio" string, returns following error
{
"error": {
"code": 400,
"message": "Invalid recognition 'config': bad sample rate hertz.",
"status": "INVALID_ARGUMENT"
}
}
Same error occurs with other files encoded per requirements outlined in https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig#AudioEncoding (16 bit, 16KHz, mono, WAV, FLAC encoding)
In addition, use of "sampleRateHertz" and "ENCODING_UNSPECIFIED" per the above AudioEncoding ref also return invalid argument errors.
Have searched the boards with keywords "config': bad sample rate hertz." with no luck.
What is strange is that the FLAC file called in the codelabs doc is a .wav when downloaded and doesn't work when moved to my bucket.
Any ideas welcome- thanks!

The codelabs document is a bit confusing, they actually transcribe the FLAC file under following address gs://cloud-samples-tests/speech/brooklyn.flac (download the flac file at https://storage.cloud.google.com/speech-demo/brooklyn.flac) but for preview they suggest different file with WAV under address https://storage.cloud.google.com/speech-demo/brooklyn.wav. This is because not many browsers can play flac but most can play wav. The WAV file is only for preview, not for transcription.
If you put WAV file into bucket, you need to adjust parameters according to WAV format. Or you still can download FLAC file and use it as the recommend.

Related

Google Drive Rest API - How to check if file has changed

Is there a reliable way, short of comparing full contents, of checking if a file was updated/change in Drive?
I have been struggling with this for a bit. Here's the two things I have tried:
1. File version number
I upload a plain text file to Google Drive (simple upload, update endpoint), and save the version from the file metadata returned after a successful upload.
Then I poll the Drive API (get endpoint) occasionally to check if the version has changed.
The trouble is that within a second or two of uploading the file, the version gets bumped up again.
There are no changes to the file content. The file has not been opened, viewed, or even downloaded anywhere else. Still, the version number increases from what it was after the upload.
To my code this version number change indicates that the remote file has been changed in Drive, so it downloads the new version. Every time!
2. The Changes endpoints
As an alternative I tried using the Changes api.
After I upload the file, I get a page token using changes.getStartPageToken or changes.list.
Later I use this page token to poll the Changes API for changes, and filter the changes for the fileId of uploaded file. I use these options when polling for changes:
{
"includeRemoved": false
"restrictToMyDrive": true
"spaces": "drive"
}
Here again, there is the same problem as with the version number. The page token returned immediately after uploading the file changes again within a second or two. The new page token shows the uploaded file having been changed.
Again, there is no change to the content of the file. It hasn't been opened, updated, downloaded anywhere else. It isn't shared with anyone else.
Yet, a few seconds after uploading, the file reappears in the changes list.
As a result, the local code redownloads the file from Drive, assuming remote changes.
Possible workaround
As a hacky hook, I could wait a few seconds after the file upload before getting the new file-version/changes-page-token. This may take care of the delayed version increment issue.
However, there is no documentation of what is causing this phantom change in version number (or changes.list). So, I have no sure way of knowing:
How long a wait is safe enough to get a 'settled' version number without losing possible changes by other users/apps?
Whether the new (delayed) version number will be stable, or may change again at any time for no reason?
Is there a reliable way, short of comparing full contents, of checking if a file was updated/change in Drive?
You can try using the md5Checksum property of the File resource object, if your file is not a Google Doc file (ie. binary). You should be able to use that to track changes to the contents of your binary files.
You might also be able to use the Revisions API.
The Revisions resource object also has a md5Checksum property.
As a workaround, how about using Drive Activity API? I think that there are several answers for your situation. So please think of this as just one of them.
When Drive Activity API is used, the activity information about the target file can be retrieved. For example, from ActionDetail, you can see whether the target file was edited, renamed, deleted and so on.
The sample endpoint and request body are as follows.
Endpoint:
POST https://driveactivity.googleapis.com/v2/activity:query?fields=activities%2CnextPageToken
Request body:
{"itemName": "items/### fileId of target file ###"}
Response:
Sample response is as follows. You can see the information from this. The file with the fileId and filename was edited at the timestamp.
{
"activities": [
{
"primaryActionDetail": {
"edit": {} <--- If the target file was edited, this property is added.
},
"actors": [
{
"user": {
"knownUser": {
"personName": "people/### userId who edited the target file ###",
"isCurrentUser": true
}
}
}
],
"actions": [
{
"detail": {
"edit": {}
}
}
],
"targets": [
{
"driveItem": {
"name": "items/### fileId of target file ###",
"title": "### filename of target file ###",
"file": {},
"mimeType": "### mimeType of target file ###",
"owner": {
"user": {
"knownUser": {
"personName": "people/### owner's userId ###",
"isCurrentUser": true
}
}
}
}
}
],
"timestamp": "2000-01-01T00:00:0.000Z"
},
],
"nextPageToken": "###"
}
Note:
When you use this API in my environment, please enable Drive Activity API at API console and include https://www.googleapis.com/auth/drive.activity.readonly in the scopes.
Although when I used this API, I felt that the response was fast, if the response was slow when you use this, I apologize.
References:
Google Drive Activity API
ActionDetail
If this was not what you want, I apologize.
What you are seeing is the eventual consistency feature of the Google Drive filesystem. If you think about search, it doesn't matter how quickly a search index is updated, only that it is eventually updated and is very efficient for reading. Google Drive works on the same premise.
Drive acknowledges your updates as quickly as possible. Long before those updates have propagated to all worldwide copies of your file. Derived data (eg. timestamps and I think I recall, md5sums) are also calculated after the update has "completed".
The solution largely depends on how problematic the redundant syncs are to your app.
The delay of a few seconds is enough to deal with the vast majority of phantom updates.
You could switch to the v2 API and use etags.
You could implement your own version number using custom properties. So every time you sync up, you increment your own version number. You only sync down if the application version number has changed.

Forge convertion to obj only returning svf

I'm following the step-by-step instructions Extract Geometry tutorial , and everything seems to work fine, except when I check the manifest after posting the job, it always returns the manifest for the initial conversion to SVF.
The tutorial specifically states that you must convert to SVF first. This takes a few seconds to a few minutes, starting at 0% and going until 100%. I await completion, and when I post the second job with the following payload (verifying that the payload is as requested)
let objPayload = {
"input": {
"urn": job.urn # urn retrieved from the file upload / svf conversion
},
"output": {
"formats": [
{
"type": "obj"
, "advanced": {
"modelGuid": metaData[0].guid,
"objectIds": [-1]
}
}]
}
}
( where metaData[0].guid is the provided guid from Step 1's call to /modelderivative/v2/designdata/${urn}/metadata)
, then the job actually starts at about 99%. It sometimes takes a few moments to complete, but when it does, the call to retrieve the manifest returns the previous manifest where the output format is marked at "svf".
The POST Job page states that
Derivatives are stored in a manifest that is updated each time this endpoint is used on a source file.
So I would expect the the returned manifest to be updated to return the requested 'obj'. But it is not.
What am I missing here?
As Cyrille pointed out, the translate job only works consistently when translating to SVF. If translating to OBJ, you can only do so from specific formats, listed in this table.
At the time of this writing, if you request a job outside that table (eg IFC->OBJ), it will still accept your job, and simply not do it. So if you're following the "Extract Geometry" tutorial, when you request the manifest, it is still pointing to the original SVF translation.

Google Speech Recognition API Result returns only metadata

I'm using this google app script for asynchronous speech recognition. It works perfectly fine with files under one minute but the result I get for a longer file (~12 mins) is this:
[18-11-18 08:19:52:104 EST] {
"name": "5822702390902833748",
"metadata": {
"#type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"startTime": "2018-11-18T13:19:21.769945Z",
"lastUpdateTime": "2018-11-18T13:19:21.950214Z"
}
}
With no "transcript".
I've upgraded my google cloud account to payed subscription.
And this is the encoding of the file:
File Size: 15.0M Bit Rate: 162k
Encoding: FLAC Info: Processed by SoX
Channels: 1 # 16-bit
Samplerate: 16000Hz
Replaygain: off
Duration: 00:12:20.65
What am I missing?
It sounds nothing you are missing, you just need to wait until the operation is done.
Basically if you don't use a Speech client library, you should make an operation checker to get operation metadata which will be including the progress of your long running recognize.
More info can be found here: https://cloud.google.com/speech-to-text/docs/reference/rest/v1/operations

GoogleDrive metadata for copied images

Im getting images from GoogleDrive via files.list method, response looks like this:
items:[{
"title": "canon_eos_30D.CR2",
"fileExtension": "cr2",
"imageMediaMetadata": {
"width": 3504,
"height": 2336
}
}]
Everything is OK, but after I copied this image via GoogleDrive web interface response looks like this.
items:[{
"title": "Copy of canon_eos_30D.CR2",
"fileExtension": "cr2",
"imageMediaMetadata": {
"width": 0,
"height": 0
}
}]
imageMediaMetadata is not copied! (doesn't matter what jpg or cr2). Then I tried to copy image on machine and sync it via client — everything is OK.
Looks like imageMediaMetadata parsed during image import and this is GoogleDrive bug.
Is there any way to get this info to workaround this bug or is there any way to force metadata reparsing until this bug is there?
P.S.: JFYI: If I add properties field to filter to files.list method these broken metadata fields excluded from response.
This is a known issue with how image files are copied in the Drive UI. Copying image files via the Drive API doesn't have this issue. A workaround, if unpleasant one, is to download and re-upload the contents of the file, which will cause it to be re-scanned and the metadata to be populated.

write a file using FileSystem API

I am trying to create a file using the File system API..i googled and i get a code
function onFs(fs) {
fs.root.getFile('log.txt', {create: true, exclusive: true},
function(fileEntry) {
fileEntry.getMetaData(function(md) {
}, onError);
},
onError
);
}
window.requestFileSystem(TEMPORARY, 1024*1024 /*1MB*/, onFs, onError);
can any one say what is the fs which is passed as function argument..
Please refer me a good example...
fs is a javascript object that allows you to make "system-like" level calls to a virtual filesystem.
So for instance you can use the fs object to create/get a reference to a file in the virtual filesystem with fs.root.getFile(...). The third argument (in your case, the following lines of code from your above snippet) in the .getFile(...) method happens to be a callback for successful obtaining of a file reference.
function(fileEntry) {
fileEntry.getMetaData(function(md) {
}, onError);
}
That file reference (in your case it is called fileEntry) can have various methods called such as .createWriter(...) for writing to files, .file(...) for reading files and .remove(...) for removing files. Your method calls .getMetaData(...) which contains a file size and modification date.
For more specifics as well as some good examples on the html5 file-system api you may find the following article helpful Exploring the File-System API
The location of the files differs on the browser, operating system and storage type (persistent vs. temporary) but the following link has served to be quiet useful as well Chrome persistent storage locations