"resourcesExceeded" error when creating a table from a .avro file in BigQuery - csv

I have uploaded a .avro file on Google Cloud Storage which is about 100MB. It is converted from a 800MB .csv file.
When trying to create a table from this file in the BigQuery web interface, I get the following error after a few seconds:
script: Resources exceeded during query execution: UDF out of memory. (error code: resourcesExceeded)
Job ID audiboxes:bquijob_4462680b_15607de51b9
I checked the BigQuery Quota Policy and I think my file does not exceed it.
Is there a workaround or do I need to split my original .csv in order to get multiple, smaller .avro files ?
Thanks in advance !

This error means that the parser used more memory than allowed. We are working on fixing this issue. In the meantime, if you used compression in the Avro files, try remove it. Using a smaller data block size will also help.
And yes splitting into smaller Avro files like 10MB or less will help too, but the two approaches above are easier if they work for you.

Related

Azure Synapse Dedicated Pool COPY INTO function fails due to base64 encode image in CSV file

I am using Azure Synapse Link for Dynamics 365. It automatically exports data from Dynamics 365 in CSV format into blob storage/data lake. I use the COPY INTO function to load the data into a Dedicated Pool instance. However, the contact model has recently started failing.
I investigated the issue and found that the cause was due to a field that has an image encoded as text. I only copy selected fields from the CSV files and this is not one of them, but it still causes the copy to fail. I manually updated the CSV file to exclude this data from the one row where it was found and it worked fine.
The error message associated with the error is:
The column is too long in the data file for row 1328, column 32.
This is supposed to be an automated process so I do not want to be manually editing CSV files when this occurs. Are there any parameters that I can add to the COPY INTO function to prevent this error? I tried using MAXERRORS but that made no difference.
The only other thing that I could think of is to write a script (maybe an Azure Function?) that checks the file for this issue and corrects it. Maybe there is a simpler approach though?

How to load CSV file with thousands of rows in the Knowledge Feature of Dialogflow?

Whenever I load my csv file containing 3568 rows on the Knowledge (Beta) feature of Dialogflow, an error prompts informing me about an Operation Timeout. I thought that it might be due to the number of rows on my csv that is why I tried to create a separate file containing only 20 rows and it was processed.
Is there a way to change the timeout so that I can load my csv file in one go? I am trying to avoid the solution of creating a new csv for every 20 rows just to load it in the tool. Kindly refer to the image on the link below. Thanks much, I hope for your response soon.

Firebase: Exporting JSON Unable to export The size of data exported at a single location cannot exceed 256 MB

I used to download a node of firebase real-time database every day to monitor some outputs by exporting the .JSON file for that node. The JSON file itself is about 8MB.
Recently, I started receiving an error:
"Exporting JSON Unable to export The size of data exported at a single location cannot exceed 256 MB.Navigate to a smaller part of the database or use backups. Read more about limits"
Can someone please explain why I keep getting this error, since the JSON file I exported just yesterday was only 8.1 MB large.
I probably solved it! I disabled CORS addon in Chrome and suddenly it worked to export :)
To get rid of this, you can use Postman's Import feature because downloading a large JSON file sometimes faces failure in the middle of the way using a browser from the dashboard of the firebase. You can put the traditional cUrl commands on it. You just need to click save the response after the response is reached. To get rid of complex authentication complexity, you make the rule permission of the firebase database to read:true until the download is complete thought you need to ensure security for this. Postman also needs sometimes to preview the JSON even freezing the UI but you don't need to be bothered with it.

Managing a large SPSS (*.sav) file (4.2 GB)

I have received an SPSS file from survey fielded by another company that allegedly only contains ~1500 respondents, but the file size somehow has ballooned 4.2GB. My hunch is that the reason for this is that the file was from a global survey and the 1500 records that have been selected are from the US only so there are a series of blank variables, metadata for those variables that are included in this file and may also be in multiple languages/alphabets.
I only need a subset of this data, and can likely work with it if I removed the metadata but my issue has been that I can't get the damn thing open to cut down on the number of variables. I have been using the tools at my disposal to try the following workarounds, though I'm sure there are better options:
Opening the file using PSPP (freeware SPSS) - this causes the PSPP to stop responding
Using the R command read.spss (from the foreign package) to write a .csv - this claims that the file has a duplicate variable name and won't proceed further
Using the R command spss.system.file to write a .csv - when I tried this, R has spend a lot of time thinking as it as it attempts to run this and has been running for a couple hours with no apparent success.
Using the PSPP text conversion tool (https://pspp.benpfaff.org/) to create either a dictionary or a .csv file - both of these options crash after the file has completed uploading.
I've gone back to the other company to try have them work on reducing the file size, however I wasn't sure if anyone else had any ideas to do either of the following:
Open the file using another program/converter that could turn it into a .csv or other similarly skinny file format
Use another program to at least read only the variable names included in the file so that I can provide the other company with the specific variables I need
The following command from PSPP should do what you need:
$ pspp-convert originalFile.sav output.csv
In case it doesn't, please provide terminal error message.

CKAN : Upload to datastore failed; Resource too large to download

When i try to upload a large csv file to CKAN datastore it fails and shows the following message
Error: Resource too large to download: 5158278929 > max (10485760).
I changed the maximum in megabytes a resources upload to
ckan.max_resource_size = 5120
in
/etc/ckan/production.ini
What else do i need to change to upload a large csv to ckan.
Screenshot:
That error message comes from the DataPusher, not from CKAN itself: https://github.com/ckan/datapusher/blob/master/datapusher/jobs.py#L250. Unfortunately it looks like the DataPusher's maximum file size is hard-coded to 10MB: https://github.com/ckan/datapusher/blob/master/datapusher/jobs.py#L28. Pushing larger files into the DataStore is not supported.
Two possible workarounds might be:
Use the DataStore API to add the data yourself.
Change the MAX_CONTENT_LENGTH on the line in the DataPusher source code that I linked to above, to something bigger.