Postgres export via copy script doesn't like json with quotes on import - json

I am trying to migrate some json blocks stored inside a Postgres database using the scripts below. They work fine in most cases but fails on a few data sets(I think ones where I double quotes in the json). Basically I am trying to promote some individual approved blocks of json to the prod database using an export script that copies out the data to file and an import script that locates the destination node of json in the destination database and rewrites that node with my new data appended.
I think those data sets that fail are the ones where the json data contains double quotes in the data. An example of how the json with quotes is exports is shown here. I think the issue is the way the copy function is writing out(escaping) this data. My reason for this guess is 1)Only records that fail seem to be ones where quotes in the data 2)A few json tools failed when parsing the json in the area where the quotes were escaped.
"description": "See the chapter \\"Key fields\\" for a general description of keys.
How can I adjust my scripts to be more friendly to what type of data is stored in the json?
Export
psql -h demo_ip -c "\COPY (select f_values from view_export_f where id = '$id' and f_kind='$f_kind') TO '/tmp/demo.json'"
Import
psql -h prod_ip -ef "import.psql" -v ex="$ex"
import.psql
\set json_block `cat /tmp/demo.json`
UPDATE data
SET value= jsonb_set(value, '{json_location}', value->'json_location' || (:'json_block'), true) where id=(:'ex')

Related

How to Export GA360 table from Big query to snowflake through GCS as json file without data loss?

I am exporting GA360 table from Big query to snowflake as json format using bq cli command. I am losing some fields when I load it as table in snowflake. I use the copy command to load my json data from GCS external stage in snowflake to snowflake tables. But, I am missing some fields that are part of nested array. I even tried compressing the file when I export to gcs but I still loose data. Can someone suggest me how I can do this. I don't want to flatten the table in bigquery and transfer that. My daily table size is minimum of 1.5GB to maximum of 4GB.
bq extract \
--project_id=myproject \
--destination_format=NEWLINE_DELIMITED_JSON \
--compression GZIP \
datasetid.ga_sessions_20191001 \
gs://test_bucket/ga_sessions_20191001-*.json
I have set up my integration, file format, and stage in snowflake. I copying data from this bucket to a table that has one variant field. The row count matches with Big query but the fields are missing.
I am guessing this is due to the limit snowflake has where each variant column should be of 16MB. Is there some way I can compress each variant field to be under 16MB?
I had no problem exporting GA360, and getting the full objects into Snowflake.
First I exported the demo table bigquery-public-data.google_analytics_sample.ga_sessions_20170801 into GCS, JSON formatted.
Then I loaded it into Snowflake:
create or replace table ga_demo2(src variant);
COPY INTO ga_demo2
FROM 'gcs://[...]/ga_sessions000000000000'
FILE_FORMAT=(TYPE='JSON');
And then to find the transactionIds:
SELECT src:visitId, hit.value:transaction.transactionId
FROM ga_demo1, lateral flatten(input => src:hits) hit
WHERE src:visitId='1501621191'
LIMIT 10
Cool things to notice:
I read the GCS files easily from Snowflake deployed in AWS.
JSON manipulation in Snowflake is really cool.
See https://hoffa.medium.com/funnel-analytics-with-sql-match-recognize-on-snowflake-8bd576d9b7b1 for more.

How to import JSON file in PostgreSQL: COPY 1

I'm new at PostgreSQL. I'm trying to import JSON file into PostgreSQL table. I created an empty table:
covid19=# CREATE TABLE temp_cov(
covid19(# data jsonb
covid19(# );
and tried to copy my data from JSON in this table with this command in Command line:
cat output.json | psql -h localhost -p 5432 covid19 -U postgres -c "COPY temp_cov (data) FROM STDIN;"
The output was just "COPY 1" and when I open my table in psql with
SELECT * FROM temp_cov;
But this command goes without an end and with this output.
Unfortunately, I couldn't find an answer or some similar problem solution. Thank you in advance for your advices.
Also my json file is already modified to "not pretty" form and it has over than 11k lines.
Your data is there. psql is sending the row to the pager (likely more?), and the pager can't deal with it very usably because it is too big. You can turn off the pager (\pset pager off inside psql) or set the pager to a better program (PAGER=less or PSQL_PAGER=less as environment variables), but really none of those is going to be all that useful for viewing giant JSON data.
You have your data in PostgreSQL, now what do you want to do with it? Just looking at it within psql's pager is unlikely to be interesting.

Let Google BigQuery infer schema from csv string file

I want to upload csv data into BigQuery. When the data has different types (like string and int), it is capable of inferring the column names with the headers, because the headers are all strings, whereas the other lines contains integers.
BigQuery infers headers by comparing the first row of the file with
other rows in the data set. If the first line contains only strings,
and the other lines do not, BigQuery assumes that the first row is a
header row.
https://cloud.google.com/bigquery/docs/schema-detect
The problem is when your data is all strings ...
You can specify --skip_leading_rows, but BigQuery still does not use the first row as the name of your variables.
I know I can specify the column names manually, but I would prefer not doing that, as I have a lot of tables. Is there another solution ?
If your data is all in "string" type and if you have the first row of your CSV file containing the metadata, then I guess it is easy to do a quick script that would parse the first line of your CSV and generates a similar "create table" command:
bq mk --schema name:STRING,street:STRING,city:STRING... -t mydataset.myNewTable
Use that command to create a new (void) table, and then load your CSV file into that new table (using --skip_leading_rows as you mentioned)
14/02/2018: Update thanks to Felipe's comment:
Above comment can be simplified this way:
bq mk --schema `head -1 myData.csv` -t mydataset.myNewTable
It's not possible with current API. You can file a feature request in the public BigQuery tracker https://issuetracker.google.com/issues/new?component=187149&template=0.
As a workaround, you can add a single non-string value at the end of the second line in your file, and then set the allowJaggedRows option in the Load configuration. Downside is you'll get an extra column in your table. If having an extra column is not acceptable, you can use query instead of load, and select * EXCEPT the added extra column, but query is not free.

Export SQLite3 to CSV with text representation of GUID/BLOB

I would like to export each table of my SQLite3 database to CSV files for further manipulation with Python and after that I want to export the CSV files into a different database format (PSQL). The ID column in SQLite3 is of type GUID, hence jiberrish when I export tables to CSV as text:
l_yQ��rG�M�2�"�o
I know that there is a way to turn it into a readable format since the SQLite Manager addon for Firefox does this automatically, sadly without reference regarding how or which query is used:
X'35B17880847326409E61DB91CC7B552E'
I know that QUOTE (GUID) displays the desired hexadecimal string, but I don't know how to dump it to the CSV instead of the BLOB.
I found out what my error was - not why it doesn't work, but how I get around it.
So I tried to export my tables as staded in https://www.sqlite.org/cli.html , namely a multiline command, which didn't work:
sqlite3 'path_to_db'
.headers on`
.mode csv
.output outfile.csv
SELECT statement
and so on.
I was testing a few things and since I'm lazy while testing, I used the single line variant, which got the job done:
sqlite3 -header -csv 'path_to_db' "SELECT QUOTE (ID) AS Hex_ID, * FROM Table" > 'output_file.csv'
Of course it would be better if I would specify all column names instead of using *, but this sufices as an example.

From MySQL to Mongo data export: how to preserve creation date?

I'm trying to export a MySQL table data to MongoDB, creating a set of "Create" statements in Rails.
My issue is this: in my original table I have "created_at" and "updated_at" fields and I would like to keep the original values even when I export the data to my new MongoDB document. But after I create a new row in Mongo, even if I tell it to set "created_at" = [my original date], Mongo sets it to the current datetime.
How can I avoid this? This is my MongoMapper model:
class MongoFeedEvent
include MongoMapper::Document
key :event_type, String
key :type_id, Integer
key :data, String
timestamps!
end
You're probably better off dumping your MySQL table as JSON and then using mongoimport to import that JSON; this will be a lot faster than doing it row by row through MongoMapper and it will bypass your problem completely as a happy side effect.
There's a gem that will help you dump your MySQL database to JSON called mysql2xxxx:
How to export a MySQL database to JSON?
I haven't used it but the author seems to hang out on SO so you should be able to get help with it if necessary. Or, write a quick one-off script to dump your data to JSON.
Once you have your JSON, you can import it with mongoimport and move on to more interesting problems.
Also, mongoimport understands CSV and mysqldump can write CSV directly:
The mysqldump command can also generate output in CSV, other delimited text, or XML format.
So skip MongoMapper and row-by-row copying completely for the data transfer. Dump your data to CSV or JSON and then import that all at once.