load json files in google cloud storage into big query table - json

I am trying to do it with client lib using python.
the problem I am facing is that the TIMESTAMP on the JSON files are on Unix epoch TIMESTAMP format and big query can't detect that:
according to documentation:
so I wonder what to do?
I thought about changing the JSON format manually before I load it into BigQuery table?
Or maybe looking for an auto conversion from the BigQuery side?
I wondered across the internet and could not find anything useful yet.
Thanks in advance for any support.

You have 2 solutions
Either you update the format before the BigQuery integration
Or you update the format after the BigQuery integration
Before
Before means updating your JSON (manually or by script) or to update it by the process that load the JSON into BigQuery (like Dataflow).
I personally don't like this, file handling are never funny and efficient.
After
In this case, you let BigQuery loading your JSON file into a temporary table and convert your UNIX timestamp into a Number or a String. Then, perform a request into this temporary table, convert the field in the correct timestamp format, and insert the data in the final table.
This way is smoother and easier (a simple SQL query to write). However, it implies cost to read all the loaded data (to write them then)

Related

Azure Data Factory - copy task using Rest API is only returning first row upon execution

I have a copy task in ADF that is pulling data from a REST API into an Azure SQL Database. I've created the mappings, and pulled in a collection reference as follows:
preview of json data
source
sink
mappings
output
You will notice it's only outputting 1 row (the first row) when running the copy task. I know this is usually because you are pulling from a nested JSON array, in which the collection reference should resolve this to pull from the array - but I can't for the life of me get it to pull multiple records even after setting the collection.
There's a trick to this. You import schemas, then you put the name of the array in collection reference then you import schemas again then it works
Screen shot from azure data factory
Because of Azure Data Factory design limitation, pulling JSON data and inserting into Azure SQL Database isn't a good approach. Even after using the "Collective reference" you might not get the desired results.
The recommended approach is to store the output of REST API as a JSON file in Azure blob storage by Copy Data activity. Then you can use that file as Source and do transformation in Data Flow. Also you can use Lookup activity to get the JSON data and invoke the Stored Procedure to store the data in Azure SQL Database(This way will be cheaper and it's performance will be better).
Use the flatten transformation to take array values inside hierarchical structures such as JSON and unroll them into individual rows. This process is known as denormalization.
Refer this third-party tutorial for more details.
Hey I had this issue and I noticed that the default column names for the json branches were really long and in my target csv the header row got truncated after a bit and I was able to get ADF working by just renaming them in the mapping section.
For example i had:
['hours']['monday']['openIntervals'][0]['endTime'] in source and changed it to MondayCloseTime in destination.
Just started working. Can also just turn off the header on the output for a quick test before re writing all the column names as that also got it working for me
I assume it writes out the truncated header row at the same time as the 1st row of data and then tries to use that header row afterwards but as it doesn't match what its expecting it just ends. Bit annoying it doesn't give an error or anything but anyway this worked for me.

Redshift/S3 - Copy the contents of a Redshift table to S3 as JSON?

It's straightforward to copy JSON data on S3 into a Redshift table using the standard Redshift COPY command.
However, I'm also looking for the inverse operation: to copy the data contained within an existing Redshift table to JSON that is stored in S3, so that a subsequent Redshift COPY command can recreate the Redshift table exactly as it was originally.
I know about the Redshift UNLOAD commnd, but it doesn't seem to offer any option to store the data in S3 directly in JSON format.
I know that I can write per-table utilities to parse and reformat the output of UNLOAD for each table, but I'm looking for a generic solution which allows me to do this Redshift-to-S3-JSON extract on any specified Redshift table.
I couldn't find any existing utilities that will do this. Did I miss something?
Thank you in advance.
I think the only way is to unload in CSV and write a simple lambda function that turns an input CSV into JSON taking the CSV header as keys and values of every row as values.
There is no built in way to do this yet. So you might have to hack your query with some hardcoding :
https://sikandar89dubey.wordpress.com/2015/12/23/how-to-dump-data-from-redshift-to-json/

What is the best way to store a pretty large JSON object in MySQL

I'm building a Laravel app the core features are driven with rather large JSON objects. (the largest ones are between 1000-1500 lines).
I know there are better data base choices than MySQL for storing files and blocks of data, but for various reasons I will need to use MySQL for the application.
So my question is, how to I store my JSON objects most effective in MySQL? I will not need to do any queries on the column that holds the data, there will be other columns for identifying it. Something like this:
id, title, created-at, updated-at, JSON-blobthingy
Any ideas?
You could use the JSON data type if you have MySQL version 5.7.8 or above.
You could store the JSON file on the server, and simply reference its location via MySQL.
You could use also one of the TEXT types.
The best answer i can give is to use MySQL 5.7. On this version the new column type JSON are supported. Which handles large JSON very well (obviously).
https://dev.mysql.com/doc/refman/5.7/en/json.html
You could compress the data before inserting it if you don't need it searchable. I'm using the 'zlib' library for that
Simply, you can use the type longblob which can handle up to 4GB of data for the column holding the large JSON object where you can insert, update, and read this column normally as if it is text or anything else!

Can data from a sql query be inserted into elasticsearch?

I'm new to elasticsearch.I have learnt how to give different queries and get search results with the understanding that each document is stored in json format.Is it possible to insert records that were obtained from an sql query on a relational database?If it is possible,how is it done? by converting each record into json format?
You need to build an index in elasticsearch similar to the way you've got your tables in the RMDBS, this can be done in a lot of ways and it really depends on what data you would need to access via elasticsearch. You shouldnt just dump your complete RMDBS data into ES.
If you search around you may find bulk data importers/synchronisers/rivers(deprecated) for your RMDBS to ES, some of these can run in the background and keep the indexes in ES upto date with your RMDBS.
You can create your own code as well which updates ES whenever any data is changed in your RMDBS. Look into the API for your platform Elastic Search Client APIhttps://www.elastic.co/guide/en/elasticsearch/client/index.html

Excel SQL Server Data Connection

Perhaps someone could provide some insight into a problem I have.
I have a SQL Server database which receives information every hour and is updated from a stored procedure using a Bulk Insert. This all works fine, however the end result is to pull this information into Excel.
Establishing the data connection worked fine as well, until I attempted some calculations. The imported data is all formatted as text. Excel's number formats aren't working so I decided looking at the table in the database.
All the columns are set to varchar for the Bulk Insert to work so I changed a few to numeric. Refreshed in Excel and the calculations worked.
After repeat attempts I've not been able to get the Bulk Insert to work, even generating a format file with bcp it still returned errors on the insert. Could not convert varchar to numerical, after some further searching it was only failing on one numerical column which is generally empty.
Other than importing the data with VBA and converting it like that or adding zero to every imported value so Excel converts it.
Any suggestions are welcome.
Thanks!
Thanks for the replies I had considered using =value() in Excel but wanted to try avoid the additional formulas.
I was eventually able to resolve my problem by generating a format file for the Bulk Insert using the bcp utility. Though getting it to generte a file proved tricky enough below is an example of how I generated it.
At an elevated cmd:
C:\>bcp databasename.dbo.tablename format nul -c -x -f outputformatfile.xml -t, -S localhost\SQLINSTANCE -T
This generated an xml format file for the specific table. As my table had two additional columns which weren't in the source data I edited the XML and removed them. They were uniqueid and getdate columns.
Then I changed the Bulk Insert statement so it used the format file:
BULK INSERT [database].[dbo].[tablename]
FROM 'C:\bulkinsertdata.txt'
WITH (FORMATFILE='C:\outputformatfile.xml',FIRSTROW=3)
Using this method I was able to use the numeric and int datatypes successfully. Going back to Excel when the data connection was refreshed it was able to determine the correct datatypes.
Hope that helps someone!