[
{"link":"https://twitter.com/GreenAddress/status/550793651186855937",
"pDate":"2015 01 1",
"title":"GreenAddress",
"description": "btcarchitect coinkite blockchain circlebits coinbase bitgo some maybe some are oracle cosigners which require lesszero trust"},
{"link":"https://twitter.com/Bit_Swift/status/550765718581411840",
"pDate":"2015 01 1",
"title":"Bitswift™",
"description": "swiftstealth offers you privacy in bitswift v2 swiftstealth enables stealth address use on the bitswift blockchain swift"},
{"link":"https://twitter.com/allenday/status/550741133500772352",
"pDate":"2015 01 1",
"title":"Allen Day, PhD",
"description": "all in one article bitcoin blockchain 3dprinting drones and deeplearninghttp simondlr compost101071618938adecentralizedaivia simondlr"}
]
my test.json file like this
and my mysql db table is here
i can input text file with csv type, but i have no idea how input json text file on mysql
i try [create table test ( data json);] and
[insert into test values ( '{json type}'); but when i try input data with csv type LOAD DATA INFILE 'test.txt' made it possible
so I wonder json has the same functionality
thanks for any advice
MySQL does have JSON data field. However, it will not work with your file and current table structure as it request a field to be JSON. To solve your data, will require a little bit of programming work. Depending on your current ability, you will need to write codes that does the following:
Open a database connection
Read the JSON and loop through each value
Store each value using the following INSERT query:
INSERT INTO news(link, date, title, description) VALUES($link, $pDate, $title, $description);
Depending on your language and database connection feature, close the database connection.
Related
I have the following problem. We are using Azure SQL Database for the Data processing. Instead of wizard import every time we would like to automatically load the data through API from our accounting platforms. (API Documentation Links: https://hellocashapi.docs.apiary.io/#introduction/authentication , https://www.zoho.com/books/api/v3/)
Basically my task is to get the data through API from these platforms and create the table in our Azure SQL Database and insert this data therein.
Can anyone recommend me the platform to resolve this issue? or please send me the link with the documentation which would show me the way to do that.
Thank you.
If you can put the JSON on a SQL variable like this
DECLARE #json NVARCHAR(MAX) = N'[
{
"Order": {
"Number":"SO43659",
"Date":"2011-05-31T00:00:00"
},
"AccountNumber":"AW29825",
"Item": {
"Price":2024.9940,
"Quantity":1
}
},
{
"Order": {
"Number":"SO43661",
"Date":"2011-06-01T00:00:00"
},
"AccountNumber":"AW73565",
"Item": {
"Price":2024.9940,
"Quantity":3
}
}
]
Then you can create a table using the WITH clause
SELECT * INTO TableName1
FROM OPENJSON ( #json )
WITH (
Number varchar(200) '$.Order.Number',
Date datetime '$.Order.Date',
Customer varchar(200) '$.AccountNumber',
Quantity int '$.Item.Quantity',
[Order] nvarchar(MAX) AS JSON
)
Firstly, not all the API is supported as Source data in Data Factory.
Please reference this document: Azure Data Factory connector overview
Data Factory doesn't support hellocashAPI. That means do that with Data Factory.
Secondly, Data Factory now support supports creating a destination table automatically.
Referecec:Copy Activity in Azure Data Factory supports creating a destination table automatically.
Summary:
Load data faster with new support from the Copy Activity feature of Azure Data Factory. Now, if you’re trying to copy data from an any supported source into SQL database/data warehouse and find that the destination table doesn’t exist, Copy Activity will create it automatically. After the data ingestion, review and adjust the sink table schema as needed.
This feature is supported with:
Azure SQL Database
Azure SQL Database Managed Instance
Azure SQL Data Warehouse
SQL Server
Hope this helps.
I have Cassandra DB with data that has TTL of X hour's for every column value and this needs to be pushed to ElasticSearch Cluster real time.
I have seen past posts on StackOverflow that advise using tools such as LogStash or pushing data directly from application layer.
However, How can one preserve the TTL of the data imported once the data is copied in ES Version >=5.0?
There was once a field called _ttlwhich has been deprecated in ES 2.0 and removed in ES 5.0.
As of ES 5, there are now two official ways of preserving the TTL of your data. First make sure to create a TTL field in your ES documents that would be set to the creation date of your row in Cassandra + the TTL seconds. So if in Cassandra you have a record like this:
INSERT INTO keyspace.table (userid, creation_date, name)
VALUES (3715e600-2eb0-11e2-81c1-0800200c9a66, '2017-05-24', 'Mary')
USING TTL 86400;
Then in ES, you should export the following document to ES:
{
"userid": "3715e600-2eb0-11e2-81c1-0800200c9a66",
"name": "mary",
"creation_date": "2017-05-24T00:00:00.000Z",
"ttl_date": "2017-05-25T00:00:00.000Z"
}
Then you can either:
A. Use a cron that will regularly perform a delete by query based on one of your ttl_date field, i.e. call the following command from your cron:
curl -XPOST localhost:9200/your_index/_delete_by_query -d '{
"query": {
"range": {
"ttl_date": {
"lt": "now"
}
}
}
}'
B. Or use time-based indices and insert each document in an index matching it's ttl_date field. For instance, the above document would be inserted in the index named your_index-2017-05-25. Then with the curator tool you can easily delete indices that have expired.
SnappyData v.0-5
The issue I am having is that my JDBC Connection's Table metadata and Pulse Web App do not see the table I created below.
I create a table in SnappyData using the shell and a csv file.
Data is here (roads.csv):
"roadId","name"
"1","Road 1"
"2","Road 2"
"3","Road 3"
"4","Road 4"
"5","Road 5"
"6","Road 6"
"7","Road 7"
"8","Road 8"
"9","Road 9"
"10","Road 10"
==========================================================
snappy> CREATE TABLE STAGING_ROADS
(road_id string, name string)
USING com.databricks.spark.csv
OPTIONS(path '/home/ubuntu/data/example/roads.csv', header 'true');
snappy> select * from STAGING_ROADS
Returns 10 rows.
I have a SnappyData JDBC connection (DBVisualizer & SquirrelSQL show same).
I cannot see that table in the "TABLES" list from metadata.
However, if I do a "select * from STAGING_ROADS".
Returns 10 rows with CLOBs, which btw are completely unusable.
road_id | name
=====================
CLOB CLOB
CLOB CLOB
CLOB CLOB
CLOB CLOB
CLOB CLOB
CLOB CLOB
CLOB CLOB
CLOB CLOB
CLOB CLOB
CLOB CLOB
Second, the Pulse Web App does not register that I create the table when I did it from the snappy> shell. However, if I run a CREATE TABLE command from the JDBC client, it shows up there fine.
Am I doing something incorrectly? How can I get metadata about the tables I create in snappy> shell to show up in JDBC and Pulse as well?
The issue I am having is that my JDBC Connection's Table metadata and Pulse Web App do not see the table I created below.
This is a known issue (https://jira.snappydata.io/browse/SNAP-303). The JDBC metadata shows only the items in the store and not the external table. While the metadata issue is being tracked, Pulse webapp will not be able to see such external tables since it is designed to monitor the snappydata store.
A note: the "CREATE TABLE" DDL has been changed to "CREATE EXTERNAL TABLE" (https://github.com/SnappyDataInc/snappydata/pull/311) for sources outside of store to make things clearer.
How can I get metadata about the tables I create in snappy> shell to show up in JDBC and Pulse as well?
It will show up for internal SnappyData sources: column and row tables. For other providers in USING, they will not show up as mentioned.
CSV tables are usually useful for only loading data into column or row tables as in the example provided by #jagsr.
Didn't think creating a table using SQL where Spark.csv is the data source has been tested. Here is a related JIRA - https://jira.snappydata.io/browse/SNAP-416.
We have been suggesting folks to use a Spark Job to load the data in parallel. You can do this using the spark-shell also.
stagingRoadsDataFrame = snappyContext.read
.format("com.databricks.spark.csv")
.option("header", "true") // Use first line of all files as header
.option("inferSchema", "true") // Automatically infer data types
.load(someFile)
// Save Dataframe as a Row table
stagingRoadsDatFrame.write.format("row").options(props).saveAsTable("staging_roads")
That said, could you try (perhaps this might work)-
CREATE TABLE STAGING_ROADS (road_id varchar(100), name varchar(500))
Note that there is no 'String' as a data type in SQL. By default, with no knowledge of the max length we convert this to a CLOB. We are working to resolve this issue too.
I am trying to insert the contents of an JSON to a MySql database using Mule ESB. The JSON looks like:
{
"id":106636,
"client_id":9999,
"comments":"Credit",
"salesman_name":"Salvador Dali",
"cart_items":[
{"citem_id":1066819,"quantity":3},
{"citem_id":1066820,"quantity":10}
]
}
On mule I want to insert all data on a step like:
Insert INTO order_header(id,client_id,comments,salesman_name)
Insert INTO order_detail(id,citem_id,quantity)
Insert INTO order_detail(id,citem_id,quantity)
Currently i have come this far on Mule:
MuleSoft Flow
Use Bulk Execute operation of Database Connector.
You will insert into multiple tables.
for ex :
Query text
Insert INTO order_header(payload.id,payload.client_id,payload.comments,payload.salesman_name);
Insert INTO order_detail(payload.id,payload.cart_items[0].citem_id,payload.cart_items[0].quantity); etc..
There is an excellant article here http://www.dotnetfunda.com/articles/show/2078/parse-json-keys-to-insert-records-into-postgresql-database-using-mule
that should be of help. You may need to modify as you need to write the order_header data first and then use a collection splitter for the order_detail and wrap the whole in a transaction.
Ok. Since, you have already converted JSON into Object in the flow, you can refer individual values with their object reference like obj.id, obj.client_id etc.
Get a database connector next.
Configure your MySQL database in "Connector Configuration".
Operation: Choose "Bulk execute"
In "Query text" : Write multiple INSERT queries and pass appropriate values from Object (converted from JSON). Remember to separate multiple queries with semicolon (;) in Query text.
That's it !! Let me know if you face any issue. Hope it works for you..
Greenplum Database version:
PostgreSQL 8.2.15 (Greenplum Database 4.2.3.0 build 1)
SQL Server Database version:
Microsoft SQL Server 2008 R2 (SP1)
Our current approach:
1) Export each table to a flat file from SQL Server
2) Load the data into Greenplum with pgAdmin III using PSQL Console's psql.exe utility
Benifits...
Speed: OK, but is there anything faster? We load millions of rows of data in minutes
Automation: OK, we call this utility from an SSIS package using a Shell script in VB
Pitfalls...
Reliability: ETL is dependent on the file server to hold the flat files
Security: Lots of potentially sensitive data on the file server
Error handling: It's a problem. psql.exe never raises an error that we can catch even if it does error out and loads no data or a partial file
What else we have tried...
.Net Providers\Odbc Data Provider: We have configured a System DSN using DataDirect 6.0 Greenplum Wire Protocol. Good performance for a DELETE. Dog awful slow for an INSERT.
For reference, this is the aforementioned VB script in SSIS...
Public Sub Main()
Dim v_shell
Dim v_psql As String
v_psql = "C:\Program Files\pgAdmin III\1.10\psql.exe -d "MyGPDatabase" -h "MyGPHost" -p "5432" -U "MyServiceAccount" -f \\MyFileLocation\SSIS_load\sql_files\load_MyTable.sql"
v_shell = Shell(v_psql, AppWinStyle.NormalFocus, True)
End Sub
This is the contents of the "load_MyTable.sql" file...
\copy MyTable from '\\MyFileLocation\SSIS_load\txt_files\MyTable.txt' with delimiter as ';' csv header quote as '"'
If you're getting your data load done in minutes, then the current method is probably good enough. However, if you find yourself having to load larger volumes of data (terabyte scale for instance), the usual preferred method for bulk-loading into Greenplum is via gpfdist and corresponding EXTERNAL TABLE definitions. gpload is a decent wrapper that provides abstraction over much of this process and is driven by YAML control files. The general idea is that gpfdist instance(s) are spun up at the location(s) where your data is staged, preferrably as CSV text files, and then the EXTERNAL TABLE definition within Greenplum is made aware of the URIs for the gpfdist instances. From the admin guide, a sample definition of such an external table could look like this:
CREATE READABLE EXTERNAL TABLE students (
name varchar(20), address varchar(30), age int)
LOCATION ('gpfdist://<host>:<portNum>/file/path/')
FORMAT 'CUSTOM' (formatter=fixedwidth_in,
name=20, address=30, age=4,
preserve_blanks='on',null='NULL');
The above example expects to read text files whose fields from left to right are a 20-character (at most) string, a 30-character string, and an integer. To actually load this data into a staging table inside GP:
CREATE TABLE staging_table AS SELECT * FROM students;
For large volumes of data, this should be the most efficient method since all segment hosts are engaged in the parallel load. Do keep in mind that the simplistic approach above will probably result in a randomly distributed table, which may not be desirable. You'd have to customize your table definitions to specify a distribution key.