Pretty much what the title says :)
At the moment I'm using Python to turn the json data into a plain-text tab-separated file, and then mysqlimport to pull that into my MySQL tables. Anyone know a nicer / more direct way?
check this from Ayende.. might be some help..
http://ayende.com/Blog/archive/2010/03/05/actual-scenario-testing-with-raven.aspx
http://ayende.com/Blog/category/564.aspx
http://ayende.com/Blog/archive/2010/03/23/first-look-at-raven-db.aspx
Related
Anybody have an example (velocity template or other...) for exporting an instance table from magicdraw (or cameo) in a json format?
I'm just investigating right now so close ideas are welcome.
I have a simple JSON file that I would like to insert into an HBase table.
My JSON file has the following format:
{
"word1":{
"doc_01":4,
"doc_02":7
},
"word2":{
"doc_06":1,
"doc_02":3,
"doc_12":8
}
}
The HBase table is called inverted_index, it has one column family matches.
I would like to use the keys word1,word2, etc as row keys and their values to be inserted in the column family matches.
I know that Hive supports JSON parsing and I've already tried it, but only when I know the keys in the JSON beforehand to access the records.
My problem is that I don't know what or How many words my JSON file contains, or How many matches each word will have (it can't be empty though).
My question: Is this even doable using hive only? If so, kindly provide some pointers on what hive queries/functions to use to achieve that.
If not, is there any alternative to tackle this? Thanks in advance.
I am reading data from a flat file and storing them in a OLE DB destination.
In the flat file, dates and in various formats
17/02/2014,
28-Apr-14,
30.06.14
I have used a Derived column transformation to check for empty columns and replace it with null. As far as I have seen SSIS and data base accepts 17/02/2014 format and 28-Apr-14, 30.06.14 are rejected.
I want to convert 28-Apr-14 and 30.06.14 to a valid format which DB accepts.
I have researched a bit and read that Script task can do it but I am not sure of the code how to check this.
Could you please guide which is the best way to do this.
Any suggestions/help is much appreciated.
Thanks
Rao
Check out my answer on the following question. This should work for you as well.
SSIS clean up date from csv file
I am downloading CSV files which are comma-separated. The problem i'm having is that the commas are screwing-up my import into a database table (SQL Server). For example, I have a header row called hotel_name, but some of the names are like the following:
HOTEL_NAME
hilton
cambridge,the
The problem is that fields containing a comma in the hotel name will move to the adjacent column, like this I'm wondering if converting from CSV to a pipe-delimited format will work.
The problem i'm having is that i'm not sure how to get started. I've tried following the Powershell documentation but get basic errors. I think this is because i'm new to Powershell and not understanding something. Can someone please post a script of how to change the comma-separated file to a pipe-delimited file?
Sorry if this is confusing, i'm finding the formatting on StackOverflow to be a bit crazy.
Taken from Dealing with commas in a CSV file
Use " to wrap data that contains a comma.
For example
Server000,"Microsoft(R) Windows(R) Server 2003, Enterprise Edition"
Motivation: I want to load the data into Apache Drill. I understand that Drill can handle JSON input, but I want to see how it performs on Parquet data.
Is there any way to do this without first loading the data into Hive, etc and then using one of the Parquet connectors to generate an output file?
Kite has support for importing JSON to both Avro and Parquet formats via its command-line utility, kite-dataset.
First, you would infer the schema of your JSON:
kite-dataset json-schema sample-file.json -o schema.avsc
Then you can use that file to create a Parquet Hive table:
kite-dataset create mytable --schema schema.avsc --format parquet
And finally, you can load your JSON into the dataset.
kite-dataset json-import sample-file.json mytable
You can also import an entire directly stored in HDFS. In that case, Kite will use a MR job to do the import.
You can actually use Drill itself to create a parquet file from the output of any query.
create table student_parquet as select * from `student.json`;
The above line should be good enough. Drill interprets the types based on the data in the fields. You can substitute your own query and create a parquet file.
To complete the answer of #rahul, you can use drill to do this - but I needed to add more to the query to get it working out of the box with drill.
create table dfs.tmp.`filename.parquet` as select * from dfs.`/tmp/filename.json` t
I needed to give it the storage plugin (dfs) and the "root" config can read from the whole disk and is not writable. But the tmp config (dfs.tmp) is writable and writes to /tmp. So I wrote to there.
But the problem is that if the json is nested or perhaps contains unusual characters, I would get a cryptic
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: java.lang.IndexOutOfBoundsException:
If I have a structure that looks like members: {id:123, name:"joe"} I would have to change the select to
select members.id as members_id, members.name as members_name
or
select members.id as `members.id`, members.name as `members.name`
to get it to work.
I assume the reason is that parquet is a "column" store so you need columns. JSON isn't by default so you need to convert it.
The problem is I have to know my json schema and I have to build the select to include all the possibilities. I'd be happy if some knows a better way to do this.