Trouble following Encrypted Big-Query tutorial document - json

I wanted to try out the encrypted big query client for google big query and I've been having some trouble.
I'm following the instructions outlined in this PDF:
https://docs.google.com/file/d/0B-WB8hYCrhZ6cmxfWFpBci1lOVE/edit
I get to the point where I'm running this command:
ebq load --master_key_filename="key_file" testdataset.cars cars.csv cars.schema
And I'm getting an error string which ends with:
raise ValueError("No JSON object could be decoded")
I've tried a few different formats for my .csv and .schema files but none have worked. Here are my latest versions.
cars.schema:
[{"name": "Year", "type": "integer", "mode": "required", "encrypt": "none"}
{"name": "Make", "type": "string", "mode": "required", "encrypt": "pseudonym"}
{"name": "Model", "type": "string", "mode": "required", "encrypt": "probabilistic_searchwords"}
{"name": "Description", "type": "string", "mode": "nullable", "encrypt": "searchwords"}
{"name": "Website", "type": "string", "mode": "nullable", "encrypt": "searchwords","searchwords_separator": "/"}
{"name": "Price", "type": "float", "mode": "required", "encrypt": "probabilistic"}
{"name": "Invoice_Price", "type": "integer", "mode": "required", "encrypt": "homomorphic"}
{"name": "Holdback_Percentage", "type": "float", "mode": "required", "encrypt":"homomorphic"}]
cars.csv:
1997,Ford,E350, "ac\xc4a\x87, abs, moon","www.ford.com",3000.00,2000,1.2
1999,Chevy,"Venture ""Extended Edition""","","www.cheverolet.com",4900.00,3800,2.3
1999,Chevy,"Venture ""Extended Edition, Very Large""","","www.chevrolet.com",5000.00,4300,1.9
1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof,loaded","www.chrysler.com/jeep/grand­cherokee",4799.00,3950,2.4

I believe the issue may be that you need to move the --master_key_filename argument before the load argument. If that doesn't work, can you send the output of adding --apilog=- as the first argument?
Also, there is an example script file of running ebq here:
https://code.google.com/p/bigquery-e2e/source/browse/#git%2Fsamples%2Fch13

Related

What is causing a csv load error in weka?

Im receiving the following error when trying to open a CSV file in Weka version 3.8.5
File not recognized as an 'CSV data files' file Reason: wrong number
of values. Read 2, expected 12, read Token [EOL], Line 2 Problem
encountered on Line:2
I have read solutions to similar errors on this site and can't seem to find what is wrong with my particular file. However, as a very newbie weka user, it may just be my misunderstanding of the issue. Can someone take a look at the sample csv data below and let me know if you see what I am not understnding or missing?
LossMonth,LossYear,ClaimNumber,PolicyNumber,ClaimBranch,Agency,LocationCounty,CATCode,CauseCode,IncurredLoss,CurrentReserves,"
City",State,ZIPCODE,"
COLLISIONTYPECD","
CLOSEDDT",DaystoCLose,"
FATALITYCNT","
FATALITYIND","
FAULTRATINGIND","
AUTOGLASSIND","
DEERLOSSIND","
WEATHERRELATEDIND","
POLICYTIERCD",ClaimStatus,AgencyHandled,VEHICLEYEAR,DRIVERRELATIONTOINSUREDDESC,TOTALLOSSIND,INSURANCESCORE,Age
10,2016,4125858,20169200,4,113,73,1,comp,2525,0,PADUCAH,KY,42001,x,42692,18,0,0,0,0,0,1,70,1,0,2004,Other third party,0,703,73
1,2018,4265645,20137828,13,106,37,1,hail,3164,0,BAGDAD,KY,40003,x,43214,88,0,0,0,0,0,0,50,1,0,2010,Named Insured,1,799,63
12,2016,4136759,20322058,5,105,105,1,hail,2547,0,GEORGETOWN,KY,40324,x,42713,2,0,0,0,0,0,0,10,1,0,2010,Named Insured,0,999,68
1,2016,4033032,20175699,13,106,106,1,comp,15327,0,SIMPSONVILLE,KY,40067,x,42469,73,0,0,0,0,0,1,80,1,0,2000,Named Insured,1,999,34
9,2016,4116782,20133146,2,115,115,1,wind,7529,0,SPRINGFIELD,KY,40069,x,42649,8,0,0,0,0,0,0,10,1,0,2003,Named Insured,0,783,47
2,2016,4038442,20170355,7,148,10,1,hail,3631,0,ASHLAND,KY,41101,x,42417,1,0,0,0,0,0,0,50,1,0,2010,Named Insured,0,778,42
2,2016,4039439,20218265,7,45,10,1,hail,3579,0,FLATWOODS,KY,41139,x,42444,25,0,0,0,0,0,0,40,1,0,2013,Named Insured,0,820,52
2,2016,4039440,20218265,7,45,10,1,hail,570,0,FLATWOODS,KY,41139,x,42422,3,0,0,0,0,0,0,40,1,0,2012,Named Insured,0,820,52
3,2018,4275810,20126522,15,40,40,1,hail,3747,0,LANCASTER,KY,40444,x,43216,55,0,0,0,0,0,0,10,1,0,2009,Named Insured,1,999,74
5,2016,4071936,20461965,15,40,40,1,hail,525,0,LANCASTER,KY,40444,x,42521,7,0,0,0,0,0,0,50,1,0,2006,Named Insured,0,999,68
3,2016,4046685,20226270,7,35,35,1,hail,3558,0,FLEMINGSBURG,KY,41041,x,42447,2,0,0,0,0,0,0,80,1,0,2012,Named Insured,0,842,69
4,2016,4055942,20439287,7,35,35,1,hail,2551,0,EWING,KY,41039,x,42475,1,0,0,0,0,0,0,70,1,0,2006,Named Insured,0,867,48
1,2016,4026514,20394097,7,148,10,1,hail,1350,0,ASHLAND,KY,41101,x,42376,3,0,0,0,0,0,0,40,1,0,2007,Named Insured,0,637,65
3,2016,4047152,20212062,15,141,76,1,hail,1739,0,BEREA,KY,40403,x,42473,27,0,0,0,0,0,0,80,1,0,2008,Named Insured,0,777,77
2,2016,4035512,20103029,15,40,40,1,hail,2008,0,LANCASTER,KY,40444,x,42405,1,0,0,0,0,0,0,0,1,0,2000,Named Insured,1,885,72
1,2016,4030456,20385643,15,120,40,1,hail,1497,0,LANCASTER,KY,40444,x,42450,62,0,0,0,0,0,0,20,1,0,2013,Named Insured,0,839,65
4,2016,4053299,20251610,5,69,11,1,hail,1535,0,DANVILLE,KY,40422,x,42514,48,0,0,0,0,0,0,100,1,0,2013,Insured,0,999,64
6,2016,4076264,20337992,17,140,1,1,hail,1799,0,MILLTOWN,KY,42728,x,42529,2,0,0,0,0,0,0,50,1,0,2002,Named Insured,0,999,84
8,2017,4217498,20596983,8,86,86,1,hail,660,0,TOMPKINSVILLE,KY,42167,x,42954,0,0,0,0,0,0,0,100,1,0,2012,Named Insured,0,999,45
1,2016,4026053,20511114,4,113,113,1,hail,1310,0,STURGIS,KY,42459,x,42376,3,0,0,0,0,0,0,100,1,0,2003,Named Insured,0,694,44
1,2016,4026766,20656586,4,113,113,1,hail,2360,0,MORGANFIELD,KY,42437,x,42383,9,0,0,0,0,0,0,20,1,0,2010,Named Insured,0,999,89
1,2016,4027473,20085251,6,42,42,1,hail,1699,0,MAYFIELD,KY,42066,x,42381,5,0,0,0,0,0,0,90,1,0,2008,Named Insured,0,747,50
1,2016,4029284,20167051,17,109,109,1,wind,3133,0,CAMPBELLSVILLE,KY,42718,x,42387,5,0,0,0,0,0,0,10,1,0,1993,Named Insured,0,886,78
1,2016,4031937,20326278,3,81,12,1,comp,3385,0,FOSTER,KY,41043,x,42402,8,0,0,0,0,0,1,40,1,0,2003,Named Insured,0,723,79
1,2016,4027931,20339366,8,107,107,1,wind,5858,0,FRANKLIN,KY,42134,x,42447,70,0,0,0,0,0,0,20,1,0,2014,Named Insured,0,940,80
1,2016,4028456,20453076,15,87,87,1,comp,2056,0,JEFFERSONVILLE,KY,40337,x,42387,7,0,0,0,0,0,1,100,1,0,2013,Named Insured,0,999,51
1,2016,4028597,20051661,4,113,113,1,hail,5320,0,WAVERLY,KY,42462,x,42712,332,0,0,0,0,0,0,20,1,0,2014,Named Insured,0,717,58
3,2016,4046687,20018268,6,42,42,1,hail,2736,0,MAYFIELD,KY,42066,x,42450,5,0,0,0,0,0,0,110,1,0,2012,Named Insured,0,735,73
9,2016,4116499,20128172,3,96,59,1,glss,320,0,TAYLOR MILL,KY,41015,x,42660,20,0,0,0,0,0,1,0,1,0,1997,Spouse,0,923,81
1,2016,4026247,20086164,4,113,113,1,hail,1611,0,MORGANFIELD,KY,42437,x,42376,3,0,0,0,0,0,0,10,1,0,2013,Named Insured,0,902,61
1,2016,4027222,20033936,6,79,79,1,glss,105,0,CALVERT CITY,KY,42029,x,42389,14,0,0,0,0,0,1,110,1,0,2001,Named Insured,0,772,57
1,2016,4028311,20059964,4,75,75,1,comp,1040,0,SACRAMENTO,KY,42372,x,42382,2,0,0,0,0,0,1,10,1,0,1996,Named Insured,0,999,64
1,2016,4029164,20541039,6,42,42,1,wind,1495,0,SEDALIA,KY,42079,x,42382,0,0,0,0,0,0,0,0,1,0,2008,Named Insured,0,756,67
1,2016,4027475,20085251,6,42,42,1,hail,940,0,MAYFIELD,KY,42066,x,42381,5,0,0,0,0,0,0,90,1,0,2013,Named Insured,0,747,50
1,2016,4030356,20007300,4,117,117,1,hail,6550,0,DIXON,KY,42409,x,42436,49,0,0,0,0,0,0,40,1,0,2009,Named Insured,0,864,34
Weka's CSVLoader cannot handle rows that span multiple lines (despite quoting). Once all your rows (header and data) are one per line, you should be fine.
The common-csv (unofficial) Weka package should be able to handle rows spanning multiple lines.

JSON Errors when trying to create a transfer_config in Google BigQuery CLI

I am trying to create a transfer job on the windows commandline with
bq mk --transfer_config --data_source=amazon_s3
--target_dataset=Usage --display_name='s3_transfer_installs_global_in_v0_test'
--params='{"data_path_template":"mybucket", "destination_table_name_template":"in_table", "file_format":"CSV", "max_bad_records":"0", "skip_leading_rows":"1", "allow_jagged_rows":"false", "allow_quoted_newlines":"true", "access_key_id":"dfadfadf", "secret_access_key":"sdfsfsdfsdf"}'
but I keep getting variations of the error
Too many positional args, still have ['"allow_quoted_newlines":"true","access_key_id":',...
Output from --apilog was also not enlightening.
My JSON validates, but there might still be escape characters needed maybe?
Any help very much appreciated, have been shuffling around quotation marks and backslashes for two hours now...
I got the same error as you when running your query.
I tried to replace double quotes with single quotes in --params option and it seems to be working. Try the following:
bq mk --transfer_config --data_source=amazon_s3 --target_dataset=Usage --display_name='s3_transfer_installs_global_in_v0_test' --params="{'data_path_template':'mybucket', 'destination_table_name_template':'in_table', 'file_format':'CSV', 'max_bad_records':'0', 'skip_leading_rows':'1', 'allow_jagged_rows':'false', 'allow_quoted_newlines':'true', 'access_key_id':'dfadfadf', 'secret_access_key':'sdfsfsdfsdf'}"
I also tried to run the original command in Windows PowerShell and it worked without any changes.
I think the problem is in Windows cmd...

Golang CSV read : extraneous " in field error

I am using a simple program to read CSV file, somehow I noticed when I created a CSV using EXCEL or windows based computer go library fails to read it. even when I use cat command it only shows me last line on the terminal. It always results in this error extraneous " in field.
I researched somewhat than I found it is somewhat related to carriage return differences between OS.
But I really want to ask how to make a generic csv reader. I tried reading the same csv using pandas and it was reading successfully. But i am not been able to achieve this using my Go code.
Also screen shot of correct csv Is here
Your file clearly shows that you've got an extra quote at the end of the content. While programs like pandas may be fine with that, I assume it's not valid csv so go does return an error.
Quick example of what's wrong with your data: https://play.golang.org/p/KBikSc1nzD
Update: After your update and a little bit of searching, I have to apoligize, the carriage return does matter and seems to be tha main culprit here, Go seems to be ok handling the \r\n windows variant but not the \r one. In that case what you can do is wrap the bytes.Reader into a custom reader that replaces the \r byte with the \n byte.
Here's an example: https://play.golang.org/p/vNjzwAHmtg
Please note, that the example is just that, an example, it's not handling all the possible cases where \r might be a legit byte.

How to load OSM (GeoJSON) data to ArangoDB?

How I can load OSM data to ArangoDB?
I loaded data sed named luxembourg-latest.osm.pbf from OSM, than converted it to JSON with OSMTOGEOJSON, after I tried to load result geojson to ArangoDB with next command: arangoimp --file out.json --collection lux1 --server.database geodb and got hude list of errors:
...
2017-03-17T12:44:28Z [7712] WARNING at position 719386: invalid JSON type (expecting object, probably parse error), offending context: ],
2017-03-17T12:44:28Z [7712] WARNING at position 719387: invalid JSON type (expecting object, probably parse error), offending context: [
2017-03-17T12:44:28Z [7712] WARNING at position 719388: invalid JSON type (expecting object, probably parse error), offending context: 5.867441,
...
What I am doing wrong?
upd: it's seems that converter osm2json converter should be run with option osmtogeojson --ndjson that produce items not as single Json, but in line by line mode.
As #dmitry-bubnenkov already found out, --ndjson is required to produce the right input for ArangoImp.
One has to know here, that ArangoImp expects a JSON-Subset (since it doesn't parse the json on its own) dubbed as JSONL.
Thus, Each line of the JSON-File is expected to become one json document in the collection after the import. To maximize performance and simplify the implementation, The json is not completely parsed before sending it to the server.
It tries to chop the JSON into chunks with the maximum request size that the server permits. It leans on the JSONL-line endings to isolate possible chunks.
However, the server expects valid JSON for sure. Sending the chopped part to the server with possibly incomplete JSON documents will lead to parse errors on the server, which is the error message you saw in your output.

load csv file into BQ - too many positional args

I tried loading a sample data file [csv] in BQ. Since CSV has header I wanted to skip first row, Following is the code
project_id1> load prodtest.prod_det_test gs://bucketname/Prod_det.csv prodno:integer,prodname:string,instock:integer --skip_leading_rows=1
Issue: Too many positional args, still have['--skip_leading_rows=1']. Please suggest how to resolve this issue?
This should work:
bq load --skip_leading_rows=1 prodtest.prod_det_test gs://bucketname/Prod_det.csv prodno:integer,prodname:string,instock:integer
The -- arguments come at the beginning.