How can I get ImageDatasetImportDataOp to update labels? - json
In a Vertex AI pipeline I am updating an image dataset, thus:
ds_op = gcc_aip.ImageDatasetImportDataOp(
project=project,
dataset=get_dataset_id_op.outputs['dataset'],
gcs_source=DATASET_PATH,
import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification
)
I have tried adding images, updating the csv file with their path and label and uploading this to GCS. Then I run the pipe, the images are uploaded to the dataset but their labels are ignored and they are classed as Unlabeled. What am I doing wrong? TIA!
UPDATE: I am trying to use 'data_item_labels (JsonObject): Labels that will be applied to newly imported DataItems.' but I don't know what format is expected. i have tried JSON, csv, json lines etc but keep getting
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)'
errors.
UPDATE 2: finally figured out I should be passing a JSON object not a file uri, but I have tried everything I can think of and I either get JSON errors or "Invalid data_item_labels.".
Related
Expecting value: line 1 column 1 (char 0) problem
i want json file open for object detection (yolo v5 or yolo v7) so i have 51 ten thousand images and json labeling data enter image description here but i don't solve this problem (i tried googleing but yet...) jupyter notebook not print this problem i think be a major cause json file problem or enter image description here colab memory issue so how i solev this problem? please help me
You are possibly using the open function wrongly. Now don't quote me on this but if you are trying to read data from a JSON file you would need to use the key word 'r' like so: with open(filename, 'r') as f: data = json.load(f) 'r' stands for read.
What is causing a csv load error in weka?
Im receiving the following error when trying to open a CSV file in Weka version 3.8.5 File not recognized as an 'CSV data files' file Reason: wrong number of values. Read 2, expected 12, read Token [EOL], Line 2 Problem encountered on Line:2 I have read solutions to similar errors on this site and can't seem to find what is wrong with my particular file. However, as a very newbie weka user, it may just be my misunderstanding of the issue. Can someone take a look at the sample csv data below and let me know if you see what I am not understnding or missing? LossMonth,LossYear,ClaimNumber,PolicyNumber,ClaimBranch,Agency,LocationCounty,CATCode,CauseCode,IncurredLoss,CurrentReserves," City",State,ZIPCODE," COLLISIONTYPECD"," CLOSEDDT",DaystoCLose," FATALITYCNT"," FATALITYIND"," FAULTRATINGIND"," AUTOGLASSIND"," DEERLOSSIND"," WEATHERRELATEDIND"," POLICYTIERCD",ClaimStatus,AgencyHandled,VEHICLEYEAR,DRIVERRELATIONTOINSUREDDESC,TOTALLOSSIND,INSURANCESCORE,Age 10,2016,4125858,20169200,4,113,73,1,comp,2525,0,PADUCAH,KY,42001,x,42692,18,0,0,0,0,0,1,70,1,0,2004,Other third party,0,703,73 1,2018,4265645,20137828,13,106,37,1,hail,3164,0,BAGDAD,KY,40003,x,43214,88,0,0,0,0,0,0,50,1,0,2010,Named Insured,1,799,63 12,2016,4136759,20322058,5,105,105,1,hail,2547,0,GEORGETOWN,KY,40324,x,42713,2,0,0,0,0,0,0,10,1,0,2010,Named Insured,0,999,68 1,2016,4033032,20175699,13,106,106,1,comp,15327,0,SIMPSONVILLE,KY,40067,x,42469,73,0,0,0,0,0,1,80,1,0,2000,Named Insured,1,999,34 9,2016,4116782,20133146,2,115,115,1,wind,7529,0,SPRINGFIELD,KY,40069,x,42649,8,0,0,0,0,0,0,10,1,0,2003,Named Insured,0,783,47 2,2016,4038442,20170355,7,148,10,1,hail,3631,0,ASHLAND,KY,41101,x,42417,1,0,0,0,0,0,0,50,1,0,2010,Named Insured,0,778,42 2,2016,4039439,20218265,7,45,10,1,hail,3579,0,FLATWOODS,KY,41139,x,42444,25,0,0,0,0,0,0,40,1,0,2013,Named Insured,0,820,52 2,2016,4039440,20218265,7,45,10,1,hail,570,0,FLATWOODS,KY,41139,x,42422,3,0,0,0,0,0,0,40,1,0,2012,Named Insured,0,820,52 3,2018,4275810,20126522,15,40,40,1,hail,3747,0,LANCASTER,KY,40444,x,43216,55,0,0,0,0,0,0,10,1,0,2009,Named Insured,1,999,74 5,2016,4071936,20461965,15,40,40,1,hail,525,0,LANCASTER,KY,40444,x,42521,7,0,0,0,0,0,0,50,1,0,2006,Named Insured,0,999,68 3,2016,4046685,20226270,7,35,35,1,hail,3558,0,FLEMINGSBURG,KY,41041,x,42447,2,0,0,0,0,0,0,80,1,0,2012,Named Insured,0,842,69 4,2016,4055942,20439287,7,35,35,1,hail,2551,0,EWING,KY,41039,x,42475,1,0,0,0,0,0,0,70,1,0,2006,Named Insured,0,867,48 1,2016,4026514,20394097,7,148,10,1,hail,1350,0,ASHLAND,KY,41101,x,42376,3,0,0,0,0,0,0,40,1,0,2007,Named Insured,0,637,65 3,2016,4047152,20212062,15,141,76,1,hail,1739,0,BEREA,KY,40403,x,42473,27,0,0,0,0,0,0,80,1,0,2008,Named Insured,0,777,77 2,2016,4035512,20103029,15,40,40,1,hail,2008,0,LANCASTER,KY,40444,x,42405,1,0,0,0,0,0,0,0,1,0,2000,Named Insured,1,885,72 1,2016,4030456,20385643,15,120,40,1,hail,1497,0,LANCASTER,KY,40444,x,42450,62,0,0,0,0,0,0,20,1,0,2013,Named Insured,0,839,65 4,2016,4053299,20251610,5,69,11,1,hail,1535,0,DANVILLE,KY,40422,x,42514,48,0,0,0,0,0,0,100,1,0,2013,Insured,0,999,64 6,2016,4076264,20337992,17,140,1,1,hail,1799,0,MILLTOWN,KY,42728,x,42529,2,0,0,0,0,0,0,50,1,0,2002,Named Insured,0,999,84 8,2017,4217498,20596983,8,86,86,1,hail,660,0,TOMPKINSVILLE,KY,42167,x,42954,0,0,0,0,0,0,0,100,1,0,2012,Named Insured,0,999,45 1,2016,4026053,20511114,4,113,113,1,hail,1310,0,STURGIS,KY,42459,x,42376,3,0,0,0,0,0,0,100,1,0,2003,Named Insured,0,694,44 1,2016,4026766,20656586,4,113,113,1,hail,2360,0,MORGANFIELD,KY,42437,x,42383,9,0,0,0,0,0,0,20,1,0,2010,Named Insured,0,999,89 1,2016,4027473,20085251,6,42,42,1,hail,1699,0,MAYFIELD,KY,42066,x,42381,5,0,0,0,0,0,0,90,1,0,2008,Named Insured,0,747,50 1,2016,4029284,20167051,17,109,109,1,wind,3133,0,CAMPBELLSVILLE,KY,42718,x,42387,5,0,0,0,0,0,0,10,1,0,1993,Named Insured,0,886,78 1,2016,4031937,20326278,3,81,12,1,comp,3385,0,FOSTER,KY,41043,x,42402,8,0,0,0,0,0,1,40,1,0,2003,Named Insured,0,723,79 1,2016,4027931,20339366,8,107,107,1,wind,5858,0,FRANKLIN,KY,42134,x,42447,70,0,0,0,0,0,0,20,1,0,2014,Named Insured,0,940,80 1,2016,4028456,20453076,15,87,87,1,comp,2056,0,JEFFERSONVILLE,KY,40337,x,42387,7,0,0,0,0,0,1,100,1,0,2013,Named Insured,0,999,51 1,2016,4028597,20051661,4,113,113,1,hail,5320,0,WAVERLY,KY,42462,x,42712,332,0,0,0,0,0,0,20,1,0,2014,Named Insured,0,717,58 3,2016,4046687,20018268,6,42,42,1,hail,2736,0,MAYFIELD,KY,42066,x,42450,5,0,0,0,0,0,0,110,1,0,2012,Named Insured,0,735,73 9,2016,4116499,20128172,3,96,59,1,glss,320,0,TAYLOR MILL,KY,41015,x,42660,20,0,0,0,0,0,1,0,1,0,1997,Spouse,0,923,81 1,2016,4026247,20086164,4,113,113,1,hail,1611,0,MORGANFIELD,KY,42437,x,42376,3,0,0,0,0,0,0,10,1,0,2013,Named Insured,0,902,61 1,2016,4027222,20033936,6,79,79,1,glss,105,0,CALVERT CITY,KY,42029,x,42389,14,0,0,0,0,0,1,110,1,0,2001,Named Insured,0,772,57 1,2016,4028311,20059964,4,75,75,1,comp,1040,0,SACRAMENTO,KY,42372,x,42382,2,0,0,0,0,0,1,10,1,0,1996,Named Insured,0,999,64 1,2016,4029164,20541039,6,42,42,1,wind,1495,0,SEDALIA,KY,42079,x,42382,0,0,0,0,0,0,0,0,1,0,2008,Named Insured,0,756,67 1,2016,4027475,20085251,6,42,42,1,hail,940,0,MAYFIELD,KY,42066,x,42381,5,0,0,0,0,0,0,90,1,0,2013,Named Insured,0,747,50 1,2016,4030356,20007300,4,117,117,1,hail,6550,0,DIXON,KY,42409,x,42436,49,0,0,0,0,0,0,40,1,0,2009,Named Insured,0,864,34
Weka's CSVLoader cannot handle rows that span multiple lines (despite quoting). Once all your rows (header and data) are one per line, you should be fine. The common-csv (unofficial) Weka package should be able to handle rows spanning multiple lines.
Can't display CSV file in pyspark(ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling)
I'm getting an error while displaying a CSV file through Pyspark. I've attached the PySpark code and CSV file that I used. from pyspark.sql import * spark.conf.set("fs.azure.account.key.xxocxxxxxxx","xxxxx") time_on_site_tablepath= "wasbs://dwpocblob#dwadfpoc.blob.core.windows.net/time_on_site.csv" time_on_site = spark.read.format("csv").options(header='true', inferSchema='true').load(time_on_site_tablepath) display(time_on_site.head(50)) The error is shown below ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling CSV file format is attached below time_on_site:pyspark.sql.dataframe.DataFrame next_eventdate:timestamp barcode:integer eventdate:timestamp sno:integer eventaction:string next_action:string next_deviceid:integer next_device:string type_flag:string site:string location:string flag_perimeter:integer deviceid:integer device:string tran_text:string flag:integer timespent_sec:integer gg:integer CSV file data is attached below next_eventdate,barcode,eventdate,sno,eventaction,next_action,next_deviceid,next_device,type_flag,site,location,flag_perimeter,deviceid,device,tran_text,flag,timespent_sec,gg 2018-03-16 05:23:34.000,1998296,2018-03-14 18:50:29.000,1,IN,OUT,2,AGATE-R02-AP-Vehicle_Exit,,NULL,NULL,1,1,AGATE-R01-AP-Vehicle_Entry,Access Granted,0,124385,0 2018-03-17 07:22:16.000,1998296,2018-03-16 18:41:09.000,3,IN,OUT,2,AGATE-R02-AP-Vehicle_Exit,,NULL,NULL,1,1,AGATE-R01-AP-Vehicle_Entry,Access Granted,0,45667,0 2018-03-19 07:23:55.000,1998296,2018-03-17 18:36:17.000,6,IN,OUT,2,AGATE-R02-AP-Vehicle_Exit,,NULL,NULL,1,1,AGATE-R01-AP-Vehicle_Entry,Access Granted,1,132458,1 2018-03-21 07:25:04.000,1998296,2018-03-19 18:23:26.000,8,IN,OUT,2,AGATE-R02-AP-Vehicle_Exit,,NULL,NULL,1,1,AGATE-R01-AP-Vehicle_Entry,Access Granted,0,133298,0 2018-03-24 07:33:38.000,1998296,2018-03-23 18:39:04.000,10,IN,OUT,2,AGATE-R02-AP-Vehicle_Exit,,NULL,NULL,1,1,AGATE-R01-AP-Vehicle_Entry,Access Granted,0,46474,0 What could be done to load the CSV file successfully?
There is no issue in your syntax, it's working fine. The issue is in your data of CSV file, where the column named as type_flag have only None(null) values, So it doesn't infer it's Datatype. So, here are two options. you can display the data without using head(). Like display(time_on_site) If you want to use head() then you need to replace the null value, at here I replaced it with the empty string(''). time_on_site = time_on_site.fillna('') display(time_on_site.head(50))
For some reason, probably a bug, even if you provide a schema on the spark.read.schema(my_schema).csv('path') call you get the same error on a display(df.head()) call the display(df) works though, but it gave me a WTF moment.
Python loading multiple JSON values on one line
I have a file (not a valid JSON file) that looks similar to this: [[0,0,0],[0,0,0],[0,0,0]]["testing", "foo", "bar"] These are two (or more), non-delimited, valid JSON values that I need to load in from STDIN. I tried just using the following (in Python 3.7): for line in sys.stdin: stripped = line.strip() if not stripped: break x = loads(stripped) But that gave the error json.decoder.JSONDecodeError: Extra data: line 1 column 118 (char 117) which makes sense, as it can only load one JSON value at a time. How would I go about loading in multiple of these values from STDIN when they are not delimited? Is there a way to check if the JSON loader successfully completed a load and then start another one from the same line?
Read a log file in R
I'm trying to read a log file in R. It looks like an extract from a JSON file to me, but when trying to read it using jsonlite I get the following error message: "Error: parse error: trailing garbage". Here is how my log file look like: {"date":"2017-05-11T04:37:15.587Z","userId":"admin","module":"Quote","action":"CreateQuote","identifier":"-.admin1002"}, {"date":"2017-05-11T05:12:24.939Z","userId":"a145fhyy","module":"Quote","action":"Call","identifier":"RunUY"}, {"date":"2017-05-11T05:12:28.174Z","userId":"a145fhyy","license":"named","usage":"External","module":"Catalog","action":"OpenCatalog","identifier":"wks.klu"}, Has you can see, the column name is precised directly in front of the content for each line (e.g: "date": or "action":) And some line can skip some columns and add some other. What I want to get as output would be to have 7 columns with the corresponding data filled in each: date userId license usage module action identifier Does anyone has a suggestion about how to get there? Thanks a lot in advance ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// Thanks everyone for your answers. Here are some precisions about my issue: The data that I gave as example in an extract of one of my log files. I've got a lot of them that I need to read as one unique table. I haven't added any commas or anything to it. #r2evans I've tried the following: Log3 <-read.table("/Projects/data/analytics.log.agregated.2017-05-11.log") jsonlite::stream_in(textConnection(gsub(",$","",Log3))) It returns the following error: Error: lexical error: invalid char in json text. c(17, 18, 19, 20, 21, 22, 23, 2 (right here) ------^ I'm not sure how to use sed -e 's/,$//g' infile > outfile and Sys.which("sed"), that something I'm not familiar with. I'm looking into it, but if you have anymore precisions to give me about the usage of it that would be great.
I have saved your example as a file "test.json" and was able to read and parse it like this: library(jsonlite) rf <- read_file("test.json") rfr <- gsub("\\},", "\\}", rf) data <- stream_in(textConnection(rfr)) It parses and simplifies into a neat data frame exactly like you want. What I do is look for "}," rather than ",$", because the very last comma is not (necessarily) followed by a newline character(s). However, this might not be the best solution for very large files.. For those you may need to first look for a way to modify the text file itself by getting rid of the commas. Or, if that's possible, ask the people who exported this file to export it in a normal ndjson format:-)