I am trying to evaluate InfluxDB as a real time, time series data visualization tool. I have an account with InfluDB and I have created a bucket for data storage. I now want to upload a csv file into the bucket via the click to upload feature but I keep getting errors associated with incorrect annotations. The last error I received was:
'Failed to upload the selected CSV: error in csv.from(): failed to read metadata: failed to read header row: wrong number of fields'
I have tried to decipher their docs and examples on how to annotate a csv file and have tried many different combinations of #datatype, #group and #default but nothing works.
This is the latest attempt that generated the error above.
#datatype,string,string,double,dateTime
#group,true,true,false,false
#default,,,,
_measurement,station,_value,_time
device,MBL,-0.814075542,1.65E+18
device,MBL,-0.837942395,1.65E+18
device,MBL,-0.862699339,1.65E+18
device,MBL,-0.891686336,1.65E+18
device,MBL,-0.891492408,1.65E+18
device,MBL,-0.933193098,1.65E+18
device,MBL,-0.933193098,1.65E+18
device,MBL,-0.976859072,1.65E+18
device,MBL,-0.981019863,1.65E+18
device,MBL,-1.011647128,1.65E+18
device,MBL,-1.017813258,1.65E+18
Any thoughts would be greatly appreciated. Thanks.
From the sample data above, I assume "device" is the name of a measurement and "MBL" is a tag whose name is station. Hence, there is 1 measurement and 1 tag, 1 field and a timestamp.
And you are mixing data types and line protocol elements when using annotated CSV. You could try following version:
#datatype,measurement,tag,double,dateTime
#default device,MBL,
thisIsYouMeasurementName,station,thisIsYourFieldKeyName,time
device,MBL,-0.814075542,1652669077000000000
device,MBL,-0.837942395,1652669077000000001
device,MBL,-0.862699339,1652669077000000002
device,MBL,-0.891686336,1652669077000000003
device,MBL,-0.891492408,1652669077000000004
device,MBL,-0.933193098,1652669077000000005
device,MBL,-0.933193098,1652669077000000006
device,MBL,-0.976859072,1652669077000000007
device,MBL,-0.981019863,1652669077000000008
device,MBL,-1.011647128,1652669077000000009
device,MBL,-1.017813258,1652669077000000010
Note that time column should avoid using "1.65E+18". See more details here.
I am working on a school project for data mining, where we were given CSV data from kaggle (this is how the data looks (2 lines out of 6970)):
4,1970,Female,150,DomesticPartnersKids,Bachelor's Degree,Democrat,,Yes,No,No,No,Yes,Public,No,Yes,No,Yes,No,No,Yes,Science,Study first,Yes,Yes,No,No,Receiving,No,No,Pragmatist,No,No,Cool headed,Standard hours,No,Happy,Yes,Yes,Yes,No,A.M.,No,End,Yes,No,Me,Yes,Yes,No,Yes,No,Mysterious,No,No,,,,,,,,,,Mac,Yes,Cautious,No,Umm...,No,Space,Yes,In-person,No,Yes,Yes,No,Yay people!,Yes,Yes,Yes,Yes,Yes,No,Yes,,,,,,,,,,,,,,,,,No,No,No,Only-child,Yes,No,No
5,1997,Male,75,Single,High School Diploma,Republican,,Yes,Yes,No,,Yes,Private,No,No,No,Yes,No,No,Yes,Science,Study first,,Yes,No,Yes,Receiving,No,Yes,Pragmatist,No,Yes,Cool headed,Odd hours,No,Right,Yes,No,No,Yes,A.M.,Yes,Start,Yes,Yes,Circumstances,No,Yes,No,Yes,Yes,Mysterious,No,No,Tunes,Technology,Yes,Yes,Yes,Yes,No,Supportive,No,PC,No,Cautious,No,Umm...,No,Space,No,In-person,No,No,Yes,Yes,Grrr people,Yes,No,No,No,No,No,No,Yes,No,No,Yes,No,Own,Pessimist,Mom,No,No,No,No,Nope,Yes,No,No,No,Yes,No,Yes,No,Yes,No
and we have to get this to an .arff format for use in weka. I manualy typed the header(107 attributes)
#ATTRIBUTE user_id NUMERIC
#ATTRIBUTE yob NUMERIC
#ATTRIBUTE gender {Male,Female}
#ATTRIBUTE income {150,100,75,50,25,10}
#ATTRIBUTE householdstatus {MarriedKids,Married,DomesticPartnersKids,DomesticPartners,Single,SingleKids}
#ATTRIBUTE educationlevel {Bachelor's Degree,High School Diploma,Current K-12,Current Undergraduate,Master's Degree,Associate's Degree,Doctoral Degree}
#ATTRIBUTE party {Democrat,Republican}
#ATTRIBUTE Q124742 {Yes,No}
#ATTRIBUTE Q124122 {Yes,No}
and I get this error :
} expected at end of enumeration read token eol
Then I tried to use the weka converter but it gave me an error
Wrong number of values.Read 2,expected 1,read Token[EOL],line 4 Problem encountered at line:3
Here's what I did:
From Kaggle, I downloaded train.csv (5568 instances, highest ID numbeer 6960).
I didn't use the converter -- just loaded it into the Weka Explorer as a CSV file. Some problems and their solution:
Line 3: First instance of "Bachelor's Degree". It did NOT like that single quote ("line 3, read 7, expected 108"). Got rid of all single quotes (using a global replace in a text editor). Then I tried to load it into Weka again.
The file doesn't have a CR (the Enter key on the keyboard) at the end of the last line, which caused an error ("null on line 5569"). I added one, again in a text editor. Then I loaded it into Weka, and took a look at the variables.
YOB (Year of Birth) is missing for about 300 instances, with "NA" filled in. So, it didn't evaluate as either string or numeric. Edited these to be empty cells instead. Then I loaded it into Weka.
And, of course, moved Party to be the class variable (at the end). I did this in Weka.
Saved this as train.arff
Loaded it back in, and it seems to work OK. I generated 51% accuracy with a OneR classifier, but you wouldn't expect a OneR classifier to work well here. I'm sure you can do better.
Note I didn't do any manual typing of headers. That must have taken a while!
Good luck!
File "/home/malikarumi/Projects/cannon/local/lib/python2.7/site-packages/django/db/models/fields/__init__.py", line 2390, in get_db_prep_value
value = uuid.UUID(value)
File "/usr/lib/python2.7/uuid.py", line 134, in __init__
raise ValueError('badly formed hexadecimal UUID string')
ValueError: Problem installing fixture '/home/malikarumi/Projects/cannon/jamf/essell/fixtures/test22byhand.json': badly formed hexadecimal UUID string
I've found the following links so far:
https://github.com/dcramer/django-uuidfield/issues/40
https://github.com/dcramer/django-uuidfield/commit/caae1bc4e45445a06dd11bb22da6a9f07395f78a
Django UUIDField modelfield causes error in Django admin: badly formed hexadecimal UUID string
Django Primary Key: badly formed hexadecimal UUID string
I counted my uuidfield value. It is len=36, because it has dashes in it. At least the string representation I can see is that way. So I replaced it with the same alphanumeric without dashes, as suggested as a test by the bugfix, but I still got the same result.
I checked the model, but there is no max length on any uuid field, nor on the fk link back to the uuid. There's nothing on the fk to suggest it is, or should be limited to, chars, ints, uuids, etc.
Then I found this: http://arthurpemberton.com/2015/04/fixing-uuid-is-not-json-serializable which I hacked into /python2.7/site-packages/django/core/serializers/python.py. The blogger had put it into models.py. But I got the same error, before realizing it was NOT coming from serializers/python.py, as it was yesterday, but from /usr/lib/python2.7/uuid.py, line 134, in init. the relevant portions of that code are:
if hex is not None:
hex = hex.replace('urn:', '').replace('uuid:', '')
hex = hex.strip('{}').replace('-', '')
if len(hex) != 32:
raise ValueError('badly formed hexadecimal UUID string')
int = long(hex, 16)
Rather than try to hack more core code, given that the indication is the problem is json, not Python, I left this alone for now.
Finally, I looked at this:
https://code.djangoproject.com/ticket/24012
It is stated a couple of times here that Django's "UUIDField generates UUIDs in Python". Now here is some history. I created one row, a single instance of Model A into Django with a fixture that had no uuid and no datefield and had no issues. (The uuidfield is on an abstract model, so it is created when the object is created). I did that because I needed the uuid of that Model A instance for a fk field in Model B, which is the one I am struggling with now. I did that by copy pasting the Model A uuid into the fk field on Model B in a csv file which I then converted to json in order to use it as a fixture.
Is it possible that the uuid ran into problems in this copy paste maneuver, before the conversion to json?
If not, that means even though it was an acceptable Python object when it was created, going thru the json conversion messed it up, correct?
If that's the case, what is a workaround?
Can the Arthur Pemberton code be made to work somewhere else in this process?
If I leave the uuid off, I can probably make this work, but then I have to go back and put the all the fk uuid's in manually. Is there a better solution? Maybe a bulk insert of that field alone?
This may be a recurring issue for me, because I am also using Scrapy, which supports but does not require json. None of my scraped items will come with uuid, but how do I automate adding their fk's into my process in order to get them into Django?
Or is all of this a good reason to forget uuids altogether?
Thanks.
EDIT/UPDATE per #rolf:
Since I just discovered that the django shell differs more than I realized (the shell can find settings, the regular interpreter can't) I decided to run this once in each one, but the results were the same.
(cannon)malikarumi#Tetuoan2:~/Projects/cannon/jamf$ python manage.py shell
Python 2.7.10 (default, Oct 14 2015, 16:09:02)
IPython 4.0.3 -- An enhanced Interactive Python.
In [1]: uuid.UUID(a82857b6-e336-4c6c-8499-47601770b39d)
File "<ipython-input-1-e282858da374>", line 1
uuid.UUID(a82857b6-e336-4c6c-8499-47601770b39d)
^
SyntaxError: invalid syntax
In [2]: uuid.UUID(a0a69415-6627-43db-8c7a-b57d0c4cefe2)
File "<ipython-input-2-befebf1573ba>", line 1
uuid.UUID(a0a69415-6627-43db-8c7a-b57d0c4cefe2)
^
SyntaxError: invalid syntax
In [3]: uuid.UUID(e6e11b06-ea3b-4e98-a31f-9a83447ad884)
File "<ipython-input-3-a59ea095e61a>", line 1
uuid.UUID(e6e11b06-ea3b-4e98-a31f-9a83447ad884)
^
SyntaxError: invalid syntax
In [4]: uuid.UUID(bd116432-65d7-4612-abfe-9a99dcaf5cad)
File "<ipython-input-4-c4a04434aa3c>", line 1
uuid.UUID(bd116432-65d7-4612-abfe-9a99dcaf5cad)
^
SyntaxError: invalid syntax
Now that I have posted this, I notice that even Stack Overflow treats these uuid differently, i.e., the way they are colored, if that's relevant and meaningful here.
But now that we know this, what do we do with / about it?
2nd Update
This morning I thought, what about a uuid that had never been anywhere but in Django? So here's what I did:
In [5]: e.uuid
Out[5]: UUID('61877565-5fe5-4175-9f2b-d24704df0b74')
In [6]: uuid.UUID(61877565-5fe5-4175-9f2b-d24704df0b74)
File "<ipython-input-6-56137f5f4eb6>", line 1
uuid.UUID(61877565-5fe5-4175-9f2b-d24704df0b74)
^
SyntaxError: invalid syntax
In [7]: uuid.UUID('61877565-5fe5-4175-9f2b-d24704df0b74')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-3b4d3e5bd156> in <module>()
----> 1 uuid.UUID('61877565-5fe5-4175-9f2b-d24704df0b74')
NameError: name 'uuid' is not defined
This is apparently because I left the quote around the alphanumeric, but why that would generate a uuid not defined error, instead of 'string type' or some such error is beyond me.
In [8]: uuid.UUID(61877565-5fe5-4175-9f2b-d24704df0b74)
File "<ipython-input-8-56137f5f4eb6>", line 1
uuid.UUID(61877565-5fe5-4175-9f2b-d24704df0b74)
^
SyntaxError: invalid syntax
The first time I keyed in the characters by hand. I decided to repeat the test by copying and pasting, but as you can see, it made no difference. If there was something weird about the way only the 5 that the caret is pointing to was generated, we might be on to something, but if so, why do I get the same error in the same place when I typed it in by hand myself?
This no longer seems like a json issue to me, since – as far as I know – json has never touched this uuid, unless it did somehow in the internal workings of Django.
Instead, there is either
1. something wrong with the way uuid.UUID generates uuids, or
2. the way it generates them on my system, (Ubuntu 15.10, Django 1.9.1, Python 2.7.10) or
3. the way it reads and evaluates them when they come back, like in uuid.UUID() or being input outside the internal, automatic uuid generation process.
But that also means people using uuid.UUID() to generate uuids will never know there is an issue unless they do what I did, which is try to bring them in from outside. I remember reading somewhere that all uuids are supposed to be compatible. So, unless someone here has a better insight, I think we might be up for a bug report. But is it a Python bug, a Django bug, or both?
Your syntax is wrong:
uuid.UUID('61877565-5fe5-4175-9f2b-d24704df0b74') # note the quotes