TensorFlow tf.decode_csv() not recognizing end of line character - csv

I am trying to read in a CSV file in TensorFlow.
record_defaults = [[0.0], [0.0]]
data = tf.decode_csv(r"C:\Users\USER.NAME\Desktop\tmp.txt", record_defaults=record_defaults)
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
sess.run(tf.global_variables_initializer())
print(sess.run(data))
sess.close()
Where tmp.txt is a simple CSV:
1.0,4.0
-.3,1.2
Note that I am running windows and Notepad++ shows that my lines end with '\r\n' (CRLF).
I get the following error when running the above code, which suggests to me that tensorflow isnt recognizing the end of line character:
InvalidArgumentError Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in
_do_call(self, fn, *args)
1021 try:
-> 1022 return fn(*args)
1023 except errors.OpError as e:
C:\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1003 feed_dict, fetch_list, target_list,
-> 1004 status, run_metadata)
1005
C:\Anaconda3\Lib\contextlib.py in __exit__(self, type, value, traceback)
65 try:
---> 66 next(self.gen)
67 except StopIteration:
C:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in raise_exception_on_not_ok_status()
465 compat.as_text(pywrap_tensorflow.TF_Message(status)),
--> 466 pywrap_tensorflow.TF_GetCode(status))
467 finally:
InvalidArgumentError: Expect 2 fields but have 1 in record 0
[[Node: DecodeCSV = DecodeCSV[OUT_TYPE=[DT_FLOAT, DT_FLOAT], field_delim=",", _device="/job:localhost/replica:0/task:0/cpu:0"](DecodeCSV/records, DecodeCSV/record_defaults_0, DecodeCSV/record_defaults_1)]]
The error persists even when I change the delimiter to a space or tab.
I've searched across Google/StackOverflow but haven't been able to find a similar error. Any help is appreciated. Thank you!

Convert your file to unix format. I am assuming you are working on windows. Either way, in Notepad++, change the type of file as below:
From the "Edit" menu, select "EOL Conversion" -> "UNIX/OSX Format".
Convert to Unix

Related

Error while using cenpy library in python

I am working on a project where I need to use census data for a couple of towns in MA. For that, I am using cenpy library ASC data, but I got a key error. The same error happens even when I try the example code described for Chicago. Here is the example code I use and the error I see:
chicago = products.ACS(2017).from_place('Chicago, IL', level='tract',
variables=['B00002*', 'B01002H_001E'])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~\anaconda3\envs\oxe\lib\site-packages\cenpy\tiger.py:192, in ESRILayer.query(self, raw, strict, **kwargs)
191 try:
--> 192 features = datadict["features"]
193 except KeyError:
KeyError: 'features'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 chicago = products.ACS(2017).from_place('Chicago, IL', level='tract',
2 variables=['B00002*', 'B01002H_001E'])
File ~\anaconda3\envs\oxe\lib\site-packages\cenpy\products.py:791, in ACS.from_place(self, place, variables, level, return_geometry, place_type, strict_within, return_bounds, replace_missing)
788 variables = self._preprocess_variables(variables)
789 variables.append("GEO_ID")
--> 791 geoms, variables, *rest = super(ACS, self).from_place(
792 place,
793 variables=variables,
794 level=level,
795 return_geometry=return_geometry,
796 place_type=place_type,
797 strict_within=strict_within,
798 return_bounds=return_bounds,
799 replace_missing=replace_missing,
800 )
801 variables["GEOID"] = variables.GEO_ID.str.split("US").apply(lambda x: x[1])
802 return_table = geoms[["GEOID", "geometry"]].merge(
803 variables.drop("GEO_ID", axis=1), how="left", on="GEOID"
804 )
File ~\anaconda3\envs\oxe\lib\site-packages\cenpy\products.py:200, in _Product.from_place(self, place, variables, place_type, level, return_geometry, geometry_precision, strict_within, return_bounds, replace_missing)
197 else:
199 placer = "STATE={} AND PLACE={}".format(placerow.STATEFP, placerow.TARGETFP)
--> 200 env = env_layer.query(where=placer)
202 print(
203 "Matched: {} to {} "
204 "within layer {}".format(
(...)
208 )
209 )
211 geoms, data = self._from_bbox(
212 env.to_crs(epsg=4326).total_bounds,
213 variables=variables,
(...)
219 replace_missing=replace_missing,
220 )
File ~\anaconda3\envs\oxe\lib\site-packages\cenpy\tiger.py:198, in ESRILayer.query(self, raw, strict, **kwargs)
196 if details is []:
197 details = "Mapserver provided no detailed error"
--> 198 raise KeyError(
199 (
200 r"Response from API is malformed. You may have "
201 r"submitted too many queries, formatted the request incorrectly, "
202 r"or experienced significant network connectivity issues."
203 r" Check to make sure that your inputs, like placenames, are spelled"
204 r" correctly, and that your geographies match the level at which you"
205 r" intend to query. The original error from the Census is:\n"
206 r"(API ERROR {}:{}({}))".format(code, msg, details)
207 )
208 )
209 todf = []
210 for i, feature in enumerate(features):
KeyError: 'Response from API is malformed. You may have submitted too many queries, formatted the request incorrectly, or experienced significant network connectivity issues. Check to make sure that your inputs, like placenames, are spelled correctly, and that your geographies match the level at which you intend to query. The original error from the Census is:\\n(API ERROR 400:Unable to complete operation.([]))'

gym package not identifying ten-armed-bandits-v0 env

Environment:
Python: 3.9
OS: Windows 10
When I try to create the ten armed bandits environment using the following code the error is thrown not sure of the reason.
import gym
import gym_armed_bandits
env = gym.make('ten-armed-bandits-v0')
The error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File D:\00_PythonEnvironments\01_RL\lib\site-packages\gym\envs\registration.py:158, in EnvRegistry.spec(self, path)
157 try:
--> 158 return self.env_specs[id]
159 except KeyError:
160 # Parse the env name and check to see if it matches the non-version
161 # part of a valid env (could also check the exact number here)
KeyError: 'ten-armed-bandits-v0'
During handling of the above exception, another exception occurred:
UnregisteredEnv Traceback (most recent call last)
Input In [6], in <module>
----> 1 env = gym.make('ten-armed-bandits-v0')
File D:\00_PythonEnvironments\01_RL\lib\site-packages\gym\envs\registration.py:235, in make(id, **kwargs)
234 def make(id, **kwargs):
--> 235 return registry.make(id, **kwargs)
File D:\00_PythonEnvironments\01_RL\lib\site-packages\gym\envs\registration.py:128, in EnvRegistry.make(self, path, **kwargs)
126 else:
127 logger.info("Making new env: %s", path)
--> 128 spec = self.spec(path)
129 env = spec.make(**kwargs)
130 return env
File D:\00_PythonEnvironments\01_RL\lib\site-packages\gym\envs\registration.py:203, in EnvRegistry.spec(self, path)
197 raise error.UnregisteredEnv(
198 "Toytext environment {} has been moved out of Gym. Install it via `pip install gym-legacy-toytext` and add `import gym_toytext` before using it.".format(
199 id
200 )
201 )
202 else:
--> 203 raise error.UnregisteredEnv("No registered env with id: {}".format(id))
UnregisteredEnv: No registered env with id: ten-armed-bandits-v0
When I check the environments available, I am able to see it there.
from gym import envs
print(envs.registry.all())
dict_values([EnvSpec(CartPole-v0), EnvSpec(CartPole-v1), EnvSpec(MountainCar-v0), EnvSpec(MountainCarContinuous-v0), EnvSpec(Pendulum-v1), EnvSpec(Acrobot-v1), EnvSpec(LunarLander-v2), EnvSpec(LunarLanderContinuous-v2), EnvSpec(BipedalWalker-v3), EnvSpec(BipedalWalkerHardcore-v3), EnvSpec(CarRacing-v0), EnvSpec(Blackjack-v1), EnvSpec(FrozenLake-v1), EnvSpec(FrozenLake8x8-v1), EnvSpec(CliffWalking-v0), EnvSpec(Taxi-v3), EnvSpec(Reacher-v2), EnvSpec(Pusher-v2), EnvSpec(Thrower-v2), EnvSpec(Striker-v2), EnvSpec(InvertedPendulum-v2), EnvSpec(InvertedDoublePendulum-v2), EnvSpec(HalfCheetah-v2), EnvSpec(HalfCheetah-v3), EnvSpec(Hopper-v2), EnvSpec(Hopper-v3), EnvSpec(Swimmer-v2), EnvSpec(Swimmer-v3), EnvSpec(Walker2d-v2), EnvSpec(Walker2d-v3), EnvSpec(Ant-v2), EnvSpec(Ant-v3), EnvSpec(Humanoid-v2), EnvSpec(Humanoid-v3), EnvSpec(HumanoidStandup-v2), EnvSpec(FetchSlide-v1), EnvSpec(FetchPickAndPlace-v1), EnvSpec(FetchReach-v1), EnvSpec(FetchPush-v1), EnvSpec(HandReach-v0), EnvSpec(HandManipulateBlockRotateZ-v0), EnvSpec(HandManipulateBlockRotateZTouchSensors-v0), EnvSpec(HandManipulateBlockRotateZTouchSensors-v1), EnvSpec(HandManipulateBlockRotateParallel-v0), EnvSpec(HandManipulateBlockRotateParallelTouchSensors-v0), EnvSpec(HandManipulateBlockRotateParallelTouchSensors-v1), EnvSpec(HandManipulateBlockRotateXYZ-v0), EnvSpec(HandManipulateBlockRotateXYZTouchSensors-v0), EnvSpec(HandManipulateBlockRotateXYZTouchSensors-v1), EnvSpec(HandManipulateBlockFull-v0), EnvSpec(HandManipulateBlock-v0), EnvSpec(HandManipulateBlockTouchSensors-v0), EnvSpec(HandManipulateBlockTouchSensors-v1), EnvSpec(HandManipulateEggRotate-v0), EnvSpec(HandManipulateEggRotateTouchSensors-v0), EnvSpec(HandManipulateEggRotateTouchSensors-v1), EnvSpec(HandManipulateEggFull-v0), EnvSpec(HandManipulateEgg-v0), EnvSpec(HandManipulateEggTouchSensors-v0), EnvSpec(HandManipulateEggTouchSensors-v1), EnvSpec(HandManipulatePenRotate-v0), EnvSpec(HandManipulatePenRotateTouchSensors-v0), EnvSpec(HandManipulatePenRotateTouchSensors-v1), EnvSpec(HandManipulatePenFull-v0), EnvSpec(HandManipulatePen-v0), EnvSpec(HandManipulatePenTouchSensors-v0), EnvSpec(HandManipulatePenTouchSensors-v1), EnvSpec(FetchSlideDense-v1), EnvSpec(FetchPickAndPlaceDense-v1), EnvSpec(FetchReachDense-v1), EnvSpec(FetchPushDense-v1), EnvSpec(HandReachDense-v0), EnvSpec(HandManipulateBlockRotateZDense-v0), EnvSpec(HandManipulateBlockRotateZTouchSensorsDense-v0), EnvSpec(HandManipulateBlockRotateZTouchSensorsDense-v1), EnvSpec(HandManipulateBlockRotateParallelDense-v0), EnvSpec(HandManipulateBlockRotateParallelTouchSensorsDense-v0), EnvSpec(HandManipulateBlockRotateParallelTouchSensorsDense-v1), EnvSpec(HandManipulateBlockRotateXYZDense-v0), EnvSpec(HandManipulateBlockRotateXYZTouchSensorsDense-v0), EnvSpec(HandManipulateBlockRotateXYZTouchSensorsDense-v1), EnvSpec(HandManipulateBlockFullDense-v0), EnvSpec(HandManipulateBlockDense-v0), EnvSpec(HandManipulateBlockTouchSensorsDense-v0), EnvSpec(HandManipulateBlockTouchSensorsDense-v1), EnvSpec(HandManipulateEggRotateDense-v0), EnvSpec(HandManipulateEggRotateTouchSensorsDense-v0), EnvSpec(HandManipulateEggRotateTouchSensorsDense-v1), EnvSpec(HandManipulateEggFullDense-v0), EnvSpec(HandManipulateEggDense-v0), EnvSpec(HandManipulateEggTouchSensorsDense-v0), EnvSpec(HandManipulateEggTouchSensorsDense-v1), EnvSpec(HandManipulatePenRotateDense-v0), EnvSpec(HandManipulatePenRotateTouchSensorsDense-v0), EnvSpec(HandManipulatePenRotateTouchSensorsDense-v1), EnvSpec(HandManipulatePenFullDense-v0), EnvSpec(HandManipulatePenDense-v0), EnvSpec(HandManipulatePenTouchSensorsDense-v0), EnvSpec(HandManipulatePenTouchSensorsDense-v1), EnvSpec(CubeCrash-v0), EnvSpec(CubeCrashSparse-v0), EnvSpec(CubeCrashScreenBecomesBlack-v0), EnvSpec(MemorizeDigits-v0), EnvSpec(three-armed-bandits-v0), EnvSpec(five-armed-bandits-v0), EnvSpec(ten-armed-bandits-v0), EnvSpec(MultiarmedBandits-v0)])
It could be a problem with your Python version: k-armed-bandits library was made 4 years ago, when Python 3.9 didn't exist. Besides this, the configuration files in the repo indicates that the Python version is 2.7 (not 3.9).
If you create an environment with Python 2.7 and follow the setup instructions it works correctly on Windows:
git clone gym_armed_bandits
cd gym_armed_bandits
pip install -e .

How to use HuggingFace nlp library's GLUE for CoLA

I've been trying to use the HuggingFace nlp library's GLUE metric to check whether a given sentence is a grammatical English sentence. But I'm getting an error and is stuck without being able to proceed.
What I've tried so far;
reference and prediction are 2 text sentences
!pip install transformers
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
reference="Security has been beefed across the country as a 2 day nation wide curfew came into effect."
prediction="Security has been tightened across the country as a 2-day nationwide curfew came into effect."
import nlp
glue_metric = nlp.load_metric('glue',name="cola")
#Using BertTokenizer
encoded_reference=tokenizer.encode(reference, add_special_tokens=False)
encoded_prediction=tokenizer.encode(prediction, add_special_tokens=False)
glue_score = glue_metric.compute(encoded_prediction, encoded_reference)
Error I'm getting;
ValueError Traceback (most recent call last)
<ipython-input-9-4c3a3ce7b583> in <module>()
----> 1 glue_score = glue_metric.compute(encoded_prediction, encoded_reference)
6 frames
/usr/local/lib/python3.6/dist-packages/nlp/metric.py in compute(self, predictions, references, timeout, **metrics_kwargs)
198 predictions = self.data["predictions"]
199 references = self.data["references"]
--> 200 output = self._compute(predictions=predictions, references=references, **metrics_kwargs)
201 return output
202
/usr/local/lib/python3.6/dist-packages/nlp/metrics/glue/27b1bc63e520833054bd0d7a8d0bc7f6aab84cc9eed1b576e98c806f9466d302/glue.py in _compute(self, predictions, references)
101 return pearson_and_spearman(predictions, references)
102 elif self.config_name in ["mrpc", "qqp"]:
--> 103 return acc_and_f1(predictions, references)
104 elif self.config_name in ["sst2", "mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]:
105 return {"accuracy": simple_accuracy(predictions, references)}
/usr/local/lib/python3.6/dist-packages/nlp/metrics/glue/27b1bc63e520833054bd0d7a8d0bc7f6aab84cc9eed1b576e98c806f9466d302/glue.py in acc_and_f1(preds, labels)
60 def acc_and_f1(preds, labels):
61 acc = simple_accuracy(preds, labels)
---> 62 f1 = f1_score(y_true=labels, y_pred=preds)
63 return {
64 "accuracy": acc,
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in f1_score(y_true, y_pred, labels, pos_label, average, sample_weight, zero_division)
1097 pos_label=pos_label, average=average,
1098 sample_weight=sample_weight,
-> 1099 zero_division=zero_division)
1100
1101
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in fbeta_score(y_true, y_pred, beta, labels, pos_label, average, sample_weight, zero_division)
1224 warn_for=('f-score',),
1225 sample_weight=sample_weight,
-> 1226 zero_division=zero_division)
1227 return f
1228
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight, zero_division)
1482 raise ValueError("beta should be >=0 in the F-beta score")
1483 labels = _check_set_wise_labels(y_true, y_pred, average, labels,
-> 1484 pos_label)
1485
1486 # Calculate tp_sum, pred_sum, true_sum ###
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
1314 raise ValueError("Target is %s but average='binary'. Please "
1315 "choose another average setting, one of %r."
-> 1316 % (y_type, average_options))
1317 elif pos_label not in (None, 1):
1318 warnings.warn("Note that pos_label (set to %r) is ignored when "
ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].
However, I'm able to get results (pearson and spearmanr) for 'stsb' with the same workaround as given above.
Some help and a workaround for(cola) this is really appreciated. Thank you.
In general, if you are seeing this error with HuggingFace, you are trying to use the f-score as a metric on a text classification problem with more than 2 classes. Pick a different metric, like "accuracy".
For this specific question:
Despite what you entered, it is trying to compute the f-score. From the example notebook, you should set the metric name as:
metric_name = "pearson" if task == "stsb" else "matthews_correlation" if task == "cola" else "accuracy"

Stanford NER Tagger and NLTK - not working [OSError: Java command failed ]

Trying to run Stanford NER Taggerand NLTK from a jupyter notebook.
I am continuously getting
OSError: Java command failed
I have already tried the hack at
https://gist.github.com/alvations/e1df0ba227e542955a8a
and thread
Stanford Parser and NLTK
I am using
NLTK==3.3
Ubuntu==16.04LTS
Here is my python code:
Sample_text = "Google, headquartered in Mountain View, unveiled the new Android phone"
sentences = sent_tokenize(Sample_text)
tokenized_sentences = [word_tokenize(sentence) for sentence in sentences]
PATH_TO_GZ = '/home/root/english.all.3class.caseless.distsim.crf.ser.gz'
PATH_TO_JAR = '/home/root/stanford-ner.jar'
sn_3class = StanfordNERTagger(PATH_TO_GZ,
path_to_jar=PATH_TO_JAR,
encoding='utf-8')
annotations = [sn_3class.tag(sent) for sent in tokenized_sentences]
I got these files using following commands:
wget http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-postagger-full-2015-04-20.zip
wget http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip
# Extract the zip file.
unzip stanford-ner-2015-04-20.zip
unzip stanford-parser-full-2015-04-20.zip
unzip stanford-postagger-full-2015-04-20.zip
I am getting the following error:
CRFClassifier invoked on Thu May 31 15:56:19 IST 2018 with arguments:
-loadClassifier /home/root/english.all.3class.caseless.distsim.crf.ser.gz -textFile /tmp/tmpMDEpL3 -outputFormat slashTags -tokenizerFactory edu.stanford.nlp.process.WhitespaceTokenizer -tokenizerOptions "tokenizeNLs=false" -encoding utf-8
tokenizerFactory=edu.stanford.nlp.process.WhitespaceTokenizer
Unknown property: |tokenizerFactory|
tokenizerOptions="tokenizeNLs=false"
Unknown property: |tokenizerOptions|
loadClassifier=/home/root/english.all.3class.caseless.distsim.crf.ser.gz
encoding=utf-8
Unknown property: |encoding|
textFile=/tmp/tmpMDEpL3
outputFormat=slashTags
Loading classifier from /home/root/english.all.3class.caseless.distsim.crf.ser.gz ... Error deserializing /home/root/english.all.3class.caseless.distsim.crf.ser.gz
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassCastException: java.util.ArrayList cannot be cast to [Ledu.stanford.nlp.util.Index;
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1380)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1331)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2315)
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to [Ledu.stanford.nlp.util.Index;
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2164)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1249)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1366)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1377)
... 2 more
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-15-5621d0f8177d> in <module>()
----> 1 ne_annot_sent_3c = [sn_3class.tag(sent) for sent in tokenized_sentences]
/home/root1/.virtualenv/demos/local/lib/python2.7/site-packages/nltk/tag/stanford.pyc in tag(self, tokens)
79 def tag(self, tokens):
80 # This function should return list of tuple rather than list of list
---> 81 return sum(self.tag_sents([tokens]), [])
82
83 def tag_sents(self, sentences):
/home/root1/.virtualenv/demos/local/lib/python2.7/site-packages/nltk/tag/stanford.pyc in tag_sents(self, sentences)
102 # Run the tagger and get the output
103 stanpos_output, _stderr = java(cmd, classpath=self._stanford_jar,
--> 104 stdout=PIPE, stderr=PIPE)
105 stanpos_output = stanpos_output.decode(encoding)
106
/home/root1/.virtualenv/demos/local/lib/python2.7/site-packages/nltk/__init__.pyc in java(cmd, classpath, stdin, stdout, stderr, blocking)
134 if p.returncode != 0:
135 print(_decode_stdoutdata(stderr))
--> 136 raise OSError('Java command failed : ' + str(cmd))
137
138 return (stdout, stderr)
OSError: Java command failed : [u'/usr/bin/java', '-mx1000m', '-cp', '/home/root/stanford-ner.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', '/home/root/english.all.3class.caseless.distsim.crf.ser.gz', '-textFile', '/tmp/tmpMDEpL3', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf-8']
Download Stanford Named Entity Recognizer version 3.9.1: see ‘Download’ section from The Stanford NLP website.
Unzip it and move 2 files "ner-tagger.jar" and "english.all.3class.distsim.crf.ser.gz" to your folder
Open jupyter notebook or ipython prompt in your folder path and run the following python code:
import nltk
from nltk.tag.stanford import StanfordNERTagger
sentence = u"Twenty miles east of Reno, Nev., " \
"where packs of wild mustangs roam free through " \
"the parched landscape, Tesla Gigafactory 1 " \
"sprawls near Interstate 80."
jar = './stanford-ner.jar'
model = './english.all.3class.distsim.crf.ser.gz'
ner_tagger = StanfordNERTagger(model, jar, encoding='utf8')
words = nltk.word_tokenize(sentence)
# Run NER tagger on words
print(ner_tagger.tag(words))
I tested this on NLTK==3.3 and Ubuntu==16.0.6LTS

Error reading a JSON file into Pandas

I am trying to read a JSON file into Pandas. It's a relatively large file (41k records) mostly text.
{"sanders": [{"date": "February 8, 2016 Monday", "source": "Federal News
Service", "subsource": "MSNBC \"MSNBC Live\" Interview with Sen. Bernie
Sanders (I-VT), Democratic", "quotes": ["Well, it's not very progressive to
take millions of dollars from Wall Street as well.", "That's a very good
question, and I wish I could give her a definitive answer. QUOTE SHORTENED FOR
SPACE"]}, {"date": "February 7, 2016 Sunday", "source": "CBS News Transcripts", "subsource": "SHOW: CBS FACE THE NATION 10:30 AM EST", "quotes":
["Well, John -- John, I think that`s a media narrative that goes around and
around and around. I don`t accept that media narrative.", "Well, that`s what
she said about Barack Obama in 2008. "]},
I tried:
quotes = pd.read_json("/quotes.json")
I expected it to read in cleanly because it was file created in python. However, I got this error:
ValueError Traceback (most recent call last)
<ipython-input-19-c1acfdf0dbc6> in <module>()
----> 1 quotes = pd.read_json("/Users/kate/Documents/99Antennas/Client\
Files/Fusion/data/quotes.json")
/Users/kate/venv/lib/python2.7/site-packages/pandas/io/json.pyc in
read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates,
keep_default_dates, numpy, precise_float, date_unit)
208 obj = FrameParser(json, orient, dtype, convert_axes,
convert_dates,
209 keep_default_dates, numpy, precise_float,
--> 210 date_unit).parse()
211
212 if typ == 'series' or obj is None:
/Users/kate/venv/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
276
277 else:
--> 278 self._parse_no_numpy()
279
280 if self.obj is None:
/Users/kate/venv/lib/python2.7/site-packages/pandas/io/json.pyc in _
parse_no_numpy(self)
493 if orient == "columns":
494 self.obj = DataFrame(
--> 495 loads(json, precise_float=self.precise_float),
dtype=None)
496 elif orient == "split":
497 decoded = dict((str(k), v)
ValueError: Expected object or value
After reading the documentation and stackoverflow, I also tried adding convert_dates=False to the parameters, but that did not correct the problem. I would welcome suggestions as to how to handle this error.
Try removing the forward slash in the filename. If you run this python code from the same directory where the file is sitting, it should work.
quotes = pd.read_json("quotes.json")
SPKoder mentioned the forward slash. I was looking for an answer when I realized I hadn't added a / when combing filename and path (i.e. c:/path/herefile.json, instead of c:/path/here/file.json). Anyways the error I received was ...
ValueError: Expected object or value
Not a very intuitive error message, but that is what causes it.