Find Dict Values from csv.DictReader - csv

I'm trying to take a csv file and turn it into a dictionary, via csv.DictReader. After doing this, I want to modify one of the columns of the dictionary, and then write the data into a tsv file. I'm dealing with words and word frequencies in a text.
I've tried using the dict.value() function to obtain the dictionary values, but I get an error message saying "AttributeError: DictReader instance has no attribute "values""
Below is my code:
#calculate frequencies of each word in Jane Austen's "Pride and Prejudice"
import csv
#open file with words and counts for the book, and turn into dictionary
fob = open("P&P.csv", "r")
words = csv.DictReader(fob)
dict = words
#open a file to write the words and frequencies to
fob = open("AustenWords.tsv", "w")
#set total word count
wordcount = 120697
for row in words:
values = dict.values()
print values
Basically, I have the total count of each word in the text (i.e. "a","1937") and I want to find the percentage of the total word count that the word in question uses (thus, for "a", the percentage would be 1937/120697.) Right now my code doesn't have the equation for doing this, but I'm hoping, once I obtain the values of each row, to write a row to the new file with the word and the calculated percentage. If anyone has a better way (or any way!) to do this, I would greatly appreciate any input.
Thanks

To answer the basic question - "why am I getting this error" - when you call csv.DictReader(), the return type is an iterator not a Dictionary.
Each ROW in the iterator is a Dictionary which you can then use for your script:
for row in words:
values = row.values()
print values

Thank goodness for Matt Dunnam's answer (I'd reply to it but I don't see how to). csv.DictReader objects are, quite counter-intuitively, NOT dictionary objects (although I think I am beginning to see some usefulness in why not). As he says, csv.DictReader objects are an iterator (with my intro level to python, I think this is like a list maybe). Each entry in that object (which is not a dictionary) is a dictionary.
So, csv.DictReader returns something like a list of dictionaries, which is not the same as returning one dictionary object, despite the name.
What is nice, so far, is that csv.DictReader did preserve my key values in the first row, and placed them correctly in each of the many dictionary objects that are a part of the iterable object it actually returned (again, it does not return a dictionary object!).
I've wasted about an hour banging my head on this, the documentation is not clear enough, although now that I understand what type of object csv.DictReader returns, the documentation is a lot clearer. I think the documentation says something like how it returns an iterable object, but if you think it returns a dictionary and you don't know if dictionaries are iterable or not then this is easy to read as "returns a dictionary object".
The documentation should say "This does not return a dictionary object, but instead returns an iterable object containing a dictionary object for each entry" or some such thing. As a python newbie who hasn't coded in 20 years, I keep running into problems where the documentation is written by and for experts and it is too dense for beginners.
I'm glad it's there and that people have given their time to it, but it could be made easier for beginners while not reducing its worth to expert pythonistas.

Related

Way to extract columns from a CSV and place them into a dictionary

So basically I'm at a wall with an assignment and it's beginning to really frustrate me. Essentially I have a CSV file and my goal is to count how an the amount of times a string is called. So like column 1 would have a string and column 2 would have a integer connected to it. I ultimately need this to be formatted into a dictionary. Where I am stuck is how the heck do I do this without using imported libraries. I am only allowed to iterate through the file using for loops. Would my best bet be indexing each line and creating that into a string and count how many times that string is called? Any insight would be appreciated.
If you don't want to you any library (and assuming you are using python) you can use a dict comprehension, like this:
with open("data.csv") as file:
csv_as_dict = {line[0]: line[1] for line in file.readlines()}
Note: The question is possibly a duplicate of Creating a dictionary from a csv file?.

Using lapply or for loop on JSON parsed text to calculate mean

I have a json file that has a multi-layered list (already parsed text). Buried within the list, there is a layer that includes several calculations that I need to average. I have code to do this for each line individually, but that is not very time efficient.
mean(json_usage$usage_history[[1]]$used[[1]]$lift)
This returns an average for the numbers in the lift layer of the list for the 1st row. As mentioned, this isn't time efficient when you have a dataset with multiple rows. Unfortunately, I haven't had much success in using either a loop or lapply to do this on the entire dataset.
This is what happens when I try the for loop:
for(i in json_usage$usage_history[[i]]$used[[1]]$lift){
json_usage$mean_lift <- mean(json_usage$usage_history[[i]]$used[[1]]$lift)
}
Error in json_usage$affinity_usage_history[[i]] :
subscript out of bounds
This is what happens why I try lapply:
mean_lift <- lapply(lift_list, mean(lift_list$used$lift))
Error in match.fun(FUN) :
'mean(lift_list$used$lift)' is not a function, character or symbol
In addition: Warning message:
In mean.default(lift_list$used$lift) :
argument is not numeric or logical: returning NA
I am new to R, so I know I am likely doing it wrong, but I haven't found any examples of what I'm trying to do. I'm running out of ideas and growing increasingly frustrated. Please help!
Thank you!
The jsonlite package has a very useful function called flatten that you can use to convert the nested lists that commonly appear when parsing JSON data to a more usable dataframe. That should make it simpler to do the calculations you need.
Documentation is here: https://cran.r-project.org/web/packages/jsonlite/jsonlite.pdf
For an answer to a vaguely similar question I asked (though my issue was with NA data within JSON results), see here: Converting nested list with missing values to data frame in R

Why does this Dict contain so many #undefs? How to ignore them?

Using Julia, I am trying to read and interpret JSON data, but I get many #undefs. How to obtain an array which excludes the undefs?
using JSON
source = "http://api.herostats.io/heroes/1"
download(source, "1.json")
hdict = JSON.parsefile("1.json")
#Why does hdict have so many #undefs?
hdict.vals
hdict.keys
#And how to remove them?
Julia sometimes lets you do some silly things if you're not careful. In this case, you're viewing the internals of the dictionary (hash map) by accessing hdict.keys and hdict.vals, and accessing the underlying arrays that hold the items.
Try:
values(hdict)
keys(hdict)

Parsing CSV of files paths line-by-line [Logic Request]

I have a tricky data set to parse through and have not been able to formulate a processing method for it and several failed ideas have just made me more confused...
Sample data in CSV format - before processing
C:\User1\Videos\videos
10
C:\User1\Videos\videos
22
C:\User2\Videos\videos
8
C:\User2\Videos\videos
67
C:\User3\Videos\videos
18
C:\User3\Videos\videos
12
C:\User4\Videos\videos
90
I'm trying to combine the lengths of the video files in each user's video directory and output a list of each user and the total runtime of all their files.
Result - after processing
C:\User1\Videos\videos
32
C:\User2\Videos\videos
75
C:\User3\Videos\videos
30
C:\User4\Videos\videos
90
I'm looking for pseudocode or any advice really as to how I can achieve this result. I have been unsuccessful in trying to use nested loops and am having a hard time conceptualizing other solutions. I am writing this in VBScript for convenience with file processing in Windows.
Thanks so much for the help in advance, I appreciate any advice you can offer.
First, this is a line delimited format with two lines per record.
1: directory
2: video length
Second, you need only a single loop to read each line of the file and process the data.
Steps
Dim a dic var. Set dic = CreateObject("Scripting.Dictionary").
Dim vars for filepath, userkey, and length value
Loop and read lines from file
Inside the loop, read line 1 and identify user for the record. VBScript has the ability to split the string. Based on the example, if the idea is to aggregate all lengths under User1 no matter what the remaining subfolders are then split string and grab the first path element and use this as the user key. You can check that the second element is Videos to filter, etc or use more elements as the key or as expressed in your results example use the full string as the key for exact matching. Store the user key in a local variable.
Inside the loop, read second line and parse length from line 2, store in local variable length.
Inside the loop, check if key exists in the dictionary then if so get value for key and add to length yielding the sum, add key,sum to dictionary. "dic.Item(userkey) = sum" Else if it does not exist then just add key,value "dic.Item(userkey) = value"
Repeat from step 4 until end of file
List items from dictionary by getting keys and then printing each key and the key's value from the dictionary.
The value stored in the dictionary could be an Object to store more information. Don't forget error handling.

printing JSON values in Python

I've search a ton of articles here and elsewhere via the Googles, plus read a few related docs over at docs.python.org and still stuck.
I am getting a response from an API like below:
{"status":"active","perms":1,"data":{"40332895":{"user_values":{"delta":-203,"value":53.32,"order":42509}}}}
I have no problem printing the 'status' or 'data'. However, all I can grab is the name of the *user_values*, not the date inside them.
Been at it for way to long and was hoping someone could point me in the right direction. Fairly new to Python and if I need to change how I am doing this because it is bad practices or there is an easier way to get the results I am looking for, please let me know.
Code:
import json
import urllib2
url = urllib2.urlopen('http://URLofAPI.com')
json_url = json.load(url)
api_result = json_url
for doc in api_result['data']['40332895']['user_values']:
print doc
outputs:
delta
value
order
what I really want to get is the value of them (i.e.: '-203', '53.32', '42509').
I am basically trying to save that data into a list\dict (individually or separately), then print it with other data. I have tried all sorts of things and cannot manage it. I am sure it's probably something super easy that I missing, but it's driving me nuts. :)
Also, I was really expecting the below to give me '42509', but I get an error:
for doc in api_result['data']['40332895']['user_values']['order']
Thanks in advance!
You're asking for the keys of the user_values dictionary, and getting them. Try this:
uv = api_result['data']['40332895']['user_values']
for doc in uv:
print uv[doc]
In your example api_result['data']['40332895']['user_values'] is a dictionary.
If you iterate over a dictionary you will get the keys. This is the case in your original example and in mgkrebbs answer.
However if you iterate over the .iteritems() (or .items()) of the dictionary you get the (key, value) pairs in a tuple:
uv = api_result['data']['40332895']['user_values']
for key,value in uv.iteritems():
print key, value
If you only need the values, you iterate over .itervalues()
uv = api_result['data']['40332895']['user_values']
for value in uv.itervalues():
print value
Or if you only need the values as a list:
my_list = api_result['data']['40332895']['user_values'].values()
The difference between .itervalues() and .values() is that the former gives you an iterable (an object which returns one value at a time, but does not create the structure in memory), while the latter gives you a list.