D3 Loading in CSV file then using only specific columns - csv

I've had a hard time getting two columns from a CSV file with which I am planning on building a basic bar chart. I was planning on getting 2 arrays (one for each column) within one array that I would use as such to build a bar chart. Just getting started with D3, as you can tell.
Currently loading the data in gets me an array of objects, which is then a mess to get two columns of key-value pairs. I'm not sure my thinking is correct...
I see this similar question:
d3 - load two specific columns from csv file
But how would I use selections and enter() to accomplish my goal?

You can't load just 2 columns of a bigger CSV, but you can load the whole thing and extract the columns you want.
Say your csv is like this:
col1,col2,col3,col4
aaa1,aaa2,aaa3,aaa4
bbb1,bbb2,bbb3,bbb4
ccc1,ccc2,ccc3,ccc4
And you load it with
csv('my.csv', function(err, data) {
console.log(data)
/*
output:
[
{ col1:'aaa1', col2:'aaa2', col3:'aaa3', col4:'aaa4' },
{ col1:'bbb1', col2:'bbb2', col3:'bbb3', col4:'bbb4' },
{ col1:'ccc1', col2:'ccc2', col3:'ccc3', col4:'ccc4' }
]
*/
})
If you only want col2 and col3 (and you don't want to simply leave the other columns' data in there, which shouldn't be an issue anyway), you can do this:
var cols2and3 = data.map(function(d) {
return {
col2: d.col2,
col3: d.col3
}
});
console.log(cols2and3)
/*
output:
[
{ col2:'aaa2', col3:'aaa3' },
{ col2:'bbb2', col3:'bbb3' },
{ col2:'ccc2', col3:'ccc3' }
]
*/
I.e. the above code produced a new array of objects with only two props per object.
If you want just an array of values per column — not objects with both columns' values — you can:
var col2data = data.map(function(d) { return d.col2 }
var col3data = data.map(function(d) { return d.col3 }
console.log(col2) // outputs: ['aaa2', 'bbb2', 'ccc2']

Related

How can I load the following JSON (deeply nested) to a DataFrame?

A sample of the JSON is as shown below:
{
"AN": {
"dates": {
"2020-03-26": {
"delta": {
"confirmed": 1
},
"total": {
"confirmed": 1
}
}
}
},
"KA": {
"dates": {
"2020-03-09": {
"delta": {
"confirmed": 1
},
"total": {
"confirmed": 1
}
},
"2020-03-10": {
"delta": {
"confirmed": 3
},
"total": {
"confirmed": 4
}
}
}
}
}
I would like to load it into a DataFrame, such that the state names (AN, KA) are represented as Row names, and the dates and nested entries are present as Columns.
Any tips to achieve this would be very much appreciated. [I am aware of json_normalize, however I haven't figured out how to work it out yet.]
The output I am expecting, is roughly as shown below:
Can you update your post with the DataFrame you have in mind ? It'll be easier to understand what you want.
Also sometimes it's better to reshape your data if you can't make it work the way they are now.
Update:
Following your update here's what you can do.
You need to reshape your data, as I said when you can't achieve what you want it is best to look at the problem from another point of view. For instance (and from the sample you shared) the 'dates' keys is meaningless as the other keys are already dates and there are no other keys ate the same level.
A way to achieve what you want would be to use MultiIndex, it'll help you group your data the way you want. To use it you can for instance create all the indices you need and store in a dictionary the values associated.
Example :
If the only index you have is ('2020-03-26', 'delta', 'confirmed') you should have values = {'AN' : [1], 'KA':None}
Then you only need to create your DataFrame and transpose it.
I gave it a quick try and came up with a piece of code that should work. If you're looking for performance I don't think this will do the trick.
import pandas as pd
# d is the sample you shared
index = [[],[],[]]
values = {}
# Get all the dates
dates = [date for c in d.keys() for date in d[c]['dates'].keys() ]
for country in d.keys():
# For each country we create an array containing all 6 values for each date
# (missing values as None)
values[country] = []
for date in dates:
if date in d[country]['dates']:
for method in ['delta', 'total']:
for step in ['confirmed', 'recovered', 'tested']:
# Incrementing indices
index[0].append(date)
index[1].append(method)
index[2].append(step)
if step in value.keys():
values[country].append(deepcopy(d[country]['dates'][date][method][step]))
else :
values[country].append(None)
# When country does not have a date fill with None
else :
for method in ['delta', 'total']:
for step in ['confirmed', 'recovered', 'tested']:
index[0].append(date)
index[1].append(method)
index[2].append(step)
values[country].append(None)
# Removing duplicates introduced because we added n_countries times
# the indices
# 3 is the number of steps
# 2 is the number of methods
number_of_rows = 3*2*len(dates)
index[0] = index[0][:number_of_rows]
index[1] = index[1][:number_of_rows]
index[2] = index[2][:number_of_rows]
df = pd.DataFrame(values, index=index).T
Here is what I have for the transposed data frame of my output :
Hope this can help you
You clearly needs to reshape your json data before load it into a DataFrame.
Have you tried load your json like a dict ?
dataframe = pd.DataFrame.from_dict(JsonDict, orient="index")
The “orient” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

Iterating over a list in JSON using TypeScript

I'm having a problem iterating over my json in TypeScript. I'm having trouble with one specific json field, the tribe. For some reason I can't iterate over that one. In the debugger, I'm expecting the Orc to show up but instead I get a 0. Why is this? How do I iterate correctly over my tribe data?
// Maps a profession or tribe group name to a bucket of characters
let professionMap = new Map<string, Character[]>()
let tribeMap = new Map<string, Character[]>()
let herolistJson = require('./data/HeroList.json')
for (let hero of herolistJson){
// Certain characters can have more than one tribe
// !!!!! The trouble begins here, tribe is 0???
for (let tribe in hero.tribe){
let tribeBucket = tribeMap.get(tribe) as Character[]
// If the hero does not already exist in this tribe bucket, add it
if(tribeBucket.find(x => x.name == hero.name) === undefined )
{
tribeBucket.push(new Character(hero.name, hero.tribe, hero.profession, hero.cost))
}
}
}
My json file looks like this
[
{
"name": "Axe",
"tribe": ["Orc"],
"profession": "Warrior",
"cost": 1
},
{
"name": "Enchantress",
"tribe": ["Beast"],
"profession": "Druid",
"cost": 1
}
]
in iterates over the keys of an object, not the values. The keys of an array are its indices. If you use of instead, you'll use the newer iterator protocol and an Array's iterator provides values instead of keys.
for (let tribe of /* in */ hero.tribe) {
Note, that this won't work in IE 11, but will work in most other browsers as well many JS environments that are ES2015 compatible. kangax/compat has a partial list.
Change the "in" to "of" in second loop.

Get corresponding time and value from json using Django (Python 2)

The following is a sample of the JSON that I'm fetching from the source. I need to get certain values from it such as time, value, value type. The time and value are in different arrays and on different depth.
So, my aim is to get the time and value like for value 1, I should get the timestamp 1515831588000, 2 for 1515838788000 and 3 for 1515845987000.
Please mind that this is just a sample JSON so ignore any mistakes that I may have made while writing this array.
{
"feed":{
"component":[
{
"stream":[
{
"statistic":[
{
"type":"DOUBLE",
"data":[
1,
2,
3
]
}
],
"time":[
1515831588000,
1515838788000,
1515845987000
]
}
]
}
]
},
"message":"",
"success":true
}
Here is a function, that I have written to get the values but the timestamp that I'm getting on the final step is incorrect. Please help to resolve this issue.
get_feed_data() is the function that is giving the above JSON.
# Fetch the component UIDs from the database
component_uids = ComponentDetail.objects.values('uid')
# Make sure we have component UIDs to begin with
if component_uids:
for component_uid in component_uids:
# Fetch stream UIDs for each component from the database
stream_uids = StreamDetail.objects.values('uid').filter(comp_id=ComponentDetail.objects.get(uid=component_uid['uid']))
for stream_uid in stream_uids:
feed_data = json.loads(get_feed_data(component_uid['uid'], stream_uid['uid']))
sd = StreamDetail.objects.get(uid=stream_uid['uid'])
for component in feed_data['feed']['component']:
for stream in component['stream']:
t = {}
d = {}
stats = {}
for time_value in stream['time']:
t = time_value
for stats in stream['statistic']:
for data_value in stats['data']:
print({
'uid': sd, value_type:stats['type'], timestamp=t, value=data_value
})
I finally got it dome using lists. I simply append the data in a list and call it later in a loop using the index and fetch the corresponding date and time.

Azure tables unable to store flattened JSON

I am using the npm flat package, and arrays/objects are flattened, but object/array keys are surrounded by '' , like in 'task_status.0.data' using the object below.
These specific fields do not get stored into AzureTables - other fields go through, but these are silently ignored. How would I fix this?
var obj1 = {
"studentId": "abc",
"task_status": [
{
"status":"Current",
"date":516760078
},
{
"status":"Late",
"date":1516414446
}
],
"student_plan": "n"
}
Here is how I am using it - simplified code example: Again, it successfully gets written to the table, but does not write the properties that were flattened (see further below):
var flatten = require('flat')
newObj1 = flatten(obj1);
var entGen = azure.TableUtilities.entityGenerator;
newObj1.PartitionKey = entGen.String(uniqueIDFromMyDB);
newObj1.RowKey = entGen.String(uniqueStudentId);
tableService.insertEntity(myTableName, newObj1, myCallbackFunc);
In the above example, the flattened object would look like:
var obj1 = {
studentId: "abc",
'task_status.0.status': 'Current',
'task_status.0.date': 516760078,
'task_status.1.status': 'Late',
'task_status.1.date': 516760078,
student_plan: "n"
}
Then I would add PartitionKey and RowKey.
all the task_status fields would silently fail to be inserted.
EDIT: This does not have anything to do with the actual flattening process - I just checked a perfectly good JSON object, with keys that had 'x.y.z' in it, i.e. AzureTables doesn't seem to accept these column names....which almost completely destroys the value proposition of storing schema-less data, without significant rework.
. in column name is not supported. You can use a custom delimiter to flatten your objects instead.
For example:
newObj1 = flatten(obj1, {delimiter: '__'});

d3.js: How to import csv data with float and string types

Very new to d3 and javascript, but have done a good four or so hours trying to resolve this issue and haven't found a solution yet.
My .csv dataset I'd like to use has both string and float numbers (see sample set here: http://individual.utoronto.ca/lannajin/d3/pcadummyset.csv), which I'm currently having issues importing.
d3.csv("http://individual.utoronto.ca/lannajin/d3/pcadummyset.csv")
.row(function (d) {
return {
metric: d.metric,
Var: d.Var,
param: d.param,
PC1: +d.PC1, // convert "PC1" column to number
PC2: +d.PC2, // convert "PC2" column to number
Type: d.Type
};
})
.get(function (error, rows) {
console.log(rows);
});
Where I've tried variations (for line 7 & 8) as:
PC1: d.parseFloat("PC1"),
PC1: parseFloat(d.PC1),
PC1: d.parseFloat.PC1,
etc.
The main problem is that my float characters (PC1 and PC2) are in the form 1.0e-02 rather than 0.001, which means I can't simply use the "+" operator to convert my data to float. Instead, I have to use the parseFloat function, which, I unfortunately am not sure how to use.
When I change the structure to the solution offered by Importing CSV without headers into D3 - data with both numbers and strings,
data = d3.csv.parseRows("http://individual.utoronto.ca/lannajin/d3/pcadummyset.csv").map(function(row) {
return row.map(function(value, index) {
if(index == 2) {
return value;
} else {
return +value;
}
});
});
It's clear that the data is being read, but not plotted since the PC1 and PC2 values are probably being read in as strings. I tried to integrate the parseFloat function, but it still won't plot my values.
Help please!