Sort JSON by values in MATLAB - json

I would like to sort my values in the JSON file order by "createdAt" and use these values in the plot function. As you can see this column stores date value so I've converted it. And I've applied the sort function but when I see the output of the data, it seems sort does not apply.
data = loadjson('C:/data/default.json');
count_data = sum(cellfun(#(x) numel(x),data.Location)); %returns 21
for i=1:count_data
createdAt= cellfun( #(cellElem) cellElem.createdAt, data.Location ,'UniformOutput',false);
createdAtDate= datetime(createdAt(i),'InputFormat','dd-MM-yyyy HH:mm:ss','Format', 'dd-MM-yyyy n HH:mm:ss');
[~,X] = sort(createdAtDate,'descend');
out=data(X);
end
for i=1:count_data
x = cellfun( #(cellElem) cellElem.createdAt, out.Location,'UniformOutput',false);
disp(x);
end
My JSON file:
"Location": [
{
"id": "0b5965e5-c509-4522-a525-8ef5a49dadaf",
"measureId": "5a6e9b79-dbb1-4482-acc1-d538f68ef01f",
"locationX": 0.9039769252518151,
"locationY": 0.2640594070404616,
"createdAt": "06-01-2021 19:38:44"
},
{
"id": "18714a2f-a8b3-4dc6-8a5b-114497fa9671",
"measureId": "671f52bc-a066-494a-9dce-6e9ccfac6c1d",
"locationX": 1.5592001730078755,
"locationY": 0.5207689756815629,
"createdAt": "06-01-2021 19:35:24"
},
Thanks in advance.

You need to extract all of the data you need, then sort
i.e.
x = cellfun( #(cellElem) cellElem.locationX, data.Location );
y = cellfun( #(cellElem) cellElem.locationY, data.Location );
% Get date strings
d = cellfun( #(cellElem) cellElem.createdAt, data.Location, 'UniformOutput', false)
% Convert to datetime
d = datetime( d, 'InputFormat', 'dd-MM-yyyy HH:mm:ss' );
% Get the sort order
[~,idx] = sort( d );
% Sort other arrays
x = x(idx);
y = y(idx);
Another option would be to use tables
x = cellfun( #(cellElem) cellElem.locationX, data.Location );
y = cellfun( #(cellElem) cellElem.locationY, data.Location );
% Get dates
d = cellfun( #(cellElem) cellElem.createdAt, data.Location, 'UniformOutput', false)
d = datetime( d, 'InputFormat', 'dd-MM-yyyy HH:mm:ss' );
% Create table
t = table( x(:), y(:), d(:), 'VariableNames', {'locationX','locationY','createdAt'} );
% Sortrows
t = sortrows( t, 'createdAt' );
You have to use a table rather than a matrix here (although sortrows can accept either) because of the mixed data types across the columns.

Related

How to calculate a probability vector and an observation count vector for a range of bins?

I want to test the hypothesis whether some 30 occurrences should fit a Poisson distribution.
#GNU Octave
X = [8 0 0 1 3 4 0 2 12 5 1 8 0 2 0 1 9 3 4 5 3 3 4 7 4 0 1 2 1 2]; #30 observations
bins = {0, 1, [2:3], [4:5], [6:20]}; #each bin can be single value or multiple values
I am trying to use Pearson's chi-square statistics here and coded the below function. I want a Poisson vector to contain corresponding Poisson probabilities for each bin and count the observations for each bin. I feel the loop is rather redundant and ugly. Can you please let me know how can I re-factor the function without the loop and make the whole calculation cleaner and more vectorized?
function result= poissonGoodnessOfFit(bins, observed)
assert(iscell(bins), "bins should be a cell array");
assert(all(cellfun("ismatrix", bins)) == 1, "bin entries either scalars or matrices");
assert(ismatrix(observed) && rows(observed) == 1, "observed data should be a 1xn matrix");
lambda_head = mean(observed); #poisson lambda parameter estimate
k = length(bins); #number of bin groups
n = length(observed); #number of observations
poisson_probability = []; #variable for poisson probability for each bin
observations = []; #variable for observation counts for each bin
for i=1:k
if isscalar(bins{1,i}) #this bin contains a single value
poisson_probability(1,i) = poisspdf(bins{1, i}, lambda_head);
observations(1, i) = histc(observed, bins{1, i});
else #this bin contains a range of values
inner_bins = bins{1, i}; #retrieve the range
inner_bins_k = length(inner_bins); #number of values inside
inner_poisson_probability = []; #variable to store individual probability of each value inside this bin
inner_observations = []; #variable to store observation counts of each value inside this bin
for j=1:inner_bins_k
inner_poisson_probability(1,j) = poisspdf(inner_bins(1, j), lambda_head);
inner_observations(1, j) = histc(observed, inner_bins(1, j));
endfor
poisson_probability(1, i) = sum(inner_poisson_probability, 2); #assign over the sum of all inner probabilities
observations(1, i) = sum(inner_observations, 2); #assign over the sum of all inner observation counts
endif
endfor
expected = n .* poisson_probability; #expected observations if indeed poisson using lambda_head
chisq = sum((observations - expected).^2 ./ expected, 2); #Pearson Chi-Square statistics
pvalue = 1 - chi2cdf(chisq, k-1-1);
result = struct("actual", observations, "expected", expected, "chi2", chisq, "pvalue", pvalue);
return;
endfunction
There's a couple of things worth noting in the code.
First, the 'scalar' case in your if block is actually identical to your 'range' case, since a scalar is simply a range of 1 element. So no special treatment is needed for it.
Second, you don't need to create such explicit subranges, your bin groups seem to be amenable to being used as indices into a larger result (as long as you add 1 to convert from 0-indexed to 1-indexed indices).
Therefore my approach would be to calculate the expected and observed numbers over the entire domain of interest (as inferred from your bin groups), and then use the bin groups themselves as 1-indices to obtain the desired subgroups, summing accordingly.
Here's an example code, written in the octave/matlab compatible subset of both languges:
function Result = poissonGoodnessOfFit( BinGroups, Observations )
% POISSONGOODNESSOFFIT( BinGroups, Observations) calculates the [... etc, etc.]
pkg load statistics; % only needed in octave; for matlab buy statistics toolbox.
assert( iscell( BinGroups ), 'Bins should be a cell array' );
assert( all( cellfun( #ismatrix, BinGroups ) ) == 1, 'Bin entries either scalars or matrices' );
assert( ismatrix( Observations ) && rows( Observations ) == 1, 'Observed data should be a 1xn matrix' );
% Define helpful variables
RangeMin = min( cellfun( #min, BinGroups ) );
RangeMax = max( cellfun( #max, BinGroups ) );
Domain = RangeMin : RangeMax;
LambdaEstimate = mean( Observations );
NBinGroups = length( BinGroups );
NObservations = length( Observations );
% Get expected and observed numbers per 'bin' (i.e. discrete value) over the *entire* domain.
Expected_Domain = NObservations * poisspdf( Domain, LambdaEstimate );
Observed_Domain = histc( Observations, Domain );
% Apply BinGroup values as indices
Expected_byBinGroup = cellfun( #(c) sum( Expected_Domain(c+1) ), BinGroups );
Observed_byBinGroup = cellfun( #(c) sum( Observed_Domain(c+1) ), BinGroups );
% Perform a Chi-Square test on the Bin-wise Expected and Observed outputs
O = Observed_byBinGroup; E = Expected_byBinGroup ; df = NBinGroups - 1 - 1;
ChiSquareTestStatistic = sum( (O - E) .^ 2 ./ E );
PValue = 1 - chi2cdf( ChiSquareTestStatistic, df );
Result = struct( 'actual', O, 'expected', E, 'chi2', ChiSquareTestStatistic, 'pvalue', PValue );
end
Running with your example gives:
X = [8 0 0 1 3 4 0 2 12 5 1 8 0 2 0 1 9 3 4 5 3 3 4 7 4 0 1 2 1 2]; % 30 observations
bins = {0, 1, [2:3], [4:5], [6:20]}; % each bin can be single value or multiple values
Result = poissonGoodnessOfFit( bins, X )
% Result =
% scalar structure containing the fields:
% actual = 6 5 8 6 5
% expected = 1.2643 4.0037 13.0304 8.6522 3.0493
% chi2 = 21.989
% pvalue = 0.000065574
A general comment about the code; it is always preferable to write self-explainable code, rather than code that does not make sense by itself in the absence of a comment. Comments generally should only be used to explain the 'why', rather than the 'how'.

withCount() - get difference of 2 counts

I have a table that has a column is_up_vote = true|false
I am trying to write a query using withCount() so it returns me the sum of true values and false values.
select COUNT(is_up_vote) from comment_votes WHERE is_up_vote = true returns 17
select COUNT(is_up_vote) from comment_votes WHERE is_up_vote = false returns 15
However I couldn't figure out how to get different of them, so the total returns 2.
What I tried is:
Model::withCount(['votes' => function($q) {
$q->selectRaw(
'(SUM (COUNT(is_up_vote) WHERE is_up_vote = true) - (COUNT(is_up_vote) WHERE is_up_vote = false) )'
);
}]);
But this returns 17+15 = 32
without SUM(), it's also returning 32.
$q->selectRaw(
'( (COUNT(is_up_vote) WHERE is_up_vote = true) - (COUNT(is_up_vote) WHERE is_up_vote = false) )'
);
What am I doing wrong?
Edit:
If I try one side, it ignores the where and still returns 32, so the where is not getting called (where it does in sql)
return $query->withCount(['votes' => function($q) {
$q->selectRaw('(COUNT(is_up_vote) WHERE is_up_vote = true)');
}]);
maunal sequel query returns 17:
select COUNT(is_up_vote) from comment_votes WHERE is_up_vote = true
Edit 2:
$query->withCount([
'votes as up_votes_count' => function($q) {
$q->where('is_up_vote', true);
},
'votes as down_votes_count' => function($q) {
$q->where('is_up_vote', false);
},
]);
The only thing that I could make work is this, but it'd need extra step for getting the total, so I didn't really like this approach. I'm sure someone more proficient with queries can come up with something with direct query.
withCount may not be able to handle this case. what about using addSelect to calculate the votes count.
return Model::addSelect(['votes_count' => Vote::selectRaw('( (COUNT(is_up_vote) WHERE is_up_vote = true) - (COUNT(is_up_vote) WHERE is_up_vote = false) )')
->whereColumn('parent_id', 'parents.id')
])->get();
I don't know whether the sub-query is gonna work. But you know the idea.
According to the instructions commented by #Jefferson Pessanha, just using Model::selectRaw('(abs(2 * sum(is_up_vote) - count(*)))')->get(); would be enough
create table comment_votes
(
is_up_vote tinyint(1) default 1 null
);
insert into comment_votes values (true),(true),(true),(true),(true),(true),(true),(true),(true),(true),(true),(true),(true),(true),(true),(true),(true);
insert into comment_votes values (false),(false),(false),(false),(false),(false),(false),(false),(false),(false),(false),(false),(false),(false),(false);
select COUNT(is_up_vote) from comment_votes WHERE is_up_vote = true;
returns 17
select COUNT(is_up_vote) from comment_votes WHERE is_up_vote = false;
returns 15
select (abs(2 * sum(is_up_vote) - count(*))) from comment_votes;
returns 2
Explain:
trueValues + falseValues = total
falseValues = total - trueValues
trueValues - falseValues = x
trueValues - (total - trueValues) = x
2 * trueValues - total = x
trueValues = sum(is_up_vote) [only trues stored as 1 are counted]
total = count(*)
Abs function is to convert the result to positive in case of negative
Isn't it correct?
Try this.
select
sum( if(is_up_vote= true, 1, 0) ) as uptrue,
sum( if(is_up_vote= false, 1, 0) ) as upfalse,
sum( if(is_up_vote= true, 1, -1) ) as updiff
from comment_votes
Edit: 1
Instead of checking if it is true or false, try this.
select
sum( if(is_up_vote, 1, 0) ) as uptrue,
sum( if(is_up_vote, 0, 1) ) as upfalse,
sum( if(is_up_vote, 1, -1) ) as updiff
from comment_votes

ValueError: not enough values to unpack (expected 3, got 2) Python

I'm trying to print out a dictionary in the following format. I want to do this because I need to graph some trends in D3 using the JSON format. For this trend, I am counting the number of murders in each state within each decade (1980s to 2010s).
I am able to output the file and everything but since I am trying to create a graph, the format of the data in the JSON file needs to be very specific in terms of labeling each key, value pair in the dictionary in the output.
xl = pd.ExcelFile('Wyoming.xlsx')
df = xl.parse('Sheet1')
year = df['Year']
state = df['State']
freq = dict()
for i in range(0, len(df)):
currYear = year.iloc[i]
if(currYear >= 1980 and currYear < 1989):
currDecade = 1980
elif(currYear >= 1990 and currYear < 2000):
currDecade = 1990
elif(currYear >= 2000 and currYear < 2010):
currDecade = 2000
elif(currYear >= 2010):
currDecade = 2010
currState = state.iloc[i]
if currDecade in freq:
if currState in freq[currDecade]:
freq[currDecade][currState] += 1
else:
key = {currState: 1}
freq[currDecade].update(key)
else:
key = {currDecade:{currState: 1}}
freq.update(key)
#print(freq)
freq1 = [{'Decade': d, 'State': [{'State': s, 'Freq': f}]} for d, s, f in freq.items()]
print(freq1)
I am getting the error "ValueError: not enough values to unpack (expected 3, got 2)"
I expect the output to be as given below.
[{"Decade": "1980", "State": [{"State": "California", "Freq": 29591}, {"State": "Massachusetts", "Freq": 1742}, ...}]
The dict.items() only iterates tuples with two elements: the key and value.
freq1 = []
for decade, states in freq.items():
freq1.append({
'Decade': decade,
'State': []
})
for state, freq in states.items():
freq1['State'].append([{'State': state, 'Freq': freq}])
print(freq1)
I think the code is more readable this way. However if you still prefer the one-line list comprehension solution, here it is:
freq1 = [{'Decade': d, 'State': [{'State': s, 'Freq': f} for s, f in ss.items()]} for d, ss in freq.items()]
The culprit is for d, s, f in freq.items(), since freq.items() returns an iterable over (key, value) pairs in freq. Since you have nested dicts, try this:
freq1 = [{'Decade': d, 'State': [{'State': s, 'Freq': f} for s, f in sdict.items()]}
for d, sdict in freq.items()
]

dictionary value is dict but printing as string in json dump

I have a script that is working fine except for this tiny issue. My script is looping over list items and appending a json string over a loop and then doing json dump to file.
My json string:
main_json = {"customer": {"main_address": "","billing_address": "","invoice_reference": "","product": []}}
main loop:
for row in result:
account_id = ACCOUNTID_DATA_CACHE.get(row['customer.main_address.customer_id'])
if account_id is None or account_id != row['customer.main_address.customer_id']:
if main_json:
results.append(main_json)
main_json = {"customer": {"main_address": "","billing_address": "","invoice_reference": "","product": []}}
main_address = {}
billing_address = {}
for key,value in row.items():
if key.startswith('customer.main_address'):
main_address[key.split(".")[2]] = value
if key.startswith('customer.billing_address'):
billing_address[key.split(".")[2]] = value
billing_address_copy = billing_address.copy()
for mkey,mvalue in main_address.items():
for bkey,bvalue in billing_address_copy.items():
if str(bvalue) == str(mvalue):
bvalue = ''
billing_address_copy[bkey] = bvalue
if all(value == '' for value in billing_address_copy.values()) is True:
main_json['customer']['billing_address'] = ''
else:
main_json['customer']['billing_address'] = billing_address
main_json['customer']['main_address'] = main_address
product = parse_products(row)
main_json['customer']['product'].append(product)
...
def parse_products(row):
product = {}
x = {}
for key,value in row.items():
if key.startswith('customer.product'):
product[key.split(".")[2]] = value
if key.startswith('customer.product.custom_attributes'):
x['domain'] = value
print(x)
product[key.split(".")[2]] = x
if key == 'start_date' or 'renewal_date':
value = str(value)
product[key] = value
return product
In this part below, how do make sure that the value is not a string when dumped?
if key.startswith('customer.product.custom_attributes'):
x['domain'] = value
print(x)
product[key.split(".")[2]] = x
Because in the output I'm getting:
{
"custom_attributes": "{'domain': 'somedomain.com'}",
"description": "some_description",
"discount": "0.00"}
When what I really want is:
{
"custom_attributes": {"domain": "somedomain.com"},
"description": "some_description",
"discount": "0.00"}
EDIT: how i'm dumping:
with open('out.json', 'w') as jsonout:
json.dump(main_json, jsonout, sort_keys=True, indent=4)
Well, this IF is flawed and always TRUE:
if key == 'start_date' or 'renewal_date':
So you are converting everything to str()

Sort lua table based on nested json value

We have a key-value pair in redis consisting of a key with a JSON object as a value with various information;
"node:service:i-01fe0d69c343734" :
"{\"port\":\"32781\",
\"version\":\"3.0.2\",
\"host-instance-id\":\"i-01fe0d69c2243b366\",
\"last-checkin\":\"1492702508\",
\"addr\":\"10.0.0.0\",
\"host-instance-type\":\"m3.large\"}"
Is it possible to sort the table based on the last-checkin time of the value?
Here is my solution to your problem, using the quick sort algorithm, before doing a little correction of your input (as I understood it):
-----------------------------------------------------
local json = require("json")
function quicksort(t, sortname, start, endi)
start, endi = start or 1, endi or #t
sortname = sortname or 1
if(endi - start < 1) then return t end
local pivot = start
for i = start + 1, endi do
if t[i][sortname] <= t[pivot][sortname] then
local temp = t[pivot + 1]
t[pivot + 1] = t[pivot]
if(i == pivot + 1) then
t[pivot] = temp
else
t[pivot] = t[i]
t[i] = temp
end
pivot = pivot + 1
end
end
t = quicksort(t, sortname, start, pivot - 1)
return quicksort(t, sortname, pivot + 1, endi)
end
---------------------------------------------------------
-- I manually added delimeter ","
-- and name "node:service..." must be different
str = [[
{
"node:service:i-01fe0d69c343731" :
"{\"port\":\"32781\",
\"version\":\"3.0.2\",
\"host-instance-id\":\"i-01fe0d69c2243b366\",
\"last-checkin\":\"1492702506\",
\"addr\":\"10.0.0.0\",
\"host-instance-type\":\"m3.large\"}"
,
"node:service:i-01fe0d69c343732" :
"{\"port\":\"32781\",
\"version\":\"3.0.2\",
\"host-instance-id\":\"i-01fe0d69c2243b366\",
\"last-checkin\":\"1492702508\",
\"addr\":\"10.0.0.0\",
\"host-instance-type\":\"m3.large\"}"
,
"node:service:i-01fe0d69c343733" :
"{\"port\":\"32781\",
\"version\":\"3.0.2\",
\"host-instance-id\":\"i-01fe0d69c2243b366\",
\"last-checkin\":\"1492702507\",
\"addr\":\"10.0.0.0\",
\"host-instance-type\":\"m3.large\"}"
,
"node:service:i-01fe0d69c343734" :
"{\"port\":\"32781\",
\"version\":\"3.0.2\",
\"host-instance-id\":\"i-01fe0d69c2243b366\",
\"last-checkin\":\"1492702501\",
\"addr\":\"10.0.0.0\",
\"host-instance-type\":\"m3.large\"}"
}
]]
-- remove unnecessary \
str = str:gsub('"{','{'):gsub('}"','}'):gsub('\\"','"')
local t_res= json.decode(str)
-- prepare table before sorting
local t_indexed = {}
for k,v in pairs(t_res) do
v["node-service"] = k
t_indexed[#t_indexed+1] = v
end
-- algoritm quicksort realised only for indexed table
local t_sort= quicksort(t_indexed, "last-checkin")
for k,v in pairs(t_sort) do
print( k , v["node-service"] , v["port"], v["version"], v["host-instance-id"], v["last-checkin"] , v["addr"], v["host-instance-type"] )
end
console:
1 node:service:i-01fe0d69c343734 32781 3.0.2 i-01fe0d69c2243b366 1492702501 10.0.0.0 m3.large
2 node:service:i-01fe0d69c343731 32781 3.0.2 i-01fe0d69c2243b366 1492702506 10.0.0.0 m3.large
3 node:service:i-01fe0d69c343733 32781 3.0.2 i-01fe0d69c2243b366 1492702507 10.0.0.0 m3.large
4 node:service:i-01fe0d69c343732 32781 3.0.2 i-01fe0d69c2243b366 1492702508 10.0.0.0 m3.large