JSON as input in a mapreduce

JSON as input in a mapreduce - json

I have a JSON file contains fields such as machine_id, category, and ... Category contains states of machines such as "alarm", "failure". I simply like to see how many times each machine_id has been reported using rmr2.
For example, if I have the following:
machine_id, state
48, alarm
39, failure
48, utilization
I like to see this result:
48,2
39,1
What I did:
I wrote a simple mapreduce to read the value of JSON file and used it as an input in the second mapreduce. Code is:
mp = function(k,v){
machine_id=v$machine_id
keyval(machine_id,1) }
rd = function(k,v) keyval(k,length(v))
mapreduce(input = mapreduce(input='\user\cloudera\sample.json', input.format="json" , map=function(k,v) keyval(k,v)) , map=mp, reduce = rd)
Unfortunately, it returns only the last two values of JSON file. It seems that it doesn't read entire of the value of the JSON file. I would appreciate any help.

Related

How can I Download to CSV in Neo4j

I've been trying to download a certain data on my graph and it returns this error :
Neo.ClientError.Statement.SyntaxError: Type mismatch: expected List<Node> but was Node (line 2, column 27 (offset: 77))"CALL apoc.export.csv.data(c,[], "contrib.csv",{})"
This is the query I did :
MATCH (c:Contrib) WHERE c.nationality CONTAINS "|" CALL apoc.export.csv.data(c,[], "contrib.csv",{}) YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
What went wrong ? :(
Thanks

The syntax for the function: apoc.export.csv.data is
apoc.export.csv.data(nodes,rels,file,config)
exports given nodes and relationships as csv to the provided file
The nodes is a collection of nodes rather than a node.
OLD: MATCH (c:Contrib) WHERE c.nationality CONTAINS "|"
CALL apoc.export.csv.data(c,[], "contrib.csv",{})
NEW: MATCH (c:Contrib) WHERE c.nationality CONTAINS "|"
WITH collect(c) as contribs
CALL apoc.export.csv.data(contribs, [], "contrib.csv", {})

Select a random item from json imported dictionary

I'm trying to select a random player from an imported json file.
data = json.loads(source)
randPlayer = data['areas']['homes']
randP = random.choice(randPlayer)
print(randP)
Here is a code I tried, basically in 'homes', I have a list of player names and I want to select one at random.
Err Output
Source Code Example:
{'Player1': {'lvl': 192}, 'Player2': {'lvl': 182}}

This should work
randP = random.choice(list(randPlayer))

This is the good example which I found in some other sites, It is giving exact answer and I have checked it already. I am posting this for you and some other people who need perfect answer. All the best
Example code
import random
weight_dict = {
"Kelly": 50,
"Red": 68,
"Jhon": 70,
"Emma" :40
}
key = random.choice(list(weight_dict))
print ("Random key value pair from dictonary is ", key, " - ", weight_dict[key])
output
Random key value pair from dictonary is Jhon - 70

How to order a json file list?

My goal is to get specific data on many profiles on khanacademy by using their API.
My problem is: in their API, json files have different list orders. It can vary from one to another.
Here is my code:
from urllib.request import urlopen
import json
# here is a list with two json file links:
profiles=['https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959','https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959']
# for each json file, take some specific data out
for profile in profiles:
print(profile)
with urlopen(profile) as response:
source = response.read()
data = json.loads(source)
votes = data[1]['renderData']['discussionData']['statistics']['votes']
print(votes)
I expected something like this:
https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
100
https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
41
Instead I got an error:
https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
100
https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
Traceback (most recent call last):
File "bitch.py", line 12, in <module>
votes = data[1]['renderData']['discussionData']['statistics']['votes']
KeyError: 'discussionData'
As we can see:
This link A is working fine: https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959
But this link B is not working: https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959 And that's because in this json file. The list is not in the same order as it is in the A link.
My question is: Why? And how can I write my script to get into account these variation of orders?
There is probably something to do with .sort(). But I am missing something.
Maybe I should also precise that I am using python 3.7.2.
Link A: desired data (yellow) is in the second item of the list (blue):
Link B: desired data (yellow) is in the third item of the list (blue):

You could use an if to test if votes in current index dictionary
import requests
urls = ['https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959',
'https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959']
for url in urls:
r = requests.get(url).json()
result = [item['renderData']['discussionData']['statistics']['votes'] for item in r if 'votes' in str(item)]
print(result)

Catching exceptions in python doesn't take much overhead unlike other languages so I would recommend the "better ask forgiveness then permission" solution. This will be slightly faster than searching through a str for the word votes as it will fail instantly if the key is invalid.
import requests
urls = ['https://www.khanacademy.org/api/internal/user/kaid_329989584305166460858587/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959',
'https://www.khanacademy.org/api/internal/user/kaid_901866966302088310331512/profile/widgets?lang=en&_=190424-1429-bcf153233dc9_1556201931959']
for url in urls:
response = requests.get(url).json()
result = []
for item in response:
try:
result.append(item['renderData']['discussionData']['statistics']['votes'])
except KeyError:
pass # Could not find votes
print(result)

Reference data join on stream analytics input not giving output

I'm trying to set a rule in Azure Stream Analytics job with the use of reference data and input stream which is coming from an event hub.
This is my reference data JSON packet in BLOB storage:
{
"ruleId": 1234,
"Tag" : "TAG1",
"metricName": "velocity",
"alertName": "velocity over 500",
"operator" : "AVGGREATEROREQUAL",
"value": 500
}
And here is the transformation query in the stream analytics job:
WITH
transformedInput AS
(
SELECT
metric = GetArrayElement(DeviceInputStream.data,0),
masterTag = rules.Tag,
ruleId = rules.ruleId,
alertName = rules.alertName,
ruleOperator = rules.operator,
ruleValue = rules.value
FROM
DeviceInputStream
timestamp by EventProcessedUtcTime
JOIN
rules
ON DeviceInputStream.masterTag = rules.Tag
)
--rule output--
SELECT
System.Timestamp as time,
transformedInput.Tag as Tag,
transformedInput.ruleId as ruleId,
transformedInput.alertName as alert,
AVG(metric.velocity) as avg
INTO
alertruleblob
FROM
transformedInput
GROUP BY
transformedInput.masterTag,
transformedInput.ruleId,
transformedInput.alertName,
ruleOperator,
ruleValue,
TumblingWindow(second, 6)
HAVING
ruleOperator = 'AVGGREATEROREQUAL' AND avg(metric.velocity) >= ruleValue
This is not yielding any results. However, when I do a test with sample input and reference data I get the expected results. But this doens't seem to be working with the streaming data. My use case is if the average velocity is greater than 500 for a 6 second window, store that result in another blob storage. The value of velocity has been greater than 500 for sometime, but I'm not getting any results.
What am I doing wrong?

This was working all along. I just had to specify the input path of the reference blob in the reference input path of stream analytics including the file name. I was basically referencing only the blob container without the actual file. So when I changed the path pattern to "filename.json", I got the results. It was a stupid mistake.

Reading CSV file and generating Dictionaries

I have a CSV file looks like
Hit39, Hit24, Hit9
Hit8, Hit39, Hit21
Hit46, Hit47, Hit20
Hit24, Hit 53, Hit46
I want to read file and create a dictionary based on the first come first serve first basis
like Hit39 : 1, Hit 24:2 and so on ...
but notice Hit39 appeared on column 2 and row2 . So if the reader reads it then it should not append it to dictionary it will move on with the new number.
Once a row number is visited it shouldn't include numbers after that if appeared.

Using Python - Best guess until the OP is clarified - treat the file as though it was one huge list and assign an incrementing variable to unique occurences of value.
import csv
from itertools import count
mydict = {}
counter = count(1)
with open('infile.csv') as fin:
for row in csv.reader(fin, skipinitialspace=True):
for col in row:
mydict[col] = mydict.get(col, next(counter))

Since Python is a popular language that has dictionaries, you must be using Python. At least I assume.
import csv
reader = csv.reader(file("filename.csv"))
d = dict((line[0], 1+lineno) for lineno, line in enumerate(reader))
print d

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

JSON as input in a mapreduce - json

Related

How can I Download to CSV in Neo4j

Select a random item from json imported dictionary

How to order a json file list?

Reference data join on stream analytics input not giving output

Reading CSV file and generating Dictionaries

Categories

Resources