customized JSON output in pig - json

Need customized JSON output--
(I have two files - text file and schema file)
abc.txt -
100002030,Tom,peter,eng,block 3, lane 5,california,10021
100003031,Tom,john,doc,block 2, lane 2,california,10021
100004032,Tom,jim,eng,block 1, lane 1,california,10021
100005033,Tom,trek,doc,block 2, lane 2,california,10021
100006034,Tom,peter,eng,block 6, lane 6,california,10021
abc_schema.txt (field name and position)
rollno 1
firstname 2
lastname 3
qualification 4
address1 5
address2 6
city 7
Zipcode 8
Rules-
First 6 characters of rollno
Need to club address1 | address2 | city
Prefix Address to above
Expected Output-
{"rollno":"100002","firstname":"Tom","lastname:"peter","qualification":"eng","Address":"block 3 lane 5 california","zipcode":"10021"}
{"rollno":"100002","firstname":"Tom","lastname:"john","qualification":"doc","Address":"block 2 lane 2 california","zipcode":"10021"}
{"rollno":"100004","firstname":"Tom","lastname:"jim","qualification":"eng","Address":"block 1 lane 1 california","zipcode":"10021"}
{"rollno":"100005","firstname":"Tom","lastname:"trek","qualification":"doc","Address":"block 2 lane 2 california","zipcode":"10021"}
{"rollno":"100006","firstname":"Tom","lastname:"peter","qualification":"eng","Address":"block 6 lane 6 california","zipcode":"10021"}
I do not wish to hardcode the fields but read from the schema file, the idea is to have reusable code. Something like looping schema file and the text file

A = load 'abc.txt' using PigStorage(',') as (rollno, Fname,Lname,qua,add1,add2,city,Zipcode);
B = foreach A generate rollno, Fname,Lname,qua,concate (add1,add2,city) ,Zipcode;
C= STORE B
INTO 'first_table.json'
USING JsonStorage();
Hope this helps.

Related

how do I extract keys from a JSON object?

My splunk instance queries a database once an hour for data about products, and gets a JSON string back that is structured like this:
{"counts":
{"green":413,
"red":257,
"total":670,
"product_list":
{ "urn:product:1":{
"name":"M & Ms" ,
"total":332 ,
"green":293 ,
"red":39 } ,
"urn:product:2":{
"name":"Christmas Ornaments" ,
"total":2 ,
"green":0 ,
"red":2 } ,
"urn:product:3":{
"name":"Traffic Lights" ,
"total":1 ,
"green":0 ,
"red":1 } ,
"urn:product:4":{
"name":"Stop Signs" ,
"total":2 ,
"green":0 ,
"red":2 },
...
}
}
}
I have a query that alerts when the counts.green drops by 10% over 24 hours:
index=database_catalog source=RedGreenData | head 1
| spath path=counts.green output=green_now
| table green_now
| join host
[| search index=database_catalog source=RedGreenData latest=-1d | head 1 | spath path=counts.green output=green_yesterday
| table green_yesterday]
| where green_yesterday > 0
| eval delta=(green_yesterday - green_now)/green_yesterday * 100
| where delta > 10
While I'm an experienced developer in C, C++, Java, SQL, JavaScript, and several others, I'm fairly new to Splunk's Search Processing Language, and references and tutorials seem pretty light, at least the ones I've found.
My next story is to at least expose all the individual products, and identify which ones have a 10% drop over 24 hours.
I thought a reasonable learning exercise would be to extract the names of all the products, and eventually turn that into a table with name, product code (e.g. urn:product:4), green count today, green count 24 hours ago, and then filter that on a 10% drop for all products where yesterday's count is positive. And I'm stuck. The references to {} are all for a JSON array [], not a JSON object with keys and values.
I'd love to get a table out that looks something like this:
ID
Name
Green
Red
Total
urn:product:1
M & Ms
293
39
332
urn:product:2
Christmas Ornaments
0
2
2
urn:product:3
Traffic Lights
0
1
1
urn:product:4
Stop Signs
0
2
2
How do I do that?
I think produces the output you want:
| spath
| table counts.product_list.*
| transpose
| rex field=column "counts.product_list.(?<ID>[^.]*).(?<fieldname>.*)"
| fields - column
| xyseries ID fieldname "row 1"
| table ID name green red total
use transpose to get the field names as data
use rex to extract the ID and the field name
use xyseries to pivot the data into the output
Here is a run-anywhere example using your source data:
| makeresults
| eval _raw="
{\"counts\":
{\"green\":413,
\"red\":257,
\"total\":670,
\"product_list\":
{ \"urn:product:1\":{
\"name\":\"M & Ms\" ,
\"total\":332 ,
\"green\":293 ,
\"red\":39 } ,
\"urn:product:2\":{
\"name\":\"Christmas Ornaments\" ,
\"total\":2 ,
\"green\":0 ,
\"red\":2 } ,
\"urn:product:3\":{
\"name\":\"Traffic Lights\" ,
\"total\":1 ,
\"green\":0 ,
\"red\":1 } ,
\"urn:product:4\":{
\"name\":\"Stop Signs\" ,
\"total\":2 ,
\"green\":0 ,
\"red\":2 },
}
}
}"
| spath
| table counts.product_list.*
| transpose
| rex field=column "counts.product_list.(?<ID>[^.]*).(?<fieldname>.*)"
| fields - column
| xyseries ID fieldname "row 1"
| table ID name green red total

select case when in MYSQL

I have 2 tables
First tabel name is "consumer"
id_consumer
name
1
Roy
2
Dori
3
Rico
Second tabel name is "consumer_address"
id_consumer
address
status
1
Street Avenue
1
1
Park Hill
0
2
Highwalk Street
1
2
Albion Place
0
Condition
name from tabel "consumer"
address from "consumer_address" , but i want to get only 1 address when consumer_address.status = 1
When Consumer not have data in tabel "consumer_address", field is NULL
The Final Tabel Like this
id_consumer
name
address
status
1
Roy
Street Avenue
1
2
Dori
Highwalk Street
1
3
Rico
NULL
NULL
i have query, but its not work
this is my query
SELECT
id_consumer,
name,
CASE WHEN (`consumer_address`.`status` = 1) THEN `consumer_address`.`address` ELSE NULL END as "Address",
CASE WHEN (`consumer_address`.`status` = 1) THEN `consumer_address`.`status` ELSE NULL END as "Status"
FROM consumer
JOIN consumer_address ON consumer_address.id_consumer = consumer.id_consumer
Thanks
Very simple solution:
SELECT
`id_consumer`,
`name`,
`consumer_address`.`address`,
`consumer_address`.`status`
FROM consumer
LEFT JOIN consumer_address ON
`consumer_address`.`id_consumer` = `consumer`.`id_consumer` AND
`consumer_address`.`status` = 1
Instead of using CASE WHEN just include the status in the JOIN.
Additionally, to keep consumer 3, you need a LEFT JOIN.
SELECT
id_consumer,
name,
`consumer_address`.`address`,
`consumer_address`.`status`
FROM
consumer
LEFT JOIN
consumer_address
ON consumer_address.id_consumer = consumer.id_consumer
AND consumer_address.status = 1

Undefined columns selected using panelvar package

Have anyone used panel var in R?
Currently I'm using the package panelvar of R. And I'm getting this error :
Error in `[.data.frame`(data, , c(colnames(data)[panel_identifier], required_vars)) :
undefined columns selected
And my syntax currently is:
model1<-pvargmm(
dependent_vars = c("Change.."),
lags = 2,
exog_vars = c("Price"),
transformation = "fd",
data = base1,
panel_identifier = c("id", "t"),
steps = c("twostep"),
system_instruments = FALSE,
max_instr_dependent_vars = 99,
min_instr_dependent_vars = 2L,
collapse = FALSE)
I don't know why my panel_identifier is not working, it's pretty similar to the example given by panelvar package, however, it doesn't work, I want to appoint that base1 is on data.frame format. any ideas? Also, my data is structured like this:
head(base1)
id t country DDMMYY month month_text day Date_txt year Price Open
1 1 1296 China 1-4-2020 4 Apr 1 Apr 01 2020 12588.24 12614.82
2 1 1295 China 31-3-2020 3 Mar 31 Mar 31 2020 12614.82 12597.61
High Low Vol. Change..
1 12775.83 12570.32 NA -0.0021
2 12737.28 12583.05 NA 0.0014
thanks in advance !
Check the documentation of the package and the SSRN paper. For me it helped to ensure all entered formats are identical (you can check this with str(base1) command). For example they write:
library(panelvar)
data("Dahlberg")
ex1_dahlberg_data <-
pvargmm(dependent_vars = .......
When I look at it I get
>str(Dahlberg)
'data.frame': 2385 obs. of 5 variables:
$ id : Factor w/ 265 levels "114","115","120",..: 1 1 1 1 1 1 1 1 1 2 ...
$ year : Factor w/ 9 levels "1979","1980",..: 1 2 3 4 5 6 7 8 9 1 ...
$ expenditures: num 0.023 0.0266 0.0273 0.0289 0.0226 ...
$ revenues : num 0.0182 0.0209 0.0211 0.0234 0.018 ...
$ grants : num 0.00544 0.00573 0.00566 0.00589 0.00559 ...
For example the input data must be a data.frame (in my case it had additional type specifications like tibble or data.table). I resolved it by casting as.data.frame() on it.

How to get the API response from a csv file Using Bottle in python

Below is the sample of the CSV file(sample.csv) i am using
ID Name Address Ph.no Category
1 Person 1 Address 1 1234568789 Category1
2 Person 2 Address 2 1234568790 Category2
3 Person 3 Address 3 1234568791 Category3
4 Person 4 Address 4 1234568792 Category4
5 Person 5 Address 5 1234568793 Category1
6 Person 6 Address 6 1234568794 Category2
7 Person 7 Address 7 1234568795 Category3
8 Person 8 Address 8 1234568796 Category2
9 Person 9 Address 9 1234568797 Category1
Using Bottle Framework i want to build a restful webservice to query this CSV file. The Request will be “/getDetails?category=x”.Response should be table of records (tuples) of ID,Name,Address, Ph.no belong to the category
This is one of the ways to do it:
from bottle import route, run, template, response
from json import dumps
import csv
output = []
# API Endpoint
#get('/getDetails')
def get_details():
category = request.forms.get('category')
response.content_type = 'application/json'
# read csv file
csv_file = open("file-question", "r")
reader = csv.reader(csv_file)
#Traversing the CSV file
for row in reader:
if row[4] == category:
output.append(row)
# Return Output
return dumps(output)
run(host='localhost', port=8080)

Transform a CSV of Ids into a CSV of Names

I need to transform a csv of Ids into a csv of Names.
I have:
FOLDER ID NAME | FILE ID NAME PATH
1 A 1 fX 1
2 AB 2 fZ 1,2
3 B 3 fY 3,4
4 BC 4 fW 3,4,5
5 BCD
Get info about FILEs and its sizes from the FILEDATA table
select FILE.NAME, FILE.PATH, FILEDATA.SIZE
from FILEDATA inner join FILE on FILEDATA.fileid = FILE.id
WHERE FILEDATA.PropName = "Size"
Actually I get
fX 1 23805
fZ 1,2 27205
fY 3,4 23608
fW 3,4,5 21501
I need replace the IDs by the FOLDER names
fX A 23805
fZ A/AB 27205
fY B/BC 23608
fW B/BC/BDC 21501