JMeter While Controller with CSV issue - csv

I've got an issue with the Loop controller in Jmeter when using a CSV reader which hopefully someone can shed some light on for me.
So I have a CSV file which contains a bunch of currency codes and their respective names (See below), I'm then using the CSV values in a subsequent HTTP Request.
ALL,Albanian Lek
DZD,Algerian Dinar
ARS,Argentine Peso
AMD,Armenian Dram
AUD,Australian Dollar
AZN,Azerbaijan New Manat
BSD,Bahamian Dollar
BHD,Bahraini Dinar
BDT,Bangladeshi Taka
BZD,Belize Dollar
BOB,Bolivian Boliviano
BRL,Brazilian Real
GBP,British Pound
BND,Brunei Dollar
XOF,CFA Franc BCEAO
XAF,CFA Franc BEAC
XPF,CFP Franc
KHR,Cambodian Riel
CAD,Canadian Dollar
CLP,Chilean Peso
CNY,Chinese Yuan Renminbi
COP,Colombian Peso
KMF,Comoros Franc
CRC,Costa Rican Colon
HRK,Croatian Kuna
CZK,Czech Koruna
DKK,Danish Krone
DJF,Djibouti Franc
DOP,Dominican R. Peso
XCD,East Caribbean Dollar
EGP,Egyptian Pound
EEK,Estonian Kroon
EUR,Euro
FJD,Fiji Dollar
HNL,Honduran Lempira
HKD,Hong Kong Dollar
HUF,Hungarian Forint
ISK,Iceland Krona
INR,Indian Rupee
IDR,Indonesian Rupiah
ILS,Israeli New Shekel
JPY,Japanese Yen
JOD,Jordanian Dinar
KZT,Kazakhstan Tenge
KES,Kenyan Shilling
KWD,Kuwaiti Dinar
KGS,Kyrgyzstanian Som
LAK,Lao Kip
LVL,Latvian Lats
LBP,Lebanese Pound
LTL,Lithuanian Litas
MYR,Malaysian Ringgit
MRO,Mauritanian Ouguiya
MUR,Mauritius Rupee
MXN,Mexican Peso
MNT,Mongolian Tugrik
MAD,Moroccan Dirham
NAD,Namibia Dollar
NPR,Nepalese Rupee
NZD,New Zealand Dollar
NIO,Nicaraguan Cordoba Oro
NOK,Norwegian Kroner
OMR,Omani Rial
PKR,Pakistan Rupee
PGK,Papua New Guinea Kina
PYG,Paraguay Guarani
PEN,Peruvian Nuevo Sol
PHP,Philippine Peso
PLN,Polish Zloty
QAR,Qatari Rial
RON,Romanian New Lei
RUB,Russian Rouble
RWF,Rwandan Franc
WST,Samoan Tala
SAR,Saudi Riyal
SGD,Singapore Dollar
SOS,Somali Shilling
ZAR,South African Rand
KRW,South-Korean Won
LKR,Sri Lanka Rupee
SZL,Swaziland Lilangeni
SEK,Swedish Krona
CHF,Swiss Franc
TWD,Taiwan Dollar
TZS,Tanzanian Shilling
THB,Thai Baht
TND,Tunisian Dinar
TRY,Turkish Lira
USD,US Dollar
UGX,Uganda Shilling
UAH,Ukraine Hryvnia
UYU,Uruguayan Peso
AED,Utd. Arab Emir. Dirham
VUV,Vanuatu Vatu
VEF,Venezuelan Bolivar
VND,Vietnamese Dong
ZMK,Zambian Kwacha
In my While Loop I have a the following conditional check:
${__javaScript("EOF" != "${currencyCode}")}
note I have stackoverflow doesnt seem to display the angle brackets but there are there in my JMeter
So what I'm seeing is that the loop is going through the CSV and the required HTTP Request is being sent. However once the last value in the CSV value ("ZMK" in this case) is reached a extra Request is being made with a value of
So it looks like the Whileis doing one more loop than it should be.
I've checked the CSV file in VIM and every other editor I could think of and there are no blank lines or tabs at the end of the CSV file.
If I make my Loop condition
${__javaScript("ZMK" != "${currencyCode}")}
everything works fine. But the CSV could grow so I dont want to hardcode anything.
Just to give you a better picture of my test here is the overall picture
+ Thread Group
++ Simple Controller
+++ While Loop with condition ${__javaScript("<EOF>" != "${currencyCode}")}
++++ CSV Reader (Recycle on EOF = FALSE, STOP THREAD ON EOF = FALSE, SHARING = Current Thread)
++++ HTTP Request using value from CSV file
Also my

Try putting your CSV reader all the way at the top of the testplan. It is below your While controller now which is causing this issue.

Try totally removing your While Controllerand providing a reasonable number of Loops in Thread group (or alternatively replace it with Loop Controller)
Also, you should make your CSV Data Set Config a child of HTTP Request. See Using CSV DATA SET CONFIG guide for more details on how to implement CSV reading logic in JMeter.
Another option is using Beanshell to retrieve all currency codes and local names and store them to JMeter variables.
Following Beanshell code will obtain all currencies available in Java and store them to JMeter Variables
Set<Currency> currencies = Currency.getAvailableCurrencies();
for (Currency currency : currencies) {
vars.put(currency.getCurrencyCode(), currency.getDisplayName());
}

${__javaScript("${currencyCode}" != "EOF",)}
Worked with me.

Put your CSV reader at the top of the testplan as a child of a while controller with the following condition(function or variable): ${__javaScript(“${currencyCode}” != “<EOF>”,)}.
Then you have to increase the number of threads of your thread group to the number of rows in your CSV file, this way it will loop over each line in the csv file with each thread.

Related

Improving the recall of a Custom Named Entity Recognition (NER) in Spacy

This is a second part of another question I posted. However, they are different enough to be seperate questions, but could be related.
Previous question
Building a Custom Named Entity Recognition with Spacy , using random text as a sample
I have built a custom Named Entity Recognition (NER) using the method described in the previous question. From here, I just copied the method to build the NER from the Spacy website (under "Named Entity Recognizer" at this website https://spacy.io/usage/training#ner)
the custom NER works, sorta. If I sentence tokenize the text, lemmatize the words (so "strawberries" become "strawberry"), it can pick up an entity. However, it stops there. It sometimes picks up two entities, but very rarely.
Is there anything I can do to improve its accuracy?
Here is the code (I have TRAIN_DATA in this format, but for food items
TRAIN_DATA = [
("Uber blew through $1 million a week", {"entities": [(0, 4, "ORG")]}),
("Google rebrands its business apps", {"entities": [(0, 6, "ORG")]})]
)
The data is in the object train_food
import spacy
import nltk
nlp = spacy.blank("en")
#Create a built-in pipeline components and add them in the pipeline
if "ner" not in nlp.pipe_names:
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner, last =True)
else:
ner =nlp.get_pipe("ner")
##Testing for food
for _, annotations in train_food:
for ent in annotations.get("entities"):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
model="en"
n_iter= 20
# only train NER
with nlp.disable_pipes(*other_pipes), warnings.catch_warnings():
# show warnings for misaligned entity spans once
warnings.filterwarnings("once", category=UserWarning, module='spacy')
# reset and initialize the weights randomly – but only if we're
# training a new model
nlp.begin_training()
for itn in range(n_iter):
random.shuffle(train_food)
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(train_food, size=compounding(4.0, 32.0, 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(
texts, # batch of texts
annotations, # batch of annotations
drop=0.5, # dropout - make it harder to memorise data
losses=losses,
)
print("Losses", losses)
text = "mike went to the supermarket today. he went and bought a potatoes, carrots, towels, garlic, soap, perfume, a fridge, a tomato, tomatoes and tuna."
After this, and using text as a sample, I ran this code
def text_processor(text):
text = text.lower()
token = nltk.word_tokenize(text)
ls = []
for x in token:
p = lemmatizer.lemmatize(x)
ls.append(f"{p}")
new_text = " ".join(map(str,ls))
return new_text
def ner (text):
new_text = text_processor(text)
tokenizer = nltk.PunktSentenceTokenizer()
sentences = tokenizer.tokenize(new_text)
for sentence in sentences:
doc = nlp(sentence)
for ent in doc.ents:
print(ent.text, ent.label_)
ner(text)
This results in
potato FOOD
carrot FOOD
Running the following code
ner("mike went to the supermarket today. he went and bought garlic and tuna")
Results in
garlic FOOD
Ideally, I want the NER to pick up potato, carrot and garlic. Is there anything I can do?
Thank you
Kah
while you are training your model, You can try some information retrieval techniques such as:
1-lower casing all of the words
2-replace words with their synonyms
3-removing stop words
4-rewrite sentences(it can be done automatically using back-translation aka translating into Arabic, then translating it back into English)
also, consider using better models such as:
http://nlp.stanford.edu:8080/corenlp
https://huggingface.co/models

How to decode OBD-2 data from Hyundai Ioniq EV

I try to read out OBD-2 data from Hyundai Ioniq Electro (Version 28kWh), using a Raspberry PI and a Bluetooth ELM327 interface. Connection and data transfer works fine.
For example: sending 2105<cr><lf> gives a response (<cr> is value 0x0d = 13):
7F2112<cr>7F2112<cr>7F2112<cr>02D<cr>0:6105FFFFFFFF<cr>7F2112<cr>1:00000000001616<cr>2:161616161621FA<cr>3:26480001501616<cr>4:03E82403E80FC0<cr>5:003A0000000000<cr>6:00000000000000<cr><cr>>
The value C0 in 4:03E82403E80FC0 seems to be the State of charge (SOC) display value:
C0 -> 192 -> 192/2 % = 96%
There are some tables for decoding available (see https://github.com/JejuSoul/OBD-PIDs-for-HKMC-EVs/tree/master/Ioniq%20EV%20-%2028kWh), but how to use these tables?
For example sending 2101<cr><lf> gives the response:
02C<cr>
0:6101FFFFF800<cr>
01E<cr>
0:6101000003FF<cr>
03D<cr>
0:6101FFFFFFFF<cr>
016<cr>
0:6101FFE00000<cr>
1:0002D402CD03F0<cr>
1:0838010A015C2F<cr>
7F2112<cr>
1:B4256026480000<cr>
1:0921921A061B03<cr>
2:000582003401BD<cr>
2:0000000A002702<cr>
2:000F4816161616<cr>
2:00000000276234<cr>
3:04B84100000000<cr>
3:5B04692F180018<cr>
3:01200000000000<cr>
3:1616160016CB3F<cr>
4:00220000600000<cr>
4:00D0FF00000000<cr>
4:CB0100007A0002<cr>
5:000001F3026A02<cr>
5:5D4000025D4600<cr>
6:D2000000000000<cr>
6:00DECA0000D8E6<cr>
7:008A2FEB090002<cr>
8:0000000003E800<cr>
<cr>
>
Please note, that the line feed was added behind every carriage return (<cr>) for better readability and is not part of the original data response.
How can I decode temperature, currents, ... from these data?
I have found the mistake by myself. The ELM327 description (http://elmelectronics.com/DSheets/ELM327DS.pdf) explains the AT commands in detail.
The problem on this issue was the mixing of CAN responses from multiple ECU's caused by the AT H0 command (headers off) in the initialization phase (not described in question). See also EM327DS.pdf page 44 (Multiple Responses).
When using AT H1 on startup, the responses can be decoded without problem.
Initialization (with AT H1 = headers on)
AT D\r\n
AT Z\r\n
AT L0\r\n
AT E0\r\n
AT S0\r\n
AT H1\r\n
AT SP 0\r\n
Afterwards communication with ECU's:
Response on first command 0100\r\n:
SEARCHING...\r7EB06410080000001\r7EC06410080000001\r\r>
Response on second command 2101\r\n:
7EE037F2112\r7ED102C6101FFFFF800\r7EA10166101FFE00000\r7EC103D6101FFFFFFFF\r7EB101E6101000003FF\r7EA2109211024062703\r7EC214626482648A3FF\r7ED2100907D87E15592\r7EB210838011D88B132\r7ED2202A1A7024C0134\r7EA2200000000546900\r7EC22C00D9E1C1B1B1B\r7EB220000000A000802\r7EA2307200000000000\r7ED23050343102000C8\r7EC231B1B1C001BB50F\r7EB233C04B8320000D0\r7EC24B5010000810002\r7ED24047400C8760017\r7EB24FF300000000000\r7ED25001401F387F46A\r7EC256AC100026CB100\r7EC2600E3C50000DE69\r7ED263F001300000000\r7EC27008CC38209015C\r7EC280000000003E800\r\r>
Response on third command 2105\r\n:
7EE037F2112\r7ED037F2112\r7EA037F2112\r7EC102D6105FFFFFFFF\r7EB037F2112\r7EC2100000000001B1C\r7EC221C1B1B1B1B2648\r7EC2326480001641A1B\r7EC2403E80803E80147\r7EC25003A0000000000\r7EC2600000000000000\r\r>
Now every response starts with the id of the ECU. Take attention only to responses starting with 7EC.
Example:
Looking for battery current in amps. In the document Spreadsheet_IoniqEV_BMS_2101_2105.xls you find the battery current on:
response 21 for 2101: last byte = High Byte of battery current
response 22 for 2101: first byte = Low Byte of battery current
So look to the response of 2101\r\n and search for 7EC21 and 7EC22: You will find:
7EC214626482648A3FF: take last byte for battery high value -> FF
7EC22C00D9E1C1B1B1B: take first byte after 7EC22 for battery low value -> C0
The battery current value is: FFC0
This value is two complements encoded:
0xffc0 = 65472 -> 65472 - 65536 = -64 -> -6.4A
Result: the battery is charged with 6.4A
For a coding example see:
https://github.com/greenenergyprojects/obd2-gateway, file src/obd2/obd2.ts

Escape quotes inside quoted fields when parsing CSV in Flink

In Flink, parsing a CSV file using readCsvFile raises an exception when encountring a field containing quotes like "Fazenda São José ""OB"" Airport":
org.apache.flink.api.common.io.ParseException: Line could not be parsed: '191,"SDOB","small_airport","Fazenda São José ""OB"" Airport",-21.425199508666992,-46.75429916381836,2585,"SA","BR","BR-SP","Tapiratiba","no","SDOB",,"SDOB",,,'
I've found in this mailing list thread and this JIRA issue that quoting inside the field should be realized through the \ character, but I don't have control over the data to modify it. Is there a way to work around this?
I've also tried using ignoreInvalidLines() (which is the less preferable solution) but it gave me the following error:
08:49:05,737 INFO org.apache.flink.api.common.io.LocatableInputSplitAssigner - Assigning remote split to host localhost
08:49:05,765 ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN DataSource (at main(Job.java:53) (org.apache.flink.api.java.io.TupleCsvInputFormat)) -> Map (Map at main(Job.java:54)) -> Combine(SUM(1), at main(Job.java:56) (2/8)
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.flink.api.common.io.GenericCsvInputFormat.skipFields(GenericCsvInputFormat.java:443)
at org.apache.flink.api.common.io.GenericCsvInputFormat.parseRecord(GenericCsvInputFormat.java:412)
at org.apache.flink.api.java.io.CsvInputFormat.readRecord(CsvInputFormat.java:111)
at org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:454)
at org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:79)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:176)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)
Here is my code:
DataSet<Tuple2<String, Integer>> csvInput = env.readCsvFile("resources/airports.csv")
.ignoreFirstLine()
.ignoreInvalidLines()
.parseQuotedStrings('"')
.includeFields("100000001")
.types(String.class, String.class)
.map((Tuple2<String, String> value) -> new Tuple2<>(value.f1, 1))
.groupBy(0)
.sum(1);
If you cannot change the input data, then you should turn off parseQuotedString(). This will simply look for the next field delimiter and return everything in between as a string (including the quotations marks). Then you can remove the leading and trailing quotation mark in a subsequent map operation.

Load only a few values from complex JSON object in Pig Latin

I have a complex JSON file that looks like this: http://pastebin.com/4UfadbqS
I would like to load only several values from these JSON objects using Pig Latin. I tried doing that like this:
mydata = LOAD 'data.json'
USING JsonLoader('id:chararray, created_at:chararray,
user: {(language:chararray)}’);
STORE mydata
INTO 'output';
But it seems that Pig Latin is just taking the first 3 values from the JSON and saving them (it does not recognize the column name as a key). Is there a way to achieve this? OR should I just list ALL the values from JSON in a Pig and filter them after that?
There are few problems in the above approach
1. JsonLoader will always expect the full schema of your input but you gave only three fields.
2. JsonLoader will always expect the entire input as a single line but your input is multiline.
3. JsonLoader will not support nested schema but your input contains nested schema.
To solve all the above problems you have use the thirdparty library elephant-bird jar.
Download the (elephant-bird-pig-4.1.jar and elephant-bird-hadoop-compat-4.1.jar) jar file from this link
http://www.java2s.com/Code/Jar/e/elephant.htm and try the below approach
I copied your entire input and formatted as a single line as below.
input.json
{"filter_level":"medium","retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":488927960280211456,"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"created_at":"Tue Jul 15 06:08:04 +0000 2014","favorite_count":0,"place":null,"coordinates":null,"text":"RT #BulleyBufton: #MinaANDMaya PLEASE RT /VOTE BULLEY. Last day to help me win my old rescue #HilbraesDogs £5k https://t.co/Y8g47fLYY1 http\u2026","contributors":null,"retweeted_stt
atus":{"filter_level":"low","contributors":null,"text":"#MinaANDMaya PLEASE RT /VOTE BULLEY. Last day to help me win my old rescue #HilbraesDogs £5k https://t.co/Y8g47fLYY1 httpp
://t.co/DDco9wVXtP","geo":null,"retweeted":false,"in_reply_to_screen_name":"MinaANDMaya","possibly_sensitive":false,"truncated":false,"lang":"en","entities":{"trends":[],"symbols":[],"urls":[{"expanded_url":"https://www.animalfriendsquote.co.uk/fb-worldcup/","indices":[93,116],"display_url":"animalfriendsquote.co.uk/fb-worldcup/","url":"https://t.co/Y8g47fLYY1"}],"hashtags":[],"media":[{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":455},"large":{"w":706,"resize":"fit","h":946},"medium":{"w":600,"resize":"fit","h":803}},"id":488926730481332224,"media_url_https":"https://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","media_url":"http://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","expanded_url":"http://twitter.com/BulleyBufton/status/488926827394904064/photo/1","indices":[117,139],"id_str":"488926730481332224","type":"photo","display_url":"pic.twitter.com/DDco9wVXtP","url":"http://t.co/DDco9wVXtP"}],"user_mentions":[{"id":132204038,"name":"Mina*Bad Yoga Kitty*","indices":[0,12],"screen_name":"MinaANDMaya","id_str":"132204038"},{"id":2308374684,"name":"Julianna Kaminski","indices":[75,88],"screen_name":"HilbraesDogs","id_str":"2308374684"}]},"in_reply_to_status_id_str":null,"id":488926827394904064,"source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android<\/a>","in_reply_to_user_id_str":"132204038","favorited":false,"in_reply_to_status_id":null,"retweet_count":6,"created_at":"Tue Jul 15 06:03:34 +0000 2014","in_reply_to_user_id":132204038,"favorite_count":3,"id_str":"488926827394904064","place":null,"user":{"location":"CHICAGO , USA","default_profile":false,"statuses_count":8868,"profile_background_tile":true,"lang":"en","profile_link_color":"AD54E8","profile_banner_url":"https://pbs.twimg.com/profile_banners/225136520/1403608773","id":225136520,"following":null,"favourites_count":5082,"protected":false,"profile_text_color":"3D1957","verified":false,"description":"I'm Bulley, I'm proof that there is always hope.\r\nI was in rescue kennels in UK for 9yrs. #ada_bscakes took me in.\r\nWe've moved to America to start a new life.","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"BULLEY","profile_background_color":"0A0A0A","created_at":"Fri Dec 10 19:55:17 +0000 2010","default_profile_image":false,"followers_count":3421,"profile_image_url_https":"https://pbs.twimg.com/profile_images/486614595457789952/gtcLac9w_normal.jpeg","geo_enabled":true,"profile_background_image_url":"http://pbs.twimg.com/profile_background_images/378800000166829702/isbjd7O4.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/378800000166829702/isbjd7O4.jpeg","follow_request_sent":null,"url":null,"utc_offset":-39600,"time_zone":"International Date Line West","notifications":null,"profile_use_background_image":true,"friends_count":3702,"profile_sidebar_fill_color":"7AC3EE","screen_name":"BulleyBufton","id_str":"225136520","profile_image_url":"http://pbs.twimg.com/profile_images/486614595457789952/gtcLac9w_normal.jpeg","listed_count":29,"is_translator":false},"coordinates":null},"geo":null,"entities":{"trends":[],"symbols":[],"urls":[{"expanded_url":"https://www.animalfriendsquote.co.uk/fb-worldcup/","indices":[111,134],"display_url":"animalfriendsquote.co.uk/fb-worldcup/","url":"https://t.co/Y8g47fLYY1"}],"hashtags":[],"media":[{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":455},"large":{"w":706,"resize":"fit","h":946},"medium":{"w":600,"resize":"fit","h":803}},"id":488926730481332224,"media_url_https":"https://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","media_url":"http://pbs.twimg.com/media/BskERVuIcAAJZGu.jpg","expanded_url":"http://twitter.com/BulleyBufton/status/488926827394904064/photo/1","source_status_id_str":"488926827394904064","indices":[139,140],"source_status_id":488926827394904064,"id_str":"488926730481332224","type":"photo","display_url":"pic.twitter.com/DDco9wVXtP","url":"http://t.co/DDco9wVXtP"}],"user_mentions":[{"id":225136520,"name":"BULLEY","indices":[3,16],"screen_name":"BulleyBufton","id_str":"225136520"},{"id":132204038,"name":"Mina*Bad Yoga Kitty*","indices":[18,30],"screen_name":"MinaANDMaya","id_str":"132204038"},{"id":2308374684,"name":"Julianna Kaminski","indices":[93,106],"screen_name":"HilbraesDogs","id_str":"2308374684"}]},"source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android<\/a>","favorited":false,"in_reply_to_user_id":null,"retweet_count":0,"id_str":"488927960280211456","user":{"location":"","default_profile":false,"statuses_count":1370,"profile_background_tile":true,"lang":"zh-tw","profile_link_color":"038544","profile_banner_url":"https://pbs.twimg.com/profile_banners/2272804116/1404662156","id":2272804116,"following":null,"favourites_count":2000,"protected":false,"profile_text_color":"333333","verified":false,"description":"No More Sorrow","contributors_enabled":false,"profile_sidebar_border_color":"000000","name":"Winnie","profile_background_color":"14DBBA","created_at":"Thu Jan 02 10:13:01 +0000 2014","default_profile_image":false,"followers_count":311,"profile_image_url_https":"https://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg","geo_enabled":false,"profile_background_image_url":"http://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg","follow_request_sent":null,"url":null,"utc_offset":null,"time_zone":null,"notifications":null,"profile_use_background_image":true,"friends_count":455,"profile_sidebar_fill_color":"DDEEF6","screen_name":"winnie341881","id_str":"2272804116","profile_image_url":"http://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg","listed_count":0,"is_translator":false}}
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE myMap#'id' AS ID,myMap#'created_at' AS createdAT,myMap#'user' AS User;
DUMP B;
Output:
(488927960280211456,Tue Jul 15 06:08:04 +0000 2014,[location#,default_profile#false,profile_background_tile#true,statuses_count#1370,lang#zh-tw,profile_link_color#038544,profile_banner_url#https://pbs.twimg.com/profile_banners/2272804116/1404662156,id#2272804116,following#,protected#false,favourites_count#2000,profile_text_color#333333,contributors_enabled#false,description#No More Sorrow,verified#false,name#Winnie,profile_sidebar_border_color#000000,profile_background_color#14DBBA,created_at#Thu Jan 02 10:13:01 +0000 2014,default_profile_image#false,followers_count#311,geo_enabled#false,profile_image_url_https#https://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg,profile_background_image_url#http://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg,profile_background_image_url_https#https://pbs.twimg.com/profile_background_images/431815421189029888/YrRNpUfd.jpeg,follow_request_sent#,url#,utc_offset#,time_zone#,notifications#,friends_count#455,profile_use_background_image#true,profile_sidebar_fill_color#DDEEF6,screen_name#winnie341881,id_str#2272804116,profile_image_url#http://pbs.twimg.com/profile_images/478106512083017728/4ao_8JjE_normal.jpeg,is_translator#false,listed_count#0])
In elephantbird library all the values will be stored as key/value pair(ie MAP datatype), so it will be easy to extract the required fields from the loaded data.
In the above pigscript i have extracted the value of 'id','created_at' and 'user' as per your need.
Suppose you want to extract some fields from 'user' data( ex: 'friends_count' and 'followers_count'), in that case you need to project the 'user' field and extract the required data. sample code below.
PigScript:
REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';
REGISTER '/tmp/elephant-bird-pig-4.1.jar';
A = LOAD 'input.json ' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
B = FOREACH A GENERATE 'user' AS User;
C = FOREACH B GENERATE User#'friends_count', User#'followers_count';
DUMP C;
Output:
(455,311)

Get list of all values in a key:value list

So my input looks like
{"selling":"0","quantity":"2","price":"80000","date":"1401384212","rs_name":"overhault","contact":"PM","notes":""}
{"selling":"0","quantity":"100","price":"80000","date":"1401383271","rs_name":"sammvarnish","contact":"PM","notes":"Seers Bank W321 :)"}
{"selling":"0","quantity":"100","price":"70000","date":"1401383168","rs_name":"pwnoramaa","contact":"PM","notes":""}
and the output I want must look like
0,2,80000,1401384212,overhault,PM,""
0,100,80000,1401383271,sammvarnish,PM,"Seers Bank W321 :)"
0,100,70000,1401383168,pwnoramaa,PM,""
What's the best way to do this in bash?
EDIT: changed my needs.
The new output I want is, for
{"selling":"0","quantity":"2","price":"80000","date":"1401384212","rs_name":"overhault","contact":"PM","notes":"testnote"}
as input,
rs name: \t overhault
quantity: \t 2
price: \t 80000
date: \t 29-05 19:23
contact: \t PM
notes: \t testnote
Where \t is a tab character (like in echo "\t").
As you can see, this one is a tad bit more complicated.
For example, it changes the order, and requires the UNIX timestamp to be converted to an alternative format.
I'll use any tool you can offer me as long as you explain clearly how I can use it from a bash script. The input will consist of three of such lines, delimited by a newline character, and it must print the output with an empty line between each of the results.
Don't do this with regular expressions/bash, there are JSON parsers for this kind of task. Simple Python example:
import json
data = json.loads('{"selling":"0","quantity":"2"}')
data = ','.join(data.values())
print(data)
I strongly suggest you just use a simple script like this which you make executable and then call.
EDIT: here's a version which preserves the order:
import json
data = json.loads('{"selling":"0","quantity":"2", "price":"80000"}')
orderedkeys = ['selling', 'quantity', 'price']
values = [data[key] for key in orderedkeys]
values = ','.join(values)
print(values)
output:
0,2,80000