Carriage return in String Tokenizer in Java - carriage-return

If I only specify carriage return (\r) in the String Tokenizer like this:
StringTokenizer st1 = new StringTokenizer(line,"\r");
where 'line' is the input string.
When I provide the following text as input:
Hello
Bello
Cello
ie. with two carriage return. (I press 'Enter'after Hello and Bello.)
But the output of this is 3 in System.out.println(st1.countTokens());
Is there an explanation?

When you split a string using a separator, then, provided that your separator occurs n times, the number of elements after the split will be n+1. Look at this visual example, using comma as separator:
text1,text2,text3,text4
It will yield 4 results
Look at another example:
text1,text2,text3,
It will yield 4 results as well, the last being an empty string.

Related

How to parse invalid JSON contianing invalid number

I work with a legacy customer who sends me webhook events. Sometimes their system sends me a value that looks like this
[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]
I am using python's json.loads to parse the data sent to me. Here the ts is an invalid number and python gives json.decoder.JSONDecodeError whenever I try to parse this string.
It is okay with me to get None in ts field if I can not parse it.
What would be a smart (& possibly generic) way to solve this problem?
This may not be so generic, but you can try using yaml to load:
import yaml
s = '[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074","ts":16XX445656000}]'
yaml.safe_load(s)
Output:
[{'id': 'LXKhRA3RHtaVBhnczVRJLdr',
'ecc': '0X6',
'cph': 'X1X4X77074',
'ts': '16XX445656000'}]
If the problem is always in the ts key, and this value is always a string of numbers and letters, you could just remove it before trying to parse:
import re
jstr = """[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]"""
jstr_sanitized = re.sub(r',?\s*\"ts\":[A-Z0-9]+', "", jstr)
jobj = json.loads(jstr_sanitized)
# [{'id': 'LXKhRA3RHtaVBhnczVRJLdr', 'ecc': '0X6', 'cph': 'X1X4X77074'}]
Regex explanation (try online):
,?\s*\"ts\":[A-Z0-9]+
,? Zero or one commas
\s* Any number of whitespace characters
\"ts\": Literally "ts":
[A-Z0-9]+ One or more uppercase letters or numbers
Alternatively, you could catch the JSONDecodeError and look at its pos attribute for the offending character. Then, you could either remove just that character and try again, or look for the next space, comma, or bracket and remove characters until that point before you try again.
jstr = """[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]"""
while True:
try:
jobj = json.loads(jstr)
break
except json.JSONDecodeError as ex:
jstr = jstr[:ex.pos] + jstr[ex.pos+1:]
This mangles the output so that the ts key is now a valid integer (after removing the Xs) but since you don't care about that anyway, it should be fine:
[{'id': 'LXKhRA3RHtaVBhnczVRJLdr',
'ecc': '0X6',
'cph': 'X1X4X77074',
'ts': 16445656000}]
Since you'd end up repeatedly re-parsing the initial valid part, this is probably not a great idea if you have a huge json string, or there are lots of places that could throw an error, but it should be fine for the kind of example you have shown.

JSON- Regex to identify a pattern in JSON

I'm new to Python3 and I am working with large JSON objects. I have a large JSON object which has extra chars coming in between two JSON objects, in between the braces.
For example:
{"id":"121324343", "name":"foobar"}3$£_$£rvcfddkgga£($(>..bu&^783 { "id":"343554353", "name":"ABCXYZ"}'
These extra chars could be anything alphanumeric, special chars or ASCII. They appear in this large JSON multiple times and can be of any length. I'm trying to use regex to identify that pattern to remove them, but regex doesn't seem to work. Here is the regex I used:
(^}\n[a-zA-Z0-9]+{$)
Is there a way of identifying such patter using regex in python?
You can select the dictionary data based on named capture groups. As a bonus, this will also ignore any { or } within the extra chars.
The following pattern works on the provided data:
"\"id\"\:\"(?P<id>\d+?)\"[,\s]+\"name\"\:\"(?P<name>[ \w]+)\""
Example
import re
from pprint import pprint
string = \
"""
{"id":"121324343", "name":"foobar"}3$£_$£rvcfdd{}kgga£($(>..bu&^783 { "id":"343554353", "name":"ABC XYZ"}'
"""
pattern = re.compile(pattern="\"id\"\:\"(?P<id>\d+?)\"[,\s]+\"name\"\:\"(?P<name>[ \w]+)\"")
pprint([match.groupdict() for match in pattern.finditer(string=string)])
Output
[{'id': '121324343', 'name': 'foobar'}, {'id': '343554353', 'name': 'ABC XYZ'}]
Test it out yourself: https://regex101.com/r/82BqbE/1
Notes
For this example I assume the following:
id only contains integer digits.
name is a string that can contain the following characters [a-zA-Z0-9_ ]. (this includes white spaces and underscores).
Assuming the whole json is a single line, and there are no }{ inside the fields themselves, this should be enough
In [1]: import re
In [2]: x = """{"id":"121324343", "name":"foobar"}3$£_$£rvcfddkgga£($(>..bu&^783 { "id":"343554353", "name":"ABCXYZ"}"""
In [3]: print(re.sub(r'(?<=})[^}{]+(?={)', "\n", x))
{"id":"121324343", "name":"foobar"}
{ "id":"343554353", "name":"ABCXYZ"}
You can check the regex here https://regex101.com/r/leIoqE/1

How do I search for a string in this JSON with Python

My JSON file looks something like:
{
"generator": {
"name": "Xfer Records Serum",
....
},
"generator": {
"name: "Lennar Digital Sylenth1",
....
}
}
I ask the user for search term and the input is searched for in the name key only. All matching results are returned. It means if I input 's' only then also both the above ones would be returned. Also please explain me how to return all the object names which are generators. The more simple method the better it will be for me. I use json library. However if another library is required not a problem.
Before switching to JSON I tried XML but it did not work.
If your goal is just to search all name properties, this will do the trick:
import re
def search_names(term, lines):
name_search = re.compile('\s*"name"\s*:\s*"(.*' + term + '.*)",?$', re.I)
return [x.group(1) for x in [name_search.search(y) for y in lines] if x]
with open('path/to/your.json') as f:
lines = f.readlines()
print(search_names('s', lines))
which would return both names you listed in your example.
The way the search_names() function works is it builds a regular expression that will match any line starting with "name": " (with varying amount of whitespace) followed by your search term with any other characters around it then terminated with " followed by an optional , and the end of string. Then applies that to each line from the file. Finally it filters out any non-matching lines and returns the value of the name property (the capture group contents) for each match.

Get list of all values in a key:value list

So my input looks like
{"selling":"0","quantity":"2","price":"80000","date":"1401384212","rs_name":"overhault","contact":"PM","notes":""}
{"selling":"0","quantity":"100","price":"80000","date":"1401383271","rs_name":"sammvarnish","contact":"PM","notes":"Seers Bank W321 :)"}
{"selling":"0","quantity":"100","price":"70000","date":"1401383168","rs_name":"pwnoramaa","contact":"PM","notes":""}
and the output I want must look like
0,2,80000,1401384212,overhault,PM,""
0,100,80000,1401383271,sammvarnish,PM,"Seers Bank W321 :)"
0,100,70000,1401383168,pwnoramaa,PM,""
What's the best way to do this in bash?
EDIT: changed my needs.
The new output I want is, for
{"selling":"0","quantity":"2","price":"80000","date":"1401384212","rs_name":"overhault","contact":"PM","notes":"testnote"}
as input,
rs name: \t overhault
quantity: \t 2
price: \t 80000
date: \t 29-05 19:23
contact: \t PM
notes: \t testnote
Where \t is a tab character (like in echo "\t").
As you can see, this one is a tad bit more complicated.
For example, it changes the order, and requires the UNIX timestamp to be converted to an alternative format.
I'll use any tool you can offer me as long as you explain clearly how I can use it from a bash script. The input will consist of three of such lines, delimited by a newline character, and it must print the output with an empty line between each of the results.
Don't do this with regular expressions/bash, there are JSON parsers for this kind of task. Simple Python example:
import json
data = json.loads('{"selling":"0","quantity":"2"}')
data = ','.join(data.values())
print(data)
I strongly suggest you just use a simple script like this which you make executable and then call.
EDIT: here's a version which preserves the order:
import json
data = json.loads('{"selling":"0","quantity":"2", "price":"80000"}')
orderedkeys = ['selling', 'quantity', 'price']
values = [data[key] for key in orderedkeys]
values = ','.join(values)
print(values)
output:
0,2,80000

Formatting a Text in SSRS

I have a text of the form
1;#aa2;#dde4;#sdfsa6;#hjjs
I want to remove digit and ;# from the above string and keep the string as
aa
dde
sdfsa
hjjs
Is there a way like we do in C# to check if string contains <digit>;# and replace it with a or a blank space.
I was trying to split on ;# as
=(Split(Fields!ows_Room.Value,";#")).GetValue(1)
but than the output is only aa2.
You are getting aa2 only because GetValue(1) retruns the first indexed array value.
Change you expression to
= Join(Split(Fields!ows_Room.Value,";#"),” “)
If you want the output like
aa2
dde4
sdfsa6
hjjs
use this expression
= Join(Split(Fields!ows_Room.Value,";#"),VBCRLF)
Give a try to following expression.
=Join(Split((System.Text.RegularExpressions.Regex.Replace(Fields!ows_Room.Value, "[0-9]", "").Trim(";").Trim("#")),";#"),” “)