I have a text of the form
1;#aa2;#dde4;#sdfsa6;#hjjs
I want to remove digit and ;# from the above string and keep the string as
aa
dde
sdfsa
hjjs
Is there a way like we do in C# to check if string contains <digit>;# and replace it with a or a blank space.
I was trying to split on ;# as
=(Split(Fields!ows_Room.Value,";#")).GetValue(1)
but than the output is only aa2.
You are getting aa2 only because GetValue(1) retruns the first indexed array value.
Change you expression to
= Join(Split(Fields!ows_Room.Value,";#"),” “)
If you want the output like
aa2
dde4
sdfsa6
hjjs
use this expression
= Join(Split(Fields!ows_Room.Value,";#"),VBCRLF)
Give a try to following expression.
=Join(Split((System.Text.RegularExpressions.Regex.Replace(Fields!ows_Room.Value, "[0-9]", "").Trim(";").Trim("#")),";#"),” “)
Related
I want to check if an array of strings occur in a dataset and print those rows where the string array elements occur.
rareTitles = {"Capt", "Col", "Countess", "Don", "Dr", "Jonkheer", "Lady",
"Major", "Mlle", "Mme", "Ms", "Rev", "Sir"}
dataset[rareTitles in (dataset['Title'])]
I am getting following error:
TypeError: unhashable type: 'set'
First of all, I think the comparison should go the other way around - you look for a dataset['Title'], that contains string from rareTitles.
You can use str attribute of a pandas DataSeries, which allows as to use string methods, like contains. As this method accepts also a pattern as a regular expression, you can put as an argument something like 'Capt|Col...'. To join all elements of a set you can use str.join() method.
So the solution would be
dataset[dataset['Title'].str.contains('|'.join(rareTitles))]
Link to documentation: pandas.Series.str.contains
I have a field with varying strings of concatenated text that I need to delimit. I need the phrase and the count of how many times that phrase appeared into two separate fields and then repeating the same process for every additional phrase.
Example of table field text:
"some text":2; some:other NEAR text:1;
Desired Results:
[Field 1]: "Some Text", [Field 2]: 2, [Field 3]: some:other NEAR text, [Field 4] 1
The problem I am having is that when I use ":" and ";" to delimit the field using Len, Instr, InstrRev, Left, Right and Mid functions it is delimiting the "some:other NEAR text" string into "some" and "other NEAR text". Is there a way around this or should I go about this in another way? Any help is appreciated.
Is this a one-time fix of bad data to parse into discrete fields? Should show your attempted code.
Assuming every record has value in the example structure, try (x represents your concatenated data field):
Field1: Left(x, InStr(x, ":")-1)
Field2: Val(Mid(Left(x, InStr(x, ";")),InStrRev(Left(x, InStr(x, ";")),":")+1))
Field3: Mid(x, InStr(x, ";")+2, Len(Mid(x, InStr(x, ";")+2))-Len(Mid(x,InStrRev(x,":"))))
Field4: Val(Mid(x,InStrRev(x,":")+1))
Otherwise, might have to build a custom VBA function.
I've tried to thoroughly research this question before asking it. I'm trying to plot the ratio of two lists that are contained in a dictionary.
line_ids = ['blah1','blah2','blah3','blah4']
elines = {}
for i in range(0,len(line_ids)):
data = []
with open('../output/'+line_ids[i]+'.csv', 'rb') as f:
csvReader = csv.reader(f, delimiter='\t')
for row in csvReader:
data.append(row)
elines[line_ids[i]] = asarray(data)
printing elines['blah1'] from the dictionary gives
[['4.6976281459143071e-40' '3.0049306872382702e-39'
'1.9820026838968144e-38' '1.6041105541709449e-37']
['1.542746402089586e-35' '9.8686046391594954e-35' '6.5092653777796069e-34'
'5.2672534967984846e-33']
['5.1441760072407447e-31' '3.2907875381847918e-30'
'2.1708144830971927e-29' '1.7560195950953601e-28']
['1.7569718535756951e-26' '1.1242245080095899e-25'
'7.4206530085692796e-25' '6.0042313952458629e-24']
['6.2845797115752487e-22' '4.0257542124265526e-21' '2.66528748586604e-20'
'2.1666107897620966e-19']
['2.5547831152324016e-17' '1.6442300355147718e-16'
'1.1022166700730511e-15' '9.1504695119123154e-15']
['1.5754213462395474e-12' '1.0263716591948211e-11'
'7.0896658599931989e-11' '6.1192748118049791e-10']
['2.1154710788925884e-07' '1.3897595085341154e-06'
'9.7645963829243462e-06' '8.3998195937762357e-05']
['0.048187475948250416' '0.31578185949368143' '2.1989098794898618'
'18.120232380010545']
['13029.442003642062' '84972.769876238017' '583770.26053237868'
'4613639.5426874915']
['3726334731.7746887' '24202150828.792419' '164441556532.18036'
'1258809063091.2998']
['1095752351035507.6' '7094645944427608.0' '47806778370222816.0'
'3.5753379508453267e+17']
['3.2816291840091796e+20' '2.1198307401280088e+21'
'1.4197379061068677e+22' '1.0439706837407766e+23']
['9.9600859087036886e+25' '6.4228680979461599e+26'
'4.2823746950774039e+27' '3.1104137359015335e+28']
['3.0534668934520022e+31' '1.9665558653862894e+32'
'1.3068413234720059e+33' '9.406936707924414e+33']
['9.4324018968341618e+36' '6.0691075771818466e+37'
'4.0232499374256741e+38' '2.8769493880535716e+39']]
When I try to divide two lists, I get the following when running the script
print divide(elines['blah1'][0],elines['blah2'][0])
NotImplemented
I thought it might have to do with the numbers being treated as strings within the list, so I tried converting them with the float() function but I get an saying only length-1 arrays can be converted to Python scalars. Ideally, I'd like to plot Column 1 of blah1 vs. Column 1 of blah 2, Column 2 of blah1 vs. Column 2 of blah2, etc.
Any help would be greatly appreciated. Thanks!
I want to import many informations from a CSV file to Elastic Search.
My issue is I don't how can I use a equivalent of substring to select information into a CSV column.
In my case I have a field date (YYYYMMDD) and I want to have (YYYY-MM-DD).
I use filter, mutate, gsub like:
filter
{
mutate
{
gsub => ["date", "[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789]", "[0123456789][0123456789][0123456789][0123456789]-[0123456789][0123456789]-[0123456789][0123456789]"]
}
}
But my result is false.
I can indentified my string but I don't how can I extract part of this.
My target it's to have something like:
gsub => ["date", "[0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789][0123456789]","%{date}(0..3}-%{date}(4..5)-%{date}"(6..7)]
%{date}(0..3} : select from the first to the 4 characters of csv columns date
You can use ruby plugin to do conversion. As you say, you will have a date field. So, we can use it directly in ruby
filter {
ruby {
code => "
date = Time.strptime(event['date'],'%Y%m%d')
event['date_new'] = date.strftime('%Y-%m-%d')
"
}
}
The date_new field is the format you want.
First, you can use a regexp range to match a sequence, so rather than [0123456789], you can do [0-9]. If you know there will be 4 numbers, you can do [0-9]{4}.
Second, you want to "capture" parts of your input string and reorder them in the output. For that, you need capture groups:
([0-9]{4})([0-9]{2})([0-9]{2})
where parens define the groups. Then you can reference those on the right side of your gsub:
\1-\2-\3
\1 is the first capture group, etc.
You might also consider getting these three fields when you do the grok{}, and then putting them together again later (perhaps with add_field).
If I only specify carriage return (\r) in the String Tokenizer like this:
StringTokenizer st1 = new StringTokenizer(line,"\r");
where 'line' is the input string.
When I provide the following text as input:
Hello
Bello
Cello
ie. with two carriage return. (I press 'Enter'after Hello and Bello.)
But the output of this is 3 in System.out.println(st1.countTokens());
Is there an explanation?
When you split a string using a separator, then, provided that your separator occurs n times, the number of elements after the split will be n+1. Look at this visual example, using comma as separator:
text1,text2,text3,text4
It will yield 4 results
Look at another example:
text1,text2,text3,
It will yield 4 results as well, the last being an empty string.