grepl to search strings containing more than 2 paterns than includes parenthesis in R - grepl

Good afternoon,
I want to search for a string that contains two times the pattern "1)2"
I tried:
grepl('(1.+\).+2){2}', variable, fixed=TRUE)
Thank you very much in advance¡¡¡¡

How about this:
> grepl(".*1\\)2.*1\\)2.*", c("bla1)2bla", "bla1)2blabla1)2bla", "blablabla"))
[1] FALSE TRUE FALSE

Related

Extract string from csv file after reading in Prolog

Good evening,
I am trying to read a csv file in Prolog containing all the countries in the world. Executing this code:
read_KB(R) :- csv_read_file("countries.csv",R).
I get a list of Terms of this type:
R = [row('Afghanistan;'), row('Albania;'), row('Algeria;'), row('Andorra;'), row('Angola;'), row('Antigua and Barbuda;'), row('Argentina;'), row('Armenia;'), row(...)|...].
I would like to extract only the names of each country in form of a String and put all of them into a list of Strings.
I tried this way with only the first row executing this:
read_KB(L) :- csv_read_file("/Users/dylan/Desktop/country.csv",R),
give(R,L).
give([X|T],X).
I obtain only a Term of type row('Afghanistan;')
You can use maplist/3:
read_KB(Names) :-
csv_read_file('countries.csv', Rows, [separator(0';)]),
maplist([row(Name,_), Name] >> true, Rows, Names).
The answer given by #slago can be simplified, using arg/3 instead of a lambda expression, making it slightly more efficient:
read_KB(Names) :-
csv_read_file('countries.csv', Rows, [separator(0';)]),
maplist(arg(1), Rows, Names).

How can i verify that input string contains numbers(0-9) and multiple .(special char) in TCL

I got three different entries "10576.53012.46344.35174" , "10" and "Doc-15" in foreach loop. Out of these 3 entries, i want 10576.53012.46344.35174. How can i verify that current string contains multiple . and numbers.
Im new to TCL, Need suggestion
This is the sort of task that is a pretty good fit for regular expressions.
The string 10576.53012.46344.35174 is matched by a RE like this: ^\d{5}(?:\.\d{5}){3}$ though you might want something a little less strict (e.g., with more flexibility in the number of digits per group — 5 — or the number of groups following a . — 3).
You test if a string matches a regular expression with the regexp command:
if {[regexp {^\d{5}(?:\.\d{5}){3}$} $theVarWithTheString]} {
puts "the regular expression matched $theVarWithTheString"
}
An alternative approach is to split the string by . and check that each group is what you want:
set goodBits 0
set badString 0
foreach group [split $theVarWithTheString "."] {
if {![string is integer -strict $group]} {
set badString 1
break
}
incr goodBits
}
if {!$badString && $goodBits == 4} {
puts "the string was OK"
}
I greatly prefer the regular expression approach myself (with occasional help from string is as appropriate). Writing non-RE validators can be challenging and tends to require a lot of code.

Select different value from different tag for xpath

I am trying to get xpath for this element:
I am expecting the output should be:
Brands
Lazada
in one query. not in multiple query.
The best I found so far is:
//li[text()='Brands'] and //em[text()='Lazada']
but this return
1
Expressions like "node and node" should return boolean (true in case both nodes found and false if at least one not found). Your tool might return 1 (true) or 0 (false)
To get required text you can try
normalize-space(//ul[#id="ui-id-1"])
or
concat(//ul[#id="ui-id-1"]/li[1]/text(), " ", //ul[#id="ui-id-1"]//em/text())

How to parse a string in MySQL

So the problem is I have a column that contains a snapshot:
<p>
<t8>xx</t8>
<s7>321</s7>
<s1>6</s1>
<s2>27</s2>
<s4>73</s4>
<t1>noemail#noemail.com</t1>
<t2>xxxxx</t2>
<t3>xxxxxx</t3>
<t11>xxxxxxxx</t11>
<t6>xxxxxxxx</t6>
<t7>12345</t7>
<t9>1234567890</t9>
</p>
I need to parse this string in MySQL so that I can count the number of times that noemail.com occurs. I am not familiar with parsing so if you could please explain the best you can.
You can do it by removing searched substring and comparing the lengths. For example :
set #str = '<p>
<t8>xx</t8>
<s7>321</s7>
<s1>6</s1>
<s2>27</s2>
<s4>73</s4>
<t1>noemail#noemail.com</t1>
<t2>xxxxx</t2>
<t3>xxxxxx</t3>
<t11>xxxxxxxx</t11>
<t6>xxxxxxxx</t6>
<t7>12345</t7>
<t9>1234567890</t9>
</p>';
set #find = 'noemail#noemail.com';
select (length(#str) - length(replace(#str, #find, '')))/length(#find) AS NumberOfTimesEmailAppears;
I think there is sadly no more elegant solution (note that a database system is not designed to parse strings : this is most the job of a scripting language)

how to delete the \n\t\t\t in the result from website data collection?

i want to retrieve the names of product from the website, so i write my code below. but the result includes some trivial info such as \n\t\t\t. Can someone help me how to delete these stuff?
code:
retrieve name
reddoturl <- 'http://red-dot.de/pd/online-exhibition/?lang=en&c=163&a=0&y=2013&i=0&oes='
library(XML)
doc <- htmlParse(reddoturl)
review data
reviews<-xpathSApply(doc,'//div[#class="work_contaienterner_headline"]',xmlValue)
results:
[1] "VZ-C6 / VZ-C3D\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tDocument Camera\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
I worry a bit about removing all tabs but this would do it:
> reviews <- "VZ-C6 / VZ-C3D\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tDocument Camera\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
> reviews <- gsub( "\\\t", "", reviews)
> reviews
[1] "VZ-C6 / VZ-C3D\n\nDocument Camera\n\n"
Read ?regex and understand that there are extra backslashes needed because both R and regex use "\" as escapes and so there are two levels of character parsing on the way to a pattern. That's not the case in the replacement argument though so you don't need to used doubled escapes there. So if you then wanted to replace those "\n\n"'s with just one "\n" you could use:
> reviews <- gsub( "\\\n\\\n", "\n", reviews)
> reviews
[1] "VZ-C6 / VZ-C3D\nDocument Camera\n"
The go-to function for "find and replace" operations on strings in R are sub (to replace just the first instance) and gsub (to replace all instances). These functions seek a pattern in the string represented by a regular expression, and replace it by a fixed string of text.
For example:
s <- "VZ-C6 / VZ-C3D\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tDocument Camera\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
gsub('\t|\n', '', s)
[1] "VZ-C6 / VZ-C3DDocument Camera"
The pipe operator (|) in the the pattern above, \t|\n, ensures that either \n or \t are matched, and the second argument of '' says to replace matches with an empty string (i.e. nothing).
While s above contains just a single element, gsub and sub are vectorised and so will also work on an entire vector of arbitrary length.