Regex to match numbers and periods within quotes - json

I will admit I am new to regex but i cant figure this one out.
I am trying to regex the "tag_name" contents to grab the numbers and peroids only from the github api so i can manage and track versioning though my python app. They appear like this when accessing the api / json;
"tag_name": "jenkins-docker-packaging-2.235.1",
When i use the regex;
\"tag_name\":\s\"(\S+(\S.*))\",
I would like to grab the numbers and peroids only, but it only matches the 1 at the end like this;
Screen capture of result
Which has me stumped because i thought the (\S.*) group would capture any character while being greedy.
Any help greatly appreciated.

Start with "tag_name" and get the [0-9.] after -
^"tag_name".*\-([0-9.]+)

Related

How to create RegEx with SubMatches of the same Match that capture 2 different types of output?

I'm trying to get my Jira data via JSON REST API into Excel, i.e. using VBA, and I'm parsing JSON output using RegEx. There are plenty of useful tutorials on the web, and after a couple of days I do have more or less working solution I'm happy with, except one minor obstacle. Long story short:
Among many issue fields I need friendly Assignee name, but some issues in my projects may be Unassigned, that obviously results in TWO VERY different kinds of JSON output:
Unassigned issue:
..."assignee":null,"updated"...
Assigned issue:
"assignee":{
"self":...
<Lots of NOT needed fields here>
...
},
"displayName":"Doe, John", <-- That's what I need, name only part
"active":...
<Lots of NOT needed fields here>
...
},
"updated"...
Well, I suppose that something like:
"assignee".*?"displayName":"(.*?)"|"assignee":(.*?),"updated"
will handle the job by producing TWO possible Matches, but... Is there a way to create RegEx where ANY of output options will result in SubMatches of ONE Match?
I'm a total newbie to RegEx, so sorry if the wording of my question is silly due to incorrectly used terms. Anyway, I hope the sample part is more or less clear, and I'll be extremely grateful for useful suggestions.
After an hour of tryouts on regex101 I ended up with the following RegEx:
"assignee":(null|.*?"displayName":"(.*?)","active")
Probably it's ugly and may be improved - but it DOES the job, and does NOT ruin in the process the indexes of subsequent Matches in collection, therefore keeping the rest of code working as it is now.

Is there a proper way to highlight # hashtags and # users in text?

I'm studying Rails and trying to write a Twitter-like app.
Is there an easy way to hightlight hashtags starting with # and usernames starting with #, and make URLs out of them?
I couldn't find a proper gem. Or should I make it my own?
For example: https://twitter.com/Xaput/status/383695262796873728
More details would be helpful.
you could catch them with regex.
Learn regular expressions. They're worth your time.
http://rubular.com/
http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UJ
You didn't tag this with HTML, but that sounds like a large aspect of what you're asking.
Use String#scan
You can use Ruby's String#scan method to find all matching expressions within a Tweet. For example:
str = 'Foo #bar! Baz #quux. #foobar1'
hashtags = str.scan /#[_\p{Alpha}][\p{Alnum}_]+/
#=> ["#bar", "#foobar1"]
users = str.scan /(?<=[\A\s\\.])#[_\p{Alnum}][^\p{Punct}\p{Blank}]{,14}/
#=> ["#quux"]
This will catch the majority of valid usernames and hashtags, but there may be edge cases where the expected results are ambiguous (i.e. non-mentions like \#foo, or weird-but-valid emails like foo.#example.com). In such cases, you will need to adapt the regular expression or perform some additional validation on the results. Your mileage may vary.
References
http://www.hashtags.org/platforms/twitter/what-characters-can-a-hashtag-include/
https://support.twitter.com/articles/101299-why-can-t-i-register-certain-usernames

Passing a command line argument (as a string) into my Perl script

I'm extremely new at Perl and trying to prove I can pick it up quickly. What I was asked to do is add a string as an argument on my command line, and then feed that into my script. From there it is supposed to search a MySQL table I've made for matches in one column, and spit the contents of another column into an array. It was suggested I used the Getops:Std but I'm uncertain how exactly to do that, and if that's the best technique.
For example: I have a MySQL table with car manufacturers, and car models. I want to run, Perl myscript.pl Ford, and then have it shoot me back an array with
Mustang
Escape
Focus
But I'm uncertain how to get that string input in the first place. Would Getops:Std be best? If so how would it be written? I'm picking this up quickly, but I've been at it less than a week, so the simpler the explanation, the better.
Edit: Basically I was confused why it was suggested I should use GetOpts::Std for this. It seems to be completely inappropriate for what I'm trying to do.
GetOpts::Std is overkill for this. Your command line arguments are in #ARGV. If you haven't been able to work that out after a week, then you need better references for Perl.
The first argument will be in $ARGV[0], the second in $ARGV[1] , and so on.
You should check the DBI module. Google for some tutorial out there.
Then try to write your script and post more specific questions with some code if you need more help.

Perl::Mechanize: running a simple crawler with a loop [multiple queries]

currently ironing out a way to parse the data of a page: http://www.foundationfinder.ch/
i love to do it in Perl: Well - i am just musing which is the best way to do the job.
Guess that i am in front of a nice learning curve. ;) This task will give me some nice Perl lessions. At the moment it goes abit over my head...;-)
So here is a sample-page:
... and as i thought i can find all 790 resultpages within a certain range between Id= 0 and Id= 100000 i thought, that i can go the way with a loop:
http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=927&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=949&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=20011&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=10579&InterfaceLanguage=1&Type=Html
i thought i can go the Perl-Way but i am not very very sure: I was trying to use LWP::UserAgent on the same URLs [see below] with different query arguments, and i am wondering if LWP::UserAgent provides a way for us to loop through the query arguments? I am not sure that LWP::UserAgent has a method for us to do that. Well - i sometimes heard that it is easier to use Mechanize. But is it really easier!?
BTW; But if i am going the PHP way i could do it with Curl - couldnt i!?
Here is my approach: I tried to figure it out. And i digged deeper in the Manpages and Howtos. We can have a loop constructing the URLs and use Curl - repeatedly
As noted above: here we have some resultpages;
http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=927&InterfaceLanguage=1&Type=Html
Alternatively we can add a request_prepare handler that computes and add the query
arguments before we send out the request.
Again: What is aimed: i want to parse the data and afterwards i want to store it in a local MySQL-database
should i define a extern_uid !?
and go like this:
for my $i (0..10000) {
$ua->get('http://www.foundationfinder.ch/ShowDetails.php?Id=', id => 21, extern_uid => $i);
# process reply
}
Well but now i get stuck- i need help - can i do the job like this!?
regards
zero
Dont do it like this. Use HTTP live headers (Firefox Plugin) or eqv. to see what the javasript does behind the scenes while you select what you need from here to get to that page (with the table).
To get the data from the table, use HTML::TableExtract or HTML::TreeBuilder::XPath if you want to use XPath
If you do want to iterate over the queries, just create another var:
my $a = 'http://www.foundationfinder.ch/ShowDetails.php?Id=' . $q . '&InterfaceLanguage=&Type=Html';
and increment $q as you go, make sure the page is valid before trying to load it with get

Parsing HTML content into a MySQL database using a parser

I want to be able to parse specific content from a website into a mySQL database. For example, on site http://allrecipes.com/Recipe/Fluffy-Pancakes-2/Detail.aspx I want to parse into my database (which has a table with columns RecipeName, Ingredients 1-10).
So basically my database will contain the name and all the ingredients for that recipe. There is no need to edit the content, simply parse them in as is (i.e. 3/4 cup milk) since i am using character in my database.
How exactly do I go about doing this? I was looking a pre-built parsers and it seems its tough to find one that's easy to use since I am fairly new to programming. Of course, I can manually enter values in but I want to parse them in.
Would it be possible to just parse this content and write a file that has a RecipieName, Ingredient string which I can then parse into my database? Or should I just do it directly into the database? I am unsure as to how to connect a database to a parser also directly, but I might be able to find some information online.
Basically, I am looking for help on how to exactly go about doing this since I am not very well versed in programming and this seems to be a lot more complicated than it might be.
I am using Java as my main language right now, although I can't say I am very good at it. But I should be able to understand the basic concepts.
Any suggestions on what parser to use or how to do this?
Thanks!
This is how I would do it in PHP. This is almost certainly NOT the most efficient way to do it, nor has it been debugged.
function parseHTML($rawHTML){
$startPosition = strpos($rawHTML,'<div class="ingredients"'); //Find the position of the beginning of the ingredients list, return the character number.
$endPosition = strpos($rawHTML,'</div>',$startPosition); //Find the position of the end of the ingredients list, begin searching from the beginning of the list (found in step 1)
$relevantPart = substr($rawHTML,$startPosition,$endPosition); //Isolate the ingredients list
$parsedString = strip_tags($relevantPart); //Strip the HTML tags off of the ingredients list
return $parsedString;
}
Still to be done: You say you have a mySQL database with 10 separate ingredients columns. This code outputs everything as one big string. You would have to change the strip_tags($relevantPart) function to strip_tags($relevantPart,"<li>"). That would let the <li> tags through. Then, you would have to loop through every <li> tag, performing a similar function to this. It shouldn't be too hard, but I don't feel comfortable writing it with no functioning PHP server.