How to create RegEx with SubMatches of the same Match that capture 2 different types of output? - json

I'm trying to get my Jira data via JSON REST API into Excel, i.e. using VBA, and I'm parsing JSON output using RegEx. There are plenty of useful tutorials on the web, and after a couple of days I do have more or less working solution I'm happy with, except one minor obstacle. Long story short:
Among many issue fields I need friendly Assignee name, but some issues in my projects may be Unassigned, that obviously results in TWO VERY different kinds of JSON output:
Unassigned issue:
..."assignee":null,"updated"...
Assigned issue:
"assignee":{
"self":...
<Lots of NOT needed fields here>
...
},
"displayName":"Doe, John", <-- That's what I need, name only part
"active":...
<Lots of NOT needed fields here>
...
},
"updated"...
Well, I suppose that something like:
"assignee".*?"displayName":"(.*?)"|"assignee":(.*?),"updated"
will handle the job by producing TWO possible Matches, but... Is there a way to create RegEx where ANY of output options will result in SubMatches of ONE Match?
I'm a total newbie to RegEx, so sorry if the wording of my question is silly due to incorrectly used terms. Anyway, I hope the sample part is more or less clear, and I'll be extremely grateful for useful suggestions.

After an hour of tryouts on regex101 I ended up with the following RegEx:
"assignee":(null|.*?"displayName":"(.*?)","active")
Probably it's ugly and may be improved - but it DOES the job, and does NOT ruin in the process the indexes of subsequent Matches in collection, therefore keeping the rest of code working as it is now.

Related

Trying to build a regex for if then statement in vscode extension syntax

Long time lurker, first time poster.
I'm trying to write a vscode syntax for TeraTerm (ttl), of which has been quite the learning experience, and have hit a few walls. Biggest problem I have is writing a regex for ttl's if/then/endif. Normally I enjoy the satisfaction of figuring things out for myself, but every time I mess up on this I end up locking things up pretty hard. ttl's format for if then is:
if expression then
statement
elseif expression then
statement
endif
A single if/then is easy to work out in tmlanguage, but putting a regex together for nested if/thens has resulted in several ctrl-alt-deletes. I've gotten pretty good at lookaheads/behinds within a single line, however multiple lines have been a royal pain. This is kinda how I've been doing it:
"if-endif":{
"patterns": [
{
"begin":"(?im)(?<ifthen>(^\\s*if\\s+(\\w+|\\s+|=)+\\s+then$))(?=(g<ifthen>).*^\\s*endif$)*.*^\\s*endif$)",
"beginCaptures":{"1":{"name":"keyword.control.ifthen.ttl"}},
"end":"(?i)(?<=(ditto above lookahead)*)(^\\s*endif$))",
"endCaptures":{"1":{"name":"keyword.control.endif.ttl"}}
"patterns":[{"include":"#if-endif"},{"include":"#alltheotherthings"}]
}
]
}
Somehow I need to get the if/then and its matching endif while not accidently grabbing an endif that belongs do a different if. My biggest issue is I really have no idea how to deal with multiple lines in a regex apart from including (?m).
My other problem I'm not sure is even possible within the syntax. I'd like a way to see if a variable has already been defined in the above script so I can markup appropriately if I'm trying to use a string/integer in the wrong place, or if it hasn't been defined. However I'm not aware of any way to compare one capture group to another inside a regex. If anyone knows of a good example or documentation I'd be overjoyed. My primary docs I've been using for this are MS's vscode stuff and macromates.
Any pointers/help/links would be greatly appreciated. This is pretty much the last regex holding me up as everything else are single line affairs.

Composition in REST and consistence of the inserted data

How to properly design REST if you have a composition? I have a TestResult entity, which has TestCaseResults entities. Both support full set of REST methods. The important fact about this (which I believe differs from many examples I found on a web) is that TestResult is not consistent if it doesn't have all of TestCaseResults How do I properly design this in REST?
Let's say I create it as separate but dependent resources: api\testresults\ and api\testresults\1\testcaseresults. When the client wants to create a test result, he needs to POST to api\testresults, then retrieve URL api\testresults\1\testcaseresutls by a link from the response, and POST all of test case results to it. This means that at some point in time the test result is not consistent until the user finishes its operation. Basically, there is no concept of the transaction here.
Let's say I create only api\testresults resource, and embed an array of test case results inside, like this:
{
"Name": "Test A"
"Results": [
{
"Measured": "BB",
...
},
...
]
...
}
Then it is easier to insert, but it still hard to work with. Simple GET to api\testresults\1\ will retrieve test result with a big amount of test case results. GET to api\testresults\ will retrieve much more! The structure of this becomes complex. Furthermore, in the real word I have a few entities like TestCaseResults belong to TestResults, so there will be a few arrays, and each could have 100-200 elements.
I could try to combine the approaches. Embed the array, but also provide links to api\testresults\1\testcaseresults and support operations there as well. Maybe on GET api\testresults\1\ I could provide TestResult without it's TestCaseResults but only with a link pointing to a resource, but on POST I could accept an array of TestCaseResults embedded (not sure though it is allowed to have different return types for POST and GET in REST) But now there are two approaches for inserting information, it is confusing and I'm still not sure it solves anything.
your approach with api\testresults\1 and api\testresults\1\testcaseresults seems promising.
As JSON does not have a fixed structure, you can add query parameters to your URL to control if results are inserted or not.
api\testresults\1?with_results=true would mean that your caller want to see the test cases in addition to the test results.
api\testresults\1\testcaseresults would still return the test case results for your test 1.
If you fear that the number of test case results is too large, you can add pagination parameters, that would be reuse in the testcaseresults call.
api\testresults\1?with_results=true&per_page=10 would include the only the 10 first results. To get more, use api\testresults\1\testcaseresults?per_page=10&page=2 and so on, as it is the dedicated endpoint.
Cheers
Note: if you want a flexible API still returning JSON data, you can give a look to GraphQL, the trendy approach.

Passing a command line argument (as a string) into my Perl script

I'm extremely new at Perl and trying to prove I can pick it up quickly. What I was asked to do is add a string as an argument on my command line, and then feed that into my script. From there it is supposed to search a MySQL table I've made for matches in one column, and spit the contents of another column into an array. It was suggested I used the Getops:Std but I'm uncertain how exactly to do that, and if that's the best technique.
For example: I have a MySQL table with car manufacturers, and car models. I want to run, Perl myscript.pl Ford, and then have it shoot me back an array with
Mustang
Escape
Focus
But I'm uncertain how to get that string input in the first place. Would Getops:Std be best? If so how would it be written? I'm picking this up quickly, but I've been at it less than a week, so the simpler the explanation, the better.
Edit: Basically I was confused why it was suggested I should use GetOpts::Std for this. It seems to be completely inappropriate for what I'm trying to do.
GetOpts::Std is overkill for this. Your command line arguments are in #ARGV. If you haven't been able to work that out after a week, then you need better references for Perl.
The first argument will be in $ARGV[0], the second in $ARGV[1] , and so on.
You should check the DBI module. Google for some tutorial out there.
Then try to write your script and post more specific questions with some code if you need more help.

Extract all words from text field in mysql

I have a table that contains text fields. In those fields I store text. There are around 20 to 50 sentences in each field depending on the row. I am making an auto-complete HTML object with HTML and PHP, and I would like to start typing the beginning of a word and that the database return sentences containing those words (Like Microsoft office 2007/2010 navigation pane).
I need mysql to return those words or sentences as a separate result, so i can manipulate them further.
Example:
--------------------------------------------------------------------
| id | title |content |
--------------------------------------------------------------------
1 | test 1 | PHP is a very nice language and has nice features.
2 | test 2 | Spain is a nice country to visit and has nice language.
3 | test 3 | Perl isn\'t as nice a language as PHP.
I need mysql query to return following as different result:
1,"nice language"
1,"nice features"
2,"nice country"
2,"nice langugage"
3,"nicea a language"
Here is my sql query:
SELECT id, SUBSTR(content,POSITION('nice' IN content),50)
FROM entries
MATCH (title,entry) AGAINST ('nice' WITH QUERY EXPANSION)
New Answer
OP is actually asking nothing to do with php and javascript - his question concerns doing string manipulation directly within MySQL.
String manipulation isn't really the main focus of a DBMS. When dealing with "words" in a fluid text sense, there's a lot of logic required to determine where the next word boundary is, and you don't want your database doing this really. Plus, any queries written to do this will probably be incredibly difficult to read.
It depends exactly what you are doing, but it's quite likely that a DB only approach will be slower because there will be more function calls: SQL functions are pretty limited.
And for re-usability and best practice, what if you wanted to change your database in the future to say MongoDB? You'd need to re-write the whole damned awkward query.
No, my suggestion would be to pull the whole value using standard MySQL into PHP, throw it into PCRE, very simple regex, job done. It's better to show what you're actually doing in your PHP code as it's more "intention revealing".
At least 33% of a developer's work is picking the right tool for the job. PHP is the right tool in this example.
Original Answer
You have included the tags php and javascript, so I'm guessing (although your question needs more clarification on this) that you obviously want this 'autocomplete' running client-side. So as a result, you have to get your data from server-side to client-side first.
Twitter Bootstrap has something really cool called Typeahead. This uses JavaScript to perform (what I think) you require: the example on that page shows how you can type a country and it'll auto-complete it for you. It looks like this:
How do you get this working? Include the required JavaScript file first, and then write your HTML.
Here's some from the source code of the bootstrap page so you can see how it works:
<input type="text" data-provide="typeahead" data-items="4" data-source='["Alabama","Alaska","Arizona","Arkansas","California"]'>
Can you see how the data-source attribute is the one that gives the typeahead the information you want? You want to connect to MySQL, grab your data, and shove these into the data-source array for the JavaScript to work with, as above.
So, on your page load, you connect to MySQL and you pull all the relevant strings you would like to be "auto-complete-able" from the Database. You then put these as new Data attributes for the typeahead, and that's pretty much it!
--
Edit: There's a fork of twitter bootstrap's typeahead that allows AJAX calls, so you could use this to perform the data retrieval asynchronously (if you can figure it out, I'd recommend this approach).

Tesseract-Job: how to parse an image in order to get the information out of it

good moring.
first of all. This is the most impressive community i ever saw!
Well several days i mused about the three-folded job of
a. getting
b. parsing
c. storing a number of pages.
Two days ago i thought that getting the pages would be the major-task. No this isnt the case - i guess that the parser-job would be a heroic task. Each of the pages that are intended to be parsed is a png-image.
So the question is - after getting all them. How to parse them!? This seems to be the issue. Guess that there are some perl-modules out there - that can help in doing this...
Well - i think that this job only can be done with some OCR embedded! Question: is there a perl-module that can be use here to support this task:
BTW: see the result-pages.
BTW;: and as i thought i can find all 790 resultpages within a certain range between
Id= 0 and Id= 100000 i thought, that i can go the way with a loop:
http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=927&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=949&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=20011&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=10579&InterfaceLanguage=1&Type=Html
i thought i can go the Perl-Way but i am not very very sure:
I was trying to use LWP::UserAgent on the same URLs [see below]
with different query arguments, and i am wondering if LWP::UserAgent provides a
way for us to loop through the query arguments? I am not sure that LWP::UserAgent has a method for us to do that. Well - i sometimes heard that it is easier to use Mechanize. But is it really easier!?
But - to be frank; The first task " GETTING all the pages is not very difficult - if we compare this task with the parsing... How can this be done!?
Any ideas - suggestions -
look forward to hear from you...
zero
You do not need a Perl module, you only need the system function.
system qw[ tesseract.exe foo.png foo.txt ];
my $text = read_file('foo.txt');
You may need to preprocess the images to help Tesseract, say using ImageMagick like:
system qw[ convert.exe -resize 200% image.jpg foo.png ];