Extract Json Data with screaming frog - json

I'm using Screaming Frog as a way to extract data from a Json generated from an URL.
The Json generated is this form :
{"ville":[{"codePostal":"13009","ville":"VAUFREGE","popin":"ouverturePopin","zoneLivraison":"1300913982","url":""},{"codePostal":"13009","ville":"LES BAUMETTES","popin":"ouverturePopin","zoneLivraison":"1300913989","url":""},{"codePostal":"13009","ville":"MARSEILLE 9EME ARRON","popin":"ouverturePopin","zoneLivraison":"1300913209","url":""}]}
I'm using this regex in Custom > Extraction in Screaming Frog as a way to extract the values of "codePostal".
"codePostal":".*?"
Problem is it doesn't extract anything.
When I test my regex in regex101, it seems correct.
Do you have any clue about what is wrong ?
Thanks.
Regards.

Have you tried to save the output to understand what ScreamingFrog sees? It doesn't matter - not at the beginning - whether your RegEx works.
That said, don't forget that SF is a Java based tool hence it is the engine used by the reg ex, so make sure you test your regular expressions with the correct dialect.

You need to specify group extractors enclosed in parentheses. For instance in your example, you need to have ("codePostal":".*?") as extractor.
In addition if you simply want to extract the value, you could use the following instead.
"codePostal":"(.*?)"

It's not a problem with your Regular Expression. It seems to be that the problem is with the Content Type. ScreamingFrog isn't properly reading application/JSON content types for scraping. Hopefully they will fix this bug.

Related

Are you able to subtract in JSON?

Here is my JSON code.
{
"user_email": "{User.Email}",
"activity_date": "{Lead.LastAction.Date}",
"record_id": "{Lead.Id}-{Lead.LastAction.Date}",
"action_type": "{Lead.LastAction}",
"milestone": "{Lead.Milestone}",
"date_added": "{Lead.Date}"
}
Is it possible to add calculations in the code?
For example, can I add a line where the date_added is subtracted from activity_date?
No: JSON is a way to transport JS Objects.
You can do that while you format the JSON in your native language ( for example in PHP or JS serverside), basically creating the JSON object with the result of the calculation.
In JSON just by itself you cannot do that, it's just a data format, it's totally passive, like a text file. (If you happen to use JSONP, then the story would be a bit different, it might be possible, but using JSONP to do such things would step into area of 'hack/exploit' and it probably should not be used in that way:) )
However, I see you are using not only JSON - there is some extra markup like {User.Email}. This is totally outside JSON spec so clearly you are using some form text-templating engine. These can be quite intelligent at times. Check that path, see which one you are using, see what are its features, maybe you can write a custom function or expression to do that subtraction for you. Maybe, just maybe, it's as easy as
"inactivity_period": "{Lead.LastAction.Date - Lead.Date}"
or
"inactivity_period": "{myFunctionThatIWrote(Lead.LastAction.Date, Lead.Date)}"
but that all depends on the templating engine.

Regular expression to extract a value from a given string

I need a regular expression to extract a value from a given key/value pair. It's not for a specific language. A working example in https://regex101.com/ would be great.
Here's what I get:
{"task_id":"12323232-323-23-321"}
and here's what I expect:
12323232-323-23-321
I know, it looks easy, but drives me crazy.
The perfect solution would be:
"return the value for task_id"
.
Thanks in advance
Adam
Don't know why would you want to use regex in this case since you're dealing with json. It's unlikely that the language you're using would not have a support / library for json, which would allow you to extract the task_id.
Getting back to regex you could try capturing a group.
:"(.*?)"

regex to extract html value

im trying to write small scraper script from google search, im write the program, bat have small problem i need regex for extract data-href value from google search, please help me :
exemple html code of google search :
data-href="www.buxmob.net/index.php?id=577">
data-href="www.webopedia.com/TERM/K/keyword.html">
data-href="moz.com/beginners-guide-to-seo/keyword-research">
need only the url present in this value, only this :
hxxp://www.webopedia.com/TERM/K/keyword.html
hxxp://moz.com/beginners-guide-to-seo/keyword-research
hxxp://www.buxmob.net/index.php?id=577
thanks you
All the examples you gave can be matched with
(?:data-href=")(.*?)(?:">)
See demo at http://regex101.com/r/rB4nS1
That does NOT mean it's a good idea to try to parse (general) html with regex - but sometimes, when the response is well formed and well known, you get away with it.
Note that you mentioned you wanted hxxp:// in front of the string - that is not the job of the regular expression, but belongs with the language you use to implement the expression. The above is a "non greedy match starting after the string data-href=" and ending at the next ">

What is the correct way to post a colon in a querystring parameter?

I am posting a form and one of the field values has a ":" and that is causing an issue
is there any correct way to be able to post this string;
http://www.mysite.com/MyController/MyAction?field1=Japan:Tokyo&field2=USA:NewYork
You can use percent encoded version of colon "%3A"
Lots of libraries will have a method to process this, for example in ASP.NET if you do a UrlEncode that should get changed to a %3A, when you need to use it just do a UrlDecode on the string.
http://www.mysite.com/MyController/MyAction?field1=Japan%3ATokyo&field2=USA%3ANewYork
If you are not using any library that has this type of functionality then you can easily build your own little parsing function that will replace common characters with their HTML character code equivalent.

Is there a Go Language equivalent to Perls' Dumper() method in Data::Dumper?

I've looked at the very similarly titled post (Is there a C equivalent to Perls' Dumper() method in Data::Dumper?), regarding a C equivalent to Data::Dumper::Dumper();. I have a similar question for the Go language.
I'm a Perl Zealot by trade, and am a progamming hobbyist, and make use of Data::Dumper and similar offspring literally hundreds of times a day. I've taken up learning Go, because it looks like a fun and interesting language, something that will get me out of the Perl rut I'm in, while opening my eyes to new ways of doing stuffz... One of the things I really want is something like:
fmt.Println(dump.Dumper(decoded_json))
to see the resulting data structure, like Data::Dumper would turn the JSON into an Array of Hashes. Seeing this in Go, will help me to understand how to construct and work with the data. Something like this would be considered a major lightbulb moment in my learning of Go.
Contrary to the statements made in the C counterpart post, I believe we can write this, and since I'll be passing Dumper to Println, after compilation what ever JSON string or XML page I pass in and decode. I should be able to see the result of the decoding, in a Dumper like state... So, does any more know of anything like this that exists? or maybe some pointers to getting something like this done?
Hi and welcome to go I'm former perl hacker myself.
As to your question the encoding/json package is probably the closest you will find to a go data pretty printer. I'm not sure you really need it though. One of the reasons Data::Dumper was awesome in perl is because many times you really didn't know the structure of the data you were consuming without visually inspecting it. With go though everything is a specific type and every specific type has a specific structure. If you want to know what the data will look like then you probably just need to look at it's definition.
Some other tools you should look at include:
fmt.Println("%#v", data) will print the data in go-syntax form.
fmt.Println("%T", data) will print the data's type in go-syntax
form.
More fmt format string options are documented here: http://golang.org/pkg/fmt/
I found a couple packages to help visualize data in Go.
My personal favourite - https://github.com/davecgh/go-spew
There's also - https://github.com/tonnerre/golang-pretty
I'm not familiar with Perl and Dumper, but from what I understand of your post and the related C post (and the very name of the function!), it outputs the content of the data structure.
You can do this using the %v verb of the fmt package. I assume your JSON data is decoded into a struct or a map. Using fmt.Printf("%v", json_obj) will output the values, while %+v will add field names (for a struct - no difference if its a map, %v will output both keys and values), and %#v will output type information too.