im kinda new to programming but i found that python didnt have the speed i needed so i switched to go, im building a scraper and i need to convert a what looks like to be a ASCII formated string to json but i cant find any good documentation on how to do this in go.
the string i need converted looks something like this: debug%22%3Afalse%2C%22pageOpts%22%3A%7B%22noBidIfUnsold%22%3Atrue%2C%22keywords%22%3A%7B%22no-sno-finn-object_type%22%3A%22private%22%2C%22no-sno-finn-car_make%22%3A%22796%22%2C%22aa-sch-publisher%22%3A%22finn%22%2C%22aa-sch-inventory_type%22%3A%22classified%22%2C%22aa-sch-country_code%22%3A%22no%22%2C%22no-sno-finn-section%22%3A%22car%22%2C%22no-sno-finn-ad_owner%22%3A%22false%22%2C%22no-sno-publishergroup%22%3A%22schibsted%22%2C%22aa-sch-supply_type%22%3A%22web_desktop%22%2C%22no-sno-finn-subsection%22%3A%22car_used%22%2C%22aa-sch-page_type%22%3A%22object%22%7D
Thanks in advance!
As mentioned by a commenter, your string is URL encoded and can be decoded using url.QueryUnescape(...):
package main
import (
"fmt"
"net/url"
)
func main() {
querystr := "debug%22%3Afalse%2C%22pageOpts%22%3A%7B%22noBidIfUnsold%22%3Atrue%2C%22keywords%22%3A%7B%22no-sno-finn-object_type%22%3A%22private%22%2C%22no-sno-finn-car_make%22%3A%22796%22%2C%22aa-sch-publisher%22%3A%22finn%22%2C%22aa-sch-inventory_type%22%3A%22classified%22%2C%22aa-sch-country_code%22%3A%22no%22%2C%22no-sno-finn-section%22%3A%22car%22%2C%22no-sno-finn-ad_owner%22%3A%22false%22%2C%22no-sno-publishergroup%22%3A%22schibsted%22%2C%22aa-sch-supply_type%22%3A%22web_desktop%22%2C%22no-sno-finn-subsection%22%3A%22car_used%22%2C%22aa-sch-page_type%22%3A%22object%22%7D"
// Parse the URL encoded string.
plainstr, err := url.QueryUnescape(querystr)
if err != nil {
panic(err)
}
fmt.Println(plainstr)
// debug":false,"pageOpts":{"noBidIfUnsold":true,"keywords":{"no-sno-finn-object_type":"private","no-sno-finn-car_make":"796","aa-sch-publisher":"finn","aa-sch-inventory_type":"classified","aa-sch-country_code":"no","no-sno-finn-section":"car","no-sno-finn-ad_owner":"false","no-sno-publishergroup":"schibsted","aa-sch-supply_type":"web_desktop","no-sno-finn-subsection":"car_used","aa-sch-page_type":"object"}
}
Your example string appears to be incomplete but eventually it can be decoded into a struct or map using json.Unmarshal(...).
Related
In powershell, if I make a REST call and receive any kind of json response, I can easily $json | ConvertFrom-Json into a proper object so I can make modifications, render specific values, whatever.
It seems like in Go I have to either define a struct, or "dynamically" convert using a map[string]interface{}.
The issue with a struct is that I am writing a rest handler for a platform that, depending on the endpoint, serves wildly different JSON responses, like most REST APIs. I don't want to define a struct for all of the dozens of possible responses.
The problem with map[string]interface{} is that it pollutes the data by generating a string with a bunch of ridiculous 'map' prefixes and unwanted [brackets].
ala: [map[current_user_role:admin id:1]]
Is there a way to convert a JSON response like:
{
"current_user_role": "admin",
"id": 1
}
To return a basic:
current_user_role: admin
id: 1
... WITHOUT defining a struct?
Your approach of using a map is right if you don't wish to specify the structure of the data you're receiving. You don't like how it is output from fmt.Println, but I suspect you're confusing the output format with the data representation. Printing them out in the format you find acceptable takes a couple of lines of code, and is not as convenient as in python or powershell, which you may find annoying.
Here's a working example (playground link):
package main
import (
"encoding/json"
"fmt"
"log"
)
var data = []byte(`{
"current_user_role": "admin",
"id": 1
}`)
func main() {
var a map[string]interface{}
if err := json.Unmarshal(data, &a); err != nil {
log.Fatal(err)
}
for k, v := range a {
fmt.Printf("%s: %v\n", k, v)
}
}
I have an issue printing the JSON from the POST.
I use gorilla/mux for routing
r := mux.NewRouter()
r.HandleFunc("/test", Point).Methods("POST")
http.ListenAndServe(":80", r)`
and in Point function I have
func Point(w http.ResponseWriter, r *http.Request) {
var callback Decoder
json.NewDecoder(r.Body).Decode(&callback)
}
But I can use this method only when I know the structure and I want to figure out how to log.Print the whole JSON as a string. I tried
func Point(w http.ResponseWriter, r *http.Request) {
r.ParseForm()
log.Println(r.Form)
}
But it prints an empty map. Please help to figure out this.
In the assumption that you are building a standard API endpoint which receives some JSON and you would like to do something with it you should approach it the following way.
Edit:
As mentioned in the comments when you use the ioutil.ReadAll()
function as in this example it will read everything that is sent in the
post request into the application memory. It's a good idea to check
this in a production application (e.g limiting payload size).
1.) Make a struct which will hold the data coming from an API post request in GoLang
2.) Convert your request body into a byte array []byte
3.) Unmarshal your []byte into a single instance of your struct made earlier.
I'll put an example below:
1. Make a struct which you'll put your JSON into.
Let's take an example of a simple blog post.
The JSON object looks like this and has a slug, a title, and description
{
"slug": "test-slug",
"title": "This is the title",
"body": "This is the body of the page"
}
So your struct would look like this:
type Page struct {
Slug string `json:"slug"`
Title string `json:"title"`
Body string `json:"body"`
}
2 - 3. Get the body of your request and convert it to byte[] Then take that string and Unmarshal it into an instance of your struct.
The data of a post request is the request 'Body'.
In Golang the request in almost all cases (unless you're using something fancy outside of the defaults) will be an http.Request object. This is the 'r' which you normally have in your normal code and it holds the 'Body' of our POST request.
import (
"encoding/json"
"github.com/go-chi/chi" // you can remove
"github.com/go-chi/render" // you can remove but be sure to remove where it is used as well below.
"io/ioutil"
"net/http"
)
func GiveMeAPage(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("A page"))
}
So what we are going to do here is convert an io.ReadCloser, which is what the http.Request.Body is, to a []byte as the Unmarshal function takes a []byte type. I've commented inline below for you.
func Create(w http.ResponseWriter, r *http.Request) {
var p Page //Create an instance of our struct
//Read all the data in r.Body from a byte[], convert it to a string, and assign store it in 's'.
s, err := ioutil.ReadAll(r.Body)
if err != nil {
panic(err) // This would normally be a normal Error http response but I've put this here so it's easy for you to test.
}
// use the built in Unmarshal function to put the string we got above into the empty page we created at the top. Notice the &p. The & is important, if you don't understand it go and do the 'Tour of Go' again.
err = json.Unmarshal(s, &p)
if err != nil {
panic(err) // This would normally be a normal Error http response but I've put this here so it's easy for you to test.
}
// From here you have a proper Page object which is filled. Do what you want with it.
render.JSON(w, r, p) // This is me using a useful helper function from go-chi which 'Marshals' a struct to a json string and returns it to using the http.ResponseWriter.
}
As a side note. Please don't use Decoder to parse JSON unless you are using JSON streams. You are not here, and it's unlikely you will
for a while. You can read about why that is
here
If you just want the raw JSON data without parsing it, http.Request.Body implements io.Reader, so you can just Read from it. For example with ioutil.ReadAll.
Something like (untested):
func Point(w http.ResponseWriter, r *http.Request) {
data, err := ioutil.ReadAll(r.Body)
// check error
// do whatever you want with the raw data (you can `json.Unmarshal` it too)
}
I have JSON from an API that I want to save to MongoDB using the mgo package. My JSON looks like this:
{
"something": "value"
"collection": [
{ "obj1": "value" }
// ... (Variable number of objects here)
]
}
To save this data I've created a new type in my Go application that looks like this:
type MyData struct {
Something string
Collection []string // This actually contains more than strings but I'll be happy if I can just get strings saved
}
mongoSess, err := mgo.Dial("localhost:27017")
if err != nil {
panic(err)
}
defer mongoSess.Close()
c := mongoSess.DB("mydatabase").C("mycollection")
insertErr := c.Insert(&MyData{something, collection})
This code works but the problem is that it isn't saving anything in my collection field which should be an array of JSON objects. Nothing at all. I get the keys in my database and they are the right type but they have no data. Here's what the Mongo output is:
{ "_id" : ObjectId("5520c535a236d8a9a215d096"), "something" : "value", "collection" : [ ] }
Can anyone spot what it is I'm doing wrong? I'm obviously new to Go and having trouble with types.
Solution
The answers here really helped me a lot. I'm at fault for not properly explaining things as the answers sent me on the right track but didn't solve the issue directly. Here's what the actual solution is.
package main
import (
"encoding/json"
"github.com/bitly/go-simplejson"
"gopkg.in/mgo.v2"
//"gopkg.in/mgo.v2/bson"
// Other packages are used as well
)
type MyData struct {
Something int
Collection []interface{}
}
func main() {
// I'm using SimpleJson for parsing JSON
collection, cerr := jsonFromExternalApi.Get("collection").Array()
if cerr != nil {
logger.Debug.Fatalln(cerr)
}
// Save response to Mongo (leaving out all the connection code)
c := mongoSess.DB("mydb").C("mycollection")
insertErr := c.Insert(&Producer{npn, licenses })
}
The issue was that SimpleJSON was returning the array of objects from my API call as a []interface{}. I was not aware I could simply declare part of a struct to be an interface so instead of just correcting what Go's compiler was telling me was wrong I was making it way harder than it should have been.
Coming from loosely typed scripting languages, stuff like this really trips me up and sometimes its hard to see the benefit but hopefully this helps someone out one day.
Looks you have the wrong data structure.
insertErr := c.Insert(&MyData{something, collection})
something => string and collection => slice
You code should be like this:
insertErr := c.Insert(&MyData{"something", []string{"obj1", "value"}})
Here is the working code.
[{Something:something Collection:[obj1 value]} {Something:something Collection:[obj1 value]} {Something:something Collection:[obj1 value]} {Something:something Collection:[obj1 value]}]
For further reference, mgo has great documentation and you can find sample code and details about running mongodb queries here.
Here you are not defining proper type to Collection in your MyData struct.
Collection []string //This is what you are defining.
But from Json you are not getting string array,you are getting map[string]interface{}
This is the reason you are not filling Mydata struct properly
Correct MyData struct will be
type MyData struct {
Something string
Collection map[string]string // This actually contains more than strings but I'll be happy if I can just get strings saved
}
I'm now learning Go myself and am stuck in getting and parsing HTML/XML. In Python, I usually write the following code when I do web scraping:
from urllib.request import urlopen, Request
url = "http://stackoverflow.com/"
req = Request(url)
html = urlopen(req).read()
, then I can get raw HTML/XML in a form of either string or bytes and proceed to work with it. In Go, how can I cope with it? What I hope to get is raw HTML data which is stored either in string or []byte (though it can be easily converted, that I don't mind which to get at all). I consider using gokogiri package to do web scraping in Go (not sure I'll indeed end up with using it!), but it looks like it requires raw HTML text before doing any work with it...
So how can I acquire such object?
Or is there any better way to do web scraping work in Go?
Thanks.
From the Go http.Get Example:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
res, err := http.Get("http://www.google.com/robots.txt")
if err != nil {
log.Fatal(err)
}
robots, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s", robots)
}
Will return the contents of http://www.google.com/robots.txt into the string variable robots.
For XML parsing look into the Go encoding/xml package.
I am using Sockjs with Go, but when the JavaScript client send json to the server it escapes it, and send's it as a []byte. I'm trying to figure out how to parse the json, so that i can read the data. but I get this error.
json: cannot unmarshal string into Go value of type main.Msg
How can I fix this? html.UnescapeString() has no effect.
val, err := session.ReadMessage()
if err != nil {
break
}
var msg Msg
err = json.Unmarshal(val, &msg)
fmt.Printf("%v", val)
fmt.Printf("%v", err)
type Msg struct {
Channel string
Name string
Msg string
}
//Output
"{\"channel\":\"buu\",\"name\":\"john\", \"msg\":\"doe\"}"
json: cannot unmarshal string into Go value of type main.Msg
You might want to use strconv.Unquote on your JSON string first :)
Here's an example, kindly provided by #gregghz:
package main
import (
"encoding/json"
"fmt"
"strconv"
)
type Msg struct {
Channel string
Name string
Msg string
}
func main() {
var msg Msg
var val []byte = []byte(`"{\"channel\":\"buu\",\"name\":\"john\", \"msg\":\"doe\"}"`)
s, _ := strconv.Unquote(string(val))
err := json.Unmarshal([]byte(s), &msg)
fmt.Println(s)
fmt.Println(err)
fmt.Println(msg.Channel, msg.Name, msg.Msg)
}
You need to fix this in the code that is generating the JSON.
When it turns out formatted like that, it is being JSON encoded twice. Fix that code that is doing the generating so that it only happens once.
Here's some JavaScript that shows what's going on.
// Start with an object
var object = {"channel":"buu","name":"john", "msg":"doe"};
// serialize once
var json = JSON.stringify(object); // {"channel":"buu","name":"john","msg":"doe"}
// serialize twice
json = JSON.stringify(json); // "{\"channel\":\"buu\",\"name\":\"john\",\"msg\":\"doe\"}"
Sometimes, strconv.Unquote doesn't work.
Heres an example shows the problem and my solution.
(The playground link: https://play.golang.org/p/Ap0cdBgiA05)
Thanks for #Crazy Train's "encodes twice" idea, I just decoded it twice ...
package main
import (
"encoding/json"
"fmt"
"strconv"
)
type Wrapper struct {
Data string
}
type Msg struct {
Photo string
}
func main() {
var wrapper Wrapper
var original = `"{\"photo\":\"https:\/\/www.abc.net\/v\/t1.0-1\/p320x320\/123.jpg\"}"`
_, err := strconv.Unquote(original)
fmt.Println(err)
var val []byte = []byte("{\"data\":"+original+"}")
fmt.Println(string(val))
err = json.Unmarshal([]byte(val), &wrapper)
fmt.Println(wrapper.Data)
var msg Msg
err = json.Unmarshal([]byte(wrapper.Data), &msg)
fmt.Println(msg.Photo)
}
As Crazy Train pointed out, it appears that your input is doubly escaped, thus causing the issue. One way to fix this is to make sure that the function session.ReadMessasge() returns proper output that is escaped appropriately. However, if that's not possible, you can always do what x3ro suggested and use the golang function strconv.Unquote.
Here's a playground example of it in action:
http://play.golang.org/p/GTishI0rwe
The data shown in the problem is stringified for some purposes, in some cases you can even have \n in your string representing break of line in your json.
Let's understand the easiest way to unmarshal/deserialize this kind of data using the following example:
Next line shows the data you get from your sources and want to derserialize
stringifiedData := "{\r\n \"a\": \"b\",\r\n \"c\": \"d\"\r\n}"
Now, remove all new lines first
stringifiedData = strings.ReplaceAll(stringifiedData, "\n", "")
Then remove all the extra quotes that are present in your string
stringifiedData = strings.ReplaceAll(stringifiedData, "\\"", "\"")
Let's now convert the string into a byte array
dataInBytes := []byte(stringifiedData)
Before doing unmarshal, let's define structure of our data
jsonObject := &struct{
A string `json:"a"`
C string `json:"c"`
}
Now, you can dersialize your values into jsonObject
json.Unmarshal(dataInBytes, jsonObject)}
You got into infamous escaped string pitfall from JavaScript. Quite often people face (as I did) the same issue in Go, when serializing JSON string with json.Marshal, e.g.:
in := `{"firstName":"John","lastName":"Dow"}`
bytes, err := json.Marshal(in)
json.Marshal escapes double quotes, producing the same issue when you try to unmarshal bytes into struct.
If you faced the issue in Go, have a look at How To Correctly Serialize JSON String In Golang post which describes the issue in details with solution to it.