Combining map values into one json? - json

I am in the process of learning Go and I'm trying to make a program that takes websites stored in a csv in a column. Then queries http://ip-api.com to find out what country the IP address originates from.
However the issue I am running into is my JSON is showing up like this:
[{"country":"Singapore"}]
[{"country":"United States"},{"country":"United States"}]
[{"country":"Singapore"},{"country":"Singapore"},{"country":"Singapore"}]
[{"country":"Ireland"},{"country":"Ireland"},{"country":"Ireland"},{"country":"Ireland"}]
But I want it to show up like this
{"country": "Singapore",
"country": "United States"
"country": "Ireland"
}
My CSV File looks like this
www.google.com
www.bing.com
www.pokemon.com
www.yahoo.com
And here is my code
package main
import (
"encoding/csv"
"encoding/json"
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"os"
)
func closeFile(f *os.File) {
err := f.Close()
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
os.Exit(1)
}
}
func main() {
m := make(map[string]string)
result := []map[string]string{}
csvFile, err :=
os.Open("test.csv")
if err != nil {
log.Fatal(err)
}
defer closeFile(csvFile)
reader := csv.NewReader(csvFile)
for {
line, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
log.Fatal(err)
}
response, err := http.Get(fmt.Sprintf("http://ip-api.com/json/%s?fields=org", line[0]))
if err != nil {
fmt.Println(err)
defer response.Body.Close()
} else {
data, _ := ioutil.ReadAll(response.Body)
err := json.Unmarshal(data, &m)
if err != nil {
panic(err)
}
result = append(result, m)
rest, _ := json.Marshal(result)
fmt.Println(string(rest)) **
}
}
}
I feel like the issue is I'm missing a for: range loop to compile everything before printing but I would love any feedback to sort this issue out.

It's because Go maps have pointer semantics, they are not values:
Map types are reference types, like pointers or slices.
After a map is created with make before the loop is started and is appended to the list within the loop, it is essentially a pointer to the same underlying data stored in the list multiple times.
Then, Unmarshal won't create a new map, it'll reuse the same pointer, which means overwriting the previously retrieved result.
So, the fix
Recreate the map on every iteration. Just move the m := make(map[string]string) line inside the loop, right before the API call.

Related

How to compare JSON with varying order?

I'm attempting to implement testing with golden files, however, the JSON my function generates varies in order but maintains the same values. I've implemented the comparison method used here:
How to compare two JSON requests?
But it's order dependent. And as stated here by brad:
JSON objects are unordered, just like Go maps. If
you're depending on the order that a specific implementation serializes your JSON
objects in, you have a bug.
I've written some sample code that simulated my predicament:
package main
import (
"bufio"
"encoding/json"
"fmt"
"io/ioutil"
"math/rand"
"os"
"reflect"
"time"
)
type example struct {
Name string
Earnings float64
}
func main() {
slice := GetSlice()
gfile, err := ioutil.ReadFile("testdata/example.golden")
if err != nil {
fmt.Println(err)
fmt.Println("Failed reading golden file")
}
testJSON, err := json.Marshal(slice)
if err != nil {
fmt.Println(err)
fmt.Println("Error marshalling slice")
}
equal, err := JSONBytesEqual(gfile, testJSON)
if err != nil {
fmt.Println(err)
fmt.Println("Error comparing JSON")
}
if !equal {
fmt.Println("Restults don't match JSON")
} else {
fmt.Println("Success!")
}
}
func GetSlice() []example {
t := []example{
example{"Penny", 50.0},
example{"Sheldon", 70.0},
example{"Raj", 20.0},
example{"Bernadette", 200.0},
example{"Amy", 250.0},
example{"Howard", 1.0}}
rand.Seed(time.Now().UnixNano())
rand.Shuffle(len(t), func(i, j int) { t[i], t[j] = t[j], t[i] })
return t
}
func JSONBytesEqual(a, b []byte) (bool, error) {
var j, j2 interface{}
if err := json.Unmarshal(a, &j); err != nil {
return false, err
}
if err := json.Unmarshal(b, &j2); err != nil {
return false, err
}
return reflect.DeepEqual(j2, j), nil
}
func WriteTestSliceToFile(arr []example, filename string) {
file, err := os.OpenFile(filename, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
fmt.Println("failed creating file: %s", err)
}
datawriter := bufio.NewWriter(file)
marshalledStruct, err := json.Marshal(arr)
if err != nil {
fmt.Println("Error marshalling json")
fmt.Println(err)
}
_, err = datawriter.Write(marshalledStruct)
if err != nil {
fmt.Println("Error writing to file")
fmt.Println(err)
}
datawriter.Flush()
file.Close()
}
JSON arrays are ordered. The json.Marshal function preserves order when encoding a slice to a JSON array.
JSON objects are not ordered. The json.Marshal function writes object members in sorted key order as described in the documentation.
The bradfitz comment JSON object ordering is not relevant to this question:
The application in the question is working with a JSON array, not a JSON object.
The package was updated to write object fields in sorted key order a couple of years after Brad's comment.
To compare slices while ignoring order, sort the two slices before comparing. This can be done before encoding to JSON or after decoding from JSON.
sort.Slice(slice, func(i, j int) bool {
if slice[i].Name != slice[j].Name {
return slice[i].Name < slice[j].Name
}
return slice[i].Earnings < slice[j].Earnings
})
For unit testing, you could use assert.JSONEq from Testify. If you need to do it programatically, you could follow the code of the JSONEq function.
https://github.com/stretchr/testify/blob/master/assert/assertions.go#L1551

Read extern JSON file

I am trying to read the following JSON file:
{
"a":1,
"b":2,
"c":3
}
I have tried this but I found that I had to write each field of the JSON file into a struct but I really don't want to have all my JSON file in my Go code.
import (
"fmt"
"encoding/json"
"io/ioutil"
)
type Data struct {
A string `json:"a"`
B string `json:"b"`
C string `json:"c"`
}
func main() {
file, _ := ioutil.ReadFile("/path/to/file.json")
data := Data{}
if err := json.Unmarshal(file ,&data); err != nil {
panic(err)
}
for _, letter := range data.Letter {
fmt.Println(letter)
}
}
Is there a way to bypass this thing with something like json.load(file) in Python?
If you only want to support integer values, you could unmarshal your data into a map[string]int. Note that the order of a map is not defined, so the below program's output is non-deterministic for the input.
package main
import (
"fmt"
"encoding/json"
"io/ioutil"
)
func main() {
file, _ := ioutil.ReadFile("/path/to/file.json")
var data map[string]int
if err := json.Unmarshal(file ,&data); err != nil {
panic(err)
}
for letter := range data {
fmt.Println(letter)
}
}
You can unmarshal any JSON data in this way:
var data interface{}
if err := json.Unmarshal(..., &data); err != nil {
// handle error
}
Though, in this way you should handle all the reflection-related stuffs
since you don't know what type the root data is, and its fields.
Even worse, your data might not be map at all.
It can be any valid JSON data type like array, string, integer, etc.
Here's a playground link: https://play.golang.org/p/DiceOv4sATO
It's impossible to do anything as simple as in Python, because Go is strictly typed, so it's necessary to pass your target into the unmarshal function.
What you've written could otherwise be shortened, slightly, to something like this:
func UnmarshalJSONFile(path string, i interface{}) error {
f, err := os.Open(path)
if err != nil {
return err
}
defer f.Close()
return json.NewDecoder(f).Decode(i)
}
But then to use it, you would do this:
func main() {
data := Data{}
if err := UnmarshalJSONFile("/path/to/file.json", &data); err != nil {
panic(err)
}
}
But you can see that the UnmarshalJSONFile is so simple, it hardly warrants a standard library function.

How to output results to CSV of a concurrent web scraper in Go?

I'm new to Go and am trying to take advantage of the concurrency in Go to build a basic scraper to pull extract title, meta description, and meta keywords from URLs.
I am able to print out the results to terminal with the concurrency but can't figure out how to write output to CSV. I've tried many a variations that I could think of with limited knowledge of Go and many end up breaking the concurrency - so losing my mind a bit.
My code and URL input file is below - Thanks in advance for any tips!
// file name: metascraper.go
package main
import (
// import standard libraries
"encoding/csv"
"fmt"
"io"
"log"
"os"
"time"
// import third party libraries
"github.com/PuerkitoBio/goquery"
)
func csvParsing() {
file, err := os.Open("data/sample.csv")
checkError("Cannot open file ", err)
if err != nil {
// err is printable
// elements passed are separated by space automatically
fmt.Println("Error:", err)
return
}
// automatically call Close() at the end of current method
defer file.Close()
//
reader := csv.NewReader(file)
// options are available at:
// http://golang.org/src/pkg/encoding/csv/reader.go?s=3213:3671#L94
reader.Comma = ';'
lineCount := 0
fileWrite, err := os.Create("data/result.csv")
checkError("Cannot create file", err)
defer fileWrite.Close()
writer := csv.NewWriter(fileWrite)
defer writer.Flush()
for {
// read just one record
record, err := reader.Read()
// end-of-file is fitted into err
if err == io.EOF {
break
} else if err != nil {
fmt.Println("Error:", err)
return
}
go func(url string) {
// fmt.Println(msg)
doc, err := goquery.NewDocument(url)
if err != nil {
checkError("No URL", err)
}
metaDescription := make(chan string, 1)
pageTitle := make(chan string, 1)
go func() {
// time.Sleep(time.Second * 2)
// use CSS selector found with the browser inspector
// for each, use index and item
pageTitle <- doc.Find("title").Contents().Text()
doc.Find("meta").Each(func(index int, item *goquery.Selection) {
if item.AttrOr("name", "") == "description" {
metaDescription <- item.AttrOr("content", "")
}
})
}()
select {
case res := <-metaDescription:
resTitle := <-pageTitle
fmt.Println(res)
fmt.Println(resTitle)
// Have been trying to output to CSV here but it's not working
// writer.Write([]string{url, resTitle, res})
// err := writer.WriteString(`res`)
// checkError("Cannot write to file", err)
case <-time.After(time.Second * 2):
fmt.Println("timeout 2")
}
}(record[0])
fmt.Println()
lineCount++
}
}
func main() {
csvParsing()
//Code is to make sure there is a pause before program finishes so we can see output
var input string
fmt.Scanln(&input)
}
func checkError(message string, err error) {
if err != nil {
log.Fatal(message, err)
}
}
The data/sample.csv input file with URLs:
http://jonathanmh.com
http://keshavmalani.com
http://google.com
http://bing.com
http://facebook.com
In the code you supplied, you had commented the following code:
// Have been trying to output to CSV here but it's not working
err = writer.Write([]string{url, resTitle, res})
checkError("Cannot write to file", err)
This code is correct, except you have one issue.
Earlier in the function, you have the following code:
fileWrite, err := os.Create("data/result.csv")
checkError("Cannot create file", err)
defer fileWrite.Close()
This code causes the fileWriter to close once your csvParsing() func exits.
Because you've closed fileWriter with the defer, you are unable to write to it in your concurrent function.
Solution:
You'll need to use defer fileWrite.Close() inside your concurrent func or something similar so you do not close the fileWriter before you have written to it.

Efficient read and write CSV in Go

The Go code below reads in a 10,000 record CSV (of timestamp times and float values), runs some operations on the data, and then writes the original values to another CSV along with an additional column for score. However it is terribly slow (i.e. hours, but most of that is calculateStuff()) and I'm curious if there are any inefficiencies in the CSV reading/writing I can take care of.
package main
import (
"encoding/csv"
"log"
"os"
"strconv"
)
func ReadCSV(filepath string) ([][]string, error) {
csvfile, err := os.Open(filepath)
if err != nil {
return nil, err
}
defer csvfile.Close()
reader := csv.NewReader(csvfile)
fields, err := reader.ReadAll()
return fields, nil
}
func main() {
// load data csv
records, err := ReadCSV("./path/to/datafile.csv")
if err != nil {
log.Fatal(err)
}
// write results to a new csv
outfile, err := os.Create("./where/to/write/resultsfile.csv"))
if err != nil {
log.Fatal("Unable to open output")
}
defer outfile.Close()
writer := csv.NewWriter(outfile)
for i, record := range records {
time := record[0]
value := record[1]
// skip header row
if i == 0 {
writer.Write([]string{time, value, "score"})
continue
}
// get float values
floatValue, err := strconv.ParseFloat(value, 64)
if err != nil {
log.Fatal("Record: %v, Error: %v", floatValue, err)
}
// calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
score := calculateStuff(floatValue)
valueString := strconv.FormatFloat(floatValue, 'f', 8, 64)
scoreString := strconv.FormatFloat(prob, 'f', 8, 64)
//fmt.Printf("Result: %v\n", []string{time, valueString, scoreString})
writer.Write([]string{time, valueString, scoreString})
}
writer.Flush()
}
I'm looking for help making this CSV read/write template code as fast as possible. For the scope of this question we need not worry about the calculateStuff method.
You're loading the file in memory first then processing it, that can be slow with a big file.
You need to loop and call .Read and process one line at a time.
func processCSV(rc io.Reader) (ch chan []string) {
ch = make(chan []string, 10)
go func() {
r := csv.NewReader(rc)
if _, err := r.Read(); err != nil { //read header
log.Fatal(err)
}
defer close(ch)
for {
rec, err := r.Read()
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
ch <- rec
}
}()
return
}
playground
//note it's roughly based on DaveC's comment.
This is essentially Dave C's answer from the comments sections:
package main
import (
"encoding/csv"
"log"
"os"
"strconv"
)
func main() {
// setup reader
csvIn, err := os.Open("./path/to/datafile.csv")
if err != nil {
log.Fatal(err)
}
r := csv.NewReader(csvIn)
// setup writer
csvOut, err := os.Create("./where/to/write/resultsfile.csv"))
if err != nil {
log.Fatal("Unable to open output")
}
w := csv.NewWriter(csvOut)
defer csvOut.Close()
// handle header
rec, err := r.Read()
if err != nil {
log.Fatal(err)
}
rec = append(rec, "score")
if err = w.Write(rec); err != nil {
log.Fatal(err)
}
for {
rec, err = r.Read()
if err != nil {
if err == io.EOF {
break
}
log.Fatal(err)
}
// get float value
value := rec[1]
floatValue, err := strconv.ParseFloat(value, 64)
if err != nil {
log.Fatal("Record, error: %v, %v", value, err)
}
// calculate scores; THIS EXTERNAL METHOD CANNOT BE CHANGED
score := calculateStuff(floatValue)
scoreString := strconv.FormatFloat(score, 'f', 8, 64)
rec = append(rec, scoreString)
if err = w.Write(rec); err != nil {
log.Fatal(err)
}
w.Flush()
}
}
Note of course the logic is all jammed into main(), better would be to split it into several functions, but that's beyond the scope of this question.
encoding/csv is indeed very slow on big files, as it performs a lot of allocations. Since your format is so simple I recommend using strings.Split instead which is much faster.
If even that is not fast enough you can consider implementing the parsing yourself using strings.IndexByte which is implemented in assembly: http://golang.org/src/strings/strings_decl.go?s=274:310#L1
Having said that, you should also reconsider using ReadAll if the file is larger than your memory.

Go JSON with simplejson

Trying to use the JSON lib from "github.com/bitly/go-simplejson"
url = "http://api.stackoverflow.com/1.1/tags?pagesize=100&page=1"
res, err := http.Get(url)
body, err := ioutil.ReadAll(res.Body)
fmt.Printf("%s\n", string(body)) //WORKS
js, err := simplejson.NewJson(body)
total,_ := js.Get("total").String()
fmt.Printf("Total:%s"+total )
But it seems it doenst work !?
Trying to access the total and tag fields
You have a few mistakes:
If you'll check the JSON response you'll notice that total field is not string, that's why you should use MustInt() method, not String(), when you are accessing the field.
Printf() method invocation was totally wrong. You should pass a "template", and then pass arguments appropriate to the number of "placeholders".
By the way, I strongly recomend you to check err != nil everywhere, that'll help you a lot.
Here is the working example:
package main
import (
"fmt"
"github.com/bitly/go-simplejson"
"io/ioutil"
"log"
"net/http"
)
func main() {
url := "http://api.stackoverflow.com/1.1/tags?pagesize=100&page=1"
res, err := http.Get(url)
if err != nil {
log.Fatalln(err)
}
body, err := ioutil.ReadAll(res.Body)
if err != nil {
log.Fatalln(err)
}
// fmt.Printf("%s\n", string(body))
js, err := simplejson.NewJson(body)
if err != nil {
log.Fatalln(err)
}
total := js.Get("total").MustInt()
if err != nil {
log.Fatalln(err)
}
fmt.Printf("Total:%s", total)
}