Importing json file in new fresh mnogodb - json

I just want to ask how can I import this example.json file in new mongodb I expect to have each seassion object as row in the table I tried
mongoimport --db foo --collection myCollections < dataBuys.json
2015-05-07T21:19:15.828+0300 connected to: localhost
2015-05-07T21:19:18.831+0300 foo.myCollections 168.5 MB
2015-05-07T21:19:21.826+0300 foo.myCollections 168.5 MB
2015-05-07T21:19:24.828+0300 foo.myCollections 168.5 MB
2015-05-07T21:19:27.828+0300 foo.myCollections 168.5 MB
2015-05-07T21:19:28.849+0300 warning: attempting to insert document with size 124.6 MB (exceeds 16.0 MB limit)
2015-05-07T21:19:28.986+0300 error inserting documents: write tcp 127.0.0.1:27017: broken pipe
2015-05-07T21:19:28.986+0300 imported 0 documents
and this
mongoimport -d mydb -c mycollection --jsonArray < dataBuys.json
2015-05-07T21:20:02.139+0300 connected to: localhost
2015-05-07T21:20:02.139+0300 Failed: error reading separator after document #1: bad JSON array format - found no opening bracket '[' in input source
2015-05-07T21:20:02.139+0300 imported 0 documents
The file I want to import have the following format and it size is 170mb for this one and 2.97GB for the other one.
{
"Sessions": {
"420374" : {
"Purchases" : [
{
"Price" : "12462",
"Quantity" : "1",
"Timestamp" : "2014-04-06T18:44:58.314Z",
"ItemId" : "214537888"
},
{
"Price" : "10471",
"Quantity" : "1",
"Timestamp" : "2014-04-06T18:44:58.325Z",
"ItemId" : "214537850"
}
]
},
"281626" : {
"Purchases" : [
{
"Price" : "1883",
"Quantity" : "1",
"Timestamp" : "2014-04-06T09:40:13.032Z",
"ItemId" : "214535653"
}
]
},
"420368" : {
"Purchases" : [
{
"Price" : "6073",
"Quantity" : "1",
"Timestamp" : "2014-04-04T06:13:28.848Z",
"ItemId" : "214530572"
},
{
"Price" : "2617",
"Quantity" : "1",
"Timestamp" : "2014-04-04T06:13:28.858Z",
"ItemId" : "214835025"
}
]
}
}
}
Do I have to reformat the json ? is it possible to make it work like this ?

the first error message says:
warning: attempting to insert document with size 124.6 MB (exceeds 16.0 MB limit)
This implies you are trying to insert a document that is 124.6MB in size.
A json document starts with an open brace character "{" and ends with a closed brace character "}". The error message implies that you have 124.6MB between such characters.
I think you need to examine your input file and verify that each session object is defined as a separate document - another words starts and ends with a brace.
I suspect the problem is that the session objects are in fact embedded in a master document - sort of a container document. This would make mongoimport try to map the master container document to its collection - and not the session objects as you require.

First of all, for verifying that your GeoJSON file is accurate, you could use Geojsonlint, QGIS and so on.
After than, to import your data into your collection, use Mongoimport:
mongoimport --db MY_DATABASE_NAME -c MY_COLLECTION_NAME --type json --file "MY_GEOJSON_FILENAME"
Replace the 3 variables above whith your valid names. Obviously, make sure that your current directory contains the file.
Profit! :)

Related

Promtail: how to trim not JSON part from log

I have multiline log that consists correct json part (one or more lines), and after it - stack trace.
Is it possile to parse first part of the log as json, and for stack-trace make new label ("stackTrace" for example) and put there all the lines after first part?
Unfortunately, logs can contain a different number of fields in json format, and therefore it is unlikely to parse them using regex.
{ "timestamp" : "2022-03-28 14:33:00,000", "logger" : "appLog", "level" : "ERROR", "thread" : "ktor-8080", "url" : "/path","method" : "POST","httpStatusCode" : 400,"callId" : "f7a22bfb1466","errorMessage" : "Unexpected JSON token at offset 184: Encountered an unknown key 'a'. Use 'ignoreUnknownKeys = true' in 'Json {}' builder to ignore unknown keys. JSON input: { \"entityId\" : \"TGT-8c8d950036bf\", \"processCode\" : \"test\", \"tokenType\" : \"SSO_CCOM\", \"ttlMills\" : 600000, \"a\" : \"a\" }" }
com.example.info.core.WebApplicationException: Unexpected JSON token at offset 184: Encountered an unknown key 'a'.
Use 'ignoreUnknownKeys = true' in 'Json {}' builder to ignore unknown keys.
JSON input: {
"entityId" : "TGT-8c8d950036bf",
"processCode" : "test",
"tokenType" : "SSO_CCOM",
"ttlMills" : 600000,
"a" : "a"
}
at com.example.info.signtoken.SignTokenApi$signTokenModule$2$1$1.invokeSuspend(SignTokenApi.kt:94)
at com.example.info.signtoken.SignTokenApi$signTokenModule$2$1$1.invoke(SignTokenApi.kt)
at com.example.info.signtoken.SignTokenApi$signTokenModule$2$1$1.invoke(SignTokenApi.kt)
at io.ktor.util.pipeline.SuspendFunctionGun.loop(SuspendFunctionGun.kt:248)
at io.ktor.util.pipeline.SuspendFunctionGun.proceed(SuspendFunctionGun.kt:116)
at io.ktor.util.pipeline.SuspendFunctionGun.execute(SuspendFunctionGun.kt:136)
at io.ktor.util.pipeline.Pipeline.execute(Pipeline.kt:78)
at io.ktor.routing.Routing.executeResult(Routing.kt:155)
at io.ktor.routing.Routing.interceptor(Routing.kt:39)
at io.ktor.routing.Routing$Feature$install$1.invokeSuspend(Routing.kt:107)
at io.ktor.routing.Routing$Feature$install$1.invoke(Routing.kt)
at io.ktor.routing.Routing$Feature$install$1.invoke(Routing.kt)
UPD.
I've made promtail pipeline like so
scrape_configs:
- job_name: Test_AppLog
static_configs:
- targets:
- ${HOSTNAME}
labels:
job: INFO-Test_AppLog
host: ${HOSTNAME}
__path__: /home/adm_web/app.log
pipeline_stages:
- multiline:
firstline: ^\{\s?\"timestamp\"
max_lines: 128
max_wait_time: 1s
- match:
selector: '{job="INFO-Test_AppLog"}'
stages:
- regex:
expression: '(?P<log>^\{ ?\"timestamp\".*\}[\s])(?s)(?P<stacktrace>.*)'
- labels:
log:
stacktrace:
- json:
expressions:
logger: logger
url: url
method: method
statusCode: httpStatusCode
sla: sla
source: log
But in fact, json config block does not work, the result in Grafana is only two fields - log and stacktrace.
Any help would be appreciated
if the style is constantly like this maybe the easiest way is to analyze whole log string find index of last symbol "}" - then split the string using its index+1 and result should be in the first part of output array

Retrieving of a file with mongofiles leads to a JSON error

I am trying to retrieve an xml file from my Mongo DB with mongofiles. I get a JSON parsing error. Here is an excerpt my terminal:
$ mongofiles -d anhalytics get_id 'ObjectId("5e7f56d30800611b17fc66b1")'
2020-09-15T16:55:33.205+0200 connected to: mongodb://localhost/
2020-09-15T16:55:33.205+0200 Failed: error parsing id as Extended JSON: invalid JSON number. Position: 18
I am using a MongoDB server version: 4.2.9
Here is the record of the target file
{
"_id" : ObjectId("5e7f56d30800611b17fc66b1"),
"filename" : "5e7f56d30800611b17fc66b0.tei.xml",
"aliases" : null,
"chunkSize" : NumberLong(261120),
"uploadDate" : ISODate("2020-03-28T13:53:23.708Z"),
"length" : NumberLong(35405),
"contentType" : null,
"md5" : "eeafae907c44b207071ccb6036148808"
}
Any idea why I am getting this error? Thanks!
The message error parsing id as Extended JSON indicates that the mongofiles tool had trouble parsing the id string that was provided on the command line.
That is done in parseOrCreateId function here: https://github.com/mongodb/mongo-tools/blob/master/mongofiles/mongofiles.go#L330
That function wraps the value from the command line in another string like {"_id":"%s"}, so the value actually passed to the bson.UnmarshalExtJSON function would have been
"{\"_id\":\"ObjectId(\"5e7f56d30800611b17fc66b1\")\"}"
Position 18 of that string, as called out in the error message is the quotation mark immediately preceding the hex string.

Lua json schema validator

I have been looking for over 4 days now but I havent been able to find much support on code for lua based json schema compiler. Mainly I have been dealing with
ljsonschema (https://github.com/jdesgats/ljsonschema)
rjson (https://luarocks.org/modules/romaboy/rjson)
But either of the above have not been straight forward to use.
After dealing with issues on the luarocks, I finally got ljsonschema working but the JSON syntax looks different than normal JSON structure - For ex: equals in place of semi colon, no double quotes for key names etc.
ljsonschema supports
{ type = 'object', properties = {
foo = { type = 'string' },
bar = { type = 'number' },},}
I require :
{ "type" : "object",
"properties" : {
"foo" : { "type" : "string" },
"bar" : { "type" : "number" }}}
With rjson there is an issue with the installation location itself. Though the installation goes fine, it is never able to find the .so file while running the lua code. Plus there is not much development support that I could find.
Please help point in the right direction, in case I am missing something.
I have the json schema & a sample json, I just need a lua code to help write a program around it.
This is to write a custom JSON Validation Plugin for Kong CE.
UPDATED:
I would like the below code to work with ljsonschema:
local jsonschema = require 'jsonschema'
-- Note: do cache the result of schema compilation as this is a quite
-- expensive process
local myvalidator = jsonschema.generate_validator{
"type" : "object",
"properties" : {
"foo" : { "type" : "string" },
"bar" : { "type" : "number" }
}
}
print(myvalidator { "foo":"hello", "bar":42 })
But I get the error : '}' expected (to close '{' at line 5) near ':'
it looks like the argument to generate_validator and myvalidator are lua tables, not raw json strings. You'll want to parse the json first:
> jsonschema = require 'jsonschema'
> dkjson = require('dkjson')
> schema = [[
>> { "type" : "object",
>> "properties" : {
>> "foo" : { "type" : "string" },
>> "bar" : { "type" : "number" }}}
>> ]]
> s = dkjson.decode(schema)
> myvalidator = jsonschema.generate_validator(s)
>
> json = '{ "foo": "bar", "bar": 42 }'
> print(myvalidator(json))
false wrong type: expected object, got string
> print(myvalidator(dkjson.decode(json)))
true
Ok, I think rapidjason came to be helpful:
Refer the link
Here is a sample working code :
local rapidjson = require('rapidjson')
function readAll(file)
local f = assert(io.open(file, "rb"))
local content = f:read("*all")
f:close()
return content
end
local jsonContent = readAll("sampleJson.txt")
local sampleSchema = readAll("sampleSchema.txt")
local sd = rapidjson.SchemaDocument(sampleSchema)
local validator = rapidjson.SchemaValidator(sd)
local d = rapidjson.Document(jsonContent)
local ok, message = validator:validate(d)
if ok then
print("json OK")
else
print(message)
end

UniqueDecodeError from urllib2 output from webpage with no non-unicode characters

I am trying to read data off an api webpage using urllib2 in Python2.7. I am using the following lines to read the page:
url = 'https://api.edamam.com/api/nutrition-data?app_id=<my_app_id>&app_key=<my_app_key>&ingr=1cheeseburger'
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
These lines give me this error (the error is on the last line in the above code):
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 0: invalid start byte
I understand that this error means that there are non-unicode characters in json_obj but I am not sure why this is the case, because the same url opens in a browser and the first few lines on the webpage looks like the following:
{
"uri" : "http://www.edamam.com/ontologies/edamam.owl#recipe_2a58ff3e1fec41d79da72f0be446baaa
"calories" : 312,
"totalWeight" : 119.0,
"dietLabels" : [ "BALANCED" ],
"healthLabels" : [ "PEANUT_FREE", "TREE_NUT_FREE", "ALCOHOL_FREE" ],
"cautions" : [ ],
"totalNutrients" : {
"ENERC_KCAL" : {
"label" : "Energy",
"quantity" : 312.96999999999997,
"unit" : "kcal"
},
As you can see, there are no non-unicode characters on this webpage, so I don't really follow what is going on.

Python3 json output values to file line by line only if other fields are greater than value

I have retrieved remote json using urllib.request in python3 and would like to to dump, line by line, the value of the IP addresses only (ie. ip:127.0.0.1 would be 127.0.0.1, next line is next IP) if it matches certain criteria. Other key values include a score (one integer value per category) and category (one or more string values possible).
I want to check if the score is higher than, say 10, AND the category number equals a list of one OR more values. If it fits the params, I just need those IP addresses added line by line to a text file.
Here is how I retrieve the json:
ip_fetch = urllib.request.urlopen('https://testonly.com/ip.json').read().decode('utf8')
I have the json module loaded, but don't know where to go from here.
Example of json data I'm working with, more than one category:
"127.0.0.1" : {
"Test" : "10",
"Prod" : "20"
},
I wrote a simple example that should show you how to iterate trough json objects and how to write to a file:
import json
j = json.loads(test)
threshold = 10
validCategories = ["Test"]
f=open("test.txt",'w')
for ip, categories in j.items():
addToList = False
for category, rank in categories.items():
if category in validCategories and int(rank) >= threshold:
addToList = True
if addToList:
f.write("{}\n".format(ip))
f.close()
I hope that helps you to get started. For testing I used the following json-string:
test = """
{
"127.0.0.1" : {
"Test" : "10",
"Prod" : "20"
},
"127.0.0.2" : {
"Test" : "5",
"Prod" : "20"
},
"127.0.0.3" : {
"Test" : "5",
"Prod" : "5",
"Test2": "20"
}
}
"""