this is the query:
mongoexport --host our.dbhost.com --port 27017 --username peter -p clark --collection sent_mails --db dbname --query '{trigger_id:ObjectId( "50c62e97b9fe6a000200000c"), updated_at: {$lt : ISODate("2013-02-28"), $gte : ISODate("2013-02-01") }}'
when I run this command I get:
assertion: 10340 Failure parsing JSON string near: , updated_
any ideas? (i want all records that match the trigger_id that were updated in February.)
as explained in this issue: Mongoexport using $gt and $lt constraints on a date range, you have to use Unix time stamps for date queries in mongoexport
The time stamps have to be in milliseconds
Invoking this in the bash shell would look like this:
let "date_from=`date --utc --date "2013-02-01" +%s` * 1000"
let "date_to=`date --utc --date "2013-03-01" +%s` * 1000"
mongoexport -d test -c xx --query "{updated_at:{\$gte:new Date($date_from),\$lt:new Date($date_to)}}"> xx.json
> connected to: 127.0.0.1
> exported 1 records
The xx colletion contains:
> db.xx.find().pretty()
{
"_id" : ObjectId("5158f670c2293fc7aadd811e"),
"trigger_id" : ObjectId("50c62e97b9fe6a000200000c"),
"updated_at" : ISODate("2013-02-11T00:00:00Z")
}
Related
My scheme
/geomesa-accumulo describe-schema -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
INFO Describing attributes of feature 'SignalBuilder'
geo | Point (Spatio-temporally indexed) (Spatially indexed)
time | Date (Spatio-temporally indexed) (Attribute indexed)
cam | String (Attribute indexed) (Attribute indexed)
imei | String
dir | Double
alt | Double
vlc | Double
sl | Integer
ds | Integer
dir_y | Double
poi_azimuth_x | Double
poi_azimuth_y | Double
User data:
geomesa.attr.splits | 4
geomesa.feature.expiry | time(30 days)
geomesa.index.dtg | time
geomesa.indices | z3:7:3:geo:time,z2:5:3:geo,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time
geomesa.stats.enable | true
geomesa.table.partition | time
geomesa.z.splits | 4
geomesa.z3.interval | week
When I try to get count by stat methods it retuns 11:
./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'"
Estimated count: 11
but without cache:
./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'" --no-cache
INFO Running stat query...
Count: 1436
Why stats methods not worked properly and return only estimated value?
In redis it's all ok. The problem is only in accumulo.
** Question update:
I try to recalculate statistics
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
INFO Running stat analysis for feature type SignalBuilder...
INFO Stats analyzed:
Total features: 11527
Bounds for geo: [ 37.598007, 55.736623, 38.661036, 56.9189592 ] cardinality: 10634
Bounds for time: [ 2022-01-30T15:13:58.706Z to 2022-02-09T14:16:03.000Z ] cardinality: 3779
Bounds for cam: [ 3fe961e1-91dd-4931-b82e-d04fcaf24c3e to f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 ] cardinality: 6
INFO Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'"
Estimated count: 14
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='3fe961e1-91dd-4931-b82e-d04fcaf24c3e'"
Estimated count: 0
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='3fe961e1-91dd-4931-b82e-d04fcaf24c3e'" --no-cache
INFO Running stat query...
Count: 2675
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
INFO Running stat analysis for feature type SignalBuilder...
INFO Stats analyzed:
Total features: 11767
Bounds for geo: [ 37.598007, 55.736623, 38.661036, 56.9189592 ] cardinality: 10942
Bounds for time: [ 2022-01-30T15:13:58.706Z to 2022-02-09T14:17:41.000Z ] cardinality: 3841
Bounds for cam: [ 3fe961e1-91dd-4931-b82e-d04fcaf24c3e to f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 ] cardinality: 6
INFO Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "1=1"
Estimated count: Unknown
Re-run with --no-cache to get an exact count
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "1=1" --no-cache
INFO Running stat query...
Count: 11872
But it does not help (((. The geo-events continue to arrive to geomesa. But stats does not worked.
May by I'm not using stats-count properly. Stats-top-k shows gathered statistics.
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam like '3fe961e1-91dd-4931-b82e-d04fcaf24c3e'"
Estimated count: 0
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-top-k -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
Top values for 'geo':
unavailable
Top values for 'time':
unavailable
Top values for 'cam':
7c0cf8bc-e7e3-4023-8a00-a5f17bda3001 (2925)
9f471340-dd70-4eca-a8dc-14553a4e708a (2924)
f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 (2922)
bfe55ad1-5b0a-405d-9ca9-3bed6aca9313 (2921)
3fe961e1-91dd-4931-b82e-d04fcaf24c3e (2920)
5798a065-d51e-47a1-b04b-ab48df9f1324 (2)
Top values for 'imei':
unavailable
Top values for 'dir':
unavailable
Top values for 'alt':
unavailable
Top values for 'vlc':
unavailable
Top values for 'sl':
unavailable
Top values for 'ds':
unavailable
Top values for 'dir_y':
unavailable
Top values for 'poi_azimuth_x':
unavailable
Top values for 'poi_azimuth_y':
unavailable
Or maybe the reason was in accumulo. When I try to get data from accumulo table. It returns
root#accumulo> scan -t myNamespace.geomesa_SignalBuilder_z3_geo_time_v7_02717
2022-02-09 17:55:12,909 [commands.ShellPluginConfigurationCommand] ERROR: Error: Could not determine the type of file "hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar".
2022-02-09 17:55:12,909 [shell.Shell] ERROR: Could not load the specified formatter. Using the DefaultFormatter
2022-02-09 17:55:12,929 [commands.ShellPluginConfigurationCommand] ERROR: Error: Could not determine the type of file "hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar".
\x01\x0A\x9Dt\x19\x84\xEF\xDD\xAF "5798a065-d51e-47a1-b04b-ab48df9f1324-1643555638706 d: [] \x03\x00\x0C\x02\x00\x1E\x000\x008\x00\\\x00g\x00o\x00w\x00\x7F\x00\x83\x00\x87\x00\x87\x00\x87\x00\x87\x00\x00\x0E\x00\x01\x01#CT\x9C\xD3\xE0\xBDE#Lu\xA0t\x7F-\xDE\x00\x00\x01~\xAB\x8C\xCD\xB25798a065-d51e-47a1-b04b-ab48df9f132\xB43333333333\xB1#f#\x00\x00\x00\x00\x00?\xF3\xAE\x14z\xE1G\xAE\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x01
\x01\x0A\x9Dt\x19\x84\xEF\xDD\xBD!\x065798a065-d51e-47a1-b04b-ab48df9f1324-1643555648706 d: [] \x03\x00\x0C\x02\x00\x1E\x000\x008\x00\\\x00g\x00o\x00w\x00\x7F\x00\x83\x00\x87\x00\x87\x00\x87\x00\x87\x00\x00\x0E\x00\x01\x01#CT\x9C\xD3\xE0\xBDE#Lu\xA0t\x7F-\xDE\x00\x00\x01~\xAB\x8C\xF4\xC25798a065-d51e-47a1-b04b-ab48df9f132\xB43333333333\xB1#f#\x00\x00\x00\x00\x00?\xF3\xAE\x14z\xE1G\xAE\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x01
Stats are gathered during ingestion, but are only written on a "best effort" basis (for example, if your ingest dies, stats may not be written). There are also code paths that don't update stats, for example if you disable them via system property or if you ingest through a bulk map/reduce job. In your particular case, it's hard to say why your stats don't match your data without a detailed description of everything you did to ingest it. However, if you want to re-calculate the cached statistics, you can always run the stats-analyze CLI command.
If you can re-create the issue, please feel free to file a ticket in the GeoMesa JIRA with the steps to re-create.
I tried to extract data from below cbq command which was successful.
cbq -u Administrator -p Administrator -e "http://localhost:8093" --script= SELECT * FROM `sample` where customer.id=="12345'" -q | jq '.results' > temp.json;
However when I am trying to import the same data in json format to target cluster using below command I am getting error.
cbimport json -c http://{target-cluster}:8091 -u Administrator -p Administrator -b sample -d file://C:\Users\{myusername}\Desktop\temp.json -f list -g %docId%
JSON import failed: 0 documents were imported, 0 documents failed to be imported
JSON import failed: input json is invalid: ReadArray: expect [ or , or ] or n, but found {, error found in #1 byte of ...|{
"requ|..., bigger context ...|{
"requestID": "2fc34542-4387-4643-8ae3-914e316|...],```
```{
"requestID": "6ef38b8a-8e70-4c3d-b3b4-b73518a09c62",
"signature": {
"*": "*"
},
"results": [
{
"{Bucket-name}":{my-data}
"status": "success",
"metrics": {
"elapsedTime": "4.517031ms",
"executionTime": "4.365976ms",
"resultCount": 1,
"resultSize": 24926
}
It looks like the file which was extracted from cbq command has control fields details like RequestID, metrics, status etc. Also json in pretty format. If I manually remove it(remove all fields except {my-data}) then put in a json file and make json unpretty then it works. But I want to automate it in a single run. Is there a way to do it in cbq command.
I don't find any other utility or way to use where condition on cbexport to do that on Couchbase, because the document which are exported using cbexport can be imported using cbimport easily.
For the cbq command, you can use the --quiet option to disable the startup connection messages and the --pretty=false to disable pretty-print. Then, to extract just the documents in cbimport json lines format, I used jq.
This worked for me -- selecting documents from travel-sample._default._default (for the jq filter, where I have _default, you would put the Bucket-name, based on your example):
cbq --quiet --pretty=false -u Administrator -p password --script='select * from `travel-sample`._default._default' | jq --compact-output '.results|.[]|._default' > docs.json
Then, importing into test-bucket1:
cbimport json -c localhost -u Administrator -p password -b test-bucket1 -d file://./docs.json -f lines -g %type%_%id%
cbq documentation: https://docs.couchbase.com/server/current/tools/cbq-shell.html
cbimport documentation: https://docs.couchbase.com/server/current/tools/cbimport-json.html
jq documentation:
https://stedolan.github.io/jq/manual/#Basicfilters
I've been given an odd requirement to store an Excel spreadsheet in one JSON document within Couchbase. cbimport is saying that my document is not valid JSON, when it is, so I believe something else is wrong.
My document goes along the style of this:
[{ "sets": [
{
"cluster" : "M1M",
"type" : "SET",
"shortName" : "MARTIN MARIETTA MATERIALS",
"clusterName" : "MARTIN MARIETTA",
"setNum" : "10000163"
},
{
"shortName" : "STERLING INC",
"type" : "SET",
"cluster" : "SJW",
"setNum" : "10001427",
"clusterName" : "STERLING JEWELERS"
},
...
]}]
And my cbimport command looks like this:
cbimport json --cluster localhost --bucket documentBucket \
--dataset file://set_numbers.json --username Administrator \
--password password --format lines -e errors.log -l debug.log \
--generate-key 1
I've tried to format as lines as well as list. Both fail. What am I doing wrong?
I wrote your sample to a json file called set_numbers.json and tried it locally with list.
cbimport json --cluster localhost --bucket documentBucket --dataset
file://set_numbers.json --username Administrator --password password
--format list --generate-key 1
It imported successfully into a single document.
use cbimport to upload json data
cbimport json -c couchbase://127.0.0.1 -b data -d file://data.json -u Administrator -p password -f list -g "%id%" -t 4
What is the right format of query argument of mongoexport utility?
While running the following command in the command line:
mongoexport -h localhost:27000 -d dbName -c collName -q "{'time': { $gt: new Date('2014-01-28T12:00:00Z')}}" -o output.js
I'm getting the following error:
connected to: localhost:27000 assertion: 16619 code FailedToParse:
FailedToParse: Expecting '}' or ',': offset:37
Reading Mongo Export query arg and JSONDocument docs haven't helped me to understand the expected format of query argument.
Running the same query in mongo shell succeeds.
If:
>new Date ("2014-01-28T12:00:00Z").getTime()
1390910400000
You will have to construct your query as follows:
-q "{sendToServerTime: {\$gt: {\$date : 1390910400000}}}"
The problem is your new Date() command. This no valid json. Try this:
mongoexport -h localhost:27000 -d DeploymentJan01 -c sensorsData -q '{sendToServerTime: { $gt: "2014-01-28T12:00:00Z"}}' -o output.js
I want to use JSON to batch upsert to a mongo collection.
$ mongoexport -d myDB -c myCollection
connected to: 127.0.0.1
{ "_id" : "john", "age" : 27 }
But using the syntax I would in the mongo shell yields:
0$ echo '{_id:"john", {$set:{gender:"male"}}' | mongoimport --upsert --upsertFields _id -d myDB -c myCollection
connected to: 127.0.0.1
Fri Jul 27 15:01:32 Assertion: 10340:Failure parsing JSON string near: , {$set:{g
0x581a52 0x528554 0xa9f2e3 0xaa1593 0xa980cd 0xa9c062 0x3e7ca1ec5d 0x4fe239
...
/lib64/libc.so.6(__libc_start_main+0xfd) [0x3e7ca1ec5d] mongoimport(__gxx_personality_v0+0x3c9) [0x4fe239]
exception:Failure parsing JSON string near: , {$set:{g
imported 0 objects
encountered 1 error
When I try it without the curly brackets, it yields no error but doesn't change the table:
0$ echo '{_id:"john", $set:{gender:"male"}}' | mongoimport --upsert --upsertFields _id -d myDB -c myCollection
connected to: 127.0.0.1
imported 1 objects
0$ mongoexport -d myDB -c myCollection
connected to: 127.0.0.1
{ "_id" : "john", "age" : 27 }
exported 1 records
I've searched everywhere but can't find an example using JSON. Please help!
To the best of my knowledge, MongoImport doesn't evaluate commands.
Just to add to Andre's answer.
Mongoimport takes a single file that contains 1 JSON/CSV/TSV string per line and inserts it. You can import from standard out but not as a command as above. You can use mongoimport to perform an upsert as per here.
You can run mongoimport with the stoponError option, which will force mongoimport to stop when it encounters an error.
Here's the complete manual for mongoimport and, as a FYI, mongoimport doesn't reliably preserve all rich BSON data types.
Mongoimport does not take modifiers such as your $set. You will need to use the mongo --eval command to update.
mongo myDB --eval 'db.myCollection.update({_id: "john"}, {$set:{gender:"male"}}, upsert=true)'
Hope this helps.