This is my first time using mongodb and I have a products.json file in the format of
{"products":[ {} , {} , {} ] }
and when I inserted it to mongodb with the command :
mongoimport --db databaseName --collection collectioname --file products.json
I got in my collection the products.json as a single document with a single object id not a doc with an id for every {} object in my array .Obviously my json format is incorrect and I would really appreciate your help with this since I am a beginner.
From what you described , what you need is to import with --collection products , and have your products in the json file to be as follow:
{}
{}
{}
Or you can modify the file content as follow:
[{},{},{}]
and add --jsonArray to the mongoimport command to load the array objects as separate documents ...
I have a JSON file (converted from mongodump BSON) which I would like to insert to a MongoDB using pymongo.
The approach I am using is something like:
with open('duplicate_docs.json') as f:
lines = f.readlines()
for line in lines:
record = json.loads(line)
db.insert_one(record)
However, the JSON is in the form:
{ "_id" : ObjectId( "54ccc3f469702d45ca450200"), \"id\":\"54713efd69702d78d1420500\",\"name\":\"response"}
As you can see there are escape charaters () for the JSON keys and I am not able to load this as a JSON.
What is the best way yo fix a JSON string like this so it can be used to do insert to MongoDB?
Thank you.
As an alternative approach if you take the actual output of mongodump you can insert it straight in with the bson.json_util loads() function.
from pymongo import MongoClient
from bson.json_util import loads
db = MongoClient()['mydatabase']
with open('c:/temp/duplicate_docs.json', mode='w') as f:
f.write('{"_id":{"$oid":"54ccc3f469702d45ca450200"},"id":"54713efd69702d78d1420500","name":"response"}')
with open('c:/temp/duplicate_docs.json') as f:
lines = f.readlines()
for line in lines:
record = loads(line)
db.docs.insert_one(record)
why not use mongoexport to dump to json not bson
mongoexport --port 27017 --db <database> --collection <collection> --out output.json
and then use
mongoimport --port 27017 --db <database> --collection <collection> --file output.json
For example I have a file customers.json which is an array of objects (strictly formed) and it's pretty plain (without nested objects) like this (what is important: it's already include ids):
[
{
"id": 23635,
"name": "Jerry Green",
"comment": "Imported from facebook."
},
{
"id": 23636,
"name": "John Wayne",
"comment": "Imported from facebook."
}
]
And I want to import them all into my postgres db into a table customers.
I found some pretty difficult ways when I should import it as json-typed column to a table like imported_json and column named data with objects listed there, then to use sql to get these values and insert it into a real table.
But is there a simple way of importing json to postgres with no touching of sql?
You can feed the JSON into a SQL statement that extracts the information and inserts that into the table. If the JSON attributes have exactly the name as the table columns you can do something like this:
with customer_json (doc) as (
values
('[
{
"id": 23635,
"name": "Jerry Green",
"comment": "Imported from facebook."
},
{
"id": 23636,
"name": "John Wayne",
"comment": "Imported from facebook."
}
]'::json)
)
insert into customer (id, name, comment)
select p.*
from customer_json l
cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update
set name = excluded.name,
comment = excluded.comment;
New customers will be inserted, existing ones will be updated. The "magic" part is the json_populate_recordset(null::customer, doc) which generates a relational representation of the JSON objects.
The above assumes a table definition like this:
create table customer
(
id integer primary key,
name text not null,
comment text
);
If the data is provided as a file, you need to first put that file into some table in the database. Something like this:
create unlogged table customer_import (doc json);
Then upload the file into a single row of that table, e.g. using the \copy command in psql (or whatever your SQL client offers):
\copy customer_import from 'customers.json' ....
Then you can use the above statement, just remove the CTE and use the staging table:
insert into customer (id, name, comment)
select p.*
from customer_import l
cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update
set name = excluded.name,
comment = excluded.comment;
It turns out there's an easy way to import a multi-line JSON object into a JSON column in a postgres database using the command line psql tool, without needing to explicitly embed the JSON into the SQL statement. The technique is documented in the postgresql docs, but it's a bit hidden.
The trick is to load the JSON into a psql variable using backticks. For example, given a multi-line JSON file in /tmp/test.json such as:
{
"dog": "cat",
"frog": "frat"
}
We can use the following SQL to load it into a temporary table:
sql> \set content `cat /tmp/test.json`
sql> create temp table t ( j jsonb );
sql> insert into t values (:'content');
sql> select * from t;
which gives the result:
j
────────────────────────────────
{"dog": "cat", "frog": "frat"}
(1 row)
You can also perform operations on the data directly:
sql> select :'content'::jsonb -> 'dog';
?column?
──────────
"cat"
(1 row)
Under the covers this is just embedding the JSON in the SQL, but it's a lot neater to let psql perform the interpolation itself.
In near big-data cases, the most efficient way to import json from a file, not using any external tool, appear to
not import a single json from a file
but rather a single column csv: A list of one-line jsons:
data.json.csv:
{"id": 23635,"name": "Jerry Green","comment": "Imported from facebook."}
{"id": 23636,"name": "John Wayne","comment": "Imported from facebook."}
then, under psql:
create table t ( j jsonb )
\copy t from 'd:\path\data.json.csv' csv quote e'\x01' delimiter e'\x02'
One record per json (line) will be added into t table.
"\copy from" import was made for csv, and as such
loads data line by line.
As a result reading one json per line
rather than a single json array to be later splited,
will not use any intermediate table and will achieve high throughput.
More of that you will less likely hit the max input line-size limitation that will arise if your input json file is too big.
I would thus first convert your input into a single column csv to then import it using the copy command.
You can use spyql.
Running the following command would generate INSERT statements that you can pipe into psql:
$ jq -c .[] customers.json | spyql -Otable=customer "SELECT json->id, json->name, json->comment FROM json TO sql"
INSERT INTO "customer"("id","name","comment") VALUES (23635,'Jerry Green','Imported from facebook.'),(23636,'John Wayne','Imported from facebook.');
jq is used to transform the json array into json lines (1 json object per line) and then spyql takes care of converting json lines into INSERT statements.
To import the data into PostgreSQL:
$ jq -c .[] customers.json | spyql -Otable=customer "SELECT json->id, json->name, json->comment FROM json TO sql" | psql -U your_user_name -h your_host your_database
Disclaimer: I am the author of spyql.
If you want to do it from a command line ...
NOTE: This isn't a direct answer to your question, as this will require you to convert your JSON to SQL. You will probably have to deal with JSON 'null' when converting anyway. You could use a view or materialized view to make that problem invisible-ish, though.
Here is a script I've used for importing JSON into PostgreSQL (WSL Ubuntu), which basically requires that you mix psql meta commands and SQL in the same command line. Note use of the somewhat obscure script command, which allocates a pseudo-tty:
$ more update.sh
#!/bin/bash
wget <filename>.json
echo '\set content `cat $(ls -t <redacted>.json.* | head -1)` \\ delete from <table>; insert into <table> values(:'"'content'); refresh materialized view <view>; " | PGPASSWORD=<passwd> psql -h <host> -U <user> -d <database>
$
(Copied from my answer at Shell script to execute pgsql commands in files)
Another option is to use sling. See this blog post which covers loading JSON files into PG. You could simply pipe your json file like this:
$ export POSTGRES='postgresql://...'
$ sling conns list
+------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+------------+------------------+-----------------+
| POSTGRES | DB - PostgreSQL | env variable |
+------------+------------------+-----------------+
$ cat /tmp/records.json | sling run --tgt-conn POSTGRES --tgt-object public.records --mode full-refresh
11:09AM INF connecting to target database (postgres)
11:09AM INF reading from stream (stdin)
11:09AM INF writing to target database [mode: full-refresh]
11:09AM INF streaming data
11:09AM INF dropped table public.records
11:09AM INF created table public.records
11:09AM INF inserted 500 rows in 0 secs [1,556 r/s]
11:09AM INF execution succeeded
Using debug mode would show a DDL of create table if not exists public.records ("data" jsonb). If you would like to flatten your JSON, sling does that as well by adding the --src-options 'flatten: true' option:
$ cat /tmp/records.json | sling run --src-options 'flatten: true' --tgt-conn POSTGRES --tgt-object public.records --mode full-refresh
The DDL in that case would be something like:
create table if not exists public.records ("_id" varchar(255),
"age" integer,
"balance" varchar(255),
"company__about" text,
"company__address" varchar(255),
"company__email" varchar(255),
"company__latitude" numeric,
"company__longitude" numeric,
"company__name" varchar(255),
"company__phone" varchar(255),
"company__registered" varchar(255),
"isactive" bool,
"name" varchar(255),
"picture" varchar(255),
"tags" jsonb)
FYI, I am the author of sling.
I have file.csv with similar to this structure
loremipsum; machine, metal
As i understand, succeful import will looks like
{
text: "loremipsum", << string
tags: ["machine","metal"] << object with two fields
}
The best result i get
{
text: "loremipsum", << string
tags: "machine, metal" << string
}
If i understand it correctly then please tell me how to do succeful import. Thanks.
Edit: because "tags" object should contain ~16 urls, so tell me how it should be stored correctly.
Ideally, below command should be used to import csv file to mongoDb (Maybe you are using the same):
mongoimport --db users --type csv --headerline --file /filePath/fileName.csv
I think ,Your Problem is with array type of data (if I understood correctly...!!).
Then, You need to first add one document in collection and export it as CSV file. This will give you the format which is expected by above command to import your data correctly. And then Arrange your data as per exported CSV file.
link Answer Here Explained it Very well
I had this data in Excel
I wanted like this in MongoDB
{
"name":"John",
"age":30,
"cars":[ "Ford", "BMW", "Fiat" ]
}
I did replaced the headings like cars.0 cars.1 cars.2
Like this
I used the tool mongoimport and ran this command
mongoimport.exe --uri "mongodb+srv:/localhost/<MYDBName>" --username dbuser --collection ----drop --type csv --useArrayIndexFields --headerline --file 1.csv
here my csv is 1.csv
I'd like to import a json file containing an array of jsons into a mysql table.
Using the LOAD DATA LOCAL INFILE '/var/lib/mysql-files/big_json.json' INTO TABLE test(json); give the error messageInvalid JSON text: "Invalid value." at position 1 in value for column
My json looks like the following:
[
{
"id": "6defd952",
"title": "foo"
},
{
"id": "98ee3d8b",
"title": "bar"
}
]
How can I parameterize the load command to iterate through the array?
For whatever reason, MySQL errors out when you try and load pretty-printed JSON, both in 5.7 and in 8.0+, so you'll need the JSON to be in compact form, i.e.:
[{"id":"6defd952","title":"foo"},{"id":"98ee3d8b","title":"bar"}]
This can be accomplished by reading the json object to jq and using the -c flag to output in compact form to re-format the text:
Windows:
type input.json | jq -c . > infile.json
Mac/Linux:
cat input.json | jq -c . > infile.json
If you're reading from a std_out instead of a file, just replace the cat/type section and pipe to jq in the same manner. The resultant "infile.json" should then work for the LOAD DATA INFILE statement.
If you wanted to then iterate through the loaded array, use JSON_EXTRACT(JSON_FIELD, CONCAT('$[', i, ']')) with ordinal variable i to traverse the root document (array) in a single query.