Get list of all values in a key:value list - json

So my input looks like
{"selling":"0","quantity":"2","price":"80000","date":"1401384212","rs_name":"overhault","contact":"PM","notes":""}
{"selling":"0","quantity":"100","price":"80000","date":"1401383271","rs_name":"sammvarnish","contact":"PM","notes":"Seers Bank W321 :)"}
{"selling":"0","quantity":"100","price":"70000","date":"1401383168","rs_name":"pwnoramaa","contact":"PM","notes":""}
and the output I want must look like
0,2,80000,1401384212,overhault,PM,""
0,100,80000,1401383271,sammvarnish,PM,"Seers Bank W321 :)"
0,100,70000,1401383168,pwnoramaa,PM,""
What's the best way to do this in bash?
EDIT: changed my needs.
The new output I want is, for
{"selling":"0","quantity":"2","price":"80000","date":"1401384212","rs_name":"overhault","contact":"PM","notes":"testnote"}
as input,
rs name: \t overhault
quantity: \t 2
price: \t 80000
date: \t 29-05 19:23
contact: \t PM
notes: \t testnote
Where \t is a tab character (like in echo "\t").
As you can see, this one is a tad bit more complicated.
For example, it changes the order, and requires the UNIX timestamp to be converted to an alternative format.
I'll use any tool you can offer me as long as you explain clearly how I can use it from a bash script. The input will consist of three of such lines, delimited by a newline character, and it must print the output with an empty line between each of the results.

Don't do this with regular expressions/bash, there are JSON parsers for this kind of task. Simple Python example:
import json
data = json.loads('{"selling":"0","quantity":"2"}')
data = ','.join(data.values())
print(data)
I strongly suggest you just use a simple script like this which you make executable and then call.
EDIT: here's a version which preserves the order:
import json
data = json.loads('{"selling":"0","quantity":"2", "price":"80000"}')
orderedkeys = ['selling', 'quantity', 'price']
values = [data[key] for key in orderedkeys]
values = ','.join(values)
print(values)
output:
0,2,80000

Related

How to parse invalid JSON contianing invalid number

I work with a legacy customer who sends me webhook events. Sometimes their system sends me a value that looks like this
[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]
I am using python's json.loads to parse the data sent to me. Here the ts is an invalid number and python gives json.decoder.JSONDecodeError whenever I try to parse this string.
It is okay with me to get None in ts field if I can not parse it.
What would be a smart (& possibly generic) way to solve this problem?
This may not be so generic, but you can try using yaml to load:
import yaml
s = '[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074","ts":16XX445656000}]'
yaml.safe_load(s)
Output:
[{'id': 'LXKhRA3RHtaVBhnczVRJLdr',
'ecc': '0X6',
'cph': 'X1X4X77074',
'ts': '16XX445656000'}]
If the problem is always in the ts key, and this value is always a string of numbers and letters, you could just remove it before trying to parse:
import re
jstr = """[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]"""
jstr_sanitized = re.sub(r',?\s*\"ts\":[A-Z0-9]+', "", jstr)
jobj = json.loads(jstr_sanitized)
# [{'id': 'LXKhRA3RHtaVBhnczVRJLdr', 'ecc': '0X6', 'cph': 'X1X4X77074'}]
Regex explanation (try online):
,?\s*\"ts\":[A-Z0-9]+
,? Zero or one commas
\s* Any number of whitespace characters
\"ts\": Literally "ts":
[A-Z0-9]+ One or more uppercase letters or numbers
Alternatively, you could catch the JSONDecodeError and look at its pos attribute for the offending character. Then, you could either remove just that character and try again, or look for the next space, comma, or bracket and remove characters until that point before you try again.
jstr = """[{"id":"LXKhRA3RHtaVBhnczVRJLdr","ecc":"0X6","cph":"X1X4X77074", "ts":16XX445656000}]"""
while True:
try:
jobj = json.loads(jstr)
break
except json.JSONDecodeError as ex:
jstr = jstr[:ex.pos] + jstr[ex.pos+1:]
This mangles the output so that the ts key is now a valid integer (after removing the Xs) but since you don't care about that anyway, it should be fine:
[{'id': 'LXKhRA3RHtaVBhnczVRJLdr',
'ecc': '0X6',
'cph': 'X1X4X77074',
'ts': 16445656000}]
Since you'd end up repeatedly re-parsing the initial valid part, this is probably not a great idea if you have a huge json string, or there are lots of places that could throw an error, but it should be fine for the kind of example you have shown.

JSON- Regex to identify a pattern in JSON

I'm new to Python3 and I am working with large JSON objects. I have a large JSON object which has extra chars coming in between two JSON objects, in between the braces.
For example:
{"id":"121324343", "name":"foobar"}3$£_$£rvcfddkgga£($(>..bu&^783 { "id":"343554353", "name":"ABCXYZ"}'
These extra chars could be anything alphanumeric, special chars or ASCII. They appear in this large JSON multiple times and can be of any length. I'm trying to use regex to identify that pattern to remove them, but regex doesn't seem to work. Here is the regex I used:
(^}\n[a-zA-Z0-9]+{$)
Is there a way of identifying such patter using regex in python?
You can select the dictionary data based on named capture groups. As a bonus, this will also ignore any { or } within the extra chars.
The following pattern works on the provided data:
"\"id\"\:\"(?P<id>\d+?)\"[,\s]+\"name\"\:\"(?P<name>[ \w]+)\""
Example
import re
from pprint import pprint
string = \
"""
{"id":"121324343", "name":"foobar"}3$£_$£rvcfdd{}kgga£($(>..bu&^783 { "id":"343554353", "name":"ABC XYZ"}'
"""
pattern = re.compile(pattern="\"id\"\:\"(?P<id>\d+?)\"[,\s]+\"name\"\:\"(?P<name>[ \w]+)\"")
pprint([match.groupdict() for match in pattern.finditer(string=string)])
Output
[{'id': '121324343', 'name': 'foobar'}, {'id': '343554353', 'name': 'ABC XYZ'}]
Test it out yourself: https://regex101.com/r/82BqbE/1
Notes
For this example I assume the following:
id only contains integer digits.
name is a string that can contain the following characters [a-zA-Z0-9_ ]. (this includes white spaces and underscores).
Assuming the whole json is a single line, and there are no }{ inside the fields themselves, this should be enough
In [1]: import re
In [2]: x = """{"id":"121324343", "name":"foobar"}3$£_$£rvcfddkgga£($(>..bu&^783 { "id":"343554353", "name":"ABCXYZ"}"""
In [3]: print(re.sub(r'(?<=})[^}{]+(?={)', "\n", x))
{"id":"121324343", "name":"foobar"}
{ "id":"343554353", "name":"ABCXYZ"}
You can check the regex here https://regex101.com/r/leIoqE/1

How to target all attributes within an HTML tag [duplicate]

Example:
This is just\na simple sentence.
I want to match every character between This is and sentence. Line breaks should be ignored. I can't figure out the correct syntax.
For example
(?<=This is)(.*)(?=sentence)
Regexr
I used lookbehind (?<=) and look ahead (?=) so that "This is" and "sentence" is not included in the match, but this is up to your use case, you can also simply write This is(.*)sentence.
The important thing here is that you activate the "dotall" mode of your regex engine, so that the . is matching the newline. But how you do this depends on your regex engine.
The next thing is if you use .* or .*?. The first one is greedy and will match till the last "sentence" in your string, the second one is lazy and will match till the next "sentence" in your string.
Update
Regexr
This is(?s)(.*)sentence
Where the (?s) turns on the dotall modifier, making the . matching the newline characters.
Update 2:
(?<=is \()(.*?)(?=\s*\))
is matching your example "This is (a simple) sentence". See here on Regexr
Lazy Quantifier Needed
Resurrecting this question because the regex in the accepted answer doesn't seem quite correct to me. Why? Because
(?<=This is)(.*)(?=sentence)
will match my first sentence. This is my second in This is my first sentence. This is my second sentence.
See demo.
You need a lazy quantifier between the two lookarounds. Adding a ? makes the star lazy.
This matches what you want:
(?<=This is).*?(?=sentence)
See demo. I removed the capture group, which was not needed.
DOTALL Mode to Match Across Line Breaks
Note that in the demo the "dot matches line breaks mode" (a.k.a.) dot-all is set (see how to turn on DOTALL in various languages). In many regex flavors, you can set it with the online modifier (?s), turning the expression into:
(?s)(?<=This is).*?(?=sentence)
Reference
The Many Degrees of Regex Greed
Repetition with Star and Plus
Try This is[\s\S]*?sentence, works in javascript
This:
This is (.*?) sentence
works in javascript.
use this: (?<=beginningstringname)(.*\n?)(?=endstringname)
This worked for me (I'm using VS Code):
for:
This is just\na simple sentence
Use:
This .+ sentence
You can simply use this: \This is .*? \sentence
RegEx to match everything between two strings using the Java approach.
List<String> results = new ArrayList<>(); //For storing results
String example = "Code will save the world";
Let's use Pattern and Matcher objects to use RegEx (.?)*.
Pattern p = Pattern.compile("Code "(.*?)" world"); //java.util.regex.Pattern;
Matcher m = p.matcher(example); //java.util.regex.Matcher;
Since Matcher might contain more than one match, we need to loop over the results and store it.
while(m.find()){ //Loop through all matches
results.add(m.group()); //Get value and store in collection.
}
This example will contain only "will save the" word, but in the bigger text it will probably find more matches.
In case of JavaScript you can use [^] to match any character including newlines.
Using the /s flag with a dot . to match any character also works, but is applied to the whole pattern and JavaScript does not support inline modifiers to turn on/off the flag.
To match as least as possible characters, you can make the quantifier non greedy by appending a question mark, and use a capture group to extract the part in between.
This is([^]*?)sentence
See a regex101 demo.
As a side note, to not match partial words you can use word boundaries like \bThis and sentence\b
const s = "This is just\na simple sentence";
const regex = /This is([^]*?)sentence/;
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
The lookaround variant in JavaScript is (?<=This is)[^]*?(?=sentence) and you could check Lookbehind in JS regular expressions for the support.
Also see Important Notes About Lookbehind.
const s = "This is just\na simple sentence";
const regex = /(?<=This is)[^]*?(?=sentence)/;
const m = s.match(regex);
if (m) {
console.log(m[0]);
}
In case anyone is looking for an example of this within a Jenkins context. It parses the build.log and if it finds a match it fails the build with the match.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
node{
stage("parse"){
def file = readFile 'build.log'
def regex = ~"(?s)(firstStringToUse(.*)secondStringToUse)"
Matcher match = regex.matcher(file)
match.find() {
capturedText = match.group(1)
error(capturedText)
}
}
}
There is a way to deal with repeated instances of this split in a block of text? FOr instance: "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence. ". to matches each instance instead of the entire string, use below code:
data = "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence."
pattern = re.compile('This is (?s).*? sentence')
for match_instance in re.finditer(pattern, data):
do_something(match_instance.group())
I landed here on my search for regex to convert this print syntax between print "string", in Python2 in old scripts with: print("string"), for Python3. Works well, otherwise use 2to3.py for additional conversions. Here is my solution for others:
Try it out on Regexr.com (doesn't work in NP++ for some reason):
find: (?<=print)( ')(.*)(')
replace: ('$2')
for variables:
(?<=print)( )(.*)(\n)
('$2')\n
for label and variable:
(?<=print)( ')(.*)(',)(.*)(\n)
('$2',$4)\n
How to replace all print "string" in Python2 with print("string") for Python3?
Here is how I did it:
This was easier for me than trying to figure out the specific regex necessary.
int indexPictureData = result.IndexOf("-PictureData:");
int indexIdentity = result.IndexOf("-Identity:");
string returnValue = result.Remove(indexPictureData + 13);
returnValue = returnValue + " [bytecoderemoved] " + result.Remove(0, indexIdentity); `
for a quick search in VIM, you could use
at Vim Control prompt: /This is.*\_.*sentence
i had this string
headers:
Date:
schema:
type: string
example: Tue, 23 Aug 2022 11:36:23 GMT
Content-Type:
schema:
type: string
example: application/json; charset=utf-8
Transfer-Encoding:
schema:
type: string
example: chunked
Connection:
schema:
type: string
example: keep-alive
Content-Encoding:
schema:
type: string
example: gzip
Vary:
schema:
type: string
example: Accept-Encoding
Server:
schema:
type: number
example: Microsoft-IIS/10.0
X-Powered-By:
schema:
type: string
example: ASP.NET
Access-Control-Allow-Origin:
schema:
type: string
example: '*'
Access-Control-Allow-Credentials:
schema:
type: boolean
example: 'true'
Access-Control-Allow-Headers:
schema:
type: string
example: '*'
Access-Control-Max-Age:
schema:
type: string
example: '-1'
Access-Control-Allow-Methods:
schema:
type: string
example: GET, PUT, POST, DELETE
X-Content-Type-Options:
schema:
type: string
example: nosniff
X-XSS-Protection:
schema:
type: string
example: 1; mode=block
content:
application/json:
and i wanted to remove everything from the words headers: to content so I wrote this regex (headers:)[^]*?(content)
and it worked as expected finding how many times that expression has occurred.
Sublime Text 3x
In sublime text, you simply write the two word you are interested in keeping for example in your case it is
"This is" and "sentence"
and you write .* in between
i.e. This is .* sentence
and this should do you well

How to convert a JSON file to an SQLite database

If I have some sample data, how do I put it into SQLite (preferably fully automated)?
{"uri":"/","user_agent":"example1"}
{"uri":"/foobar","user_agent":"example1"}
{"uri":"/","user_agent":"example2"}
{"uri":"/foobar","user_agent":"example3"}
I found the easiest way to do this is by using jq and CSV as an intermediary format.
Getting the CSV
First write your data to a file.
I will assume data.json here.
Then construct the header using jq:
% head -1 data.json | jq -r 'keys | #csv'
"uri","user_agent"
The head -1 is because we only want one line.
jq's -r makes the output a plain string instead of a JSON-String wrapping the CSV.
We then call the internal function keys to get the keys of the input as an array.
This we send to the #csv formatter which outputs us a single string with the headers in quoted CSV format.
We then need to construct the data.
% jq -r 'map(tostring) | #csv' < data.json
"/","example1"
"/foobar","example1"
"/","example2"
"/foobar","example3"
We now take the whole input and deconstruct the associative array (map) using .[] and then put it back into a simple array […].
This basically converts our dictionary to an array of keys.
Sent to the #csv formatter, we again get some CSV.
Putting it all together we get a single one-liner in the form of:
% (head -1 data.json | jq -r 'keys | #csv' && jq -r 'map(tostring) | #csv' < data.json) > data.csv
If you need to convert the data on the fly, i.e. without a file, try this:
% cat data.json | (read -r first && jq -r '(keys | #csv),(map(tostring) | #csv)' <<<"${first}" && jq -r 'map(tostring) | #csv')
Loading it into SQLite
Open an SQLite database:
sqlite3 somedb.sqlite
Now in the interactive shell do the following (assuming you wrote the CSV to data.csv and want it in a table called my_table):
.mode csv
.import data.csv my_table
Now close the shell and open it again for a clean environment.
You can now easily SELECT from the database and do whatever you want to.
Putting it all together
Have an asciinema recording right there:
Edits
Edit:
As pointed out (thanks #Leo), the original question did show newline delimited JSON objects, which each on their own conform to rfc4627, but not all together in that format.
jq can handle a single JSON array of objects much the same way though by preprocessing the file using jq '.[]' <input.json >preprocessed.json.
If you happen to be dealing with JSON text sequences (rfc7464) luckily jq has got your back too with the --seq parameter.
Edit 2:
Both the newline separated JSON and the JSON text sequences have one important advantage; they reduce memory requirements down to O(1), meaning your total memory requirement is only dependent on your longest line of input, whereas putting the entire input in a single array requires that either your parser can handle late errors (i.e. after the first 100k elements there's a syntax error), which generally isn't the case to my knowledge, or it will have to parse the entire file twice (first validating syntax, then parsing, in the process discarding previous elements, as is the case with jq --stream) which also happens rarely to my knowledge, or it will try to parse the whole input at once and return the result in one step (think of receiving a Python dict which contains the entirety of your say 50G input data plus overhead) which is usually memory backed, hence raising your memory footprint by just about your total data size.
Edit 3:
If you hit any obstacles, try using keys_unsorted instead of keys.
I haven't tested that myself (I kind of assume my columns were already sorted), however #Kyle Barron reports that this was needed.
Edit 4:
As pointed out by youngminz in the comment below the original command fails when working with non-{number,string} values like nested lists.
The command has been updated (with a slightly adapted version from the comment, map() – unlike map_values() converts objects to their keys the same as [.[]], making the map more readable).
Keys remain unaffected, if you really have complex types as keys (which may not even conform to JSON, but I'm too lazy to look it up right now) you can do the same for the key-related mappings.
A way do this without CSV or a 3rd party tool is to use the JSON1 extension of SQLite combined with the readfile extension that is provided in the sqlite3 CLI tool. As well as overall being a "more direct" solution, this has the advantage of handling JSON NULL values more consistently than CSV, which will otherwise import them as empty strings.
If the input file is a well-formed JSON file, e.g. the example given as an array:
[
{"uri":"/","user_agent":"example1"},
{"uri":"/foobar","user_agent":"example1"},
{"uri":"/","user_agent":"example2"},
{"uri":"/foobar","user_agent":"example3"}
]
Then this can be read into the corresponding my_table table as follows. Open the SQLite database file my_db.db using the sqlite3 CLI:
sqlite3 my_db.db
then create my_table using:
CREATE TABLE my_table(uri TEXT, user_agent TEXT);
Finally, the JSON data in my_data.json can be inserted into the table with the CLI command:
INSERT INTO my_table SELECT
json_extract(value, '$.uri'),
json_extract(value, '$.user_agent')
FROM json_each(readfile('my_data.json'));
If the initial JSON file is newline separated JSON elements, then this can be converted first using jq using:
jq -s <my_data_raw.json >my_data.json
It's likely there is a way to do this directly in SQLite using JSON1, but I didn't pursue that given that I was already using jq to massage the data prior to import to SQLite.
sqlitebiter appears to provide a python solution:
A CLI tool to convert CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/TSV/Google-Sheets to a SQLite database file. http://sqlitebiter.rtfd.io/
docs:
http://sqlitebiter.readthedocs.io/en/latest/
project:
https://github.com/thombashi/sqlitebiter
last update approximately 3 months ago
last issue closed approximately 1 month ago, none open
noted today, 2018-03-14
You can use spyql.
spyql reads the json files (with 1 json object per line) and generates INSERT statements that you can pipe into sqlite:
$ spyql -Otable=my_table "SELECT json->uri, json->user_agent FROM json TO sql" < sample3.json | sqlite3 my.db
This assumes that you already created an empty table in the sqlite database my.db.
Disclaimer: I am the author of spyql.
To work with a file of newline delimited JSON objects, including \n in the data.
Add a header column name and ensure the JSON is compact (1 line per record).
cat <(echo '"line"') source.json | jq -c '.' > source.fauxcsv
Import the JSON and header as a "csv" into a temporary table with a column separator \t that won't occur in the JSON. Then create the real table via SQLites JSON functions.
sqlite3 file.db \
-cmd '.separator \t \n' \
-cmd '.import --schema temp source.fauxcsv temp_json_lines' <<-'EOSQL'
INSERT into records SELECT
json_extract(line, '$.rid'),
coalesce(json_extract(line, '$.created_at'), strftime('%Y-%m-%dT%H:%M:%fZ', 'now')),
json_extract(line, '$.name')
FROM temp_json_lines;
EOSQL
If (as in the original question) the JSON data comes in the form of JSONLines (that is, one JSON entity per line), and if it is desired to create a table with one of these entities per row, then sqlite3 can be used to import the data by setting .mode=line, e.g. as follows:
create table input (
raw JSON
);
.mode=line
.import input.json input
This approach is worth knowing not least because it can easily be adapted to handle cases where the data is not already in JSONLines format. For example, if input.json contains a single very long JSON array, we could use a tool such as jq or gojq to "splat" it:
.mode=line
.import "|jq -c .[] input.json" input
Similarly, if input.json contains a single object with many keys, and if it is desired to create a table of corresponding single-key objects:
.mode=line
.import "|jq -c 'to_entries[] | {(.key): .value}'" input
If the original data is a single very large JSON array or JSON object, jq's streaming parser could be used to save memory. In this context, it may be worth mentioning two CLI tools with minimal memory requirements: my own jm (based on JSON Machine), and jm.py (based on ijson). E.g., to "splat" each array in a file containing one or more JSON arrays:
.mode=line
.import "|jm input.json" input
With the JSON data safely in an SQLite table, it is (thanks to SQLite's support for JSON) now quite straightforward to create indices, populate other tables, etc., etc.
Here is the first answer compiled into a deno script:
// just for convenience (pathExists)
import {} from "https://deno.land/x/simple_shell#0.9.0/src/stringUtils.ts";
/**
* #description
* convert a json db to csv and then to sqlite
*
* #note
* `sqliteTableConstructor` is a string that is used to create the table, if it is specified the csv file *should not* contain a header row.
* if it's not specified then the csv file *must* contain a header row so it can be used to infer the column names.
*/
const jsonToSqlite = async (
{
jsonDbPath,
jsonToCsvFn,
sqliteDbPath,
sqliteTableConstructor,
tableName,
}: {
jsonDbPath: string;
sqliteDbPath: string;
tableName: string;
sqliteTableConstructor?: string;
// deno-lint-ignore no-explicit-any
jsonToCsvFn: (jsonDb: any) => string;
},
) => {
// convert it into csv
const csvDbPath = `${jsonDbPath.replace(".json", "")}.csv`;
if (csvDbPath.pathExists()) {
console.log(`${csvDbPath} already exists`);
} else {
const db = JSON.parse(await Deno.readTextFile(jsonDbPath));
const csv = jsonToCsvFn(db);
await Deno.writeTextFile(csvDbPath, csv);
}
// convert it to sqlite
if (sqliteDbPath.pathExists()) {
console.log(`${sqliteDbPath} already exists`);
} else {
const sqlite3 = Deno.spawnChild("sqlite3", {
args: [sqliteDbPath],
stdin: "piped",
stderr: "null", // required to make sqlite3 work
});
await sqlite3.stdin.getWriter().write(
new TextEncoder().encode(
".mode csv\n" +
(sqliteTableConstructor ? `${sqliteTableConstructor};\n` : "") +
`.import ${csvDbPath} ${tableName}\n` +
".exit\n",
),
);
await sqlite3.status;
}
};
Example of usage:
await jsonToSqlite(
{
jsonDbPath: "./static/db/db.json",
sqliteDbPath: "./static/db/db.sqlite",
tableName: "radio_table",
sqliteTableConstructor:
"CREATE TABLE radio_table(name TEXT, country TEXT, language TEXT, votes INT, url TEXT, favicon TEXT)",
jsonToCsvFn: (
db: StationDBType[],
) => {
const sanitize = (str: string) =>
str.trim().replaceAll("\n", " ").replaceAll(",", " ");
return db.filter((s) => s.name.trim() && s.url.trim())
.map(
(station) => {
return (
sanitize(station.name) + "," +
sanitize(station.country) + "," +
sanitize(station.language) + "," +
station.votes + "," +
sanitize(station.url) + "," +
sanitize(station.favicon)
);
},
).join("\n");
},
},
);
Edit1:
Importing csv to sqlite by defaults sets all column types to string. In this edit I allow the user to create the table first (via an optional constructor) before importing the csv into it, this way he can specify the exact column types.
Improve example
Edit2:
Turns out that with deno and sqlite-deno you don't need to use csv as an intermediate or shell out to sqlite, here is an example on how to achieve this:
This next code will create a new sql db from the json one.
import { DB } from "https://deno.land/x/sqlite#v3.2.1/mod.ts";
export interface StationDBType {
name: string;
country: string;
language: string;
votes: number;
url: string;
favicon: string;
}
export const db = new DB("new.sql");
db.query(
"create TABLE radio_table (name TEXT, country TEXT, language TEXT, votes INT, url TEXT, favicon TEXT)",
);
const jsonDb: StationDBType[] = JSON.parse(
await Deno.readTextFile("static/db/compressed_db.json"),
);
const sanitize = (s: string) => s.replaceAll('"', "").replaceAll("'", "");
db.query(
`insert into radio_table values ${
jsonDb.map((station) =>
"('" +
sanitize(station.name) +
"','" +
sanitize(station.country) +
"','" +
sanitize(station.language) +
"'," +
station.votes +
",'" +
sanitize(station.url) +
"','" +
sanitize(station.favicon) +
"')"
).join(",")
}`,
);
db.close();

Removing \n \\n and other unwanted characters from a json unicode dictionary with python

I've tried a couple of different solutions to fix my problem with some "funny" newlines within my json dictionary and none of them works, so I thought I might make a post. The dictionary is achieved by scraping a website.
I have a json dictionary:
my_dict = {
u"Danish title": u"Avanceret",
u"Course type": u"MScTechnol",
u"Type of": u"assessmen",
u"Date": u"\nof exami",
u"Evaluation": u"7 step sca",
u"Learning objectives": u"\nA studen",
u"Participants restrictions": u"Minimum 10",
u"Aid": u"No Aid",
u"Duration of Course": u"13 weeks",
u"name": u"Advanced u",
u"Department": u"31\n",
u"Mandatory Prerequisites": u"31545",
u"General course objectives": u"\nThe cour",
u"Responsible": u"\nMartin C",
u"Location": u"Campus Lyn",
u"Scope and form": u"Lectures, ",
u"Point( ECTS )": u"10",
u"Language": u"English",
u"number": u"31548",
u"Content": u"\nThe cour",
u"Schedule": u"F4 (Tues 1"
}
I have stripped the value content to [:10] to reduce clutter, but some of the values have a length of 300 characters. It might not be portrayed well here, but some of values have a lot of newline characters in them and I've tried a lot of different solutions to remove them, such as str.strip and str.replace but without success because my 'values' are unicode. And by values I mean key, value in my_dict.items().
How do I remove all the newlines appearing in my dictionary? (With the values in focus as some of the newlines are trailing, some are leading and others are in the middle of the content: e.i \nI have a\ngood\n idea\n).
EDIT
I am using Python v. 2.7.11 and the following piece of code doesn't produce what I need. I want all the newlines to be changed to a single whitespace character.
for key, value in test.items():
value = str(value[:10]).replace("\n", " ")
print key, value
If you're trying to remove all \n or any junk character apart from numbers or letters then use regex
for key in my_dict.keys():
my_dict[key] = mydict[key].replace('\\n', '')
my_dict[key] = re.sub('[^A-Za-z0-9 ]+', '', my_dict[key])
print my_dict
If you wish to keep anything apart from those then add it on to the character class inside the regex
for remove '\n' try this ....
for key, value in my_dict.items():
my_dict[key] = ''.join(value.split('\n'))
you need to put the updated value back to your dictionary ( similar to "by value vs. by reference" situation ;) ) ...
to remove the "/n" this one liner may be more "pythonic" :
new_test ={ k:v.replace("\n", "") for k,v in test.iteritems()}
to do what you try to do in your loop try something like:
new_test ={ k:str(value[:10]).replace("\n", " ") for k,v in test.iteritems()}
In your code, value takes the new value, but you never write it back...
So for example, this would work (but be slower, also you would be changing the values inside the loop, which should not cause problems, but the interpreter might not like...):
for key, value in test.items():
value = str(value[:10]).replace("\n", " ")
#now put it back to the dictionary...
test[key]=value
print key, value