Export Redis hashes to csv - csv

This answer doesn't work for me
I run this command to find the number of keys that I want
SCAN 0 MATCH "test_user:*"
so I got a (very long) list of hashes that I want to export to CSV.
I tried
SCAN 0 MATCH "test_user:*" > list.csv
or simply
SCAN 0 MATCH "test_user:*" > list.txt
but always with syntax error response.
Any idea?

The only way I found is this (creating a sh script)
redis-cli --scan --pattern test_user:* |\
grep -e "^test_users:[^:]*$" |\
awk '{print "hmget " $0 " id display_name reputation location"}' |\
redis-cli --csv > test_user.csv
It works very well scanning for pattern, you can use regex for better accuracy.
Then you use a awk script to run the redis command 'hmget'.
Finally the output is printed in a csv file with the --csv utility
https://rdbtools.com/blog/redis-export-hashes-as-csv-using-cli/

Related

How to format a TXT file into a structured CSV file in bash?

I wanted to get some information about my CPU temperatures on my Linux Server (OpenSuse Leap 15.2). So I wrote a Script which collects data every 20 seconds and writes it into a text file. Now I have removed all garbage data (like "CPU Temp" etc.) I don't need.
Now I have a file like this:
47
1400
75
3800
The first two lines are one reading of the CPU temperature in C and the fan speed in RPM, respectively. The next two lines are another reading of the same measurements.
In the end I want this structure:
47,1400
75,3800
My question is: Can a Bash script do this for me? I tried something with sed and Awk but nothing worked perfectly for me. Furthermore I want a CSV file to make a graph, but i think it isn't a problem to convert a text file into a CSV file.
You could use paste
paste -d, - - < file.txt
With pr
pr -ta2s, file.txt
with ed
ed -s file.txt <<-'EOF'
g/./s/$/,/\
;.+1j
,p
Q
EOF
You can use awk:
awk 'NR%2{printf "%s,",$0;next;}1' file.txt > file.csv
Another awk:
$ awk -v OFS=, '{printf "%s%s",$0,(NR%2?OFS:ORS)}' file
Output:
47,1400
75,3800
Explained:
$ awk -v OFS=, '{ # set output field delimiter to a comma
printf "%s%s", # using printf to control newline in output
$0, # output line
(NR%2?OFS:ORS) # and either a comma or a newline
}' file
Since you asked if a bash script can do this, here's a solution in pure bash. ;o]
c=0
while read -r line; do
if (( c++ % 2 )); then
echo "$line"
else printf "%s," "$line"
fi
done < file
Take a look at 'paste'. This will join multiple lines of text together into a single line and should work for what you want.
echo "${DATA}"
Name
SANISGA01CI
5WWR031
P59CSADB01
CPDEV02
echo "${DATA}"|paste -sd ',' -
Name,SANISGA01CI,5WWR031,P59CSADB01,CPDEV02

Search in large csv files

The problem
I have thousands of csv files in a folder. Every file has 128,000 entries with four columns in each line.
From time to time (two times a day) I need to compare a list (10,000 entries) with all csv files. If one of the entries is identical with the third or fourth column of one of the csv files I need to write the whole csv row to an extra file.
Possible solutions
Grep
#!/bin/bash
getArray() {
array=()
while IFS= read -r line
do
array+=("$line")
done < "$1"
}
getArray "entries.log"
for e in "${array[#]}"
do
echo "$e"
/bin/grep $e ./csv/* >> found
done
This seems to work, but it lasts forever. After almost 48 hours the script checked only 48 entries of about 10,000.
MySQL
The next try was to import all csv files to a mysql database. But there I had problems with my table at around 50,000,000 entries.
So I wrote a script which created a new table after 49,000,000 entries and so I was able to import all csv files.
I tried to create an index on the second column but it always failed (timeout). To create the index before the import process wasn't possible, too. It slowed down the import to days instead of only a few hours.
The select statement was horrible, but it worked. Much faster than the "grep" solution but still to slow.
My question
What else can I try to search within the csv files?
To speed things up I copied all csv files to an ssd. But I hope there are other ways.
This is unlikely to offer you meaningful benefits, but some improvements to your script
use the built-in mapfile to slurp a file into an array:
mapfile -t array < entries.log
use grep with a file of patterns and appropriate flags.
I assume you want to match items in entries.log as fixed strings, not as regex patterns.
I also assume you want to match whole words.
grep -Fwf entries.log ./csv/*
This means you don't have to grep the 1000's of csv files 1000's of times (once for each item in entries.log). Actually this alone should give you a real meaningful performance improvement.
This also removes the need to read entries.log into an array at all.
In awk assuming all the csv files change, otherwise it would be wise to keep track of the already checked files. But first some test material:
$ mkdir test # the csvs go here
$ cat > test/file1 # has a match in 3rd
not not this not
$ cat > test/file2 # no match
not not not not
$ cat > test/file3 # has a match in 4th
not not not that
$ cat > list # these we look for
this
that
Then the script:
$ awk 'NR==FNR{a[$1];next} ($3 in a) || ($4 in a){print >> "out"}' list test/*
$ cat out
not not this not
not not not that
Explained:
$ awk ' # awk
NR==FNR { # process the list file
a[$1] # hash list entries to a
next # next list item
}
($3 in a) || ($4 in a) { # if 3rd or 4th field entry in hash
print >> "out" # append whole record to file "out"
}' list test/* # first list then the rest of the files
The script hashes all the list entries to a and reads thru the csv files looking for 3rd and 4th field entries in the hash outputing when there is a match.
If you test it, let me know how long it ran.
You can build a patterns file and then use xargs and grep -Ef to search for all patterns in batches of csv files, rather than one pattern at a time as in your current solution:
# prepare patterns file
while read -r line; do
printf '%s\n' "^[^,]+,[^,]+,$line,[^,]+$" # find value in third column
printf '%s\n' "^[^,]+,[^,]+,[^,]+,$line$" # find value in fourth column
done < entries.log > patterns.dat
find /path/to/csv -type f -name '*.csv' -print0 | xargs -0 grep -hEf patterns.dat > found.dat
find ... - emits a NUL-delimited list of all csv files found
xargs -0 ... - passes the file list to grep, in batches

extra characters at end of command in shell

I am trying to run the following script:
sed -E -n '/"data"/,/}/{/[{}]/d;s/^[[:space:]]*"([^"]+)":[[:space:]]*"([^"]+)".*$/\1|\2/g;p}' /tmp/data.json | while IFS="|" read -r item val;do item="${item^^}"; item="${val}"; export "${item}"; echo ${item}; done
This basically exports data from inside a JSON as environment variables.
That is,
Here, the key data will have a list (of different lengths) of key-value pairs within itself wherein the key is not fixed. Now, I want to read every key in the list and export its value. For example, I want these commands to be executed as part of the shell script.
export HELLO1
export SAMPLEKEY
However, when I run this, it gives the error: sed: 1: "/"data"/,/}/{/[{}]/d;s/ ...": extra characters at the end of p command. What might be the reason for this?
Rather than trying to use sed to parse .json files (which can rapidly grow beyond reasonable sed parsing), instead use a tool made for parsing json (like jq -- json query). You can easily obtain the keys for values under data, and then parse with your shell tools.
(note: your questions should be tagged bash since you use the parameter expansion for character-case which is a bashism, e.g. ${item^^})
Using jq, you could do something like the following:
jq '.data' /tmp/data.json | tail -n+2 | head -n-1 |
while read -r line; do line=${line#*\"}; line=${line%%\"*}; \
printf "export %s " ${line^^}; done; echo ""
Which results in the output:
export HELLO1 export SAMPLEKEY
(there are probably even cleaner way to do this with jq -- and there was)
You can have jq output the keys for data one per line with:
jq -r '.data | to_entries[] | (.key|ascii_upcase)' /tmp/data.json
This allows you to shorten your command to generate export in from of the keys with:
while read -r key; do \
printf "export %s " $key; \
done < <(jq -r '.data | to_entries[] | (.key|ascii_upcase)' /tmp/data.json); \
echo ""
(note: to effect your actual environment, you would need to export the values as part of your shell startup)

Split JSON into multiple files

I have json file exported from mongodb which looks like:
{"_id":"99919","city":"THORNE BAY"}
{"_id":"99921","city":"CRAIG"}
{"_id":"99922","city":"HYDABURG"}
{"_id":"99923","city":"HYDER"}
there are about 30000 lines, I want to split each line into it's own .json file. (I'm trying to transfer my data onto couchbase cluster)
I tried doing this:
cat cities.json | jq -c -M '.' | \
while read line; do echo $line > .chunks/cities_$(date +%s%N).json; done
but I found that it seems to drop loads of line and the output of running this command only gave me 50 odd files when I was expecting 30000 odd!!
Is there a logical way to make this not drop any data using anything that would suite?
Assuming you don't care about the exact filenames, if you want to split input into multiple files, just use split.
jq -c . < cities.json | split -l 1 --additional-suffix=.json - .chunks/cities_
In general to split any text file into separate files per-line using any awk on any UNIX system is simply:
awk '{close(f); f=".chunks/cities_"NR".json"; print > f}' cities.json

linux command line to zip files based on mysql resultset

i have a table, where some filenames are stored.
i would like to find all the files having that name under a specific folder and zip all of them.
on disk the structure is similar to this:
/folder/sub1/file1
/folder/sub1/file2
/folder/sub2/file1 <- same name as under sub1
/folder/sub2/file2
so i am looking for something similar to:
mysql -e "select file from table" | find /folder -type f -name <the value of file from mysql result set> | zip <all files found by all find commands>
thanks.
Couple of additions to your command:
Firstly, you want to use mysql in batch mode, so you do this:
mysql -Be "select file from table"
It gives you a single column table with no borders, so you get rid of the headers by piping it to tail starting at the second line:
tail -n +2
Then you pipe that to xargs, but before you do, hack it a bit with concat (you'll see why in a sec):
mysql -Be "select concat(' -o -name ', file) from table"
NOW you pipe it to xargs:
xargs find /folder -false
This does a "false" test (i.e. a no-op), but it appends a whole pile of things like -o -name somename.file, each of which performs a boolean or (with false originally, later with all other file names) and ultimately returns the list of files that match.
...which you finally pipe to zip, with another xargs:
xargs zip files.zip
Again, this puts the file names as arguments to zip.
Here's the total line:
mysql -Be "select concat(' -o -name ', file) from table" | tail -n +2 | xargs find /folder -false | xargs zip files.zip
Bear in mind that this assumes you have no spaces in your filenames. If you do, that'll add a bit of complexity: You can work around that by using -print0 and -0 in find and xargs respectively, although zip will have a harder time with that so you'd need to add another intermediate stage (or use zip -r).