I need to transform a csv of Ids into a csv of Names.
I have:
FOLDER ID NAME | FILE ID NAME PATH
1 A 1 fX 1
2 AB 2 fZ 1,2
3 B 3 fY 3,4
4 BC 4 fW 3,4,5
5 BCD
Get info about FILEs and its sizes from the FILEDATA table
select FILE.NAME, FILE.PATH, FILEDATA.SIZE
from FILEDATA inner join FILE on FILEDATA.fileid = FILE.id
WHERE FILEDATA.PropName = "Size"
Actually I get
fX 1 23805
fZ 1,2 27205
fY 3,4 23608
fW 3,4,5 21501
I need replace the IDs by the FOLDER names
fX A 23805
fZ A/AB 27205
fY B/BC 23608
fW B/BC/BDC 21501
Related
I have a tsv file with different column number
1 123 123 a b c
1 123 b c
1 345 345 a b c
I would like to extract only rows with 6 columns
1 123 123 a b c
1 345 345 a b c
How I can do that in bash (awk, sed or something else) ?
Using Awk
$ awk -F'\t' 'NF==6' file
1 123 123 a b c
1 345 345 a b c
FYI, most of the existing solutions have one potential pitfall :
echo "1\t2\t3\t4\t5\t" |
mawk '$!NF = "\n\n\t NF == "( NF ) \
" :\f\b<( "( $_ )" )>\n\n"' FS='\11'
NF == 6 :
<( 1 2 3 4 5 )>
if the input file happens to have a trailing tab \t, it would still be reported by awk as having NF count of 6. whether this test case line actually has 5 columns or 6 in the logical sense is open for interpretation.
Using GNU sed let file.txt content be
1 123 123 a b c
1 123 b c
1 345 345 a b c
1 777 777 a b c d
then
sed -n '/^[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*\t[^\t]*$/p' file.txt
gives output
1 123 123 a b c
1 345 345 a b c
Explanation: -n turn off default printing, sole action is to print (p) line matching pattern which is begin (^) and end ($) anchored consisting of 6 column of non-TABs separated by single TABs. This code does use very basic features sed but as you might observe is longer than AWK and not as easy in adjusting N.
(tested in GNU sed 4.2.2)
This might work for you (GNU sed):
sed -nE 's/\S+/&/6p' file
This will print lines with 6 or more fields.
sed -nE 's/\S+/&/6;T;s//&/7;t;p' file
This will print lines with only 6 fields.
I have a list of skip rows ( say [1,5,10] --> row numbers) and when I passed this to pandas read_csv, it ignores those rows. But, I need to save these skipped rows in a different text file.
I went through pandas read_csv documentation and few other articles, but have no idea how to save this into a text file.
Example :
Input file :
a,b,c
# Some Junk to Skip 1
4,5,6
# Some junk to skip 2
9,20,9
2,3,4
5,6,7
Code :
skiprows = [1,3]
df = pandas.read_csv(file, skip_rows = skiprows)
Now output.txt :
# Some junk to skip 1
# Some junk to skip 2
Thanks in advance!
def write_skiprows(infile, skiprows, outfile='skiprows.csv')
maxrow = max(skiprows)
with open(infile, 'r') as f, open(outfile, 'w') as o:
for i, line in enumerate(f):
if i in skiprows:
o.write(line)
if i == maxrow:
return
try this,
df=pd.read_csv('input.csv')
skiprows=[1,3,6]
df,df_skiprow=df.drop(skiprows),df.iloc[skiprows]
#df_skiprow.to_csv('skiprows.csv',index=False)
Input:
a b
0 1 c1
1 2 c2
2 3 c3
3 4 c4
4 5 c5
5 6 c6
6 7 c7
7 8 c8
8 9 c9
9 10 c10
Output:
df
a b
0 1 c1
2 3 c3
4 5 c5
5 6 c6
7 8 c8
8 9 c9
9 10 c10
df_skiprow
a b
1 2 c2
3 4 c4
6 7 c7
Explanation:
read whole file.
split file by df and skiprow
convert into seperate csv file.
I currently have a CSV like this:
A B C
1 10 {"a":"one","b":"two","c":"three"}
1 10 {"a":"four","b":"five","c":"six"}
1 10 {"a":"seven","b":"eight","c":"nine"}
1 10 {"a":"ten","b":"eleven","c":"twelve"}
2 10 {"a":"thirteen","b":"fourteen","c":"fifteen"}
2 10 {"a":"sixteen","b":"seventeen","c":"eighteen"}
2 10 {"a":"nineteen","b":"twenty","c":"twenty-one"}
3 10 {"a":"twenty-two","b":"twenty-three","c":"twenty-four"}
3 10 {"a":"twenty-five","b":"twenty-six","c":"twenty-seven"}
3 10 {"a":"twenty-eight","b":"twenty-nine","c":"thirty"}
3 10 {"a":"thirty-one","b":"thirty-two","c":"thirty-three"}
I want to group by column A, ignore column B, and take only the "b" field in C, and get an output like:
A C
1 ['two','five','eight','eleven']
2 ['fourteen','seventeen','twenty']
3 ['twenty-three','twenty-six','twenty-nine','thirty-two']
Can I do this? I have pandas if that will be useful! Also I would like the output file to be tab delimited.
Try this:
import pandas as pd
import json
# read file that looks exactly as given above
df = pd.read_csv("file.csv", delim_whitespace=True)
# drop the 'B' column
del df['B']
# 'C' will start life as a string. convert from json, extract values, return as list
df['C'] = df['C'].map(lambda x: json.loads(x)['b'])
# 'C' now holds just the 'b' values. group these together:
df = df.groupby('A').C.apply(lambda x : list(x))
print(df)
This returns:
A
1 [two, five, eight, eleven]
2 [fourteen, seventeen, twenty]
3 [twenty-three, twenty-six, twenty-nine, thirty...
IIUC
df.groupby('A').C.apply(lambda x : [y['b'] for y in x ])
A
1 [two, five, eight, eleven]
2 [fourteen, seventeen, twenty]
3 [twenty-three, twenty-six, twenty-nine, thirty...
Name: C, dtype: object
Let's say I have the following table in my MySQL database:
id item value
1 Plump A
2 Apple B
3 Banana F
4 Peach K
5 Orange B
6 Cherry U
And I have this table on my computer:
id value
1 B
2 F
3 L
4 A
5 B
6 A
I want to import the table from computer and replace the values from value with the values from value where id = id without changing the values in item.
Means, I need this on my MySQL database at the end:
id item value
1 Plump B
2 Apple F
3 Banana L
4 Peach A
5 Orange B
6 Cherry A
How can I do that?
I ended up with doing the following in Apple Numbers:
Then copy and paste in something like Coda. Search replace tab characters.
Result is:
UPDATE my_table SET value = "B" WHERE id = 1;
UPDATE my_table SET value = "F" WHERE id = 2;
UPDATE my_table SET value = "L" WHERE id = 3;
UPDATE my_table SET value = "A" WHERE id = 4;
UPDATE my_table SET value = "B" WHERE id = 5;
UPDATE my_table SET value = "A" WHERE id = 6;
Now I just have to execute these SQL codes in my database. Done!
Need customized JSON output--
(I have two files - text file and schema file)
abc.txt -
100002030,Tom,peter,eng,block 3, lane 5,california,10021
100003031,Tom,john,doc,block 2, lane 2,california,10021
100004032,Tom,jim,eng,block 1, lane 1,california,10021
100005033,Tom,trek,doc,block 2, lane 2,california,10021
100006034,Tom,peter,eng,block 6, lane 6,california,10021
abc_schema.txt (field name and position)
rollno 1
firstname 2
lastname 3
qualification 4
address1 5
address2 6
city 7
Zipcode 8
Rules-
First 6 characters of rollno
Need to club address1 | address2 | city
Prefix Address to above
Expected Output-
{"rollno":"100002","firstname":"Tom","lastname:"peter","qualification":"eng","Address":"block 3 lane 5 california","zipcode":"10021"}
{"rollno":"100002","firstname":"Tom","lastname:"john","qualification":"doc","Address":"block 2 lane 2 california","zipcode":"10021"}
{"rollno":"100004","firstname":"Tom","lastname:"jim","qualification":"eng","Address":"block 1 lane 1 california","zipcode":"10021"}
{"rollno":"100005","firstname":"Tom","lastname:"trek","qualification":"doc","Address":"block 2 lane 2 california","zipcode":"10021"}
{"rollno":"100006","firstname":"Tom","lastname:"peter","qualification":"eng","Address":"block 6 lane 6 california","zipcode":"10021"}
I do not wish to hardcode the fields but read from the schema file, the idea is to have reusable code. Something like looping schema file and the text file
A = load 'abc.txt' using PigStorage(',') as (rollno, Fname,Lname,qua,add1,add2,city,Zipcode);
B = foreach A generate rollno, Fname,Lname,qua,concate (add1,add2,city) ,Zipcode;
C= STORE B
INTO 'first_table.json'
USING JsonStorage();
Hope this helps.