Related
I am having some data coming from csv which has \n character in it and I expect neo4j to add a new line when assigning that string to some attribute in node. Apparently its not working. I can see \n character as it is added in the string.
How to make it work? Thanks in Advance.
Following is one such string example from CSV:
Combo 4 4 4 5 \n\nSpare Fiber Inventory. \nMultimode Individual fibers from 9927/9928 to FDB.\nNo available spares from either BTS to FDB - New conduits would be required\n\nFrom FDB to tower top. 9 of 9 Spares available on 2.5 riser cables.
My load command:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS
FROM 'file:///abc.csv' AS line
WITH line WHERE line.parent <> "" AND line.type = 'LSD' AND line.parent_type = 'XYZ'
This is a hack that I made to replace the occurrences of \n with a newline. The character \ is an escape character so it will replace \n with a new line in line 4. Do not remove line 5 and combine with line 4.
LOAD CSV WITH HEADERS
FROM 'file:///abc.csv' AS line
WITH line WHERE line.parent <> ""
WITH replace(line.parent,'\\n',"
") as parent
MERGE (p:Parent {parent: parent})
RESULT:
{
"identity": 16,
"labels": [
"Parent"
],
"properties": {
"parent": "Combo 4 4 4 5
Spare Fiber Inventory.
Multimode Individual fibers from 9927/9928 to FDB.
No available spares from either BTS to FDB - New conduits would be required
From FDB to tower top. 9 of 9 Spares available on 2.5 riser cables."
}
}
I am working on generating an HTML page using a CGI script in Perl.
I need filter some sequences in order to check whether they contain a specific pattern; if they contain it I need to print those sequences on my page with 50 bases per line, and highlight the pattern in the sequences. My sequences are in an hash called %hash; the keys are the names, the values are the actual sequences.
my %hash2;
foreach my $key (keys %hash) {
if ($hash{$key} =~ s!(aaagg)!<b>$1</b>!) {
$hash2{$key} = $hash{$key}
}
}
foreach my $key (keys %hash2) {
print "<p> <b> $key </b> </p>";
print "<p>$_</p>\n" for unpack '(A50)*', $hash2{$key};
}
This method "does" the job however if I highlight the pattern "aaagg" using this method I am messing up the unpacking of the line (for unpack '(A50)*'); because now the sequences contains the extra characters of the bold tags which are included in the unpacking count. This beside making the lines of different length it is also a big problem if the tag falls between 2 lines due to unpacking 50 characters, it basically remains open and everything after that is bold.
The script below uses a single randomly generated DNA sequence of length 243 (generated using http://www.bioinformatics.org/sms2/random_dna.html) and a variable length pattern.
It works by first recording the positions which need to be highlighted instead of changing the sequence string. The highlighting is inserted after the sequence is split into chunks of 50 bases.
The highlighting is done in reverse order to minimize bookkeeping busy work.
#!/usr/bin/env perl
use utf8;
use strict;
use warnings;
use YAML::XS;
my $PRETTY_WIDTH = 50;
# I am using bold-italic so the highlighting
# is visible on Stackoverflow, but in real
# life, this would be something like:
# my #PRETTY_MARKUP = ('<span class="highlighted-match">', '</span>');
my #PRETTY_MARKUP = ('<b><i>', '</i></b>');
use constant { BAŞ => 0, SON => 1, ROW => 0, COL => 1 };
my $sequence = q{ccggtgagacatccagttagttcactgagccgacttgcatcagtcatgcttttccccgtaatgagggccccatattcaggccgtcgtccggaattgtcttggatccggaatgcagcttttctcaccgcttgatgaacattcactgaatatctgacgccgcgaaaacagggtcactagcctgtttccggtcgcccgagaccggcgagtttgtggtatcgcgagcgcccccgggcggtagggtct};
my $wanted = 'c..?gg';
my #pos;
while ($sequence =~ /($wanted)/g) {
push #pos, [ pos($sequence) - length($1), pos($sequence) ];
}
print Dump \#pos;
my #output = unpack "(A$PRETTY_WIDTH)*", $sequence;
print Dump \#output;
while (my $pos = pop #pos) {
my #rc = map pos_to_rc($_, $PRETTY_WIDTH), #$pos;
substr($output[ $rc[$_][ROW] ], $rc[$_][COL], 0, $PRETTY_MARKUP[$_]) for SON, BAŞ;
}
print Dump \#output;
sub pos_to_rc {
my $r = int( $_[0] / $_[1] );
my $c = $_[0] - $r * $_[1];
[ $r, $c ];
}
Output:
C:\...\Temp> perl s.pl
---
- - 0
- 4
- - 76
- 80
- - 87
- 91
- - 97
- 102
- - 104
- 108
- - 165
- 170
- - 184
- 188
- - 198
- 202
- - 226
- 231
---
- ccggtgagacatccagttagttcactgagccgacttgcatcagtcatgct
- tttccccgtaatgagggccccatattcaggccgtcgtccggaattgtctt
- ggatccggaatgcagcttttctcaccgcttgatgaacattcactgaatat
- ctgacgccgcgaaaacagggtcactagcctgtttccggtcgcccgagacc
- ggcgagtttgtggtatcgcgagcgcccccgggcggtagggtct
---
- ccggtgagacatccagttagttcactgagccgacttgcatcagtcatgct
- tttccccgtaatgagggccccatattcaggccgtcgtccggaattgtctt
- ggatccggaatgcagcttttctcaccgcttgatgaacattcactgaatat
- ctgacgccgcgaaaacagggtcactagcctgtttccggtcgcccgagacc
- ggcgagtttgtggtatcgcgagcgcccccgggcggtagggtct
Especially since this turns out to have been a homework assignment, it is now up to you to take this and apply it to all sequences in your hash table.
I'm having trouble using grep to through some html code.
I'm trying to find similar strings to this
<td>product description here</td><td> $<font color='red'>0.25</font>
i'm trying to generalize formula to count each line that is under $0.25 the parts that will vary are the:
href='/go/12229' the number after /go/ will change but always be a number 5 digits long
the product description can be alphanumeric with spaces and special characters
and the price can be anything from 0.01 to 0.25
I've tried making formulas like the one below but it either does not work or returns nothing.
grep -c "href='/go/'[*] target="_blank" rel="nofollow">*</a></td><td> $<font color='red'>[0].[0-2][0-9]</font>"
I think it has to do with me not escaping special characters correctly, but i'm not sure.
Any help is appreciated.
Okay - this requires that each line be formated as in your example, but this should give you the link, description and prices where each line is between 0.01 and 0.25. The the contents of this code an put them in a file like "priceawk" and make it executable:
grep 'go\/[0-9]\{5\}' | awk -F"<" '
{
split( $7, price_arr, ">" )
if( price_arr[ 2 ] > 0.00 && price_arr[ 2 ] < 0.26 )
{
split( $3, link_arr, "'\''" )
split( link_arr[ 3 ], desc_arr, ">" )
printf( "%s %s %s\n", link_arr[ 2 ], desc_arr[ 2 ], price_arr[ 2 ] )
}
} '
Then use it like:
cat input | priceawk
With a test input file I made from your line, I get the following kinds of output:
/go/12229 product description here 0.25
/go/13455 find this line2 0.01
/go/12334 find this line3 0.23
/go/34455 find this line4 0.16
The printf() can be improved to give your output in a different form, with a more useful delimiter than the current space.
I have a large .csv file to to process and my elements are arranged randomly like this:
xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012
xxxxxx,xx,MLOCAL,341993,22/10/2012
xxxxxx,xx,MREMOTE,9356828,08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012
where fields LOCAL, REMOTE, MLOCAL or MREMOTE are displayed like:
when they are displayed as pairs (LOCAL/REMOTE) if 3rd field is MLOCAL, and 4th field is MREMOTE, then 5th and 7th field represent the value and date of MLOCAL, and 6th and 8th represent the value and date of MREMOTE
when they are displayed as single (only LOCAL or only REMOTE) then the 4th and 5th fields represent the value and date of field 3.
Now, I have split these rows using:
nawk 'BEGIN{
while (getline < "'"$filedata"'")
split($0,ft,",");
name=ft[1];
ID=ft[2]
?=ft[3]
?=ft[4]
....................
but because I can't find a pattern for the 3rd and 4th field I'm pretty stuck to continue to assign var names for each of the array elements in order to use them for further processing.
Now, I tried to use "case" statement but isn't working for awk or nawk (only in gawk is working as expected). I also tried this:
if ( ft[3] == "MLOCAL" && ft[4]!= "MREMOTE" )
{
MLOCAL=ft[3];
MLOCAL_qty=ft[4];
MLOCAL_TIMESTAMP=ft[5];
}
else if ( ft[3] == MLOCAL && ft[4] == MREMOTE )
{
MLOCAL=ft[3];
MREMOTE=ft[4];
MOCAL_qty=ft[5];
MREMOTE_qty=ft[6];
MOCAL_TIMESTAMP=ft[7];
MREMOTE_TIMESTAMP=ft[8];
}
else if ( ft[3] == MREMOTE && ft[4] != MOCAL )
{
MREMOTE=ft[3];
MREMOTE_qty=ft[4];
MREMOTE_TIMESTAMP=ft[5];
..........................................
but it's not working as well.
So, if you have any idea how to handle this, I would be grateful to give me a hint in order to be able to find a pattern in order to cover all the possible situations from above.
EDIT
I don't know how to thank you for all this help. Now, what I have to do is more complex than I wrote above, I'll try to describe as simple as I can otherwise I'll make you guys pretty confused.
My output should be like following:
NAME,UNIQUE_ID,VOLUME_ALOCATED,MLOCAL_VALUE,MLOCAL_TIMESTMP,MLOCAL_limit,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,MREMOTE_VALUE,MREMOTE_TIMESTAMP,REMOTE_VALUE,REMOTE_TIMESTAMP
(where MLOCAL_limit and LOCAL_limit are a subtract result between VOLUME_ALOCATED and MLOCAL_VALUE or LOCAL_VALUE)
So, in my output file, fields position should be arranged like:
4th field =MLOCAL_VALUE,5th field =MLOCAL_TIMESTMP,7th field=LOCAL_VALUE,
8th field=LOCAL_TIMESTAMP,10th field=MREMOTE_VALUE,11th field=MREMOTE_TIMESTAMP,12th field=REMOTE_VALUE,13th field=REMOTE_TIMESTAMP
Now, an example would be this:
for the following input: name,ID,VOLUME_ALLOCATED,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
name,ID,VOLUME_ALLOCATED,REMOTE,234455,19/12/2012
I should process this line and the output should be this:
name,ID,VOLUME_ALLOCATED,33222,22/10/2012,MLOCAL_LIMIT, ,,,56,18/10/2012,,
7th ,8th, 9th,12th, and 13th fields are empty because there is no info related to: LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,REMOTE_VALUE, and REMOTE_TIMESTAMP
OR
name,ID,VOLUME_ALLOCATED,,,,,,,,,234455,9/12/2012
4th,5th,6th,7th,8th,9th,10thand ,11th, fields should be empty values because there is no info about: MLOCAL_VALUE,MLOCAL_TIMESTAMP,MLOCAL_LIMIT,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_LIMIT,MREMOTE_VALUE,MREMOTE_TIMESTAMP
VOLUME_ALLOCATED is retrieved from other csv file (called "info.csv") based on the ID field which is processed earlier in the script like:
info.csv
VOLUME_ALLOCATED,ID,CLIENT
5242881,64,subscriber
567743,24,visitor
data.csv
NAME,64,MLOCAL,341993,23/10/2012
NAME,24,LOCAL$REMOTE,2347$4324,19/12/2012$18/12/2012
Now, my code is this:
#! /usr/bin/bash
input="info.csv"
filedata="data.csv"
outfile="out"
nawk 'BEGIN{
while (getline < "'"$input"'")
{
split($0,ft,",");
volume=ft[1];
id=ft[2];
client=ft[3];
key=id;
volumeArr[key]=volume;
clientArr[key]=client;
}
close("'"$input"'");
while (getline < "'"$filedata"'")
{
gsub(/\$/,","); # substitute the $ separator with comma
split($0,ft,",");
volume=volumeArr[id]; # Get the volume from the volumeArr, using "id" as key
segment=clientArr[id]; # Get the client mode from the clientArr, using "id" as key
NAME=ft[1];
id=ft[2];
here I'm stuck, I can't find the right way to set the rest of the
fields since I don't know how to handle the 3rd and 4th fields.
? =ft[3];
? =ft[4];
Sorry, if I make you pretty confused but this is my current situation right now.
Thanks
You didn't provide the expected output from your sample input but here's a start to show how to get the values for the 2 different formats of input line:
$ cat tst.awk
BEGIN{ FS=","; OFS="\t" }
{
delete value # or use split("",value) if your awk cant delete arrays
if ($4 ~ /LOCAL|REMOTE/) {
value[$3] = $5
date[$3] = $7
value[$4] = $6
date[$4] = $8
}
else {
value[$3] = $4
date[$3] = $5
}
print
for (type in value) {
printf "%15s%15s%15s\n", type, value[type], date[type]
}
}
$ awk -f tst.awk file
xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
MREMOTE 56 18/10/2012
MLOCAL 33222 22/10/2012
xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012
MREMOTE 33222 22/10/2012
MLOCAL 56 18/10/2012
xxxxxx,xx,MLOCAL,*341993,22/10/2012*
MLOCAL *341993 22/10/2012*
xxxxxx,xx,MREMOTE,9356828,08/10/2012
MREMOTE 9356828 08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
REMOTE 15253 22/10/2012
LOCAL 19316 22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
REMOTE 1865871 22/10/2012
LOCAL 383666 22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012
REMOTE 1180306134 19/10/2012
and if you post the expected output we could help you more.
Given the output of git ... --stat:
3 files changed, 72 insertions(+), 21 deletions(-)
3 files changed, 27 insertions(+), 4 deletions(-)
4 files changed, 164 insertions(+), 0 deletions(-)
9 files changed, 395 insertions(+), 0 deletions(-)
1 files changed, 3 insertions(+), 2 deletions(-)
1 files changed, 1 insertions(+), 1 deletions(-)
2 files changed, 57 insertions(+), 0 deletions(-)
10 files changed, 189 insertions(+), 230 deletions(-)
3 files changed, 111 insertions(+), 0 deletions(-)
8 files changed, 61 insertions(+), 80 deletions(-)
I wanted to produce the sum of the numeric columns but preserve the formatting of the line. In the interest of generality, I produced this awk script that automatically sums any numeric columns and produces a summary line:
{
for (i = 1; i <= NF; ++i) {
if ($i + 0 != 0) {
numeric[i] = 1;
total[i] += $i;
}
}
}
END {
# re-use non-numeric columns of last line
for (i = 1; i <= NF; ++i) {
if (numeric[i])
$i = total[i]
}
print
}
Yielding:
44 files changed, 1080 insertions(+), 338 deletions(-)
Awk has several features that simplify the problem, like automatic string->number conversion, all arrays as associative arrays, and the ability to overwrite auto-split positional parameters and then print the equivalent lines.
Is there a better language for this hack?
Perl - 47 char
Inspired by ChristopheD's awk solution. Used with the -an command-line switch. 43 chars + 4 chars for the command-line switch:
$i-=#a=map{($b[$i++]+=$_)||$_}#F}{print"#a"
I can get it to 45 (41 + -ap switch) with a little bit of cheating:
$i=0;$_="Ctrl-M#{[map{($b[$i++]+=$_)||$_}#F]}"
Older, hash-based 66 char solution:
#a=(),s#(\d+)(\D+)#$b{$a[#a]=$2}+=$1#gefor<>;print map$b{$_}.$_,#a
Ruby — 87
puts ' '+[*$<].map(&:split).inject{|i,j|[0,3,5].map{|k|i[k]=i[k].to_i+j[k].to_i};i}*' '
Python - 101 chars
import sys
print" ".join(`sum(map(int,x))`if"A">x[0]else x[0]for x in zip(*map(str.split,sys.stdin)))'
Using reduce is longer at 126 chars
import sys
print" ".join(reduce(lambda X,Y:[str(int(x)+int(y))if"A">x[0]else x for x,y in zip(X,Y)],map(str.split,sys.stdin)))
AWK - 63 characters
(in a bash script, $1 is the filename provided as command line argument):
awk -F' ' '{x+=$1;y+=$4;z+=$6}END{print x,$2,$3,y,$5,z,$7}' $1
One could of course also pipe the input in (would save another 3 characters when allowed).
This problem is not challenging or difficult... it is "cute" though.
Here is solution in Python:
import sys
r = []
for s in sys.stdin:
r = map(lambda x,y:(x or 0)+int(y) if y.isdigit() else y, r, s.split())
print ' '.join(map(str, r))
What does it do... it keeps tally in r while proceeding line by line. Splits the line, then for each element of the list, if it is a number, adds it to the tally or keeps it as string. At the end they all get re-mapped to string and merged with spaces in between to be printed.
Alternative, more "algebraic" implementation, if we did not care about reading all input at once:
import sys
def totalize(l):
try: r = str(sum(map(int,l)))
except: r = l[-1]
return r
print ' '.join(map(totalize, zip(*map(str.split, sys.stdin))))
What does this one do? totalize() takes a list of strings and tries to calculate sum of the numbers; if that fails, it simply returns the last one. zip() is fed with a matrix that is list of rows, each of them being list of column items in the row - zip transposes the matrix so it turns into list of column items and then totalize is invoked on each column and the results are joined as before.
At the expense of making your code slightly longer, I moved the main parsing into the BEGIN clause so the main clause is only processing numeric fields. For a slightly larger input file, I was able to measure a significant improvement in speed.
BEGIN {
getline
for (i = 1; i <= NF; ++i) {
# need to test for 0, too, in this version
if ($i == 0 || $i + 0 != 0) {
numeric[i] = 1;
total[i] = $i;
}
}
}
{
for (i in numeric) total[i] += $i
}
END {
# re-use non-numeric columns of last line
for (i = 1; i <= NF; ++i) {
if (numeric[i])
$i = total[i]
}
print
}
I made a test file using your data and doing paste file file file ... and cat file file file ... so that the result had 147 fields and 1960 records. My version took about 1/4 as long as yours. On the original data, the difference was not measurable.
JavaScript (Rhino) - 183 154 139 bytes
Golfed:
x=[n=0,0,0];s=[];readFile('/dev/stdin').replace(/(\d+)(\D+)/g,function(a,b,c){x[n]+=+b;s[n++]=c;n%=3});print(x[0]+s[0]+x[1]+s[1]+x[2]+s[2])
Readable-ish:
x=[n=0,0,0];
s=[];
readFile('/dev/stdin').replace(/(\d+)(\D+)/g,function(a,b,c){
x[n]+=+b;
s[n++]=c;
n%=3
});
print(x[0]+s[0]+x[1]+s[1]+x[2]+s[2]);
PHP 152 130 Chars
Input:
$i = "
3 files changed, 72 insertions(+), 21 deletions(-)
3 files changed, 27 insertions(+), 4 deletions(-)
4 files changed, 164 insertions(+), 0 deletions(-)
9 files changed, 395 insertions(+), 0 deletions(-)
1 files changed, 3 insertions(+), 2 deletions(-)
1 files changed, 1 insertions(+), 1 deletions(-)
2 files changed, 57 insertions(+), 0 deletions(-)
10 files changed, 189 insertions(+), 230 deletions(-)
3 files changed, 111 insertions(+), 0 deletions(-)
8 files changed, 61 insertions(+), 80 deletions(-)";
Code:
$a = explode(" ", $i);
foreach($a as $k => $v){
if($k % 7 == 0)
$x += $v;
if(3-$k % 7 == 0)
$y += $v;
if(5-$k % 7 == 0)
$z += $v;
}
echo "$x $a[1] $a[2] $y $a[4] $z $a[6]";
Output:
44 files changed, 1080 insertions(+), 338 deletions(-)
Note: explode() will require that there is a space char before the new line.
Haskell - 151 135 bytes
import Char
c a b|all isDigit(a++b)=show$read a+read b|True=a
main=interact$unwords.foldl1(zipWith c).map words.filter(not.null).lines
... but I'm sure it can be done better/smaller.
Lua, 140 bytes
I know Lua isn't the best golfing language, but compared by the size of the runtimes, it does pretty well I think.
f,i,d,s=0,0,0,io.read"*a"for g,a,j,b,e,c in s:gmatch("(%d+)(.-)(%d+)(.-)(%d+)(.-)")do f,i,d=f+g,i+j,d+e end print(table.concat{f,a,i,b,d,c})
PHP, 176 166 164 159 158 153
for($a=-1;$a<count($l=explode("
",$i));$r=explode(" ",$l[++$a]))for($b=-1;$b<count($r);$c[++$b]=is_numeric($r[$b])?$c[$b]+$r[$b]:$r[$b]);echo join(" ",$c);
This would, however, require the whole input in $i... A variant with $i replaced with $_POST["i"] so it would be sent in a textarea... Has 162 chars:
for($a=-1;$a<count($l=explode("
",$_POST["i"]));$r=explode(" ",$l[$a++]))for($b=0;$b<count($r);$c[$b]=is_numeric($r[$b])?$c[$b]+$r[$b]:$r[$b])$b++;echo join(" ",$c);
This is a version with
NO HARDCODED COLUMNS