How to increment counter in gnu awk? - csv

I want to be able to essentially print the line number along every printed line in the output after scanning and reworking an input csv file. The input csv file is comma separated, here's a sample
Timestamp,Email,Name,Year,Make,Model,Car_ID,Judge_ID,Judge_Name,Racer_Turbo,Racer_Supercharged,Racer_Performance,Racer_Horsepower,Car_Overall,Engine_Modifications,Engine_Performance,Engine_Chrome,Engine_Detailing,Engine_Cleanliness,Body_Frame_Undercarriage,Body_Frame_Suspension,Body_Frame_Chrome,Body_Frame_Detailing,Body_Frame_Cleanliness,Mods_Paint,Mods_Body,Mods_Wrap,Mods_Rims,Mods_Interior,Mods_Other,Mods_ICE,Mods_Aftermarket,Mods_WIP,Mods_Overall
8/5/2018 14:10,honoland13#japanpost.jp,Hernando,2015,Acura,TLX,48,J04,Bob,0,0,2,2,4,4,0,2,4,4,2,4,2,2,2,2,2,0,4,4,4,6,2,0,4
8/5/2018 15:11,nlighterness2q#umn.edu,Noel,2015,Jeep,Wrangler,124,J02,Carl,0,6,4,2,4,6,6,4,4,4,6,6,6,6,6,4,6,6,6,6,6,4,6,4,6
8/5/2018 17:10,eguest47#microsoft.com,Edan,2015,Lexus,Is250,222,J05,Adrian,0,0,0,0,0,0,0,0,6,6,6,0,0,6,6,6,0,0,0,0,0,0,0,0,4
8/5/2018 17:34,hchilley40#fema.gov,Hieronymus,1993,Honda,Civic eG,207,J06,Aaron,0,0,2,2,2,2,2,2,0,4,2,2,2,2,2,2,4,2,2,0,0,0,2,2,0
8/5/2018 14:30,nnowick3d#tuttocitta.it,Nickolas,2016,Ford,Mystang,167,J02,Carl,0,0,2,2,0,2,2,0,0,0,0,2,0,2,2,2,0,0,2,0,0,0,0,0,2
8/5/2018 16:12,mdearl39#amazon.co.uk,Martin,2013,Hyundai,Gen coupe,159,J04,Bob,0,0,2,0,0,0,2,0,0,0,0,2,0,2,2,0,2,0,2,0,0,0,0,0,0
8/5/2018 17:00,alynamg#blogtalkradio.com,Aldridge,2009,Infiniti,G37,20,J06,Aaron,2,0,2,2,0,0,2,0,0,2,2,2,2,2,2,2,2,2,4,2,2,0,2,0,2
What my code currently does is sift through the csv file, and pick out the car_id column, year, make, and model columns. Then it runs through every column from racer_turbo to the last, and for each row it adds up the values in those columns into a total value and prints that along side the other values (id, make, model, etc.). There is also a ranking column that precedes the other 5 when printed. Here is my code below.
BEGIN {
FS = ",";
OFS = "\t";
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total";
}
FNR > 1 {
rank = 0;
total = 0;
if(NR > 1) {
for(i = 8; i < NF; i++) {
total += $i;
}
print ++rank,$7,$4,$5,$6,total;
}
}
END {
}
My current output is as follows
Ranking Car_ID Year Make Model Total
1 48 2015 Acura TLX 58
1 124 2015 Jeep Wrangler 118
1 222 2015 Lexus Is250 36
1 207 1993 Honda Civic eG 40
1 167 2016 Ford Mystang 18
1 159 2013 Hyundai Gen coupe 14
1 20 2009 Infiniti G37 36
1 178 2009 Honda Oddesy 66
My problem is that under the ranking column, it only shows 1 for each row, I need it to be able to increment starting at 1 and going down for as many lines as there are in the document. Right now as evident in my code, I have a rank variable that acts as a tracker that I want to increment up with each row printed, but it only prints 1 for each row. How can I fix that?
The expected output is this
Ranking Car_ID Year Make Model Total
1 48 2015 Acura TLX 58
2 124 2015 Jeep Wrangler 118
3 222 2015 Lexus Is250 36
4 207 1993 Honda Civic eG 40
5 167 2016 Ford Mystang 18
6 159 2013 Hyundai Gen coupe 14
7 20 2009 Infiniti G37 36
8 178 2009 Honda Oddesy 66
Please be advised, my machine is running version 4.0.2 of AWK.

You do not need own variable in this case, just subtract 1 from NR to get desired output, that is
BEGIN {
FS = ",";
OFS = "\t";
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total";
}
FNR > 1 {
total = 0;
if(NR > 1) {
for(i = 8; i < NF; i++) {
total += $i;
}
print NR-1,$7,$4,$5,$6,total;
}
}
As side note: END is optional, you do not have to have one in GNU AWK command.

Related

Append information in the th tags to td rows

I am an economist struggling with coding and data scraping.
I am scarping data from the main and unique table on this webpage (https://www.oddsportal.com/basketball/europe/euroleague-2013-2014/results/). I can retrieve all the information of the td HTML tags with python selenium by referring to the class element. The same goes for the th tag where it is stored the information of the date and stage of the competition. In my final dataset, I would like to have the information stored in the th tag in two rows (data and stage of the competition) next to the other rows in the table. Basically, for each match, I would like to have the date and the stage of the competition in rows and not as the head of each group of matches.
The only solution I came up with is to index all the rows (with both th and td tags) and build a while loop to append the information in the th tags to the td rows whose index is lower than the next index for the th tag. Hope I made myself clear (if not I will try to give a more graphical explanation). However, I am not able to code such a logic construct due to my poor coding abilities. I do not know if I need two loops to iterate through different tags (td and th) and in case how to do that. If you have any easier solution, it is more than welcome!
Thanks in advance for the precious help!
code below:
from selenium import webdriver
import time
import pandas as pd
# Season to filter
seasons_filt = ['2013-2014', '2014-2015', '2015-2016','2016-2017', '2017-2018', '2018-2019']
# Define empty data
data_keys = ["Season", "Match_Time", "Home_Team", "Away_Team", "Home_Odd", "Away_Odd", "Home_Score",
"Away_Score", "OT", "N_Bookmakers"]
data = dict()
for key in data_keys:
data[key] = list()
del data_keys
# Define 'driver' variable and launch browser
#path = "C:/Users/ALESSANDRO/Downloads/chromedriver_win32/chromedriver.exe"
#path office pc
path = "C:/Users/aldi/Downloads/chromedriver.exe"
driver = webdriver.Chrome(path)
# Loop through pages based on page_num and season
for season_filt in seasons_filt:
page_num = 0
while True:
page_num += 1
# Get url and navigate it
page_str = (1 - len(str(page_num)))* '0' + str(page_num)
url ="https://www.oddsportal.com/basketball/europe/euroleague-" + str(season_filt) + "/results/#/page/" + page_str + "/"
driver.get(url)
time.sleep(3)
# Check if page has no data
if driver.find_elements_by_id("emptyMsg"):
print("Season {} ended at page {}".format(season_filt, page_num))
break
try:
# Teams
for el in driver.find_elements_by_class_name('name.table-participant'):
el = el.text.strip().split(" - ")
data["Home_Team"].append(el[0])
data["Away_Team"].append(el[1])
data["Season"].append(season_filt)
# Scores
for el in driver.find_elements_by_class_name('center.bold.table-odds.table-score'):
el = el.text.split(":")
if el[1][-3:] == " OT":
data["OT"].append(True)
el[1] = el[1][:-3]
else:
data["OT"].append(False)
data["Home_Score"].append(el[0])
data["Away_Score"].append(el[1])
# Match times
for el in driver.find_elements_by_class_name("table-time"):
data["Match_Time"].append(el.text)
# Odds
i = 0
for el in driver.find_elements_by_class_name("odds-nowrp"):
i += 1
if i%2 == 0:
data["Away_Odd"].append(el.text)
else:
data["Home_Odd"].append(el.text)
# N_Bookmakers
for el in driver.find_elements_by_class_name("center.info-value"):
data["N_Bookmakers"].append(el.text)
# TODO think of inserting the dates list in the dataframe even if it has a different size (19 rows and not 50)
except:
pass
driver.quit()
data = pd.DataFrame(data)
data.to_csv("data_odds.csv", index = False)
I would like to add this information to my dataset as two additional rows:
for el in driver.find_elements_by_class_name("first2.tl")[1:]:
el = el.text.strip().split(" - ")
data["date"].append(el[0])
data["stage"].append(el[1])
Few things I would change here.
Don't overwrite variables. You store elements in your el variable, then you over write the element with your strings. It may work for you here, but you may get yourself into trouble with that practice later on, especially since you are iterating through those elements. It makes it hard to debug too.
I know Selenium has ways to parse the html. But I personally feel BeautifulSoup is a tad easier to parse with and is a little more intuitive if you are simply just trying to pull out data from the html. So I went with BeautifulSoup's .find_previous() to get the tags that precede the games, essentially then able to get your date and stage content.
Lastly, I like to construct a list of dictionaries to make up the data frame. Each item in the list is a dictionary key:value where the key is the column name and value is the data. You sort of do the opposite in creating a dictionary of lists. Now there is nothing wrong with that, but if the lists don't have the same length, you're get an error when trying to create the dataframe. Where as with my way, if for what ever reason there is a value missing, it will still create the dataframe, but will just have a null or nan for the missing data.
There may be more work you need to do with the code to go through the pages, but this gets you the data in the form you need.
Code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
import pandas as pd
from bs4 import BeautifulSoup
import re
# Season to filter
seasons_filt = ['2013-2014', '2014-2015', '2015-2016','2016-2017', '2017-2018', '2018-2019']
# Define 'driver' variable and launch browser
path = "C:/Users/ALESSANDRO/Downloads/chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(path)
rows = []
# Loop through pages based on page_num and season
for season_filt in seasons_filt:
page_num = 0
while True:
page_num += 1
# Get url and navigate it
page_str = (1 - len(str(page_num)))* '0' + str(page_num)
url ="https://www.oddsportal.com/basketball/europe/euroleague-" + str(season_filt) + "/results/#/page/" + page_str + "/"
driver.get(url)
time.sleep(3)
# Check if page has no data
if driver.find_elements_by_id("emptyMsg"):
print("Season {} ended at page {}".format(season_filt, page_num))
break
try:
soup = BeautifulSoup(driver.page_source, 'html.parser')
table = soup.find('table', {'id':'tournamentTable'})
trs = table.find_all('tr', {'class':re.compile('.*deactivate.*')})
for each in trs:
teams = each.find('td', {'class':'name table-participant'}).text.split(' - ')
scores = each.find('td', {'class':re.compile('.*table-score.*')}).text.split(':')
ot = False
for score in scores:
if 'OT' in score:
ot == True
scores = [x.replace('\xa0OT','') for x in scores]
matchTime = each.find('td', {'class':re.compile('.*table-time.*')}).text
# Odds
i = 0
for each_odd in each.find_all('td',{'class':"odds-nowrp"}):
i += 1
if i%2 == 0:
away_odd = each_odd.text
else:
home_odd = each_odd.text
n_bookmakers = soup.find('td',{'class':'center info-value'}).text
date_stage = each.find_previous('th', {'class':'first2 tl'}).text.split(' - ')
date = date_stage[0]
stage = date_stage[1]
row = {'Season':season_filt,
'Home_Team':teams[0],
'Away_Team':teams[1],
'Home_Score':scores[0],
'Away_Score':scores[1],
'OT':ot,
'Match_Time':matchTime,
'Home_Odd':home_odd,
'Away_Odd':away_odd,
'N_Bookmakers':n_bookmakers,
'Date':date,
'Stage':stage}
rows.append(row)
except:
pass
driver.quit()
data = pd.DataFrame(rows)
data.to_csv("data_odds.csv", index = False)
Output:
print(data.head(15).to_string())
Season Home_Team Away_Team Home_Score Away_Score OT Match_Time Home_Odd Away_Odd N_Bookmakers Date Stage
0 2013-2014 Real Madrid Maccabi Tel Aviv 86 98 False 18:00 -667 +493 7 18 May 2014 Final Four
1 2013-2014 Barcelona CSKA Moscow 93 78 False 15:00 -135 +112 7 18 May 2014 Final Four
2 2013-2014 Barcelona Real Madrid 62 100 False 19:00 +134 -161 7 16 May 2014 Final Four
3 2013-2014 CSKA Moscow Maccabi Tel Aviv 67 68 False 16:00 -278 +224 7 16 May 2014 Final Four
4 2013-2014 Real Madrid Olympiacos 83 69 False 18:45 -500 +374 7 25 Apr 2014 Play Offs
5 2013-2014 CSKA Moscow Panathinaikos 74 44 False 16:00 -370 +295 7 25 Apr 2014 Play Offs
6 2013-2014 Olympiacos Real Madrid 71 62 False 18:45 +127 -152 7 23 Apr 2014 Play Offs
7 2013-2014 Maccabi Tel Aviv Olimpia Milano 86 66 False 17:45 -217 +179 7 23 Apr 2014 Play Offs
8 2013-2014 Panathinaikos CSKA Moscow 73 72 False 16:30 -106 -112 7 23 Apr 2014 Play Offs
9 2013-2014 Panathinaikos CSKA Moscow 65 59 False 18:45 -125 +104 7 21 Apr 2014 Play Offs
10 2013-2014 Maccabi Tel Aviv Olimpia Milano 75 63 False 18:15 -189 +156 7 21 Apr 2014 Play Offs
11 2013-2014 Olympiacos Real Madrid 78 76 False 17:00 +104 -125 7 21 Apr 2014 Play Offs
12 2013-2014 Galatasaray Barcelona 75 78 False 17:00 +264 -333 7 20 Apr 2014 Play Offs
13 2013-2014 Olimpia Milano Maccabi Tel Aviv 91 77 False 18:45 -286 +227 7 18 Apr 2014 Play Offs
14 2013-2014 CSKA Moscow Panathinaikos 77 51 False 16:15 -303 +247 7 18 Apr 2014 Play Offs

How can I sort output from variable?

I want to be able to sort an input csv file that is comma separated by a values created in an extra column. Below is a sample of the input csv file
Timestamp,Email,Name,Year,Make,Model,Car_ID,Judge_ID,Judge_Name,Racer_Turbo,Racer_Supercharged,Racer_Performance,Racer_Horsepower,Car_Overall,Engine_Modifications,Engine_Performance,Engine_Chrome,Engine_Detailing,Engine_Cleanliness,Body_Frame_Undercarriage,Body_Frame_Suspension,Body_Frame_Chrome,Body_Frame_Detailing,Body_Frame_Cleanliness,Mods_Paint,Mods_Body,Mods_Wrap,Mods_Rims,Mods_Interior,Mods_Other,Mods_ICE,Mods_Aftermarket,Mods_WIP,Mods_Overall
8/5/2018 14:10,honoland13#japanpost.jp,Hernando,2015,Acura,TLX,48,J04,Bob,0,0,2,2,4,4,0,2,4,4,2,4,2,2,2,2,2,0,4,4,4,6,2,0,4
8/5/2018 15:11,nlighterness2q#umn.edu,Noel,2015,Jeep,Wrangler,124,J02,Carl,0,6,4,2,4,6,6,4,4,4,6,6,6,6,6,4,6,6,6,6,6,4,6,4,6
8/5/2018 17:10,eguest47#microsoft.com,Edan,2015,Lexus,Is250,222,J05,Adrian,0,0,0,0,0,0,0,0,6,6,6,0,0,6,6,6,0,0,0,0,0,0,0,0,4
8/5/2018 17:34,hchilley40#fema.gov,Hieronymus,1993,Honda,Civic eG,207,J06,Aaron,0,0,2,2,2,2,2,2,0,4,2,2,2,2,2,2,4,2,2,0,0,0,2,2,0
8/5/2018 14:30,nnowick3d#tuttocitta.it,Nickolas,2016,Ford,Mystang,167,J02,Carl,0,0,2,2,0,2,2,0,0,0,0,2,0,2,2,2,0,0,2,0,0,0,0,0,2
8/5/2018 16:12,mdearl39#amazon.co.uk,Martin,2013,Hyundai,Gen coupe,159,J04,Bob,0,0,2,0,0,0,2,0,0,0,0,2,0,2,2,0,2,0,2,0,0,0,0,0,0
8/5/2018 17:00,alynamg#blogtalkradio.com,Aldridge,2009,Infiniti,G37,20,J06,Aaron,2,0,2,2,0,0,2,0,0,2,2,2,2,2,2,2,2,2,4,2,2,0,2,0
What my code currently does is sift through the csv file, and pick out the car_id column, year, make, and model columns. Then it runs through every column from racer_turbo to the last, and for each row it adds up the values in those columns into a total value and prints that along side the other values (id, make, model, etc.). There is also a ranking column that precedes the other 5 when printed. Here is my code below.
BEGIN {
FS = ",";
OFS = "\t";
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total";
}
{
rank;
total = 0;
if(NR > 1) {
for(i = 8; i < NF; i++) {
total += $i;
}
print ++rank,$7, $4, $5, $6, total;
}
rows[$5][total][$0]
}
END {
print "\n";
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total";
ranking;
PROCINFO["sorted_in"] = "#ind_str_asc"
for (m in rows) {
n = asorti(rows[m], t, "#ind_num_desc");
n = (n>3) ? 3 : n
for(i = 1; i <= n; i++) for(s in rows[m][t[i]]) {
$0 = s;
$1 = ++r;
print ++ranking, $7, $4, $5, $6, total;
}
}
}
What I would like to do in the END block is print the output again, however, rank the cars by top three from each make using the total column which was created in the preceding block of the code. However, what I run my code now the output looks as follows
Ranking Car_ID Year Make Model Total
1 48 2015 Acura TLX 58
2 124 2015 Jeep Wrangler 118
3 222 2015 Lexus Is250 36
4 207 1993 Honda Civic eG 40
5 167 2016 Ford Mystang 18
6 159 2013 Hyundai Gen coupe 14
7 20 2009 Infiniti G37 36
...
Ranking Car_ID Year Make Model Total
1 113 2012 Acura Tsx sportwagon 10
2 112 2008 Acura TL 10
3 50 2015 Acura TLX 10
4 15 2014 Audi S4 10
5 18 2015 Audi S3 10
6 116 2008 Audi A4 10
7 2 2016 Bmw M2 10
8 172 2014 Bmw 4 10
9 28 1995 Bmw 318xi 10
...
See how in the total column on the second printed section it shows total is 10 for each printed car, instead of being the same values as they were in the first printed section for each respective car, and the highest 3 totals for each make being displayed.
Below is the expected output
Ranking Car_ID Year Make Model Total
1 48 2015 Acura TLX 58
2 124 2015 Jeep Wrangler 118
3 222 2015 Lexus Is250 36
4 207 1993 Honda Civic eG 40
5 167 2016 Ford Mystang 18
6 159 2013 Hyundai Gen coupe 14
7 20 2009 Infiniti G37 36
8 178 2009 Honda Oddesy 66
...
Ranking Car_ID Year Make Model Total
1 112 2008 Acura TL 110
2 50 2015 Acura TLX 102
3 127 2013 Acura Tsx 86
4 15 2014 Audi S4 120
5 18 2015 Audi S3 38
6 116 2008 Audi A4 28
7 2 2016 Bmw M2 24
8 172 2014 Bmw 4 22
9 111 2007 Bmw 328i 10
10 218 2010 Chevy Camaro 64
11 170 2014 Chevy Cruze 50
12 0 2015 Chevy Camaro 0
...
Is this salvagable with my current code? Or would a better approach be to create a separate awk file that will sort through the generated output and produce another file that is sorted by the top 3?
I'm running GNU AWK v4.0.2.
Assuming the Car_ID (hereinafter referred to as id) is unique across the rows, would you please try:
BEGIN {
FS = ","
OFS = "\t"
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total"
}
{
rank
total = 0
if (NR > 1) {
for (i = 8; i < NF; i++) {
total += $i
}
print ++rank, $7, $4, $5, $6, total
ttl[$5][$7] = total
row[$7] = $0
}
}
END {
print "\n"
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total"
ranking
id
PROCINFO["sorted_in"] = "#ind_str_asc"
for (m in ttl) {
n = asorti(ttl[m], t, "#val_num_desc")
n = (n>3) ? 3 : n
for (i = 1; i <= n; i++) {
id = t[i]
total = ttl[m][id]
$0 = row[id]
print ++ranking, $7, $4, $5, $6, total
}
}
}
I have slightly modified the data structure, assigning the id as the
main key. Then created a 2-D array ttl, which holds the value total
keyed by make and id. In the END loop, we can retrieve the
input data using the id.
As a side note, your original data structure uses total as an index.
If multiple rows with the same make happen to have the same value
of total, either of the indexes will be overwritten.

How do I convert my loop output to a .txt file?

My current code is:
count1 = 0
for i in range(30):
if i%26 == 0:
b = [i+1, i+2, i+3, i+4, i+5, i+6, i+7, i+8, i+9, i+10]
count1 += 1
print([count1])
print(*b, sep=' ')
elif (i-10)%26 == 0:
b = [i+1, i+2, i+3, i+4, i+5, i+6, i+7, i+8, i+9]
count1 += 1
print([count1])
print(*b, sep= ' ')
elif (i-16)%32 == 0:
b = [i+1, i+2, i+3, i+4, i+5, i+6, i+7, i+8, i+9, i+10]
count1 += 1
print([count1])
print(*b, sep= ' ')
which produces lines:
[1]
1 2 3 4 5 6 7 8 9 10
[2]
11 12 13 14 15 16 17 18 19
[3]
17 18 19 20 21 22 23 24 25 26
[4]
27 28 29 30 31 32 33 34 35 36
I'd like to output these lines in a simple text file. I'm familiar with the open and write functions, but do not know how to apply them to my specific example.
Thanks!
On GNU/Linux systems execute the program in the console, add > and the name of the file.
Example:
Assuming that you are in the directory wich contains the executable.
./[name of the program] > [name of the file]
./helloworld > helloworld.txt
This will save all the printed text in the console in a text file.

Regression by year and companyID to save coefficients

I am trying to run regressions by companyID and year, and save the coefficients for each firm-year model as new variables in a new column right besides the other columns. There is an additional wrinkle‹ I have panel data for 1990-2010 and want to run each regression using t to t-4 only (I.e., for 2001, use only 1998-2001 years of data and i.e. for 1990 then only the data of 1990 and so on). I am new to using foreach loops and I found some prior coding on the web. I have tried to adapt it to my situation but two issues: anything.....
the output is staying blank
I have not figured out how to use the rolling four year data periods.
Here is the code I tried. Any suggestions would be much appreciated.
use paneldata.dta // the dataset I am working in
generate coeff . //empty variable for coefficient
foreach x of local levels {
forval z = 1990/2010
{
capture reg excess_returns excess_market
replace coeff = _b[fyear] & _b[CompanyID] if e(sample) }
}
So below is a short snapshot of what the data looks like;
CompanyID Re_Rf Rm-Rf Year
10 2 2 1990 
10 3 2 1991 
15 3 2 1991 
15 4 2 1992
15 5 2 1993 
21 4 2 1990 
21 4 2 1991 
34 3 1 1990 
34 3 1 1991
34 4 1 1992
34 2 1 1993  
34 3 1 1994
34 4 1 1995
34 2 1 1996   
 
Re_Rf = excess_returns 
Rm_Rf = excess_market 
I want to run the following regression: ​​​​​​​
reg excess_returns excess_market
There is a good discussion on Statalist, but I think this answer may be helpful for your learning about loops and how Stata syntax work.
the code I would use is as follows:
generate coeff = . //empty variable for coefficient
// put the values of gvkey into a local macro called levels
qui levelsof CompanyID, local(levels)
foreach co of local levels {
forval yr = 1994/2010 {
// run the regression with the condition that year is between yr
// and yr-3 (which is what you write in your example)
// and the CompanyID is the same as in the regression
qui reg Re_Rf Rm_Rf if fyear <= `yr' & fyear >= `yr'-3 & CompanyID== `co'
// now replace coeff equal to the coefficient on Rm_Rf with the same
// condiditions as above, but only for year yr
replace coeff = _b[Rm_Rf] if fyear == `yr' & CompanyID == `co'
}
}
This is a potentially dangerous thing to do if you do not have a balanced panel. If you are worried about this, there may be a way to deal with it using capture or changing the fyear loop to include something like:
levelsof fyear if CompanyID == `co', local(yr_level)
foreach yr of `yr_level' { ...

How HTML works in awk command in shell scripting?

I have a script called "main.ksh" which returns "output.txt" file and I am sending that file via mail (list contains 50+ records, I just give 3 records for example).
mail output I am getting is: (10 cols)
DATE FEED FILE_NAME JOB_NAME SCHEDULED TIME SIZE COUNT STATUS
Dec 17 INVEST iai guxmow080 TUE-SAT 02:03 0.4248 4031 On_Time
Dec 17 SECURITIES amltxn gdcpl3392 TUE-SAT 02:03 0.0015 9 Delayed
Dec 17 CONNECTED amlbene gdcpl3392 TUE-SAT 02:03 0.0001 1 No_Records
output with perfect coloring: (6 cols only)
DATE FEED FILE_NAME JOB_NAME SCHEDULED TIME SIZE COUNT STATUS
Dec 17 INVEST iai guxmow080 On_Time(green color)
Dec 17 SECURITIES amltxn gdcpl3392 Delayed(red color)
Dec 17 CONNECTED amlbene gdcpl3392 No_Records(yellow color)
I am implementing coloring for Delayed, On_Time and No_Records field and I wrote below script which gives me bottom output.
awk 'BEGIN {
print "<html>" \
"<body bgcolor=\"#333\" text=\"#f3f3f3\">" \
"<pre>"
}
NR == 1 { print $0 }
NR > 1 {
if ($NF == "Delayed") color="red"
else if ($NF == "On_time") color="green"
else if ($NF == "No_records") color="yellow"
else color="#003abc"
Dummy=$0
sub("[^ ]+$","",Dummy)
print Dummy "<span style=\"color:" color (bold ? ";font-weight:bold" : "")(size ? ";font-size:size" : "") (italic ? ";font-style:italic" : "") "\">" $NF "</span>"
}
END {
print "</pre>" \
"</body>" \
"</html>"
}
' output.txt > output.html
There are 4 columns are skipped automatically.
| date | feed_names | file_names | job_names | scheduled_time| timestamp| size| count| status |
Dec 19 ISS_BENEFICIAL_OWNERS_FEED amlcpbo_iss_20161219.txt gdcpl3392_uxmow080_ori_isz_dat WEEK_DAYS 00:03 9.3734 34758 On_Time
Dec 19 ISS_INVESTORS_FEED amlinvest_iss_20161219.txt gdcpl3392_uxmow080_ori_isz_dat WEEK_DAYS 00:01 0.0283 82 On_Time
Dec 19 ISS_TRANSACTIONS_FEED amltran_iss_1_20161219.txt gdcpl3392_uxmow080_ori_isz_dat WEEK_DAYS 00:12 14.022 36532 DELAYED
Dec 19 ISS_TRANSACTIONS_FEED amltran_iss_5_20161219.txt gdcpl3392_uxmow080_ori_isz_dat WEEK_DAYS 00:23 0.0010 3 DELAYED
Dec 19 IBS_CUSTOMER_FEED ibscust_aml_***_20161219.txt gdcpl3392_uxmow080_ori_sfp_ibc WEEK_DAYS (11 _out_of_11) -NA- ARRIVED
Dec 19 IBS_DDA_NOSTRO_ACCOUNT_FEED ibsacct_aml_***_20161219.txt gdcpl3392_uxmow080_ori_sfp_ibc WEEK_DAYS (44 _out_of_44) -NA- ARRIVED
Dec 19 GP__TRANSACTIONS_FEED amltrans__20161219.txt gdcpl3392_uxmow080_ori_sfp_glo WEEK_DAYS (3 _out_of_30) -NA- ARRIVED
But when I am trying to print in a sequential order by using below command
awk '{printf("%-5s%s\t%-33s%-35s%-39s%s\t%s%-3s\t%s\t%s\n", $1,$2,$3,$4,$5,$6,$7,$8,$9,$10)}' output.txt, I am getting the output in a sequential format
but 4 cols are skipped. Kindly suggest!!!
| date | feed_names | file_names | job_names | scheduled_time| timestamp| size| count| status |
Dec 19 ISS_BENEFICIAL_OWNERS_FEED amlcpbo_iss_20161219.txt gdcpl3392_uxmow080_ori_isz_dat On_Time
Dec 19 ISS_INVESTORS_FEED amlinvest_iss_20161219.txt gdcpl3392_uxmow080_ori_isz_dat On_Time
Dec 19 ISS_TRANSACTIONS_FEED amltran_iss_1_20161219.txt gdcpl3392_uxmow080_ori_isz_dat DELAYED
Dec 19 ISS_TRANSACTIONS_FEED amltran_iss_5_20161219.txt gdcpl3392_uxmow080_ori_isz_dat DELAYED
Dec 19 IBS_CUSTOMER_FEED ibscust_aml_***_20161219.txt gdcpl3392_uxmow080_ori_sfp_ibc ARRIVED
Dec 19 IBS_DDA_NOSTRO_ACCOUNT_FEED ibsacct_aml_***_20161219.txt gdcpl3392_uxmow080_ori_sfp_ibc ARRIVED
Dec 19 GP__TRANSACTIONS_FEED amltrans__20161219.txt gdcpl3392_uxmow080_ori_sfp_glo YET_TO_RECEIVE