Creating multiple new columns using one function in Pandas - function

I know that this is probably very simple but I have been trying to work this out for a while, and I need it for quite a few functions.
I have a DataFrame with 2 columns, both with share price data.
I would like to compute 2 new columns in a new dataframe called 'returns', with each column named as the same as in the first (i.e. 'AAPL' and 'GOOG').
I use this procedure to get the original data and create the 'data' dataframe:
names = ['AAPL', 'GOOG']
def get_data(stock, start, end):
return web.get_data_yahoo(stock, start, end)['Adj Close']
data = pd.DataFrame({n: get_data(n, '1/1/2009', '6/1/2012') for n in names})
I know that the returns could be generate using (from the pandas library):
returns = pd.DataFrame(index=data.index)
returns['*COLUMN A*'] = data['*COLUMN A*'].pct_change()
However I am guessing that I need to use some sort of loop to iterate over either 'names' or the columns but I cannot get anything to work.
Any help would be greatly appreciated. I am sorry if I have been rather vague, but this is my first question and I have searched for 30 minutes through the forum :)

You can use pct_change on the entire df
In [15]: df = DataFrame(np.random.randint(20,size=20).reshape(10,2),
columns=['AAPL','GOOG'],index=date_range('20130101',periods=10))+50
In [16]: df
Out[16]:
AAPL GOOG
2013-01-01 53 54
2013-01-02 66 64
2013-01-03 50 59
2013-01-04 53 57
2013-01-05 67 65
2013-01-06 61 55
2013-01-07 68 52
2013-01-08 64 65
2013-01-09 62 62
2013-01-10 66 50
In [17]: 100*df.pct_change()
Out[17]:
AAPL GOOG
2013-01-01 NaN NaN
2013-01-02 24.528302 18.518519
2013-01-03 -24.242424 -7.812500
2013-01-04 6.000000 -3.389831
2013-01-05 26.415094 14.035088
2013-01-06 -8.955224 -15.384615
2013-01-07 11.475410 -5.454545
2013-01-08 -5.882353 25.000000
2013-01-09 -3.125000 -4.615385
2013-01-10 6.451613 -19.354839

Related

5 point average in SSRS

I try to put a 5 point avg in my chart. I add a trendline, but it looks like this:
And then I created a new series to calculate there the avg. and this looks like this:
but I would like to show this in a 5 point average. How can I do this?
This answer is based on my experience with Excel, not reporting services, but it is probably the same problem.
Your chart is probably a scatter plot rather than a line chart (note: this is Excel terminology). A scatter plot does not have an intrinsic ordering in the data. A line chart does.
The solution (for a scatter plot) is simply to sort the data by the x-values. The same will probably work for you. If you are pulling the data from a database, then order by can accomplish this. Otherwise, you can sort the data in the application.
Using this post as a starting point you can see that it is possible to calculate a moving average for a chart using the SQL query that pulls the data from the database.
For example, using this table in my database called mySalesTable
myDate sales myDate sales myDate sales
---------- ------ ---------- ------ ---------- ------
01/01/2015 456 16/01/2015 546 31/01/2015 658
02/01/2015 487 17/01/2015 12 01/02/2015 121
03/01/2015 245 18/01/2015 62 02/02/2015 654
04/01/2015 812 19/01/2015 516 03/02/2015 261
05/01/2015 333 20/01/2015 1 04/02/2015 892
06/01/2015 449 21/01/2015 65 05/02/2015 982
07/01/2015 827 22/01/2015 15 06/02/2015 218
08/01/2015 569 23/01/2015 656 07/02/2015 212
09/01/2015 538 24/01/2015 25 08/02/2015 312
10/01/2015 455 25/01/2015 549 09/02/2015 21
11/01/2015 458 26/01/2015 261
12/01/2015 542 27/01/2015 21
13/01/2015 549 28/01/2015 21
14/01/2015 432 29/01/2015 61
15/01/2015 685 30/01/2015 321
You can pull out this data, and create a Moving average based on the last 5 dates by using the following query for your dataset
SELECT mst.myDate, mst.sales, avg(mst_past.sales) AS moving_average
FROM mySalesTable mst
JOIN mySalesTable as mst_past
ON mst_past.myDate
BETWEEN DATEADD(D, -4, mst.myDate) AND mst.myDate
GROUP BY mst.myDate, mst.sales
ORDER BY mst.myDate ASC
This is effectively joining a sub-table for each row consisting of the previous 4 dates and the current date, and finds the average for these dates, outputting that as the column moving_average
You can then chart both these fields as normal, to give the following output (with the data table so you and see the actual calculated moving average)
Hopefully this will help you. Please let me know if you require further assistance

Mysql search for exact number in field

in my MySql DB, i have column containing that kind of value
63 61 57 52 50 47 46 44 43 34 33 27 23 22 21 10 5 3 2 1
Those numbers are separated by tab.
Impossible to get the good result for a simple query that would aid something like this
SELECT * FROM mytable WHERE mycolumn = 63
I'm not sure if "=" is the good method, i've also tried LIKE, IN and even FIND_IN_SET
I need some help :)

strange output after appending a column

I cbind a column "class" to a data frame and got a new tdm1, tdm1<- cbind(tdm1, class), it's all good
the content of class looks like this
1 715
2 715
3 707
4 705
5 704
6 701
7 701
...
Then after cbind, I want to get a look at the class column by using tdm1[,ncol(tdm1)], somehow i got 35 Levels: 156 174 205 250 295 324 335 340 343 345 348 349 361 370 375 381 382 428 439 451 455 701 704 705 706 ... 72 after the correct values for the entire column. it's like a summary of the column value. Idon't know where it came from. this additional information makes my later knn classification weird. how do i get rid of it?
Your object is a factor. Calling ?factor reveals:
factor returns an object of class "factor" which has a set of integer
codes the length of x with a "levels" attribute of mode character and
unique (!anyDuplicated(.))
The levels attribute being printed to your dismay reflects all the unique values contained within the object you are printing. To get rid of it, try:
as.numeric(as.characer(tdm1[,ncol(tdm1)]))

Grab HTML table using XML

I am trying to read an html table using the package XML, but even though it looks easy, I haven’t managed to do it. I tried everything, but the names of the columns are always fixed by R as V1, V2, V3,…
This is the code:
require(XML)
tbl <- readHTMLTable("http://facedata.ornl.gov/ornl/npp_98-08.html”,
header = c("year","ring","CO2", "stem","root","leaf","fine root", "NPP"),
skip.rows=c(1,2),colClasses=c(rep("factor",3),rep("numeric",5)))
Many thanks for your help
The first row of the table is causing trouble. It maybe easiest to remove it:
library(XML)
appURL <- "http://facedata.ornl.gov/ornl/npp_98-08.html"
doc <- htmlParse(appURL)
removeNodes(doc["//table/tr[1]"]) # remove the first row with the troublesome header
myTable <- readHTMLTable(doc, which = 1)
> head(myTable)
Year Plot CO2 Stem Coarse Root Leaf Fine Root Total NPP
1 1998 1 elev 1540 127 362 168 2197
2 1998 2 elev 1487 139 418 175 2219
3 1998 3 amb 1085 112 333 231 1762
4 1998 4 amb 1204 113 368 185 1870
5 1998 5 amb 1136 109 382 56 1683
6 1999 1 elev 1218 98 475 295 2086

Querying a table to get values based on no of digits of a parameter?

Considering the following table
I have a large table from which I can query to get the following table
type no of times type occurs
101 450
102 562
103 245
111 25
112 28
113 21
Now suppose I wanted to get a table which shows me the sum of no of times type occurs
for type starting with 1 then starting with 10,11,12,13.......19 then starting with 2, 20,21, 22, 23...29 and so on.
Something like this
1 1331 10 1257
11 74
12 ..
13 ..
.. ..
2 ... 20 ..
21 ..
Hope I am clear
Thanks
You really have two different queries:
SELECT [type]\100 AS TypePart, Count(t.type) AS CountOftype
FROM t
GROUP BY [type]\100;
And:
SELECT [type]\100 AS TypePart, [type] Mod 100 AS TypeEnd,
Count(t.type) AS CountOftype
FROM t
GROUP BY [type]\100, [type] Mod 100;
Where t is the name of the table.
Here on the first query i am getting something like this
utypPart CountOftype
1 29
2 42
3 46
4 50
5 26
6 45
7 33
9 1
it is giving me how many utyp are starting with 1 2 and so on
but whai i want is the sum of no of times those types occur for the utyp .