Imbed matplotlib figure into iPython HTML - html
I want to dynamically write and display HTML with a code cell in Jupyter Notebook. The objective is to generate the HTML to display table, div, img tags in some way I choose. I want to capture img data and place it where I want in this auto generated HTML.
So far I've figured out that I can do the following:
from IPython.core.display import HTML
HTML("<h1>Hello</h1>")
and get:
Hello
That's great. However, I want to be able to do this:
HTML("<h1>Hello</h1><hr/><img src='somestring'/>")
and get something similar to a Hello with a horizontal line and an image below it, where the image is the same one as below.
import pandas as pd
import numpy as np
np.random.seed(314)
df = pd.DataFrame(np.random.randn(1000, 2), columns=['x', 'y'])
df.plot.scatter(0, 1)
The result should look like this:
Question
What do I replace 'something' with in order to implement this? And more to the point, how do I get it via python?
I would have imagined there was an attribute on a figure object that would hold an serialized version of the image but I can't find it.
After some digging around. Credit to Dmitry B. for pointing me in the right direction.
Solution
from IPython.core.display import HTML
import binascii
from StringIO import StringIO
import matplotlib.pyplot as plt
# open IO object
sio = StringIO()
# generate random DataFrame
np.random.seed(314)
df = pd.DataFrame(np.random.randn(1000, 2), columns=['x', 'y'])
# initialize figure and axis
fig, ax = plt.subplots(1, 1)
# plot DataFrame
ax.scatter(df.iloc[:, 0], df.iloc[:, 1]);
# print raw canvas data to IO object
fig.canvas.print_png(sio)
# convert raw binary data to base64
# I use this to embed in an img tag
img_data = binascii.b2a_base64(sio.getvalue())
# keep img tag outter html in its own variable
img_html = '<img src="data:image/png;base64,{}
">'.format(img_data)
HTML("<h1>Hello</h1><hr/>"+img_html)
I end up with:
from IPython.core.display import Image
import io
s = io.BytesIO()
# make your figure here
plt.savefig(s, format='png', bbox_inches="tight")
plt.close()
Image(s.getvalue())
Let say you have base64 encoded image data:
img_data =
""
then in have it rendered inside of an iPython cell you simply do:
from IPython.core.display import Image
Image(data=img_data)
I'm going to build on what was answered by others (piRSquared) because it didn't work for me with Jupyter and Python 3. I wrote the following function, which will take any plot function I define and call it, and capture the outputs without displaying them in Jupyter. I personally use this in to build custom HTML machine learning reports based on many model iterations I execute using Livy and Spark.
from IPython.core.display import HTML
import binascii
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
import base64
def capturePlotHTML(plotFunction):
# open IO object
sio3 = BytesIO()
plotFunction()
plt.savefig(sio3)
sio3.seek(0)
data_uri = base64.b64encode(sio3.read()).decode('ascii')
html_out = '<html><head></head><body>'
html_out += '<img src="data:image/png;base64,{0}" align="left">'.format(data_uri)
html_out += '</body></html>'
#prevents plot from showing in output
plt.close()
return (HTML(html_out))
# Plot Wrappers
# Advanced Wrapper for more complex visualizations (seaborn, etc)
class plotRegline:
def __init__(self):
#// could also pass in name as arg like this #def __init__(self, name):
reg_line_prepped_pdf = pandas_input_pdf
sns.lmplot(x='predicted',y='actual',data=reg_line_prepped_pdf,fit_reg=True, height=3, aspect=2).fig.suptitle("Regression Line")
# Basic Wrapper for simple matplotlib visualizations
def plotTsPred():
ts_plot_prepped_pdf = pandas_input_pdf
ts_plot_prepped_pdf.index = pd.to_datetime(ts_plot_prepped_pdf.DAYDATECOLUMN)
ts_plot_prepped_pdf = ts_plot_prepped_pdf.drop(columns=["DAYDATECOLUMN"])
ts_plot_prepped_pdf.plot(title="Predicted Vs Actual -- Timeseries Plot -- Days", figsize=(25,6))
#building the plots and capturing the outputs
regline_html = capturePlotHTML(plotRegline)
ts_plot_day_html = capturePlotHTML(plotTsPred)
# could be any list number of html objects
html_plots = [regline_html, ts_plot_day_html]
combined_html_plots = display_html(*html_plots)
# the following can be run in this code block or another display the results
combined_html_plotes
The answer by piRSquared no longer works with Python 3. I had to change it to:
from IPython.core.display import HTML
import binascii
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# open IO object
bio = BytesIO()
# generate random DataFrame
np.random.seed(314)
df = pd.DataFrame(np.random.randn(1000, 2), columns=['x', 'y'])
# initialize figure and axis
fig, ax = plt.subplots(1, 1);
# plot DataFrame
ax.scatter(df.iloc[:, 0], df.iloc[:, 1]);
# print raw canvas data to IO object
fig.canvas.print_png(bio)
plt.close(fig)
# convert raw binary data to base64
# I use this to embed in an img tag
img_data = binascii.b2a_base64(bio.getvalue()).decode()
# keep img tag outter html in its own variable
img_html = '<img src="data:image/png;base64,{}
">'.format(img_data)
HTML("<h1>Hello</h1><hr/>"+img_html)
Specifically, I import from io, not StringIO, and I use BytesIO rather than StringIO. I needed to decode the bytes into a string for inserting into the HTML. I also added the required imports of numpy and pandas for the example plot to work, and added plt.close(fig) so that you don't end up with two figures in the output.
If you want to show the results of DataFrame.plot in an iPython cell, try this:
import pandas as pd
import numpy as np
%matplotlib inline
np.random.seed(314)
df = pd.DataFrame(np.random.randn(1000, 2), columns=['x', 'y'])
df.plot.scatter(0, 1)
Related
Pandas parallel URL downloads with pd.read_html
I know I can download a csv file from a web page by doing: import pandas as pd import numpy as np from io import StringIO URL = "http://www.something.com" data = pd.read_html(URL)[0].to_csv(index=False, header=True) file = pd.read_csv(StringIO(data), sep=',') Now I would like to do the above for more URLs at the same time, like when you open different tabs in your browser. In other words, a way to parallelize this when you have different URLs, instead of looping through or doing it one at a time. So, I thought of having a series of URLs inside a dataframe, and then create a new column which contains the strings 'data', one for each URL. list_URL = ["http://www.something.com", "http://www.something2.com", "http://www.something3.com"] df = pd.DataFrame(list_URL, columns =['URL']) df['data'] = pd.read_html(df['URL'])[0].to_csv(index=False, header=True) But it gives me error: cannot parse from 'Series' Is there a better syntax, or does this mean I cannot do this in parallel for more than one URL?
You could try like this: import pandas as pd URLS = [ "https://en.wikipedia.org/wiki/Periodic_table#Presentation_forms", "https://en.wikipedia.org/wiki/Planet#Planetary_attributes", ] df = pd.DataFrame(URLS, columns=["URL"]) df["data"] = df["URL"].map( lambda x: pd.read_html(x)[0].to_csv(index=False, header=True) ) print(df) # Output URL data 0 https://en.wikipedia.org/wiki/Periodic_t... 0\r\nPart of a series on the\r\nPeriodic... 1 https://en.wikipedia.org/wiki/Planet#Pla... 0\r\n"The eight known planets of the Sol...
How can i convert json file from labelme interface to png or image format file?
When i used the labeling the images from labelme interface as output i get json file.but i need to in image format like png,bmp,jpeg after labeling. can anyone suggest me any code ? import json from PIL import Image with open('your,json') as f: data = json.load(f) # Load the file path from the json imgpath = data['yourkey'] # Place the image path into the open method img = Image.open(imgpath)
Based on the tutorial of the original repository, you can use labelme_json_to_dataset <<JSON_PATH>> -o <<OUTPUT_FOLDER_PATH>>. To run it on python / jupyter, you can use: import os def labelme_json_to_dataset(json_path): os.system("labelme_json_to_dataset "+json_path+" -o "+json_path.replace(".","_")) If you need to do it for multiple images, just loop the function. Based on the issue, labelme_json_to_dataset behavior can be reimplemented by using either labelme2voc.py or labelme2coco.py. You also could use other implementation like labelme2Datasets You also can implement your own modification of labelme_json_to_dataset using labelme library. Basically, you use label_file = labelme.LabelFile(filename=filename) followed by img = labelme.utils.img_data_to_arr(label_file.imageData). An example of a process would be like this: import labelme import os import glob def labelme2images(input_dir, output_dir, force=False, save_img=False, new_size=False): """ new_size_width, new_size_height = new_size """ if save_img: _makedirs(path=osp.join(output_dir, "images"), force=force) if new_size: new_size_width, new_size_height = new_size print("Generating dataset") filenames = glob.glob(osp.join(input_dir, "*.json")) for filename in filenames: # base name base = osp.splitext(osp.basename(filename))[0] label_file = labelme.LabelFile(filename=filename) img = labelme.utils.img_data_to_arr(label_file.imageData) h, w = img.shape[0], img.shape[1] if save_img: if new_size: img_pil = Image.fromarray(img).resize((new_size_height, new_size_width)) else: img_pil = Image.fromarray(img) img_pil.save(osp.join(output_dir, "images", base + ".jpg"))
I got an error in Pycham after trying to run a test code for deep learning
# USAGE # python train_simple_nn.py --dataset animals --model output/simple_nn.model --label-bin output/simple_nn_lb.pickle --plot output/simple_nn_plot.png # set the matplotlib backend so figures can be saved in the background import matplotlib matplotlib.use("Agg") # import the necessary packages from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import SGD from imutils import paths import matplotlib.pyplot as plt import numpy as np import argparse import random import pickle import cv2 import os # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-d", "--dataset", required=True, help="path to input dataset of images") ap.add_argument("-m", "--model", required=True, help="path to output trained model") ap.add_argument("-l", "--label-bin", required=True, help="path to output label binarizer") ap.add_argument("-p", "--plot", required=True, help="path to output accuracy/loss plot") args = vars(ap.parse_args()) # initialize the data and labels print("[INFO] loading images...") data = [] labels = [] # grab the image paths and randomly shuffle them imagePaths = sorted(list(paths.list_images(args["dataset"]))) random.seed(42) random.shuffle(imagePaths) # loop over the input images for imagePath in imagePaths: # load the image, resize the image to be 32x32 pixels (ignoring # aspect ratio), flatten the image into 32x32x3=3072 pixel image # into a list, and store the image in the data list image = cv2.imread(imagePath) image = cv2.resize(image, (32, 32)).flatten() data.append(image) # extract the class label from the image path and update the # labels list label = imagePath.split(os.path.sep)[-2] labels.append(label) # scale the raw pixel intensities to the range [0, 1] data = np.array(data, dtype="float") / 255.0 labels = np.array(labels) I found a test code for studing Deep learning. And tried to run in Pycharm. but I got this error message. Actually I couldn't understand what that parser function is doing here. could you explain about that code and about the error? ---error i got in Pycharm ----------------------- C:\Users\giyeo\anaconda3\envs\tf\python.exe "D:/GiyeonLee/09. Machine Learning/Pycharm/Tutorial/keras-tutorial/train_simple_nn.py" 2020-07-06 13:56:28.409237: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll usage: train_simple_nn.py [-h] -d DATASET -m MODEL -l LABEL_BIN -p PLOT train_simple_nn.py: error: the following arguments are required: -d/--dataset, -m/--model, -l/--label-bin, -p/--plot Process finished with exit code 2 Thanks for reading my quation..
Can't load dataset into ipython. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 1: invalid continuation byte
Fairly new to using ipython so I'm still getting confused quite easily. Here is my code so far. After loading I have to display only the first 5 rows of the file. # Import useful packages for data science from IPython.display import display, HTML import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline # Load concerts.csv path1 = 'C:\\Users\\Cathal\\Documents\\concerts.csv' concerts = pd.read_csv(path1) Thanks in advance for any help.
try concerts = pd.read_csv(path1, encoding = 'utf8') if that doesnt work try concerts = pd.read_csv(path1, encoding = "ISO-8859-1")
Plotting candlestick data from a dataframe in Python
I would like create a daily candlestick plot from data i downloaded from yahoo using pandas. I'm having trouble figuring out how to use the candlestick matplotlib function in this context. Here is the code: #The following example, downloads stock data from Yahoo and plots it. from pandas.io.data import get_data_yahoo import matplotlib.pyplot as plt from matplotlib.pyplot import subplots, draw from matplotlib.finance import candlestick symbol = "GOOG" data = get_data_yahoo(symbol, start = '2013-9-01', end = '2013-10-23')[['Open','Close','High','Low','Volume']] ax = subplots() candlestick(ax,data['Open'],data['High'],data['Low'],data['Close']) Thanks Andrew.
Using bokeh: import io from math import pi import pandas as pd from bokeh.plotting import figure, show, output_file df = pd.read_csv( io.BytesIO( b'''Date,Open,High,Low,Close 2016-06-01,69.6,70.2,69.44,69.76 2016-06-02,70.0,70.15,69.45,69.54 2016-06-03,69.51,70.48,68.62,68.91 2016-06-04,69.51,70.48,68.62,68.91 2016-06-05,69.51,70.48,68.62,68.91 2016-06-06,70.49,71.44,69.84,70.11 2016-06-07,70.11,70.11,68.0,68.35''' ) ) df["Date"] = pd.to_datetime(df["Date"]) inc = df.Close > df.Open dec = df.Open > df.Close w = 12*60*60*1000 TOOLS = "pan,wheel_zoom,box_zoom,reset,save" p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, title = "Candlestick") p.xaxis.major_label_orientation = pi/4 p.grid.grid_line_alpha=0.3 p.segment(df.Date, df.High, df.Date, df.Low, color="black") p.vbar(df.Date[inc], w, df.Open[inc], df.Close[inc], fill_color="#D5E1DD", line_color="black") p.vbar(df.Date[dec], w, df.Open[dec], df.Close[dec], fill_color="#F2583E", line_color="black") output_file("candlestick.html", title="candlestick.py example") show(p) Code above forked from here: http://docs.bokeh.org/en/latest/docs/gallery/candlestick.html
I have no reputation to comment #randall-goodwin answer, but for pandas 0.16.2 line: # convert the datetime64 column in the dataframe to 'float days' data.Date = mdates.date2num(data.Date) must be: data.Date = mdates.date2num(data.Date.dt.to_pydatetime()) because matplotlib does not support the numpy datetime64 dtype
I stumbled across a great pastebin entry: http://pastebin.com/ne7Fjdiq that does this well. I too was having trouble getting the calling syntax right. It usually revolves around transforming your data in simple ways to get the function to work right. My issue was with the datetime. There must be something in my format data. Once I replaced the Date series with range(maxdata) then it worked. data = pandas.read_csv('data.csv', parse_dates={'Timestamp': ['Date', 'Time']}, index_col='Timestamp') ticks = data.ix[:, ['Price', 'Volume']] bars = ticks.Price.resample('1min', how='ohlc') barsa = bars.fillna(method='ffill') fig = plt.figure() fig.subplots_adjust(bottom=0.1) ax = fig.add_subplot(111) plt.title("Candlestick chart") volume = ticks.Volume.resample('1min', how='sum') value = ticks.prod(axis=1).resample('1min', how='sum') vwap = value / volume Date = range(len(barsa)) #Date = matplotlib.dates.date2num(barsa.index)# DOCHLV = zip(Date , barsa.open, barsa.close, barsa.high, barsa.low, volume) matplotlib.finance.candlestick(ax, DOCHLV, width=0.6, colorup='g', colordown='r', alpha=1.0) plt.show()
Here is the solution: from pandas.io.data import get_data_yahoo import matplotlib.pyplot as plt from matplotlib import dates as mdates from matplotlib import ticker as mticker from matplotlib.finance import candlestick_ohlc import datetime as dt symbol = "GOOG" data = get_data_yahoo(symbol, start = '2014-9-01', end = '2015-10-23') data.reset_index(inplace=True) data['Date']=mdates.date2num(data['Date'].astype(dt.date)) fig = plt.figure() ax1 = plt.subplot2grid((1,1),(0,0)) plt.ylabel('Price') ax1.xaxis.set_major_locator(mticker.MaxNLocator(6)) ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d')) candlestick_ohlc(ax1,data.values,width=0.2)
Found this question when I too was looking how to use candlestick with a pandas dataframe returned from one of the DataReader services like get_data_yahoo. I eventually figured it out. One of the keys was this other question, answered by Wes McKinney and RJRyV. Here is that link: Pandas convert dataframe to array of tuples The key was to read the candlestick.py function definition to determine how it expected to receive the data. The date needed to be converted first, then the entire dataframe needed to be converted to an array of tuples. Here is the final code that worked for me. Maybe there is some other Candlestick chart out there somewhere that works directly on a pandas dataframe returned from one of the stock quote services. That would be very nice. # Imports from pandas.io.data import get_data_yahoo from datetime import datetime, timedelta import matplotlib.dates as mdates from matplotlib.pyplot import subplots, draw from matplotlib.finance import candlestick import matplotlib.pyplot as plt # get the data on a symbol (gets last 1 year) symbol = "TSLA" data = get_data_yahoo(symbol, datetime.now() - timedelta(days=365)) # drop the date index from the dateframe data.reset_index(inplace = True) # convert the datetime64 column in the dataframe to 'float days' data.Date = mdates.date2num(data.Date) # make an array of tuples in the specific order needed dataAr = [tuple(x) for x in data[['Date', 'Open', 'Close', 'High', 'Low']].to_records(index=False)] # construct and show the plot fig = plt.figure() ax1 = plt.subplot(1,1,1) candlestick(ax1, dataAr) plt.show()