Having trouble understanding the predictions array in classification model evaluation - deep-learning

I'm working on a sarcasm detector with the BERT model (binary classification). Currently, I'm having trouble with the model evaluation as I don't really understand the predictions array. The model should output 1 for sarcastic and 0 for not, but the predictions don't output that. Please let me know if more code is needed. Thank you!
model:
from transformers import BertForSequenceClassification, AdamW, BertConfig
# Load BertForSequenceClassification, the pretrained BERT model with a single
# linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
num_labels = 2, # The number of output labels--2 for binary classification.
# You can increase this for multi-class tasks.
output_attentions = False, # Whether the model returns attentions weights.
output_hidden_states = False, # Whether the model returns all hidden-states.
attention_probs_dropout_prob=0.25,
hidden_dropout_prob=0.25
)
# Tell pytorch to run this model on the GPU.
model.cuda()
evaluation:
from sklearn.metrics import confusion_matrix
import seaborn as sn
import pandas as pd
print('Predicting labels for {:,} test sentences...'.format(len(eval_input_ids)))
# Put model in evaluation mode
model.eval()
predictions , true_labels = [], []
# iterate over test data
for batch in eval_dataloader:
batch = tuple(t.to(device) for t in batch)
# Unpack the inputs from our dataloader
b_input_ids, b_input_mask, b_labels = batch
# Telling the model not to compute or store gradients, saving memory and
# speeding up prediction
with torch.no_grad():
# Forward pass, calculate logit predictions.
result = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
return_dict=True)
logits = result.logits
# Move logits and labels to CPU
logits = logits.detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
# Store predictions and true labels
predictions.append(logits)
true_labels.append(label_ids)
true_labels[1]
predictions[1]
output:
array([0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1,
0, 1, 1, 0, 0, 0, 0, 1, 1, 1]) <-- true_labels[1]
array([[ 2.9316974 , -2.855342 ],
[ 3.4540875 , -3.3177233 ],
[ 2.7424026 , -2.6472614 ],
[-3.4326897 , 3.330751 ],
[ 3.7238903 , -3.7757814 ],
[-3.208891 , 3.175109 ],
[ 3.0500402 , -2.8103237 ],
[ 3.8333693 , -3.9073608 ],
[-3.2779126 , 3.231213 ],
[ 1.484127 , -1.2610332 ],
[ 3.686339 , -3.7582958 ],
[-2.1883147 , 2.205132 ],
[-3.274582 , 3.2254982 ],
[-1.606854 , 1.6213335 ],
[ 3.7080388 , -3.6854186 ],
[-2.351147 , 2.365543 ],
[-3.7317555 , 3.4833894 ],
[ 3.2413306 , -3.2116275 ],
[ 3.7413723 , -3.7767386 ],
[-3.6293464 , 3.4446163 ],
[ 3.7779078 , -3.9025154 ],
[-3.5576923 , 3.403335 ],
[ 3.6226897 , -3.6370063 ],
[-3.7081888 , 3.4720154 ],
[ 1.1533121 , -0.8105195 ],
[ 1.0573612 , -0.69238156],
[ 3.4189024 , -3.4764926 ],
[-0.13847755, 0.450572 ],
[ 3.7248163 , -3.7781181 ],
[-3.2015219 , 3.1719215 ],
[-2.1409311 , 2.1202204 ],
[-3.470693 , 3.358798 ]], dtype=float32) <-- predictions[1]

There are two values because you have two classes (0=no, 1=yes). These values are logits, which when fed into a softmax function gives the probability of each class. If you want to know whether the sample is classified as sarcasm or not, just take the class with the highest logit:
predictions = a.max(1)[1]
print(predictions)

Related

Using 'extendData' in a 'dcc.Interval' event will only update my graph when I'm browsing another Chrome tab

I created a very simple (one page) application in Dash that appends random data to a plotly chart using a dcc.Interval component and the extendData method (I'd like to have x values max).
The program worked like a charm, until I tried to port it to a multi-page application:
I used the following example:
https://github.com/facultyai/dash-bootstrap-components/blob/main/examples/python/templates/multi-page-apps/responsive-collapsible-sidebar/sidebar.py
and replaced:
elif pathname == "/page-1":
return html.P("This is the content of page 1. Yay!")
with:
import page_1
...
elif pathname == "/page-1":
return page_1.layout
My page_1.py contains the following code:
from dash import dcc, html
import dash_bootstrap_components as dbc
import plotly.graph_objs as go
layout = dbc.Card(dbc.CardBody([
html.H4('Live Feed'),
dcc.Graph(id='live-update-graph',
figure=go.Figure({'data':
[
{'x': [], 'y': []},
{'x': [], 'y': []},
{'x': [], 'y': []},
{'x': [], 'y': []}
]
}),
),
dcc.Interval(
id='interval-component',
interval=0.1*1000, # in milliseconds
n_intervals=0
)
]
))
I put my Callback in my app.py file:
#app.callback(Output('live-update-graph', 'extendData'),
Input('interval-component', 'n_intervals')
)
def update_graph_live(n):
# Collect some data
y1 = np.random.normal(loc = 10, scale=10)
y2 = y1 + random.randint(-5, 5)
y3 = y2 + random.randint(-10, 60)
y4 = y3 + random.randint(-40, 2)
return [{'x': [[datetime.datetime.now()]] * 4,'y': [[y1], [y2], [y3], [y4]]}, [0,1, 2, 3], 300]
...
if __name__ == '__main__':
app.run_server(debug=True)
Unfortunatly, my chart will only update when I'm browsing another tab in Chrome, not when I'm displaying it.
I have another page with some other components and an associated callback declared in my app.py file as :
#app.callback(
Output("result-code", "children"),
Input("slider", "value"),
)
def create_python_script(slider):
markdown = markdown_start
markdown += '''
msg = {{
"slider_value": {slider}
}}'''.format(slider=slider)
markdown += markdown_end
return markdown
And my Markdown component is updated in real-time, no problem with that.
Here is a copy of my callback status:
Callback status in Dash
My developper console shows every incoming message in the front-end part:
{
"multi": true,
"response": {
"live-update-graph": {
"extendData": [
{
"x": [
[
"2023-02-13T16:58:37.533426"
],
[
"2023-02-13T16:58:37.533426"
],
[
"2023-02-13T16:58:37.533426"
],
[
"2023-02-13T16:58:37.533426"
]
],
"y": [
[
-4.26648933108117
],
[
-3.2664893310811696
],
[
-8.26648933108117
],
[
-9.26648933108117
]
]
},
[
0,
1,
2,
3
],
300
]
}
}
}
Am I doing something wrong?
Thanks in advance !
I was using http://localhost:8888 instead of http://127.0.0.1:8888 to connect to my web app and somehow didn't see it, and that was preventing the chart from being updated.

Dash Tabulator : "movableRowsConnectedTables" is not working

I’m trying to use the "movableRowsConnectedTables" built-in functionality as explained in the tabulator.js examples
It doesn’t seem to work as expected:
import dash
from dash import html
import dash_bootstrap_components as dbc
import dash_tabulator
columns = [
{ "title": "Name",
"field": "name",}
]
options_from = {
'movableRows' : True,
'movableRowsConnectedTables':"tabulator_to",
'movableRowsReceiver': "add",
'movableRowsSender': "delete",
'height':200,
'placeholder':'No more Rows'
}
options_to = {
'movableRows' : True,
'height':200,
'placeholder':'Drag Here'
}
data = [
{"id":1, "name":"a"},
{"id":2, "name":"b"},
{"id":3, "name":"c"},
]
layout = html.Div(
[
dbc.Row(
[
dbc.Col(
[ html.Header('DRAG FROM HERE'),
dash_tabulator.DashTabulator(
id='tabulator_from',
columns=columns,
options=options_from,
data=data,
),
], width = 6
),
dbc.Col(
[ html.Header('DROP HERE'),
dash_tabulator.DashTabulator(
id='tabulator_to',
columns=columns,
options=options_to,
data = []
),
], width = 6
)
]
)
]
)
app = dash.Dash(external_stylesheets=[dbc.themes.BOOTSTRAP])
app.layout = dbc.Container(layout, fluid=True)
if __name__ == '__main__':
app.run_server(debug=True)
Is it also possible to get callbacks when elements were dropped?
It would be great to have this functionality inside dash!
example
im not familiar with tabulator_dash but the table your sending too also needs a 'movableRowsConnectedTables':"tabulator_from" option

Reading Parquet dataset using PyArrow filter is not working

I want to implement below query using PyArrow filter,
'(salary == 150280.17 or country == "Finland" ) and (first_name == "Amanda" or last_name == "Gray")'
dataset = pq.ParquetDataset(
parquit_file,
(use_legacy_dataset = False),
(filters = [
([("salary", "==", 150280.17)], [("country", "==", "Canada")]),
([("first_name", "==", "Amanda")], [("last_name", "==", "Gray")]),
])
);
dataset.read().to_pandas();
but it is giving me error .
ValueError: not enough values to unpack (expected 3, got 1)
The filters should be a list[tuple] or a list[list[tuple]]:
dataset = pq.ParquetDataset(
parquet_file,
use_legacy_dataset = False,
filters = [
[
("salary", "==", 150280.17),
("country", "==", "Canada"),
],
[
("first_name", "==", "Amanda"),
("last_name", "==", "Gray"),
]
]
)
dataset.read().to_pandas()
You had an extra [].

Add a # to beginning of each key in Json Python2.7

I'm trying to add a "#" at the beginning to each key of a Json object (got it from RabbitMQ api calls)
here is my attempt :
#!/bin/python
# Libraries import
import requests
import json
import sys
import os
# Define URLs
overview="/api/overview"
nodes="/api/nodes"
queues="/api/queues"
# Get credentials from file
with open('/credentials') as json_file:
data = json.load(json_file)
user = data['user']
passwd = data['pass']
# Test which URL we want to call
if ''.join(sys.argv[1]) == "overview":
commande=overview
if ''.join(sys.argv[1]) == "queues":
commande=queues
if ''.join(sys.argv[1]) == "nodes":
commande=nodes
def append(mydict):
return dict(map(lambda (key, value): ("#"+str(key), value), mydict.items()))
def transform(multileveldict):
new = append(multileveldict)
for key, value in new.items():
if isinstance(value, dict):
new[key] = transform(value)
return new
def upper_keys(x):
if isinstance(x, list):
return [upper_keys(v) for v in x]
elif isinstance(x, dict):
return dict((k.upper(), upper_keys(v)) for k, v in x.iteritems())
else:
return x
# Main
response = requests.get("http://localhost:15672" + commande, auth=(user, passwd))
if(response.ok):
json_data = json.loads(response.content)
json = json.dumps(upper_keys(json_data), indent=4)
print(json)
Here is the JSON that I get in "response.content" :
[
{
"NODE": "rabbit#server567",
"EXCLUSIVE": false,
"NAME": "test-01",
"SYNCHRONISED_SLAVE_NODES": [],
"SLAVE_NODES": [],
"AUTO_DELETE": false,
"VHOST": "/",
"ARGUMENTS": {},
"TYPE": "classic",
"DURABLE": false
},
{
"NODE": "rabbit#server567",
"EXCLUSIVE": false,
"NAME": "test-02",
"SYNCHRONISED_SLAVE_NODES": [],
"SLAVE_NODES": [],
"AUTO_DELETE": false,
"VHOST": "/",
"ARGUMENTS": {},
"TYPE": "classic",
"DURABLE": false
},
{
"NODE": "rabbit#server567",
"EXCLUSIVE": false,
"NAME": "test-03",
"SYNCHRONISED_SLAVE_NODES": [],
"SLAVE_NODES": [],
"AUTO_DELETE": false,
"VHOST": "/",
"ARGUMENTS": {},
"TYPE": "classic",
"DURABLE": false
},
{
"MESSAGES_UNACKNOWLEDGED_RAM": 0,
"RECOVERABLE_SLAVES": null,
"CONSUMERS": 0,
"REDUCTIONS": 9700519,
"AUTO_DELETE": false,
"MESSAGE_BYTES_PAGED_OUT": 0,
"MESSAGE_BYTES_UNACKNOWLEDGED": 0,
"REDUCTIONS_DETAILS": {
"RATE": 0.0
},
"MESSAGE_BYTES": 0,
"MESSAGES_UNACKNOWLEDGED": 0,
"CONSUMER_UTILISATION": null,
"EXCLUSIVE": false,
"VHOST": "/",
"GARBAGE_COLLECTION": {
"MAX_HEAP_SIZE": 0,
"MIN_HEAP_SIZE": 233,
"FULLSWEEP_AFTER": 65535,
"MINOR_GCS": 15635,
"MIN_BIN_VHEAP_SIZE": 46422
},
"MESSAGES_DETAILS": {
"RATE": 0.0
},
"SLAVE_NODES": [
"rabbit#server567"
],
"MESSAGE_BYTES_PERSISTENT": 0,
"POLICY": "ha-all",
"MESSAGES_PAGED_OUT": 0,
"NODE": "rabbit#server566",
"HEAD_MESSAGE_TIMESTAMP": null,
"DURABLE": false,
"MESSAGES_READY_RAM": 0,
"STATE": "running",
"ARGUMENTS": {},
"EFFECTIVE_POLICY_DEFINITION": {
"HA-MODE": "all"
},
"MESSAGES_READY": 0,
"MESSAGES_RAM": 0,
"MESSAGE_BYTES_READY": 0,
"SINGLE_ACTIVE_CONSUMER_TAG": null,
"NAME": "test-04",
"MESSAGES_PERSISTENT": 0,
"BACKING_QUEUE_STATUS": {
"MIRROR_SENDERS": 0,
"Q1": 0,
"Q3": 0,
"Q2": 0,
"Q4": 0,
"AVG_ACK_EGRESS_RATE": 0.0,
"MIRROR_SEEN": 0,
"LEN": 0,
"TARGET_RAM_COUNT": "infinity",
"MODE": "default",
"NEXT_SEQ_ID": 0,
"DELTA": [
"delta",
"undefined",
0,
0,
"undefined"
],
"AVG_ACK_INGRESS_RATE": 0.0,
"AVG_EGRESS_RATE": 0.0,
"AVG_INGRESS_RATE": 0.0
},
"MESSAGES": 0,
"IDLE_SINCE": "2020-10-16 13:50:50",
"OPERATOR_POLICY": null,
"SYNCHRONISED_SLAVE_NODES": [
"rabbit#server567"
],
"MEMORY": 10556,
"EXCLUSIVE_CONSUMER_TAG": null,
"MESSAGES_READY_DETAILS": {
"RATE": 0.0
},
"TYPE": "classic",
"MESSAGES_UNACKNOWLEDGED_DETAILS": {
"RATE": 0.0
},
"MESSAGE_BYTES_RAM": 0
}
]
Here, I made every key in uppercase and can display it has JSON but can't find anything to add this "#" to the beginning of each key
PS : I'm new to Python development
Thank you very much
Since you mentioned that you have successfully converted every keys in a dictionary into upper case keys, why don't you reuse the method and change the part where you do upper case into prepending "#"
# the one you provided
def upper_keys(x):
if isinstance(x, list):
return [upper_keys(v) for v in x]
elif isinstance(x, dict):
return dict((k.upper(), upper_keys(v)) for k, v in x.iteritems())
else:
return x
# the modified method
def prepend_hash_keys(x):
if isinstance(x, list):
return [prepend_hash_keys(v) for v in x]
elif isinstance(x, dict):
# this part from k.upper() to "#" + k
return dict(("#" + k, prepend_hash_keys(v)) for k, v in x.iteritems())
else:
return x
Your transform function actually works fine (for Python 2), you just forgot to actually call it! Instead, you call only upper_keys, but not transform:
json = json.dumps(upper_keys(json_data), indent=4) # where's transform?
If you use both one after the other (order does not matter) it should work:
json = {"nested": {"dict": {"with": {"lowercase": "keys"}}}}
print(transform(upper_keys(json)))
# {'#NESTED': {'#DICT': {'#WITH': {'#LOWERCASE': 'keys'}}}}
However, both transform and upper_keys can be simplified a lot using dictionary comprehensions (also available in Python 2), and you can combine both in one function:
def transform_upper(d):
if isinstance(d, dict):
return {"#" + k.upper(): transform_upper(v) for k, v in d.items()}
else:
return d
print(transform_upper(json))
# {'#NESTED': {'#DICT': {'#WITH': {'#LOWERCASE': 'keys'}}}}
From the look of it you already tried something like that in append() function.
If you modify that a bit to have something like this, it may do what you are looking for:
mydict = {
'name':1,
'surname':2
}
def append(mydict):
new_dict = {}
for key, val in mydict.items():
new_dict['#'+key]=val
return new_dict
print(append(mydict))

Convert json fetched into dataframe using R

I've json like below, which i got from below URL:
{
"info" : {
"1484121600" : [
212953175.053333,212953175.053333,null
],
"1484125200" : [
236203014.133333,236203014.133333,236203014.133333
],
"1484128800" : [
211414832.968889,null,211414832.968889
],
"1484132400" : [
208604573.791111,208604573.791111,208604573.791111
],
"1484136000" : [
231358374.288889,231358374.288889,231358374.288889
],
"1484139600" : [
210529301.097778,210529301.097778,210529301.097778
],
"1484143200" : [
212009682.04,null,212009682.04
],
"1484146800" : [
232364759.566667,232364759.566667,232364759.566667
],
"1484150400" : [
218138788.524444,218138788.524444,218138788.524444
],
"1484154000" : [
218883301.282222,218883301.282222,null
],
"1484157600" : [
237874583.771111,237874583.771111,237874583.771111
],
"1484161200" : [
216227081.924444,null,216227081.924444
],
"1484164800" : [
227102054.082222,227102054.082222,null
]
},
"summary" : "data",
"end" : 1484164800,
"start": 1484121600
}
I'm fetching this json from some url using jsonlite package in R like below:
library(jsonlite)
input_data <- fromJSON(url)
timeseries <- input_data[['info']] # till here code is fine
abc <- data.frame(ds = names(timeseries[[1]]),
y = unlist(timeseries[[1]]), stringsAsFactors = FALSE)
(something is wrong in above line)
I need to convert this data in timeseries variable into data frame; which will have index column as the epoch time and no. of columns in dataframe will depend upon no. of values in array and all arrays will have same no. of values for sure. But no. of values in array can be 1 0r 2 or etc; it is not fixed. Like in below example array size is 3 for all.
for eg : dataframe should look like:
index y1 y2 y3
1484121600 212953175.053333 212953175.053333 null
1484125200 236203014.133333 236203014.133333 236203014.133333
Please suggest how do I do this in R. I'm new to it.
JSON with only 1 item in array:
{
"info": {
"1484121600": [
212953175.053333
],
"1484125200": [
236203014.133333
],
"1484128800": [
211414832.968889
],
"1484132400": [
208604573.791111
],
"1484136000": [
231358374.288889
],
"1484139600": [
210529301.097778
],
"1484143200": [
212009682.04
],
"1484146800": [
232364759.566667
],
"1484150400": [
218138788.524444
],
"1484154000": [
218883301.282222
],
"1484157600": [
237874583.771111
],
"1484161200": [
216227081.924444
],
"1484164800": [
227102054.082222
]
},
"summary": "data",
"end": 1484164800,
"start": 1484121600
}
Consider binding the list of json values to a matrix with sapply(), then transpose columns to rows with t(), and finally convert to dataframe with data.frame()
abc <- data.frame(t(sapply(timeseries, c)))
colnames(abc) <- gsub("X", "y", colnames(abc))
abc
# y1 y2 y3
# 1484121600 212953175 212953175 NA
# 1484125200 236203014 236203014 236203014
# 1484128800 211414833 NA 211414833
# 1484132400 208604574 208604574 208604574
# 1484136000 231358374 231358374 231358374
# 1484139600 210529301 210529301 210529301
# 1484143200 212009682 NA 212009682
# 1484146800 232364760 232364760 232364760
# 1484150400 218138789 218138789 218138789
# 1484154000 218883301 218883301 NA
# 1484157600 237874584 237874584 237874584
# 1484161200 216227082 NA 216227082
# 1484164800 227102054 227102054 NA