Influxdb : CSV import problem, missing values - csv

This is my first post on Stack OverFlow and I hope to do it properly.
I'm new to influxdb and telegraf, but as part of a project, I want to import metrics, which come from network equipment and are exported by a network manager via .csv file to influxdb.
I get:
#IP :
2020-05-YY :
CSV1
CSV2
2020-05-ZZ
...
The structure of the csv file is as follows:
TimeStamp, NetworkElement_Name, Typeofmeasurement, [depends on the measurement, but in the following example it will be the memory of each card.]
Here is an example :
TimeStamp,NetworkElement_Name,Typeofmeasurement,Object,memAbsoluteUsage
2020-05-05T20:00:00+02:00,router1,CPU/Memory Usage,card1,1075
2020-05-05T20:00:00+02:00,router1,CPU/Memory Usage,card2,832
This file exists twice for the same timestamp but with a different "NetworkElement_Name", "Object" and value.
As far as telegraf is concerned, I created a ".conf" file, for each imported CSV, as follows :
[[inputs.file]]
files = ["/metric/data/data/clean_data/**/**/router_cpuMemUsage.csv"]
data_format = "csv"
csv_header_row_count = 1
csv_skip_rows = 0
csv_skip_columns = 0
csv_delimiter = ","
csv_column_types = ["string","string","string","string","int"]
csv_measurement_column = "Typeofmeasurement"
csv_timestamp_column = "TimeStamp"
csv_timestamp_format = "2006-01-02T15:04:05-07:00"
[[outputs.influxdb]]
database = "router Metrics"
And the data seems imported... but, I realize that some values are missing...
I have difficulty understanding / explaining the problem.
But I can't get all the values recorded at a specific time.
the return request :
> SELECT * FROM "CPU/Memory Usage" WHERE "NE Name" =~ /router1/ ORDER BY DESC LIMIT 5
name: CPU/Memory Usage
time NE Name Object ID Object Type Time Stamp memUsage
---- ------- --------- ----------- ---------- ----------
2020-05-07T06:45:00Z router1 card1 CPU/Memory Usage 2020-05-07T08:45:00+02:00 1075
2020-05-07T06:30:00Z router1 card1 CPU/Memory Usage 2020-05-07T08:30:00+02:00 1075
2020-05-07T06:15:00Z router1 card1 CPU/Memory Usage 2020-05-07T08:15:00+02:00 1075
2020-05-07T06:00:00Z router1 card1 CPU/Memory Usage 2020-05-07T08:00:00+02:00 1075
2020-05-07T05:45:00Z router1 card1 CPU/Memory Usage 2020-05-07T07:45:00+02:00 1075
I have only the information of the card 1, not the one of the 2 and if I remove the "WHERE" clause, for the same "TIMESTAMP", I don't have all the information, I miss the information on the router 2.
The values present for one router at a given "TIMESTAMP" will not be present for the other.
I have trouble understanding where the problem can be prevented.
If one of you has an idea :)

Related

Understanding the output of a pyomo model, number of solutions is zero?

I am using this code to create a solve a simple problem:
import pyomo.environ as pyo
from pyomo.core.expr.numeric_expr import LinearExpression
model = pyo.ConcreteModel()
model.nVars = pyo.Param(initialize=4)
model.N = pyo.RangeSet(model.nVars)
model.x = pyo.Var(model.N, within=pyo.Binary)
model.coefs = [1, 1, 3, 4]
model.linexp = LinearExpression(constant=0,
linear_coefs=model.coefs,
linear_vars=[model.x[i] for i in model.N])
def caprule(m):
return m.linexp <= 50
model.capme = pyo.Constraint(rule=caprule)
model.obj = pyo.Objective(expr = model.linexp, sense = maximize)
results = SolverFactory('glpk', executable='/usr/bin/glpsol').solve(model)
results.write()
And this is the output:
# ==========================================================
# = Solver Results =
# ==========================================================
# ----------------------------------------------------------
# Problem Information
# ----------------------------------------------------------
Problem:
- Name: unknown
Lower bound: 50.0
Upper bound: 50.0
Number of objectives: 1
Number of constraints: 2
Number of variables: 5
Number of nonzeros: 5
Sense: maximize
# ----------------------------------------------------------
# Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Error rc: 0
Time: 0.09727835655212402
# ----------------------------------------------------------
# Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
number of solutions displayed: 0
It says the number of solutions is 0, and yet it does solve the problem:
print(list(model.x[i]() for i in model.N))
Will output this:
[1.0, 1.0, 1.0, 1.0]
Which is a correct answer to the problem. what am I missing?
The interface between pyomo and glpk sometimes (always?) seems to return 0 for the number of solutions. I'm assuming there is some issue with the generalized interface between the pyomo core module and the various solvers that it interfaces with. When I use glpk and cbc solvers on this, it reports the number of solutions as zero. Perhaps those solvers don't fill that data element in the generalized interface. Somebody w/ more experience in the data glob returned from the solver may know precisely. That said, the main thing to look at is the termination condition, which I've found to be always accurate. It reports optimal.
I suspect that you have some mixed code from another model in your example. When I fix a typo or two (you missed the pyo prefix on a few things), it solves fine and gives the correct objective value as 9. I'm not sure where 50 came from in your output.
(slightly cleaned up) Code:
import pyomo.environ as pyo
from pyomo.core.expr.numeric_expr import LinearExpression
model = pyo.ConcreteModel()
model.nVars = pyo.Param(initialize=4)
model.N = pyo.RangeSet(model.nVars)
model.x = pyo.Var(model.N, within=pyo.Binary)
model.coefs = [1, 1, 3, 4]
model.linexp = LinearExpression(constant=0,
linear_coefs=model.coefs,
linear_vars=[model.x[i] for i in model.N])
def caprule(m):
return m.linexp <= 50
model.capme = pyo.Constraint(rule=caprule)
model.obj = pyo.Objective(expr = model.linexp, sense = pyo.maximize)
solver = pyo.SolverFactory('glpk') #, executable='/usr/bin/glpsol').solve(model)
results = solver.solve(model)
print(results)
model.obj.pprint()
model.obj.display()
Output:
Problem:
- Name: unknown
Lower bound: 9.0
Upper bound: 9.0
Number of objectives: 1
Number of constraints: 2
Number of variables: 5
Number of nonzeros: 5
Sense: maximize
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Error rc: 0
Time: 0.00797891616821289
Solution:
- number of solutions: 0
number of solutions displayed: 0
obj : Size=1, Index=None, Active=True
Key : Active : Sense : Expression
None : True : maximize : x[1] + x[2] + 3*x[3] + 4*x[4]
obj : Size=1, Index=None, Active=True
Key : Active : Value
None : True : 9.0

odoo 9 migrate binary field db to filestore

Odoo 9 custom module binary field attachment=True parameter added later after that new record will be stored in filesystem storage.
Binary Fields some old records attachment = True not used, so old record entry not created in ir.attachment table and filesystem not saved.
I would like to know how to migrate old records binary field value store in filesystem storage?. How to create/insert records in ir_attachment row based on old records binary field value? Is any script available?
You have to include the postgre bin path in pg_path in your configuration file. This will restore the file store that contains the binary fields
pg_path = D:\fx\upsynth_Postgres\bin
I'm sure that you no longer need a solution to this as you asked 18 months ago, but I have just had the same issue (many gigabytes of binary data in the database) and this question came up on Google so I thought I would share my solution.
When you set attachment=True the binary column will remain in the database, but the system will look in the filestore instead for the data. This left me unable to access the data from the Odoo API so I needed to retrieve the binary data from the database directly, then re-write the binary data to the record using Odoo and then finally drop the column and vacuum the table.
Here is my script, which is inspired by this solution for migrating attachments, but this solution will work for any field in any model and reads the binary data from the database rather than from the Odoo API.
import xmlrpclib
import psycopg2
username = 'your_odoo_username'
pwd = 'your_odoo_password'
url = 'http://ip-address:8069'
dbname = 'database-name'
model = 'model.name'
field = 'field_name'
dbuser = 'postgres_user'
dbpwd = 'postgres_password'
dbhost = 'postgres_host'
conn = psycopg2.connect(database=dbname, user=dbuser, password=dbpwd, host=dbhost, port='5432')
cr = conn.cursor()
# Get the uid
sock_common = xmlrpclib.ServerProxy ('%s/xmlrpc/common' % url)
uid = sock_common.login(dbname, username, pwd)
sock = xmlrpclib.ServerProxy('%s/xmlrpc/object' % url)
def migrate_attachment(res_id):
# 1. get data
cr.execute("SELECT %s from %s where id=%s" % (field, model.replace('.', '_'), res_id))
data = cr.fetchall()[0][0]
# Re-Write attachment
if data:
data = str(data)
sock.execute(dbname, uid, pwd, model, 'write', [res_id], {field: str(data)})
return True
else:
return False
# SELECT attachments:
records = sock.execute(dbname, uid, pwd, model, 'search', [])
cnt = len(records)
print cnt
i = 0
for res_id in records:
att = sock.execute(dbname, uid, pwd, model, 'read', res_id, [field])
status = migrate_attachment(res_id)
print 'Migrated ID %s (attachment %s of %s) [Contained data: %s]' % (res_id, i, cnt, status)
i += 1
cr.close()
print "done ..."
Afterwards, drop the column and vacuum the table in psql.

Saving timestamp on IBrokers package

I am having some issues accessing the timestamp data in the IBrokers package.
Here is an example of the data I get:
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.WAP AAPL.hasGaps AAPL.Count
2015-01-09 17:59:00 112 112.04 111.95 112 6043 112.011 0 2240
So when I run data[,0] I get
2015-01-09 17:59:00
The problem is that later on when I try to save that into a MySQL table I get the following error:
Error in dimnames(cd) <- list(as.character(index(x)), colnames(x)) :
'dimnames' applied to non-array
It looks like data[,0] does not simply contains the timestamp.
When I do a summary of the variable ts which contains data[,0] I get:
Error in `colnames<-`(`*tmp*`, value = c("ts.1", "ts.0")) :
'names' attribute [2] must be the same length as the vector [1]
Any tip on how to access the timestamp or convert the contents of ts to char so I can insert it into the DB will be appreciated.
EDIT:
dput() output
structure(c(112, 112.04, 111.95, 112, 6043, 112.011, 0, 2240), .Dim = c(1L,
8L), index = structure(1420837140, tzone = "", tclass = c("POSIXct",
"POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "", tzone = "", .Dimnames = list(NULL,
c("AAPL.Open", "AAPL.High", "AAPL.Low", "AAPL.Close", "AAPL.Volume",
"AAPL.WAP", "AAPL.hasGaps", "AAPL.Count")), class = c("xts",
"zoo"), from = "20150112 02:52:24", to = "20150112 02:53:24", src = "IB", updated = structure(33434342.12435, class = c("POSIXct",
"POSIXt")))
As suggested by #JoshuaUlrich on the comments to my question, the answer was on the zoo package.
Full documentation on zoo can be found here
In my particular case by including the zoo library and simply doing:
time(data[,0])
I solve the Error in dimnames() error.
Hope it helps someone else.

How to convert data from a custom format to CSV?

I have file that the content of file is as bellow, I have only output two records here but there is around 1000 record in single file:
Record type : GR
address : 62.5.196
ID : 1926089329
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
inID : 101
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:51:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
--------------------------------------------------------------------
Record type : GR
address : 61.5.196
ID : 1926089327
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
intID : 100
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:55:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
--------------------------------------------------------------------
Record type : GR
address : 63.5.196
ID : 1926089328
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
intID : 100
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:55:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
my Goal is to convert this to CSV or txt file like bellow
Record type| address |ID | time | Time zone| address [1] | PN ID
GR |61.5.196 |1926089329 |Sun Aug 10 09:53:47 2014 |+ 16200 seconds |61.5.196 |412 1
any guide would be great on how you think would be best way to start this, the sample that I provided I think will give the clear idea but in words I would want to read the header of each record once and put their data under the out put header.
thanks for your time and help or suggestion
What you're doing is creating an Extract/Transform script (the ET part of an ETL). I don't know which language you're intending to use, but essentially any language can be used. Personally, unless this is a massive file, I'd recommend Python as it's easy to grok and easy to write with the included csv module.
First, you need to understand the format thoroughly.
How are records separated?
How are fields separated?
Are there any fields that are optional?
If so, are the optional fields important, or do they need to be discarded?
Unfortunately, this is all headwork: there's no magical code solution to make this easier. Then, once you have figured out the format, you'll want to start writing code. This is essentially a series of data transformations:
Read the file.
Split it into records.
For each record, transform the fields into an appropriate data structure.
Serialize the data structure into the CSV.
If your file is larger than memory, this can become more complicated; instead of reading and then splitting, for example, you may want to read the file sequentially and create a Record object each time the record delimiter is detected. If your file is even larger, you might want to use a language with better multithreading capabilities to handle the transformation in parallel; but those are more advanced than it sounds like you need to go at the moment.
This is a simple PHP script that will read a text file containing your data and write a csv file with the results. If you are on a system which has command line PHP installed, just save it to a file in some directory, copy your data file next to it renaming it to "your_data_file.txt" and call "php whatever_you_named_the_script.php" on the command line from that directory.
<?php
$text = file_get_contents("your_data_file.txt");
$matches;
preg_match_all("/Record type[\s\v]*:[\s\v]*(.+?)address[\s\v]*:[\s\v]*(.+?)ID[\s\v]*:[\s\v]*(.+?)time[\s\v]*:[\s\v]*(.+?)Time zone[\s\v]*:[\s\v]*(.+?)address \[1\][\s\v]*:[\s\v]*(.+?)PN ID[\s\v]*:[\s\v]*(.+?)/su", $text, $matches, PREG_SET_ORDER);
$csv_file = fopen("your_csv_file.csv", "w");
if($csv_file) {
if(fputcsv($csv_file, array("Record type","address","ID","time","Time zone","address [1]","PN ID"), "|") === FALSE) {
echo "could not write headers to csv file\n";
}
foreach($matches as $match) {
$clean_values = array();
for($i=1;$i<8;$i++) {
$clean_values[] = trim($match[$i]);
}
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
}
fclose($csv_file);
} else {
die("could not open csv file\n");
}
This script assumes that your data records are always formatted similar to the examples you have posted and that all values are always present. If the data file may have exceptions to those rules, the script probably has to be adapted accordingly. But it should give you an idea of how this can be done.
Update
Adapted the script to deal with the full format provided in the updated question. The regular expression now matches single data lines (extracting their values) as well as the record separator made up of dashes. The loop has changed a bit and does now fill up a buffer array field by field until a record separator is encountered.
<?php
$text = file_get_contents("your_data_file.txt");
// this will match whole lines
// only if they either start with an alpha-num character
// or are completely made of dashes (record separator)
// it also extracts the values of data lines one by one
$regExp = '/(^\s*[a-zA-Z0-9][^:]*:(.*)$|^-+$)/m';
$matches;
preg_match_all($regExp, $text, $matches, PREG_SET_ORDER);
$csv_file = fopen("your_csv_file.csv", "w");
if($csv_file) {
// in case the number or order of fields changes, adapt this array as well
$column_headers = array(
"Record type",
"address",
"ID",
"time",
"Time zone",
"address [1]",
"PN ID",
"inID",
"timerecorded",
"Uplink data volume",
"Downlink data volume",
"Change condition"
);
if(fputcsv($csv_file, $column_headers, "|") === FALSE) {
echo "could not write headers to csv file\n";
}
$clean_values = array();
foreach($matches as $match) {
// first entry will contain the whole line
// remove surrounding whitespace
$whole_line = trim($match[0]);
if(strpos($whole_line, '-') !== 0) {
// this match starts with something else than -
// so it must be a data field, store the extracted value
$clean_values[] = trim($match[2]);
} else {
// this match is a record separator, write csv line and reset buffer
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
$clean_values = array();
}
}
if(!empty($clean_values)) {
// there was no record separator at the end of the file
// write the last entry that is still in the buffer
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
}
fclose($csv_file);
} else {
die("could not open csv file\n");
}
Doing the data extraction using regular expressions is one possible method mostly useful for simple data formats with a clear structure and no surprises. As syrion pointed out in his answer, things can get much more complicated. In that case you might need to write a more sophisticated script than this one.

RMySQL dbWriteTable adding columns to table (dynamically?)

I just started using the R package called RMySQL in order to get around some memory limitations on my computer. I am trying to take a matrix with 100 columns in R (called data.df), then make a new table on an SQL database that has "100 choose 2" (=4950) columns, where each column is a linear combination of two columns from the initial matrix. So far I have something like this:
countnumber <- 1
con <- dbConnect(MySQL(), user = "root", password = "password", dbname = "myDB")
temp <- as.data.frame(data.df[,1] - data.df[,2])
colnames(temp) <- paste(pairs[[countnumber]][1], pairs[[countnumber]][2], sep = "")
dbWriteTable(con, "spreadtable", temp, row.names=T, overwrite = T)
for(i in 1:(n-1)){
for(j in (i+1):n){
if(!((i==1)&&(j==2))){ #this part excludes the first iteration already taken care of
temp <- as.data.frame(data.df[,i] - data.df[,j])
colnames(temp) <- "hola"
dbWriteTable(con, "spreadtable", value = temp, append = TRUE, overwrite = FALSE, row.names = FALSE)
countnumber <- countnumber + 1
}
}
}
I've also tried toying around with the "field.types" argument of RMySQL::dbWriteTable(), which was suggested at RMySQL dbWriteTable with field.types. Sadly it hasn't helped me out too much.
Questions:
Is making your own sql database a valid solution to the memory-bound nature of R, even if it has 4950 columns?
Is the dbWriteTable() the proper function to be using here?
Assuming the answer is "yes" to both of the previous questions...why isn't this working?
Thanks for any help.
[EDIT]: code with error output:
names <- as.data.frame(index)
names <- t(names)
#dim(names) is 1 409
con <- dbConnect(MySQL(), user = "root", password = "password", dbname = "taylordatabase")
dbGetQuery(con, dbBuildTableDefinition(MySQL(), name="spreadtable", obj=names, row.names = F))
#I would prefer these to be double types with 8 decimal spaces instead of text
#dim(temp) is 1 409
temp <- as.data.frame(data.df[,1] - (ratios[countnumber]*data.df[,2]))
temp <- t(temp)
temp <- as.data.frame(temp)
dbWriteTable(con, name = "spreadtable", temp, append = T)
The table is created successfully in the database (I will change variable type later), but the dbWriteTable() line produces the error:
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column 'row_names' in 'field list')
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) : could not load data into table
If I make a slight change, I get a different error message:
dbWriteTable(con, name = "spreadtable", temp, append = T, row.names = F)
and
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column 'X2011_01_03' in 'field list')
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) : could not load data into table
I just want to use "names" as a bunch of column labels. They were initially dates. The actual data I would like to be "temp."
Having a query with 4950 rows is ok, the problem is that what columns you need.
If you always "select * ", you will eventually exhaust all you system memory (in the case that the table has 100 columns)
Why not give us some error message if you have encountered any problems ?