I am using InfluxDB 0.9, graphing with Grafana, mysql plugin installed and working.
I need few key metrics for our system:
innodb_buffer_read_hit_ratio = ( 1 - innodb_buffer_pool_reads/innodb_buffer_pool_read_requests) * 100
innodb_buffer_usage = ( 1 - innodb_buffer_pool_pages_free / innodb_buffer_pool_pages_total) * 100
After reading through docs I find that inner join in InfluxDB 0.9 no longer possible. What are my options? change to another time-series db? Install 0.8?
Functions and mathematical operators can only be applied to field values in the same measurement. If innodb_buffer_pool_reads and innodb_buffer_pool_read_requests are fields in the same measurement, that query will work (although only on a recent nightly build or the eventual 0.10.0 release, due to https://github.com/influxdb/influxdb/issues/4046).
Related
I'm using Apache Hive.
I created a table in Hive (similar to external table) and loaded data into the same using the LOAD DATA LOCAL INPATH './Desktop/loc1/kv1.csv' OVERWRITE INTO TABLE adih; command.
While I am able to retrieve simple data from the hive table adih (e.g. select * from adih, select c_code from adih limit 1000, etc), Hive gives me errors when I ask for data involving slight computations (e.g. select count(*) from adih, select distinct(c_code) from adih).
The Hive cli output is as shown in the following link -
hive> select distinct add_user from adih;
Query ID = latize_20161031155801_8922630f-0455-426b-aa3a-6507aa0014c6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1477889812097_0006, Tracking URL = http://latize-data1:20005/proxy/application_1477889812097_0006/
Kill Command = /opt/hadoop-2.7.1/bin/hadoop job -kill job_1477889812097_0006
[6]+ Stopped $HIVE_HOME/bin/hive
Hive stops displaying any further logs / actions beyond the last line of "Kill Command"
Not sure where I have gone wrong (many answers on StackOverflow tend to point back to YARN configs (environment config detailed below).
I have the log as well but it contains more than 30000 characters (Stack Overflow limit)
My hadoop environment is configured as follows -
1 Name Node & 1 Data Node. Each has 20 GB of RAM with sufficient ROM. Have allocated 13 GB of RAM for the yarn.scheduler.maximum-allocation-mb and yarn.nodemanager.resource.memory-mb each with the mapreduce.map.memory.mb being set as 4 GB and the mapreduce.reduce.memory.mb being set as 12 GB. Number of reducers is currently set to default (-1). Also, Hive is configured to run with a MySQL DB (rather than Derby).
You should set the appropriate values to the properties show in your trace,
eg: Edit the properties in hive-site.xml
<property>
<name>hive.exec.reducers.bytes.per.reducer</name>
<value>67108864</value></property>
Looks like you have set mapred.reduce.tasks = -1, which makes Hive refer to its config to decide the number of reduce tasks.
You are getting an error as the number of reducers is missing in Hive config.
Try setting it using below command:
Hive> SET mapreduce.job.reduces=XX
As per official documentation: The right number of reduces seems to be 0.95 or 1.75 multiplied by (< no. of nodes > * < no. of maximum containers per node >).
I managed to get Hive and MR to work - increased the memory configurations for all the processes involved:
Increased the RAM allocated to YARN Scheduler and maximum RAM allocated to the YARN Nodemanager (in yarn-site.xml), alongside increasing the RAM allocated to the Mapper and Reducer (in mapred-site.xml).
Also incorporated parts of the answers by #Sathiyan S and #vmorusu - set the hive.exec.reducers.bytes.per.reducer property to 1 GB of data, which directly affects the number of reducers that Hive uses (through application of its heuristic techniques).
I'm using dygraphs to display weather data collected every ten minutes. One of the datapoints is snow depth (in meters). Once in a while the depth is wrong, 0 meters, where the previous and next is 0.9 meters. It's winter atm. and I've been on the location to verify 0.9 is correct.
With 47 datapoints at 0.9 m. and one at 0 m. standard deviation is approx. 0.13 (using ministat on FreeBSD).
I've looked through the dygraphs documentation but can't find a way to ignore values like 0 when the standard deviation is above a certain threshold.
This page on dygraphs have three examples on how to deal with standard deviation but I just want to ignore the 0, not use it with the option errorBar or customBar, and the data is not in fractions.
The option rollPeriod is not applicable since it merely averages x data points.
I fetch the weather data in xml-format from a third party every ten minutes in a cron job, parse the values and store them in postgresql. Then select the last 48 data points in another cron job and redirect the data to a csv-file which dygraphs consumes.
Can I have dygraphs ignore 0 if standard deviation is above a threshold?
I can get the standard deviation with ministat or other utility from the last cron job and remove the 0 from the cvs-file using sed/awk. But will prefer dygraphs to do that.
var g5 = new Dygraph(
document.getElementById("snow_depth"),
"snow.csv",
{
legend: 'always',
title: 'Snødjupne',
labels: ["", "djupne"],
ylabel: 'Meter'
}
);
First three lines of the csv-file.
2016-02-23 01:50:00+00,0.91
2016-02-23 02:00:00+00,0
2016-02-23 02:10:00+00,0.9
You should clean your data before you chart it! dygraphs is a charting tool, not a data processing library.
The best approach is to load your data via AJAX, parse it, filter out the zeros and feed it in to dygraphs using native format.
Is there a mysql variable or monitoring that tells how many writes per second are being recorded ?
Can I use some variables values and compute to get the same result ?
Let's say I need to plot a graph dynamically of the same. What should i be doing ?
Im looking for command line options and not GUI based monitoring tools.
I have a mixed tokudb and innodb use case, so something non-storage engine specific would be better.
( Com_insert + Com_delete + Com_delete_multi +
Com_replace + Com_update + Com_update_multi ) / Uptime
gives you "writes/sec" since startup. This is from the point of view of the user issuing queries (such as INSERT).
Or did you want "rows written / sec"?
Or "disk writes / sec"?
The values for the above expression come from either SHOW GLOBAL STATUS or the equivalent place in information_schema.
If you want "write in the last 10 minutes", then capture the counters 10 minutes ago and now. The subtract to get the 'change' and finally divide.
There are several GUIs that will do that arithmetic and much more. Consider MonYog ($$), MySQL Enterprise Monitor ($$$), cacti, etc.
I am new to Cassandra and just run a cassandra cluster (version 1.2.8) with 5 nodes, and I have created several keyspaces and tables on there. However, I found all data are stored in one node (in the below output, I have replaced ip addresses by node numbers manually):
Datacenter: 105
==========
Address Rack Status State Load Owns Token
4
node-1 155 Up Normal 249.89 KB 100.00% 0
node-2 155 Up Normal 265.39 KB 0.00% 1
node-3 155 Up Normal 262.31 KB 0.00% 2
node-4 155 Up Normal 98.35 KB 0.00% 3
node-5 155 Up Normal 113.58 KB 0.00% 4
and in their cassandra.yaml files, I use all default settings except cluster_name, initial_token, endpoint_snitch, listen_address, rpc_address, seeds, and internode_compression. Below I list those non-ip address fields I modified:
endpoint_snitch: RackInferringSnitch
rpc_address: 0.0.0.0
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "node-1, node-2"
internode_compression: none
and all nodes using the same seeds.
Can I know where I might do wrong in the config? And please feel free to let me know if any additional information is needed to figure out the problem.
Thank you!
If you are starting with Cassandra 1.2.8 you should try using the vnodes feature. Instead of setting the initial_token, uncomment # num_tokens: 256 in the cassandra.yaml, and leave initial_token blank, or comment it out. Then you don't have to calculate token positions. Each node will randomly assign itself 256 tokens, and your cluster will be mostly balanced (within a few %). Using vnodes will also mean that you don't have to "rebalance" you cluster every time you add or remove nodes.
See this blog post for a full description of vnodes and how they work:
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
Your token assignment is the problem here. An assigned token are used determines the node's position in the ring and the range of data it stores. When you generate tokens the aim is to use up the entire range from 0 to (2^127 - 1). Tokens aren't id's like with mysql cluster where you have to increment them sequentially.
There is a tool on git that can help you calculate the tokens based on the size of your cluster.
Read this article to gain a deeper understanding of the tokens. And if you want to understand the meaning of the numbers that are generated check this article out.
You should provide a replication_factor when creating a keyspace:
CREATE KEYSPACE demodb
WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 3};
If you use DESCRIBE KEYSPACE x in cqlsh you'll see what replication_factor is currently set for your keyspace (I assume the answer is 1).
More details here
I got the following problem. I want to measure the gst_efficiency and the gld_efficiency for my cuda application using nvprof. The documentation distributed with cuda 5.0 tells me to generate these using the following formulas for devices with compute capability 2.0-3.0:
gld_efficiency = 100 * gld_requested_throughput / gld_throughput
gst_efficiency = 100 * gst_requested_throughput / gst_throughput
For the required metrics the following formulas are given:
gld_throughput = ((128 * global_load_hit) + (l2_subp0_read_requests + l2_subp1_read_requests) * 32 - (l1_local_ld_miss * 128)) / gputime
gst_throughput = (l2_subp0_write_requests + l2_subp1_write_requests) * 32 - (l1_local_ld_miss * 128)) / gputime
gld_requested_throughput = (gld_inst_8bit + 2 * gld_inst_16bit + 4 * gld_inst_32bit + 8
* gld_inst_64bit + 16 * gld_inst_128bit) / gputime
gst_requested_throughput = (gst_inst_8bit + 2 * gst_inst_16bit + 4 * gst_inst_32bit + 8
* gst_inst_64bit + 16 * gst_inst_128bit) / gputime
Since no formula is given for the metrics used I assume that these are events which can be counted by nvprof. But some of the events seem not to be available on my gtx 460 (also tried gtx 560 Ti). I pasted the output of nvprof --query-events.
Any ideas what's going wrong or what I'm misinterpreting?
EDIT:
I don't want to use CUDA Visual Profiler, since I'm trying to analyse my application for different parameters. I therefore want to run nvprof using multiple parameter configurations, recording multiple events (each one in its one run) and then output the data in tables. I got this automated already and working for other metrics (i.e. instructions issued) and want to do this for load and store efficiency. This is why I'm not interested in solution involving nvvp. By the way, for my application nvvp fails to calculate the metrics required for store-efficiency so it doesn't help my at all in this case.
I'm glad somebody had the same issue :) I was trying to do the very same thing and couldn't use the Visual Profiler, because I wanted to profile like 6000 different kernels.
The formulas on NVidia site are poorly documented - actually the variables can be:
a) events
b) other Metrics
c) different variables dependent on the GPU you have
However, a LOT of the metrics there have either typos in it or are versed a bit differently in nvprof than they are on the site. Also, there the variables are not tagged, so you can't tell just by looking whether they are a),b) or c). I used a script to grep them and then had to fix it by hand. Here is what I found:
1) "l1_local/global_ld/st_hit/miss"
These have "load"/"store" in nvprof instead of "ld"/"st" on site.
2) "l2_ ...whatever... _requests"
These have "sector_queries" in nvprof instead of "requests".
3) "local_load/store_hit/miss"
These have "l1_" in additionally in the profiler - "l1_local/global_load/store_hit/miss"
4) "tex0_cache_misses"
This one has "sector" in it in the profiler - "tex0_cache_sector_misses"
5) "tex_cache_sector_queries"
Missing "0" - so "tex0_cache_sector_queries" in the nvprof.
Finally, the variables:
1) "#SM"
The number of streaming multiprocessors. Get via cudaDeviceProp.
2) "gputime"
Obviously, the execution time on GPU.
3) "warp_size"
The size of warp on your GPU, again get via cudaDeviceProp.
4) "max_warps_per_sm"
Number of blocks executable on an sm * #SM * warps per block. I guess.
5) "elapsed_cycles"
Found this:
https://devtalk.nvidia.com/default/topic/518827/computeprof-34-active-cycles-34-counter-34-active-cycles-34-value-doesn-39-t-make-sense-to-/
But still not entirely sure, if I get it.
Hopefully this helps you and some other people who encounter the same problem :)