I have some results from the tabulate command in Stata:
However, these appear with numbers that are too detailed such as 0.4988995.
I want to change the number of digits in the output. For example, 0.0499 instead of 0.4988995.
Is there any way to reduce the number of digits displayed?
It is not necessary to generate new variables or use other commands such as tabdisp. The tabulate command respects a variable's format.
Consider the following toy example:
sysuse auto, clear
format mpg
variable name display format
-----------------------------
mpg %8.0g
-----------------------------
tabulate rep78 foreign, summarize(mpg) nofreq
Means and Standard Deviations of Mileage (mpg)
Repair |
Record | Car type
1978 | Domestic Foreign | Total
-----------+----------------------+----------
1 | 21 . | 21
| 4.2426407 . | 4.2426407
-----------+----------------------+----------
2 | 19.125 . | 19.125
| 3.7583241 . | 3.7583241
-----------+----------------------+----------
3 | 19 23.333333 | 19.433333
| 4.0856221 2.5166115 | 4.1413252
-----------+----------------------+----------
4 | 18.444444 24.888889 | 21.666667
| 4.5856055 2.7131368 | 4.9348699
-----------+----------------------+----------
5 | 32 26.333333 | 27.363636
| 2.8284271 9.367497 | 8.7323849
-----------+----------------------+----------
Total | 19.541667 25.285714 | 21.289855
| 4.7533116 6.3098562 | 5.8664085
Consequently, you just need to set the desired format beforehand:
format mpg %8.3g
tabulate rep78 foreign, summarize(mpg) nofreq
Means and Standard Deviations of Mileage (mpg)
Repair |
Record | Car type
1978 | Domestic Foreign | Total
-----------+----------------------+----------
1 | 21 . | 21
| 4.24 . | 4.24
-----------+----------------------+----------
2 | 19.1 . | 19.1
| 3.76 . | 3.76
-----------+----------------------+----------
3 | 19 23.3 | 19.4
| 4.09 2.52 | 4.14
-----------+----------------------+----------
4 | 18.4 24.9 | 21.7
| 4.59 2.71 | 4.93
-----------+----------------------+----------
5 | 32 26.3 | 27.4
| 2.83 9.37 | 8.73
-----------+----------------------+----------
Total | 19.5 25.3 | 21.3
| 4.75 6.31 | 5.87
There is not a single switch to do this, just various devices.
Here is one:
. sysuse auto, clear
(1978 Automobile Data)
. tabulate for rep78, summarize(mpg) nost nofreq
Means of Mileage (mpg)
| Repair Record 1978
Car type | 1 2 3 4 5 | Total
-----------+-------------------------------------------------------+----------
Domestic | 21 19.125 19 18.444444 32 | 19.541667
Foreign | . . 23.333333 24.888889 26.333333 | 25.285714
-----------+-------------------------------------------------------+----------
Total | 21 19.125 19.433333 21.666667 27.363636 | 21.289855
. egen mean = mean(mpg), by(for rep78)
. tabdisp for rep78, c(mean) format(%2.1f)
----------------------------------------------
| Repair Record 1978
Car type | 1 2 3 4 5 .
----------+-----------------------------------
Domestic | 21.0 19.1 19.0 18.4 32.0 23.3
Foreign | 23.3 24.9 26.3 14.0
----------------------------------------------
Note further that tabstat yields summarize-like results but with an option format().
Related
I'm trying to get data that is on multiple rows into a single row by order of importance.
I was working with multiple tables and was able to pull all the data I need into one table - so currently I'm working with one table where the data I need exists in multiple rows. Example a person can have more than one role. However, the roles have an order of importance - I added an order of importance column to the file I'm working with.
The file I'm working with looks like this:
ID | FIRST |LAST | ROLE | ORDER OF IMPORTANCE
116 | Jamie | Ansto | PARAL | 5
116 | Jamie | Ansto | FMREMP | 11
153 | Alan | Rond | PAR | 3
153 | Alan | Rond | PARAL | 5
155 | Maureen | Aron | GP | 4
155 | Maureen | Aron | PARAL | 5
38 | William | Dry | STU | 8
175 | Nathan |Gong | OTH | 10
175 |Nathan |Gong | FMRSTU | 13
175 |Nathan | Gon | FR | 14
308 | Bridget | Abad | PAR | 3
308 | Bridget | Abad | EMP | 7
370 | Matt | Bodie | BD | 1
370 | Matt | Bodie | AL | 2
What I need is a file that has all the codes associated with one person on the same row in the order of their importance.
I want to end up with something that looks like this:
ID |FIRST |LAST |CODE1 |CODE2 |CODE3 |CODE4
116 |Jamie |Ansto |PARAL |FMREMP
153 |Alan |Rond |PAR |PARAL
155 |Maureen |Aron | GP | PARAL
381 |William |Dry |STU
175 |Nathan |Gong |OTH |FMRSTU |FR
308 | Bridget |Abad |PAR |EMP
370 | Matt |Bodie |BD | AL
I tried using Group_Concat but it didn't give me the results in the order I wanted. Any help would be appreciated.
Thanks,
MG
You can do something like this:
SELECT *,GROUP_CONCAT(`ROLE` ORDER BY `ORDER_OF_IMPORTANCE` SEPARATOR ' ' )
FROM `table1` GROUP BY `ID`;
The SEPARATOR ' ' function will give you result like this OTH FMRSTU FR. If you remove it and only do GROUP_CONCAT(ROLE ORDER BY ORDER_OF_IMPORTANCE), the result will look like this OTH,FMRSTU,FR instead.
I have the following query:
SELECT `Time`,
`Resolution`,
HOUR(TIMEDIFF(`Resolution`,`Time`)),
TIMEDIFF(`Resolution`,`Time`),
datediff(`Resolution`,`Time`)
FROM Cases;
In order to debug, I add the TIMEDIFF without the HOUR before, just to see if the result is different. I use datediff to double check.
The result of the query is:
+---------------------+---------------------+-------------------------------------+-------------------------------+-------------------------------+
| Time | Resolution | HOUR(TIMEDIFF(`Resolution`,`Time`)) | TIMEDIFF(`Resolution`,`Time`) | datediff(`Resolution`,`Time`) |
+---------------------+---------------------+-------------------------------------+-------------------------------+-------------------------------+
| 2017-01-10 13:35:00 | 2017-01-24 10:52:00 | 333 | 333:17:00 | 14 |
| 2017-01-12 15:53:00 | 2017-02-21 16:06:00 | 838 | 838:59:59 | 40 |
| 2017-01-18 09:19:00 | 2017-01-18 13:39:00 | 4 | 04:20:00 | 0 |
| 2017-01-23 09:00:00 | 2017-01-23 15:08:00 | 6 | 06:08:00 | 0 |
| 2017-01-24 08:49:00 | 2017-02-20 14:34:00 | 653 | 653:45:00 | 27 |
Actually, it delivers more lines, but the relevant line is the 2 result - 838 hours, which translates to 34.91 days, let's say 35, but the DATEDIFF give 40 and when you do yourself the calculation it is 40 days! 12th Jan to 21st Feb.
All other 21 results are correct.
Any idea why? A bug in mysql?
All responses are highly appreciated.
Use
TIMESTAMPDIFF(HOUR,`Time`, `Resolution`)
instead.
It also negates the need to use HOUR().
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_timestampdiff
The result returned by TIMEDIFF() is limited to the range allowed for TIME values. https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_timediff
TIME values may range from -838:59:59 to 838:59:59. https://dev.mysql.com/doc/refman/5.5/en/time.html
So you're getting the maximum possible value.
My company ran a series of TV ads and we're measuring the impact by changes in our website traffic. I would like to determine the cost per session we saw generated, based on the cost of each ad.
The trouble is, the table this is referencing has duplicate data, so my currently cost_per_session isn't counting right.
What I have so far:
client_net_cleared = cost of ad
ad_time, media_outlet, & program = combined are a unique identifier for each ad
diff = assumed sessions generated by ad
.
SELECT DISTINCT tadm.timestamp AS ad_time
, tadm.media_outlet AS media_outlet
, tadm.program AS program
, tadm.client_net_cleared AS client_net_cleared
, SUM(tadm.before_ad_sum) AS before_ad_sessions
, SUM(tadm.after_ad_sum) AS after_ad_sessions
, (SUM(tadm.after_ad_sum) - SUM(tadm.before_ad_sum)) AS diff
, CASE WHEN tadm.client_net_cleared = 0 THEN null
WHEN (SUM(tadm.after_ad_sum) - SUM(tadm.before_ad_sum)) <1 THEN null
ELSE (tadm.client_net_cleared/(SUM(tadm.after_ad_sum) - SUM(tadm.before_ad_sum)))
END AS cost_per_session
FROM tableau.km_tv_ad_data_merged tadm
GROUP BY ad_time,media_outlet,program,client_net_cleared
Sample data:
ad_time | media_outlet | program | client_net_cleared | before_ad_sessions | after_add_sessions | diff | cost_per_session
---------------------|---------------|----------------|--------------------|--------------------|--------------------|------|-----------------
2016-12-09 22:55:00 | DIY | | 970 | 55 | 72 | 17 | 57.05
2016-12-11 02:22:00 | E! | E! News | 388 | 25 | 31 | 6 | 64.66
2016-12-19 21:15:00 | Cooking | The Best Thing | 428 | 70 | 97 | 27 | 15.85
2016-12-22 14:01:00 | Oxygen | Next Top Model | 285 | 95 | 148 | 53 | 5.37
2016-12-09 22:55:00 | DIY | | 970 | 55 | 72 | 17 | 57.05
2016-12-04 16:13:00 | Headline News | United Shades | 1698 | 95 | 137 | 42 | 40.42
What I need:
Only count one instance of each ad when calculating cost_per_session.
EDIT: Fixed the query, had a half completed row where I was failing at doing this before asking the question. :)
Get rid of the DISTINCT in SELECT DISTINCT in the first line of your query. It makes no sense in a GROUP BY query.
If your rows are entirely duplicate, try deduplicating the table before you put it into the GROUP BY grinder by replacing
FROM tableau.km_tv_ad_data_merged tadm
with
FROM ( SELECT DISTINCT timestamp, media_outlet, program,
client_net_cleared,
before_ad_sum, after_ad_sum
FROM tableau.km_tv_ad_data_merged
) tadm
I recently build my first database with MySQL. I use it to log data from ten different sensors every ten minutes. Each sensor has a unique ID and all sensors are read out at the same time so that ten entries get the same timestamp. The database looks like this:
sensor_id | timestamp | sensor_value |
1 | 2016-06-13 20:40:00 | 19.1 |
2 | 2016-06-13 20:40:00 | 20.1 |
3 | 2016-06-13 20:40:00 | 21.5 |
.
.
.
10 | 2016-06-13 20:40:00 | 18.7 |
1 | 2016-06-13 20:50:00 | 19.4 |
2 | 2016-06-13 20:50:00 | 20.2 |
3 | 2016-06-13 20:50:00 | 22.1 |
.
.
.
10 | 2016-06-13 20:50:00 | 17.9 |
.
.
.
Now I would like to export the data in such a way that I get a row for each timestamp with ten following columns containing the values of the ten sensors:
| 1 | 2 | 3 | ... | 10 |
2016-06-13 20:40:00 | 19.1 | 20.1 | 21.5 | ... | 18.7 |
2016-06-13 20:50:00 | 19.4 | 20.2 | 22.1 | ... | 17.9 |
.
.
.
I tried to use GROUP_CONCAT and almost got what I was looking for. But this gives me all the sensor values in one column as a comma separated list
timestamp | GROUP_CONCAT(sensor_value) |
2016-06-13 20:40:00 | 19.1,20.1,21.5,...,18.7 |
2016-06-13 20:50:00 | 19.4,20.2,22.1,...,17.9 |
.
.
.
Unfortunately, sometimes one of the sensors fails to deliver its value and no entry is added into my database. Therefore, there are sometimes only nine values with the same timestamp. And the comma separated list can not tell me which of the sensors is missing. That is why I need one column per unique sensor ID. Is there a way to achieve this?
I tried to work it out by browsing Stack Overflow, but since I am fairly new to MySQL and databases I did not manage to resolve my problem without posting a new question. If it has been asked and answered before I am sorry and would be happy if someone redirected me in the right direction.
Thanks!
Just use conditional aggregation:
select timestamp,
max(case when sensor_id = 1 then sensor_value end) as sensor_1,
max(case when sensor_id = 2 then sensor_value end) as sensor_2,
. . .
from t
group by timestamp;
I want to get the sum of the data for every 5 minutes.
I have 15 motes.
for ,suppose in the first 5 minutes only some motes are queried and in the next 5 minutes other some motes are queried.
Now,In the second 5 minutes I need the data of the motes which are not queried in that 5minutes also
ie.,in the first 5minutes moteid's 1,2,3,4,9,12,14 are queried and in the second minutes moteid's 1,5,6,7,9,13,14 are queried.
In the second 5 minutes,I need the data to be updated for the one's which are not queried also.Is it possible to get the data from the previous 5 minutes
moteid2 | 28 | 2012-09-25 17:45:43 | |
moteid4 | 65 | 2012-09-25 17:45:49 | |
moteid3 | 66 | 2012-09-25 17:45:51 | |
moteid6 | 25 | 2012-09-25 17:45:56 | |
moteid5 | 29 | 2012-09-25 17:45:58 | |
moteid7 | 30 | 2012-09-25 17:46:05 | |
moteid4 | 95 | 2012-09-25 17:50:29 | |
moteid6 | 56 | 2012-09-25 17:50:35 | |
moteid5 | 58 | 2012-09-25 17:50:36 | |
moteid4 | 126 | 2012-09-25 17:55:08 |
In the first 5 minutes moteid2, moteid3 are queried, but after that in the next 5minutes they are not queried. Even If they are not being queried i want the same previous queried value to be kept now.
I'm assuming the table name is motes. In this case the following query displays all unique motesid for the records which present in the whole table but were not queried in last 5 minutes:
select distinct m.motesid
from motes m
where not exists (
select *
from motes m1
where
m1.moteid = m.motesid and
m1.date > SUBTIME(CURTIME(), '0:05:00')
)