I've the following problem. I'm running a MySQL server 5.1.37 on Ubuntu 9.10 x86 on Amazon. For data store I use EBS volume formatted for ext3.
From time to time the following problem occurs. MySQL start processing about queries 10~20 queries and processing of these takes more than 300sec (These SQL are using filesort). During that time no other transaction could be executed.
I've checked CPU Wait and here what is shows:
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
24 9 66 0 0 0| 0 76k| 258k 989k| 0 0 |5970 3014
23 1 75 0 0 1|4096B 28k| 229k 1536k| 0 0 |3249 2308
19 6 74 0 0 0|4096B 316k| 209k 609k| 0 0 |4943 2542
19 17 62 0 0 2|4096B 36k| 230k 718k| 0 0 |5482 2520
21 19 57 2 0 2| 16k 800k| 271k 860k| 0 0 |6549 2923
23 27 44 5 0 1| 480k 40k| 288k 979k| 0 0 |4140 2682
12 0 86 1 0 0| 256k 48k| 237k 771k| 0 0 |3404 2627
22 1 75 0 0 1|8192B 60k| 285k 908k| 0 0 |4009 2786
54 21 19 3 0 2|4096B 3384k| 287k 1556k| 0 0 |3962 2284
49 24 24 1 0 2|4096B 928k| 285k 2795k| 0 0 |3257 2005
61 19 17 2 0 2|8192B 36k| 215k 577k| 0 0 |3246 1922
40 49 8 0 0 3| 0 40k| 312k 905k| 0 0 |3282 1732
56 23 20 1 0 1|4096B 188k| 247k 897k| 0 0 |3102 2238
39 19 27 16 0 0|4096B 77M| 265k 819k| 0 0 |5147 3075
35 35 12 16 0 1|4096B 56M| 259k 1052k| 0 0 |4656 2739
36 27 8 28 0 1|4096B 59M| 259k 1139k| 0 0 |5549 2821
27 13 36 23 0 1|4096B 64M| 251k 1218k| 0 0 |4207 2540
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
26 4 13 57 0 1|4096B 66M| 275k 681k| 0 0 |5205 3291
22 6 27 43 0 1|4096B 52M| 237k 684k| 0 0 |4906 2602
14 3 24 58 0 0|4096B 46M| 278k 1058k| 0 0 |6448 3687
19 3 34 43 0 2| 32k 51M| 233k 685k| 0 0 |5006 2652
27 3 9 61 0 1|4096B 51M| 294k 800k| 0 0 |4428 2384
17 3 30 50 0 1|4096B 42M| 243k 699k| 0 0 |5334 2830
40 18 0 42 0 0| 0 89M| 247k 840k| 0 0 |4698 2977
31 18 11 39 0 2|4096B 42M| 238k 1269k| 0 0 |4270 2474
17 3 13 66 0 0|4096B 49M| 260k 773k| 0 0 |5153 3100
21 2 14 62 0 1|8192B 46M| 269k 948k| 0 0 |6762 3581
24 2 35 39 0 0|4096B 39M| 256k 777k| 0 0 |5313 2761
15 2 10 72 0 1|4096B 49M| 237k 797k| 0 0 |5312 3018
19 4 22 55 0 0|8192B 47M| 307k 1034k| 0 0 |5508 3278
41 3 15 40 0 1|8192B 47M| 293k 727k| 0 0 |5630 3303
16 2 26 54 0 1|4096B 56M| 282k 1750k| 0 0 |5016 2781
17 3 12 67 0 2|8192B 43M| 238k 824k| 0 0 |5751 3147
14 11 50 24 0 1|4096B 39M| 247k 1105k| 0 0 |4454 2389
41 3 20 35 0 1| 0 58M| 152k 481k| 0 0 |4009 2958
52 2 4 41 0 1|4096B 59M| 211k 621k| 0 0 |5449 2846
31 2 0 66 0 1| 0 52M| 255k 1476k| 0 0 |5167 2693
36 2 24 36 0 2| 12k 49M| 311k 888k| 0 0 |4537 2563
47 7 2 43 0 2|4096B 50M| 231k 750k| 0 0 |4083 2165
40 4 6 50 0 0|4096B 86M| 211k 819k| 0 0 |4768 2875
29 5 2 65 0 0| 0 79M| 180k 580k| 0 0 |4271 4461
40 3 0 57 0 0|4096B 58M| 238k 1489k| 0 0 |4366 4480
27 8 26 38 0 1|4096B 33M| 301k 984k| 0 0 |4439 2838
11 2 9 78 0 1|4096B 24M| 230k 646k| 0 0 |4894 4504
10 3 14 72 0 0|4096B 21M| 183k 549k| 0 0 |4066 3952
14 3 27 57 0 0| 0 64M| 147k 339k| 0 0 |3479 2860
10 2 19 69 0 0|4096B 51M| 112k 452k| 0 0 |2847 2300
9 4 18 69 0 0|4096B 37M| 131k 443k| 0 0 |2923 2004
4 2 49 45 0 0|4096B 31M| 97k 230k| 0 0 |2163 1545
1 2 73 24 0 0| 0 33M| 49k 130k| 0 0 |1425 824
1 0 71 28 0 0| 0 26M| 36k 86k| 0 0 |1426 910
0 0 55 45 0 0| 0 32M| 32k 148k| 0 0 |1334 695
4 0 64 32 0 0| 0 39M| 14k 39k| 0 0 |1262 406
0 2 38 60 0 0| 0 44M| 13k 44k| 0 0 |1136 382
1 1 82 16 0 0| 0 47M| 25k 70k| 0 0 |1228 584
1 3 69 27 0 0|4096B 46M| 23k 60k| 0 0 |1576 599
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
3 1 70 27 0 0|4096B 43M| 22k 54k| 0 0 |1065 574
1 1 33 65 0 0| 0 46M|6124B 17k| 0 0 |1190 345
1 1 49 50 0 0| 0 47M| 11k 22k| 0 0 |1258 444
2 11 23 64 0 0| 56k 58M|9749B 47k| 0 0 |1143 379
1 1 64 34 0 0| 0 51M| 198B 5914B| 0 0 |1048 234
0 1 63 36 0 0| 0 58M| 662B 1278B| 0 0 | 976 454
1 0 81 18 0 0| 0 50M| 426B 6022B| 0 0 |1304 600
0 1 70 29 0 0| 0 43M| 132B 1868B| 0 0 |1150 210
1 1 79 19 0 0| 0 51M| 198B 5914B| 0 0 | 986 246
1 2 30 66 0 0| 0 54M| 246B 420B| 0 0 |1150 288
1 0 49 50 0 0| 0 55M| 659B 6752B| 0 0 |1038 280
1 2 37 60 0 0| 0 47M| 66B 354B| 0 0 |1191 227
0 0 80 19 0 0| 0 43M| 561B 6044B| 0 0 |1129 256
5 13 44 38 0 0| 0 49M|1558B 19k| 0 0 |1225 243
3 6 48 42 0 0| 0 52M| 705B 6022B| 0 0 | 948 327
What could cause such a behavior? Are there any techniques to avoid this?
You're showing a high amount of IO_WAIT status on the CPU (65%). It's possible that you're just pulling too much out of the disks. Try running iostat and seeing what the disk activity is like (namely transactions per second).
However, you mention 10 to 20 queries. Are these queries doing any writing at all? Are they using a transaction? If the answer to either is yes, then you're locking because of the transaction lock in MySQL. If that's the case, your problem is that you need to either figure a way to remove the transaction, or make the queries much more efficient.
A good test would be to create another database on the server. Then run your queries and try to query against the different database. If it works, it's due to transaction locks. If it doesn't, it's likely the disk or some other leak from the VM...
The biggest suspect here is the performance of the EBS volumes, the CPUs may be waiting for I/Os to complete. The next question is what is causing the I/O requests.
This question might be better answered on ServerFault.
http://www.mysqlperformanceblog.com/2011/02/21/death-match-ebs-versus-ssd-price-performance-and-qos/
The IO performance of EBS is poor (I recently benchmarked EBS on a Small instance as being half as fast as my laptop's hard drive). However, you can improve it significantly by striping multiple EBS volumes into a software RAID configuration.
http://alestic.com/2009/06/ec2-ebs-raid
EBS comes with lot of its own limitations, if you are running your instance in the US Region, its better you switch to optimized EBS to make the IO faster. Even I was managing a self managed Mysql but later switched to RDS, which gives a lot better performance then EBS.
Related
I made a program that converts csv files into sdf files. Those files were supposed to go in another converter that turns them into something called a "Nist MS Library". The problem is that my file doesn't get accepted by the converter for "No spectra have been converted" and I don't understand why.
The files seem identical to me and I think I'm missing something about the specific file extension.
I'm really sorry if this doesn't belong here, I will delete the post if this is the case, but I really do not know where to ask.
I tried to make the "mass spectral peaks" integer, floats, delete them and put some values that I knew for sure that were accepted by the Nist converter, but nothing seems to work.
I will put 2 molecules, the first one is mine, the one that doesn't get accepted, the other one is the one that is fine for the program.
Coumarin
No Structure
0 0 0 0 0 0 0 0 0 0 0
> <NAME>
Coumarin
> <INCHIKEY>
> <FORMULA>
> <MW>
> <CASNO>
91645
> <ID>
2
> <COMMENT>
SAFC Cat. n. W526509\nColumn: SLB-5ms part#28471-U; Supelcowax-10 part#24079; Equity-1 part#28046-U;\nwww.sigmaaldrich.com |RI:1438|
> <SYNONYMS>
Coumarin
> <NUM PEAKS>
140
> <MASS SPECTRAL PEAKS>
39 1
39 233
40 38
40 0
40 0
41 0
41 1
41 5
42 2
42 2
43 35
43 0
43 12
44 26
45 20
45 67
46 7
46 4
46 2
46 1
47 0
48 4
49 32
49 3
50 5
50 166
51 183
52 20
53 35
54 7
55 1
56 0
58 0
59 2
59 23
60 6
60 11
61 89
62 213
63 503
64 164
64 13
65 15
65 7
66 9
68 4
71 0
71 1
72 2
73 1
73 19
74 0
74 32
75 39
76 14
77 11
77 1
78 0
78 0
79 0
79 9
79 0
80 2
80 1
81 0
81 0
81 1
82 0
82 0
83 0
83 0
84 6
84 0
85 17
86 2
86 2
87 28
88 7
89 523
90 581
91 25
91 48
92 40
92 36
93 4
93 5
94 1
94 0
97 2
98 6
98 0
99 2
99 0
100 0
100 1
101 5
102 3
103 0
103 0
103 0
104 0
105 0
105 0
106 0
106 0
106 0
107 0
108 0
109 0
110 0
110 0
111 0
111 0
112 0
112 0
113 0
116 0
117 6
117 0
118 1000
119 94
120 34
120 10
121 4
121 0
122 0
122 0
131 0
135 0
145 1
146 0
146 547
147 58
148 5
183 0
246 0
334 0
351 0
359 0
382 0
> <RI value>
1430.1
$$$$
ETHYL HYDROSULFIDE
(C) 2015 John Wiley & Sons, Inc.
CAS rn = 75081, Library ID = 1
3 2 0 0 0 0 0 0 0 0999 V2000
0.0000 0.2061 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7146 -0.2061 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0
-0.7146 -0.2061 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
M END
> <NAME>
Ethyl hydrosulfide
> <SYNONYMS>
Ethanethiol
$:28DNJIEGIFACGWOD-UHFFFAOYSA-N
$:29n=703/0/1 p=620/0/1
> <FORMULA>
C2H6S
> <MW>
62
> <CASNO>
75081
> <ID>
1
> <COMMENT>
WileyID="LM_FFNSC3_1" RI1="703 (SLB-5MS (Hydro))" RI2="392 (SLB-5MS (FAMEs))" RI3="620 (Supelcowax-10 (FAMEs)" RI4="568 (Supelcowax-10 (FAEEs)" Contributor="Prof. L. Mondello (Chromaleont s.r.l./Univ. Messina, Italy)"
> <NUM PEAKS>
21
> <MASS SPECTRAL PEAKS>
44 20
45 235
46 147
47 727
48 20
49 32
50 2
51 2
52 2
53 2
54 2
55 2
56 16
57 84
58 115
59 84
60 16
61 155
62 999
63 36
64 44
$$$$
This is the webpage I am scraping: http://laxreports.sportlogiq.com/nll/GS2200.html
Below is the code for the spider I created:
import scrapy
class MatchesSpider(scrapy.Spider):
name = 'matches'
allowed_domains = ['laxreports.sportlogiq.com']
start_urls = ['http://laxreports.sportlogiq.com/nll/GS2200.html']
def parse(self, response):
tables = response.xpath('//table')
print(tables)
table = tables[0].xpath('//tbody')
I see 22 tables that have been selected for this XPath expression but my problem is that I don't fully understand how to select each individual table and extract its contents.
I am a beginner in scrapy and after searching online for a solution all I see is how to select the tables using the class or ID which in this case is not an option.
You can do that using only pandas
Code:
import pandas as pd
dfs = pd.read_html('https://laxreports.sportlogiq.com/nll/GS2200.html')
df = dfs[10]#.to_csv('d.csv', index = False)
print(df)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12
0 # Name G A +/- PIM S SOFF LB T CT FO TOF
1 2 W.Malcom 0 0 0 0 1 1 1 4 0 - 11:28
2 3 T.Edwards 0 0 -2 2 0 0 8 1 2 7-18 20:28
3 4 J.Sullivan 0 0 -3 2 0 0 3 0 0 - 15:29
4 11 T.Stuart 0 0 -3 0 0 0 4 1 1 - 21:09
5 14 W.Jeffrey 0 1 -1 0 0 0 9 2 1 - 19:17
6 16 R.Lee 2 1 2 0 9 4 6 6 1 - 23:13
7 17 C.Wardle 2 0 1 2 5 3 4 2 2 - 20:55
8 18 R.Hope (A) 0 0 -2 2 0 0 11 0 0 - 22:02
9 20 J.Ruest 3 2 3 0 8 1 3 2 0 - 24:16
10 23 J.Gilles 0 0 -1 0 0 0 4 0 3 - 14:44
11 27 S.Carnegie 0 0 -1 0 0 0 3 0 0 - 12:19
12 37 D.Coates (C) 0 0 0 0 1 0 1 0 0 1-1 2:31
13 51 E.McLaughlin 0 5 2 0 7 3 5 7 0 - 21:41
14 55 D.Kinnear 0 1 2 0 2 0 2 1 0 0-2 10:14
15 67 K.Killen 1 1 0 0 6 1 4 2 0 - 16:42
16 82 J.Cupido (A) 0 1 -1 0 3 0 4 1 0 - 20:52
17 86 J.Lintz 0 1 -1 0 0 0 4 0 1 - 19:26
18 30 T.Carlson 0 0 NaN 0 0 0 0 0 0 - NaN
19 45 D.Ward 0 0 NaN 0 0 0 0 1 0 - NaN
20 NaN Totals: 8 13 NaN 8 42 13 76 30 11 8-21 NaN
I'm trying to solve a MySQL problem without going crazy. Not sure if it is feasible or not.
Data come from a door/light sensor to detect if toilet is occupied. When door is closed or opened, I get the info + light info. If I have info of closed door and light<10, I say that toilet is not occupied, if light>10, toilet is occupied, and if door is open, toilet is not occupied.
Here is an example of my data :
id wc_id door_open light time
138 0 1 64 2018-10-10 12:28:51
139 0 0 58 2018-10-10 12:34:00
140 0 0 54 2018-10-10 12:34:38
141 0 1 68 2018-10-10 12:35:11
142 0 1 3 2018-10-10 12:35:36
143 0 0 60 2018-10-10 12:37:56
144 0 0 60 2018-10-10 12:37:57
145 0 0 57 2018-10-10 12:38:30
146 0 1 65 2018-10-10 12:43:53
147 0 1 3 2018-10-10 12:44:17
148 0 0 63 2018-10-10 13:10:55
149 0 0 59 2018-10-10 13:11:16
150 0 1 71 2018-10-10 13:12:09
151 0 1 4 2018-10-10 13:12:14
152 0 1 1 2018-10-10 13:15:07
153 0 0 62 2018-10-10 13:17:18
154 0 0 58 2018-10-10 13:18:01
155 0 1 68 2018-10-10 13:19:20
156 0 1 3 2018-10-10 13:19:56
157 0 1 42 2018-10-10 13:26:41
158 0 0 63 2018-10-10 13:26:44
159 0 0 58 2018-10-10 13:27:39
160 0 1 71 2018-10-10 13:27:40
161 0 1 3 2018-10-10 13:28:37
The idea is at the end to have only a series of door_open to 0 to 1, it's not possible to have two 0 or two 1 consecutively.
So I need to keep first door_open=0 with light>10 following a door_open=1, and first door_open=1 after door_open=0, whatever light value.
Is it possible with MySQL? I use MariaDB 10.3.9.
Thanks for your ideas.
The output should be like that :
id wc_id door_open light time
139 0 0 58 12:34:00
141 0 1 68 12:35:11
143 0 0 60 12:37:56
146 0 1 65 12:43:53
148 0 0 63 13:10:55
150 0 1 71 13:12:09
153 0 0 62 13:17:18
155 0 1 68 13:19:20
158 0 0 63 13:26:44
160 0 1 71 13:27:40
(I simplified the time, it's not really important here)
Here is a fiddle
This query should do what you want. It uses a MySQL variable to delay the value of door_open by 1 row, and then returns rows where door_open=0 with light>10 following a door_open=1, and first door_open=1 after door_open=0, whatever light value:
SELECT events.*, #door_open := door_open
FROM events
JOIN (SELECT #door_open := 1) do
WHERE #door_open = 0 AND door_open = 1 OR
#door_open = 1 AND door_open = 0 AND light > 10
Output (from your fiddle data):
id toilet_id door_open light time #door_open := door_open
101 0 false 62 2018-10-10T11:39:31Z 0
103 0 true 69 2018-10-10T11:39:34Z 1
104 0 false 62 2018-10-10T11:42:16Z 0
106 0 true 68 2018-10-10T11:45:50Z 1
109 0 false 56 2018-10-10T12:13:11Z 0
Updated SQLFiddle
Here is the potential answer to my problem, after working on Nick solution. I had to reorder my table (after deleting rows) to avoid an order mess.
select es.id,
es.idNext,
es.toilet_id,
es.time,
es.nextTime,
timediff(es.nextTime, es.time) AS duration
from (
SELECT id, toilet_id, time,
#door_open := door_open as door_open,
lead(id, 1) OVER(ORDER BY id) idNext,
lead(time, 1) OVER(ORDER BY id) nextTime
FROM events e
JOIN (SELECT #door_open := 1) do
WHERE #door_open = 0 AND door_open = 1 OR
#door_open = 1 AND door_open = 0 AND light > 20
) es
where
es.door_open=0 and
timediff(es.nextTime, es.time)>5
Next thing is to update the query to use a partition over toilet_id to separate data from each id.
Sorry for the unclear title. But i want to make column editor_article which has name of editor. Only article with same value among id_article and parent_id have name of editor_article and editor_article has '0' value if id_section = 29 and if parent_id != 0. Editor_article got from editor column join with t_kolom.id_editor.
tbl_name:t_article
id_section id_article parent_id editor
29 441 0 2
33 1093 18 2
33 18 0 0
29 3144 0 8
30 3136 0 0
31 3130 0 0
31 3140 3130 22
31 3141 3130 335
30 3142 3136 546
tbl_name:t_kolom
id_editor name
1 john
2 gerrard
3 lukas
8 anthony
22 jimmy
335 eric
546 tyas
And the expected output:
id_section id_article parent_id editor editor_article
29 441 0 2 0
33 1093 18 2 0
33 18 0 0 gerrard
29 3144 0 8 0
30 3136 0 0 tyas
31 3130 0 0 jimmy,eric
31 3140 3130 22 0
31 3141 3130 335 0
30 3142 3136 546 0
I'm trying to scrape the data from every table at the hockey-reference awards page. I can scrape the first table for the Hart Memorial Trophy, but when I try the rest of them, I end up with empty vectors. I used Selector Gadget and the rvest package to produce the following code.
library(rvest)
url="https://www.hockey-reference.com/awards/voting-2017.html"
byng<-read_html(url)
byng_node<-html_nodes(byng, "#byng_stats .right , #byng_stats a")
byng_text<-html_text(byng_node)
However, once I run this code, I get no data in the byng variables:
> byng_node
{xml_nodeset (0)}
> byng_text
character(0)
What's happening here? Does selector gadget not work for pages with multiple tables? Does it have nothing to do with that and there's something HTMLy I don't understand? Any help is greatly appreciated!
#neilfws was right: if you look at the source code of the HTML page, you see that all but the first table are commented so rvest thinks they are comments, not part of source code itself. Let's do a dirty hack and remove these characters that are used to comment our precious tables:
library(rvest)
url="https://www.hockey-reference.com/awards/voting-2017.html"
byng<-read_html(url)
# Remove commenting sequences
byng <- gsub("<!--", "", byng)
byng <- gsub("-->", "", byng)
byng<-read_html(byng)
#Get tables as a list of dataframes
tables <- html_table(byng)
# Last table
tables[7]
[[1]]
Scoring Scoring Scoring Scoring Goalie Stats Goalie Stats
1 Place Player Age Tm Pos Votes Vote% 1st 2nd 3rd 4th 5th G A PTS +/- W L
2 1 Connor McDavid 20 EDM C 762 94.07 141 18 3 0 0 30 70 100 27
3 2 Sidney Crosby 29 PIT C 526 64.94 20 142 0 0 0 44 45 89 17
4 3 Nicklas Backstrom 29 WSH C 127 15.68 1 2 116 0 0 23 63 86 17
5 4 Mark Scheifele 23 WPG C 21 2.59 0 0 21 0 0 32 50 82 18
6 5 Auston Matthews 19 TOR C 10 1.23 0 0 10 0 0 40 29 69 2
7 6 Evgeni Malkin 30 PIT C 4 0.49 0 0 4 0 0 33 39 72 18
8 7 John Tavares 26 NYI C 2 0.25 0 0 2 0 0 28 38 66 4
9 8 Jonathan Toews 28 CHI C 1 0.12 0 0 1 0 0 21 37 58 7
10 8 Brad Marchand 28 BOS C 1 0.12 0 0 1 0 0 39 46 85 18
11 8 Ryan Kesler 32 ANA C 1 0.12 0 0 1 0 0 22 36 58 8
12 8 Ryan Getzlaf 31 ANA C 1 0.12 0 0 1 0 0 15 58 73 7