[00:06:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [00:12:53] !log Running namespaceDupes.php --fix via foreachwiki in screen session on terbium [00:13:01] Logged the message, Master [00:23:08] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [00:25:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [00:31:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [00:33:08] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [00:35:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [00:38:10] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [00:40:10] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [00:43:10] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [00:45:10] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [00:48:10] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [00:50:09] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [00:53:10] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [00:55:09] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [00:57:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:18] New review: Tim Starling; "Fenari doesn't appear to have a separate partition for /tmp, so this wouldn't help there. " [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57774 [00:59:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.155 second response time [01:08:09] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [01:09:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 8 seconds [01:23:23] New review: Tim Starling; "Looks good, thanks for that. Sorry about the delay in reviewing this." [operations/debs/lucene-search-2] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/56354 [01:23:28] Change merged: Tim Starling; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/56354 [01:39:07] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 223 seconds [01:40:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [01:53:08] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 181 seconds [01:55:07] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [02:05:30] !log LocalisationUpdate completed (1.22wmf2) at Mon Apr 22 02:05:30 UTC 2013 [02:05:38] Logged the message, Master [02:08:56] !log LocalisationUpdate completed (1.22wmf1) at Mon Apr 22 02:08:56 UTC 2013 [02:09:03] Logged the message, Master [02:09:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 217 seconds [02:14:59] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 22 02:14:58 UTC 2013 [02:15:05] Logged the message, Master [02:15:42] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 30 seconds [02:23:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 196 seconds [02:26:42] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 6 seconds [02:29:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 186 seconds [02:30:42] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 30 seconds [02:33:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 210 seconds [02:36:22] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [02:36:23] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [02:38:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [02:41:42] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 16 seconds [02:48:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [02:50:42] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 12 seconds [02:53:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 192 seconds [02:58:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [02:59:42] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [03:13:39] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 196 seconds [03:15:39] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 23 seconds [03:27:09] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [03:38:39] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [03:43:39] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [03:44:40] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 15 seconds [03:53:39] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [03:58:39] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [04:01:39] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 15 seconds [04:14:37] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 218 seconds [04:16:07] PROBLEM - Puppet freshness on gallium is CRITICAL: No successful Puppet run in the last 10 hours [04:18:42] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [04:23:37] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [04:24:37] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 28 seconds [04:28:37] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 211 seconds [04:30:37] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 16 seconds [04:33:37] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 196 seconds [04:35:56] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 30 seconds [04:38:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 183 seconds [04:40:15] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [04:49:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 208 seconds [04:50:15] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [04:56:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:57:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [05:00:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:03:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [05:08:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 183 seconds [05:10:15] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [05:13:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [05:15:15] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [05:28:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [05:29:15] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 3 seconds [05:33:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 183 seconds [05:35:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [05:49:08] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 225 seconds [05:50:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [05:57:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:58:08] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [05:59:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 12 seconds [05:59:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [06:08:14] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [06:08:54] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [06:08:54] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [06:08:54] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [06:10:14] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [06:18:18] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [06:20:14] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [06:28:15] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [06:30:14] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [06:33:08] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 184 seconds [06:35:09] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 4 seconds [07:10:31] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:23] lo [07:57:04] New patchset: Legoktm; "Convert logbot to use ircbot.SingleServerIRCBot for auto-reconnection." [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/60240 [08:01:18] New patchset: Legoktm; "Convert logbot to use ircbot.SingleServerIRCBot for auto-reconnection." [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/60240 [08:10:09] New review: MZMcBride; "Looks good to me." [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/59371 [08:16:44] :D [08:17:18] Susan: If it looks good, why did't you +1? [08:17:20] yet another repo I wasn't aware of [08:44:36] New review: Hashar; "What about the production entries?" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/60231 [10:13:22] New review: Faidon; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/60187 [11:14:24] New patchset: Mark Bergsma; "Rename dysprosium's backend to -sda and -sdb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60253 [11:15:17] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60253 [12:15:08] New patchset: Mark Bergsma; "Double the backend weight on dysprosium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60257 [12:16:05] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60257 [12:21:03] New patchset: Mark Bergsma; "Make weights for esams fe -> eqiad be and eqiad fr -> eqiad be equal" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60258 [12:21:36] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60258 [12:36:49] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [12:36:49] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [12:57:22] New patchset: Mark Bergsma; "Set dysprosium backend load at 4x" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60260 [12:57:57] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60260 [13:03:39] PROBLEM - Apache HTTP on mw1159 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:03:59] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection timed out [13:04:09] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [13:04:29] PROBLEM - RAID on ms-be6 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:04:30] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection timed out [13:04:30] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out [13:04:30] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection timed out [13:04:30] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out [13:04:39] PROBLEM - Apache HTTP on mw1156 is CRITICAL: Connection timed out [13:05:20] PROBLEM - Apache HTTP on mw1153 is CRITICAL: Connection timed out [13:05:30] Hi [13:05:38] Is there a way of doing a status check [13:05:40] ? [13:05:46] I'm getting some unexpected 503 erorrs [13:05:50] mark: I think that's you [13:06:05] http://upload.wikimedia.org/wikipedia/commons/thumb/e/e5/Goody_Two-Shoes_%281881%29.djvu/page81-1410px-Goody_Two-Shoes_%281881%29.djvu.jpg [13:06:07] 503 [13:06:09] swift req/s have dived [13:06:15] dived? [13:06:20] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=swift+frontend+proxies [13:06:55] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&tab=v&vn=swift+backend+storage [13:07:08] backends are melting [13:07:26] io wait through the roof [13:07:29] New patchset: Mark Bergsma; "Revert "Double the backend weight on dysprosium"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60261 [13:07:40] Good afternoon? [13:07:49] Change abandoned: Mark Bergsma; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60261 [13:07:52] I'm based in the UK [13:07:58] New patchset: Mark Bergsma; "Revert "Set dysprosium backend load at 4x"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60262 [13:08:09] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60262 [13:09:20] apergos: do you recall off the top of your head which ones are c2100s/r720xd + h310/r720xd + h710? [13:09:35] Qcoder00: it's better to ask in #wikimedia-tech where there isn't operational stuff going on [13:09:44] I don't know which ones now have the new ssds and controllers, no [13:10:00] ms be 2, 4 and 1? maybe are the three left that are c2100s [13:10:04] one of them shoul dcome out today [13:10:19] ok, I'll have a look to know for sure [13:10:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:10:39] that's probaby best [13:10:43] *probably [13:11:11] there's a big disparity between the load of some of them vs. the others, I'd like to make sure that's the h310 and not some other thing [13:11:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [13:12:10] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 5.297 second response time [13:12:21] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.045 second response time [13:12:21] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.048 second response time [13:12:21] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.055 second response time [13:12:26] ms-be2 & ms-be9 are depooled, ms-be12 is 66%, right? [13:12:30] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.045 second response time [13:12:31] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 67996 bytes in 0.152 second response time [13:12:40] New patchset: Odder; "(bug 44308) Add new namespaces and aliases on zhwikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60263 [13:12:50] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.043 second response time [13:13:30] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.045 second response time [13:14:43] apergos: where's ms-be1? [13:15:20] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [13:15:32] what do you mean, where is it? [13:15:49] down for 11 days, still pooled... [13:15:58] wtf [13:16:18] well it's not scheduled to be down. how did I not see a notice about it? [13:16:27] ms-be9 is down for 10 days, depooled [13:16:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:16:35] yes, ms-be9 is expected. [13:16:42] ms-be2 will be in that state later today. [13:16:52] 10 days? [13:16:52] ms-be1 is not due for that [13:16:52] yes. more than 10 [13:17:16] why that long? [13:17:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.167 second response time [13:17:21] why is this? because dell wants there servers. yesterday. and the last four can come out without them waiting for us to put the new ones back in [13:17:31] because if they wait it will take twice as long. [13:17:36] *their servers [13:17:42] gah no good typing today [13:18:13] but I would like to know what happened to ms-be1, I will ask chris if he knows anything [13:18:20] we have 25% of the cluster depooled or down, plus another server at 66% [13:18:32] not very reasonable load either [13:19:13] the one at 66 can and should go to 100 before anything else happens [13:19:37] chris? [13:19:41] but we only have one more server left if that's the case (if ms-be1 has been gone that long, its partitions are already replicated elsehwere now) [13:19:46] did you attempt to powercycle and failed? [13:19:47] or should I? [13:19:52] no, I haven't done anything yet [13:20:03] I don't want to bring it back up yet til I find out what's going on [13:20:11] since it's scheduled to be replaced [13:20:35] I would just as soon let it stay down and put the new one in, and get th old one out of here [13:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [13:24:01] can we restore the cluster to a healthy state asap? :) [13:25:26] unfortunately nothing can be done on this cluster asap, it's all tediously slow. I can ask for a replacement to be racked for ms-be2 or 1 (since 1 is down) but it will take several days regardless for traffic to migrate to it [13:26:22] I bet ms-be11 and 12 have the new controller and ssds [13:26:40] why would you not have a replacement box racked when it's down anyway? [13:26:59] because steve might be doing other things [13:27:10] oh for christ sake [13:27:28] what? [13:27:50] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [13:27:56] I don't know if there is something ready to go for ms-be2 right now or not, I haven't been coordinating it [13:28:03] that's whay I would have asked chris [13:28:32] you're the one doing this replacement isn't it? [13:28:51] well tbh chris is the one getting the heat from dell [13:29:05] not steve, not me [13:30:05] so, we currently have [13:31:08] 3 boxes depooled or down, 6 boxes with a H310 which can't take much load, 1 being a C2100 and 2 with H710 which are the only ones sane, but one of them is at 66% [13:32:20] I suggest: a) increase 66% to 100% now, b) replace ms-be1 now and put ms-be1/2/9 with H710s back in the cluster [13:33:24] are you doing (a) or should I? [13:33:42] and let's ping steve when he joins irc [13:34:06] we should ping chris when he joins, he is the one coordinating the racking and replacement [13:35:25] I'm not sure if you realize, we just had swift melt a few moments ago for a slight increase of traffic [13:36:00] if you increase from 66 to 100 now instead of waiting for the new box to be racked later today, we are not going to gain much. it takes several days for that data to move around [13:37:11] no, I was not watching here, and irritatingly I did not hear my phone [13:37:43] New patchset: Krinkle; "Pester IRC as well when a draft is published" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50044 [13:37:53] I'm not sure what will happen if a box fails for a random reason now [13:38:15] let's make something to fix this asap [13:38:27] s/make/do/ even [13:39:31] and boy, it's just amazing how crappy the h310s are [13:39:59] well, the quickest (which means however that dell will wait longer for their servers) is to bring ms-be1 back on line as it is. it will be somewht out of sync, but it will have most of the data [13:40:16] even c2100s are so much better [13:40:17] there's literally nothing else "quick" that can be done [13:40:42] yeah, I am seeing the graphs for the boxes with the h310s [13:40:44] astoundingly bad [13:44:32] I don't understand why you decided to remove c2100s without replacing them [13:44:50] i.e. why we have ms-be9 down for 10 days [13:44:58] (or more as you said) [13:46:00] because in the past this cluster had a lot of head room, and dell ha been getting very pushy about getting their gear back, and putting new servers back in as the old one comes out makes everything take twice as long [13:47:23] it's definitely my bad tha I didn't see ms-be1 out, though. still don't know how that happened. [13:47:36] hmm [13:47:37] ipmitool> chassis power status [13:47:37] Error: Unable to establish LAN session [13:47:37] Unable to get Chassis Power Status [13:47:47] that's after [13:47:49] root@sockpuppet:~# ipmtool -U root -H ms-be1 shell [13:47:55] ganglia is all red with the h310s having an increased load, not sure where you see the headroom [13:48:01] ms-be1.mgmt, not ms-be1 [13:48:06] grrr [13:48:28] apparently there isn't any now [13:48:44] ? [13:48:45] time passes, spare capacity gets used [13:55:22] RECOVERY - Host ms-be1 is UP: PING OK - Packet loss = 0%, RTA = 26.78 ms [14:01:12] apergos: the new 720 (ms-be2 replacement) should already be on the rack. Once Steve gets in to the DC he will set it up [14:01:19] ok [14:01:24] I see you are doing the backread [14:01:27] yes [14:01:50] so it looks like the hope that we could pull out the remaining 2 boxes without replacing as we go, is dead in the water [14:02:08] dell will not like it but in fact their h310s and their c2100s are the reason we are in this fix [14:02:34] you might remind them of the h310s :-P [14:03:33] okay...not worried about Dell [14:03:48] in the meantime however, ms-be2 is ready to be powered off [14:03:56] the only bright spot of new in the whole thing [14:04:00] *news [14:04:46] ok...good [14:09:25] New patchset: Mark Bergsma; "Set dysprosium backend weight to 6x" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60273 [14:10:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60273 [14:10:29] i'm gonna push the limit again [14:10:48] You, could someone check https://gerrit.wikimedia.org/r/#/c/59969/2 out pretty please? [14:10:57] s/You/Yo/ [14:12:24] Coren: I saw it, I don't have any objections -- whitespace is weird with 4 spaces AND tabs though [14:12:48] paravoid: Ah, right, my default vim settings for C. [14:13:05] Coren: but you should probably get a review from someone who knows a bit more about tool labs [14:13:07] paravoid: That shouldn't be hard to fix. [14:13:09] ryan for example :) [14:16:13] PROBLEM - Puppet freshness on gallium is CRITICAL: No successful Puppet run in the last 10 hours [14:18:34] PROBLEM - Apache HTTP on mw1155 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:18:34] PROBLEM - Apache HTTP on mw1156 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:18:39] argh [14:19:23] PROBLEM - Apache HTTP on mw1153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:19:47] New patchset: Mark Bergsma; "Set dysprosium backend load at 3x" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60275 [14:19:58] perhaps today is a bad day to work :P [14:20:06] heh [14:20:13] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [14:20:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60275 [14:20:23] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.045 second response time [14:20:24] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.049 second response time [14:20:48] this isn't the first time you've done this test though, is it? [14:21:09] I think ms-be1/2 11/10 days ago must have been the tipping point [14:22:32] it is [14:22:41] before was frontend [14:22:42] now it's backend [14:22:48] so any misses hit swift directly [14:23:02] oh [14:26:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:28:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [14:43:07] New patchset: Reedy; "Debugging for EducationProgram" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60279 [14:43:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:44:05] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60279 [14:44:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [14:44:38] !log reedy synchronized wmf-config/InitialiseSettings.php [14:44:46] Logged the message, Master [14:51:53] anyone could look into change 50044? [14:52:02] been lurking there for a time now [14:53:05] ori-l: ping [14:53:51] it looks fine, but I can't actually merge it, since I'm not ops [14:53:55] ah [14:54:25] I'll poke someone later on today (PST) if nobody else picks it up [14:54:25] always difficult to know whom exactly are ops [14:55:05] ... [14:55:17] New patchset: Cmjohnson; "Updating dhcpd files for ms-be9" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60280 [14:55:44] https://meta.wikimedia.org/wiki/Sysadmins [14:55:48] Slightly out of date though [14:56:02] ah [14:56:05] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60280 [14:56:06] New review: MZMcBride; "What happened here? This changeset seems to have been approved, but never merged." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8438 [14:57:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:49] where does is show that it was approved? [14:58:04] apergos: "Patch Set 1: Looks good to me, approved" [14:58:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:58:36] unless tim typed that by hand just to fnuck with everyone after a year [14:58:58] dunno, but I do see that it needs [14:59:00] well everything [14:59:11] verified, code review *and* rebased [14:59:42] I don't know what it looked like a year ago, but surely if it had been +2 then, it would show up in the chart [15:00:15] although I guess it was verified and that doesn't show up either [15:00:30] apergos: per comment, tim aprooved it back then [15:00:46] New review: Reedy; "Certainly won't merge cleanly now." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8438 [15:00:46] yeah I see his comment [15:01:59] I wonder if deomon oughta look at that (but what he's going to be able to say after a year of gerrit hiccups, I dunno) [15:01:59] New review: Jeremyb; "Not sure why it's not merged yet (and I barely even remember the history...)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/8438 [15:02:56] apergos: or we (you) could just rebase/review/verify it and let it go to the mist of history [15:02:58] ツ [15:03:41] I could rebase it and let it be verified but I would not review it right this sec [15:04:17] apergos: so, what's the update for swift now? [15:04:28] ms-be1 is up [15:04:35] apergos: if you have time, could you review https://gerrit.wikimedia.org/r/#/c/50044/ ? [15:04:40] I saw that, you changed your mind it seems :) [15:04:47] Reedy: AzaToth: manifests/admins.pp is better usually but also not necessarily up to date [15:04:52] huh? [15:05:10] no, you wanted (justfiably so) something quick and that's the only thing that coul dbe done quickly [15:05:40] ms-be9 and ms-be2 will be ready to start going back in tongiht or early tomorrow but that is a process which will take 2 weeks [15:05:54] anyways that is the status, [15:06:05] jeremyb_: I would assume the admins themself would have a correct list of all admins ツ [15:06:43] once those are back in and of course ms-be12 to 100 then it should be stable for me to pull out another c2100 [15:06:50] that's the plan [15:06:57] AzaToth: well there's the private puppet repo's root authorized_keys [15:07:23] true [15:07:28] AzaToth: and manifests/admins.pp and the list of people that have the root password [15:07:38] there's not really any other lists i think [15:07:43] you have root passwords? [15:07:52] it hasn't been too long since the password was changed [15:08:07] AzaToth: i assume it's just for use on serial console... [15:08:15] oh [15:09:15] and I assume it's a 128+ character randon string? ツ [15:09:41] taking a week to input manually [15:10:31] i think not. i think it's something that could be typed in less than a minute. (otherwise what's the point?) [15:10:50] (really just speculation...) [15:11:42] Anybody aware of issues with loading stuff from bits.wikimedia.org on specific user pages on Commons in Firefox? I can reproduce, and it's weird. [15:12:13] * andre__ probably better off writing an email to ops@ [15:12:58] * apergos wonders who is on rt now anyways [15:15:21] anyway, an reivew of 50044 would be perfect [15:27:42] AzaToth: errrr, you're not looking for a wikimedia root. and not asking in the right place [15:27:49] AzaToth: try #mediawiki-i18n [15:36:13] mark, yt? [15:39:04] yes [15:40:18] <^demon|busy> jeremyb_: I want https://gerrit.wikimedia.org/r/#/c/8120/ off my review list. Can I abandon? [15:41:22] ugh... i need to not do much wikimedia stuff for a couple days. need to get other stuff done before puppet camp! [15:41:40] oh, that one [15:42:35] ^demon|busy: i'll look at it tomorrow night [15:42:40] <^demon|busy> Okie dokie. [15:44:00] mark, I'd like to continue with new caching rollout - when you'll be available? [15:44:16] isn't that scheduled for tomorrow? [15:45:53] so I'm asking if you'll be there, or we need to find another time:) [15:46:04] I can be there, although the window is a bit late for me ;) [15:47:11] mark, we can do it outside of PST business houurs [15:47:22] that works too [15:47:24] e.g. now:P [15:47:28] fine with me [15:48:30] rolling out everywhere? [15:48:34] yes [15:48:47] yay [15:48:49] now would be excellent [15:48:53] then maybe I can go to the datacenter tomorrow :P [15:49:01] only this rollout was in the way [15:49:26] New patchset: Jgreen; "drush consistent-user and lockdown scheme" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60285 [15:50:07] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60285 [15:50:15] MaxSem: ready in 5 mins [15:50:42] New patchset: MaxSem; "Enable $wgMFVaryResources on enwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60286 [15:50:53] let's start with this ^^^ it's about 50% of mobile traffic [15:51:50] hehe [15:51:50] ok [15:52:30] if it works, I'll flip the rest shortly [15:52:44] i'm ready [15:54:28] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60286 [15:54:33] New patchset: Ori.livneh; "Create self-standing IPython Notebook Puppet module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60187 [15:54:33] New patchset: Ori.livneh; "Use Upstart rather than supervisor to manage IPython" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60094 [15:54:44] paravoid: you put me up to it, it's all your fault :P [15:54:52] :) [15:55:02] I wasn't sure if it's a good idea [15:55:07] I'm happy to make the call either way tbh [15:55:17] for you to make the call I mean [15:55:59] well, i thought it was worth a try, so i gave it a shot, and i like what i ended up with [15:56:31] it's a big change from PS3, though, so there may be new issues. but i don't mind making additional patchsets/fixes if you don't mind reviewing. [15:56:52] !log maxsem synchronized wmf-config/InitialiseSettings.php '$wgMFVaryResources on enwiki, https://gerrit.wikimedia.org/r/#/c/60286/' [15:57:00] Logged the message, Master [15:58:21] PROBLEM - DPKG on analytics1011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:58:22] PROBLEM - DPKG on analytics1010 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:58:22] PROBLEM - DPKG on analytics1020 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:58:31] PROBLEM - DPKG on analytics1026 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:58:31] PROBLEM - DPKG on analytics1017 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:58:31] PROBLEM - DPKG on analytics1012 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:58:39] mark, deployed ^^^ headers look fine [15:58:57] yes [15:59:11] 6434.69 RxHeader Vary: Accept-Encoding,X-Device,Cookie,X-Carrier,X-Subdomain,X-Images [15:59:12] 3522.08 RxHeader Vary: Accept-Encoding,X-WAP,Cookie,X-Carrier,X-Subdomain,X-Images [15:59:51] cache hit rate going up :) [16:00:04] 55% on cp1042 backend now, up from about 38-40% [16:00:09] eek [16:01:31] RECOVERY - DPKG on analytics1026 is OK: All packages OK [16:01:34] huh [16:01:37] what an odd pattern: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Mobile+caches+eqiad&h=cp1042.eqiad.wmnet&v=214235900&m=varnish.cache_hit&jr=&js=&vl=N%2Fs&ti=Cache+hits [16:02:05] !log reedy synchronized php-1.22wmf2/extensions/Wikibase [16:02:13] Logged the message, Master [16:02:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:02:51] i wonder if that's the 300s frontend cache ttl cap [16:03:05] we may want to remove that soon, it seems pointless to me anyway [16:03:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [16:04:23] why is it only on cp1042? [16:05:04] good question [16:05:37] the client request graph looks similar [16:05:53] the graphs on other mobile caches are also weird, in a different way for each of them [16:06:04] yes [16:06:39] anyway, feel free to continue [16:06:43] I can debug this later [16:07:39] New patchset: MaxSem; "Enable $wgMFVaryResources everywhere" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60288 [16:08:06] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60288 [16:08:19] RECOVERY - DPKG on analytics1011 is OK: All packages OK [16:08:28] RECOVERY - DPKG on analytics1017 is OK: All packages OK [16:09:08] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:09] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:09] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [16:10:28] !log maxsem synchronized wmf-config/InitialiseSettings.php 'New caching everywhere' [16:10:35] Logged the message, Master [16:10:36] here we go [16:10:43] X-Wap vary header now on top [16:12:08] * MaxSem wonders when it will be visible in ganglia [16:12:18] might take a while [16:13:42] the Cookie header could just as well be killing the caching [16:13:49] New patchset: Jgreen; "fundraising/drupal sudoers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60289 [16:14:07] New patchset: Ottomata; "Putting milimetric back in analytics icinga contact group (I have added him him in private repo)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60290 [16:14:33] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60290 [16:15:23] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60289 [16:15:59] there are still a few Vary headers with X-Device coming in [16:15:59] mhm, a quick hack would be to replace alpha/beta cookies with a header and vary by it [16:17:14] might be from sites with no mobile redirection configured, e.g. commons [16:18:37] it's mostly sessions [16:18:48] most of the other crap gets filtered out by the frontend VCL [16:23:46] what's our hit rate? [16:23:55] essentially unchanged [16:24:57] still around 55% [16:24:58] ? [16:25:11] it goes between 38-55% on that box [16:25:14] as if the change didn't happen [16:26:39] of course the entire cache needs to be replaced for this to really help [16:28:58] all right, we will wait then [16:29:03] thanks mark [16:29:08] thank you as well [16:29:15] we'll see how much this does [16:29:20] and we'll improve on it further if needed [16:30:09] the variance on the Cookie header is really all sessions [16:31:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:32:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [16:36:12] New review: Andrew Bogott; "Thanks for the cleanup!" [operations/debs/adminbot] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/59371 [16:36:12] Change merged: Andrew Bogott; [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/59371 [16:36:54] New review: Andrew Bogott; "I will merge and package this as soon as legoktm tells me "I have tried this and it works!"" [operations/debs/adminbot] (master) C: 1; - https://gerrit.wikimedia.org/r/60240 [16:37:09] !log added abaso to wmf-deployment group on gerrit (new mediawiki deployer) [16:37:16] Logged the message, Master [16:38:43] hmm some app server didn't get the update [16:39:40] hahaha [16:39:42] If you find yourself frequently using a custom format string and don't want to [16:39:42] specify it every run, just modify the default format string in config.h and [16:39:43] recompile httpry. [16:39:46] thank you, that's helpful [16:42:53] !log mark synchronized wmf-config/InitialiseSettings.php [16:43:01] Logged the message, Master [16:43:16] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 190 seconds [16:45:16] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 11 seconds [16:48:57] ah, I am officially off rt, yay [16:49:14] hehe [16:50:43] and i'm on and already restarting irc bots :p [16:50:45] !log restarting wikibugs on mchenry [16:50:51] Logged the message, Master [16:50:58] and in the bot moving discussion ,hah:) [16:52:04] Change abandoned: Demon; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/42791 [17:01:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:02:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.135 second response time [17:09:22] * AaronSchulz looks around fluorine [17:09:24] grep -P 'Too many connections' dberror.log | grep -P '\w{3} \d{1,2} \d\d:\d\d' -o | uniq -c [17:09:35] notpeter: I wonder what that spike was about [17:11:13] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [17:13:05] hiii pavaroid, you around? got time to talk about kafka deb? [17:14:59] which spike? [17:15:57] apergos: did you run the same thing AaronSchulz ran? [17:17:11] what did Aaron run? [17:17:15] (I doubt it) [17:17:18] 16003 Apr 22 14:38 [17:17:53] 22 17:09:24 < AaronSchulz> grep -P 'Too many connections' dberror.log | grep -P '\w{3} \d{1,2} \d\d:\d\d' -o | uniq -c [17:18:03] I'm runniing a long query, read only, against a db in tampa. that's all [17:18:12] one thread. [17:19:11] that's pretty spikey [17:20:22] apergos: ah, that's why there's so much data going out of pmtpa mysql node :) [17:20:55] let me guess... he's dumping wikidata logs? [17:21:16] yeah sorry notpeter if you were wondering [17:21:25] no worries :) [17:21:34] so the live dumps is against db55 [17:21:41] but I've been doing testing against db73 [17:21:53] trying with a flati file instead of gzip, or no php [17:21:58] and it's all the same result: [17:22:04] New patchset: Aude; "enable data transclusion and site link widget for enwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60292 [17:22:09] query starts speedily enough and then over time slows to a crawl etf [17:22:13] *wtf [17:22:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:23:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [17:24:40] New review: Aude; "while nothing bad would happen if this was deployed before enwiki switchover to wmf/1.22wmf2, I pref..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60292 [17:26:45] mutante: hey, since you're on RT duty..... ;) https://rt.wikimedia.org/Ticket/Display.html?id=4991 [17:27:29] and so it begins... [17:28:26] greg-g: you already have the editor flag though? or are you saying you need admin to edit sidebar [17:28:46] mutante: should be contentadmin not admin proabably [17:28:46] wikitech has an editor flag thing, basic account is read-only [17:28:55] probably* [17:29:09] whatever you decide that allows me to edit https://wikitech.wikimedia.org/wiki/MediaWiki:Sidebar (which I can't right now) ;) [17:29:13] <^demon|busy> Basic account isn't read-only since the merge with labsconsole. [17:29:21] <^demon|busy> jeremyb_ is right about contentadmin vs admin. [17:29:49] oh, so true, i was still thinking "old wikitech" [17:29:54] :) [17:29:55] thx [17:30:39] ^demon|busy: i take it back... i can't edit that page [17:30:46] as a contentadmin [17:31:04] must be editinterface [17:31:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:32:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [17:32:51] <^demon|busy> Oh, I assumed contentadmin had editinterface. [17:36:26] New patchset: Jforrester; "Deploy VisualEditor alpha opt-in to 14 further Wikipedias" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60293 [17:36:52] New review: Jforrester; "Not before Thursday, please. :-)" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/60293 [17:37:13] !log dist-upgrading some mw machines, should be low/no impact [17:37:20] Logged the message, Mistress of the network gear. [17:37:30] shit, i shouldn't have said that in the log, now the site will explode [17:37:46] :-) [17:38:12] LeslieCarr: Actually, it's worse: now it will explode for completely unrelated reasons but there will always remain a lingering suspicion that it was your upgrade. :-) [17:38:17] um… !newlog terror! fire falling from the skies! [17:38:20] :) [17:38:26] New review: Andrew Bogott; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59969 [17:38:57] AaronSchulz: [17:39:00] if I had to guess [17:39:03] I could say that those errors [17:39:09] are somehting being changed on wikidata [17:39:16] and then that kcking off a shitload of reparse jobs [17:39:30] and then those all hammering on the s5 slaves [17:40:11] greg-g: what's your wiki name again? [17:41:32] mutante: "Greg Grossmeier" [17:42:05] yeah, cn: Greg Grossmeier [17:42:28] `ldaplist -l passwd gjg` [17:42:54] AaronSchulz: although gdash doesn't show crazy spike [17:42:59] although might have been very short [17:43:15] anyway, something wanted to make a bazillion connections for wikidata [17:43:20] greg-g: https://wikitech.wikimedia.org/w/index.php?title=Special%3ALog&type=rights&user=&page=&year=&month=-1&hide_patrol_log=1 how about now [17:43:20] jobrunner is my guess [17:43:29] the jq log is indeed unremarkable there [17:46:31] mutante: works, thanks! [17:47:00] :) [17:47:17] New review: Dzahn; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53973 [17:48:20] notpeter: the "no working slave" errors disturb me, since they are user visible [17:48:21] New review: Dzahn; "we should add the start script:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53973 [17:49:29] AaronSchulz: Slavery was abolished in 1865 [17:52:16] Reedy: http://en.wikipedia.org/wiki/Prison%E2%80%93industrial_complex [17:52:48] AaronSchulz: yeah, our way of dealing with lagged slaves is problematic [17:53:03] as it's totally fine... except during high load... :/ [17:55:21] notpeter: well, in the all slaves lagged case, we just serve stale data and go into read-only mode [17:55:26] New review: Demon; "If we're using the puppetized ircecho, shouldn't that not be necessary? We'd be able to do `service ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53973 [17:55:37] the error I'm talking about is when non are available for connection [17:55:52] there is no easy way to handle that case ;) [17:56:25] AaronSchulz: fair enough [17:56:26] Reedy: not 1863? [17:56:52] Near enough [17:57:13] ok :) [18:03:56] notpeter: no profiling indicates wikidata spikes around then either [18:04:24] hrm, ok [18:04:46] then perhaps my wild guessing is off :) [18:05:03] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: enwiki to 1.22wmf2 [18:05:09] Logged the message, Master [18:05:21] New review: Lwelling; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/60043 [18:05:36] ^demon|busy: what about the -feed vs. -dev thing ?:p [18:05:51] thinks they should now both join -dev, right [18:06:14] * AaronSchulz watches wikidatawiki flood exception.log [18:06:19] i'll merge your change [18:07:00] but..let's just have both in one or the other [18:07:31] mutante: could I ask for an review of https://gerrit.wikimedia.org/r/#/c/50044/? [18:07:34] AaronSchulz: Ouch [18:07:51] <^demon|busy> mutante: Yes, -dev. [18:07:54] huh? [18:10:36] AzaToth: looks ok, but since it's gerrit stuff i'd like to see a +1 by ^demon|busy [18:11:00] +1ed [18:11:07] PROBLEM - DPKG on mw10 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:11:47] PROBLEM - DPKG on mw1060 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:11:55] that's me, fyi [18:12:07] kthx [18:12:13] <^demon|busy> mutante: Lemme re-review that. [18:13:34] AaronSchulz: Apparently they stopped [18:14:18] PROBLEM - Apache HTTP on mw1060 is CRITICAL: Connection refused [18:14:47] RECOVERY - DPKG on mw1060 is OK: All packages OK [18:15:07] RECOVERY - DPKG on mw10 is OK: All packages OK [18:16:18] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [18:16:34] New review: Demon; "Hook itself is fine, but needs to be added to manifests/gerrit.pp like the other hook files." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/50044 [18:19:38] !log rebooting nonresponsive sq33 [18:19:45] Logged the message, Mistress of the network gear. [18:20:17] PROBLEM - Apache HTTP on mw114 is CRITICAL: Connection refused [18:20:18] PROBLEM - DPKG on mw1124 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:20:18] PROBLEM - DPKG on mw1121 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:20:18] PROBLEM - DPKG on mw11 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:20:27] PROBLEM - DPKG on mw1128 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:20:37] PROBLEM - Apache HTTP on mw111 is CRITICAL: Connection refused [18:20:37] PROBLEM - Apache HTTP on mw113 is CRITICAL: Connection refused [18:20:37] PROBLEM - DPKG on mw1105 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:20:37] PROBLEM - DPKG on mw1107 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:20:39] all me [18:20:57] PROBLEM - DPKG on mw1102 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:21:07] PROBLEM - Apache HTTP on mw1132 is CRITICAL: Connection refused [18:21:07] PROBLEM - Apache HTTP on mw1144 is CRITICAL: Connection refused [18:21:07] PROBLEM - Apache HTTP on mw1134 is CRITICAL: Connection refused [18:21:07] PROBLEM - Apache HTTP on mw1130 is CRITICAL: Connection refused [18:21:07] PROBLEM - Apache HTTP on mw1122 is CRITICAL: Connection refused [18:21:08] PROBLEM - DPKG on mw1106 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:21:08] PROBLEM - DPKG on mw1104 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:21:17] PROBLEM - Apache HTTP on mw1120 is CRITICAL: Connection refused [18:21:17] PROBLEM - Apache HTTP on mw1143 is CRITICAL: Connection refused [18:21:18] PROBLEM - Apache HTTP on mw1149 is CRITICAL: Connection refused [18:21:18] PROBLEM - Apache HTTP on mw1133 is CRITICAL: Connection refused [18:21:18] PROBLEM - DPKG on mw1100 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:21:18] PROBLEM - DPKG on mw1103 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:21:18] PROBLEM - Apache HTTP on mw1141 is CRITICAL: Connection refused [18:21:19] PROBLEM - DPKG on mw1101 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:21:19] RECOVERY - DPKG on mw1124 is OK: All packages OK [18:21:20] PROBLEM - Apache HTTP on mw1113 is CRITICAL: Connection refused [18:21:20] PROBLEM - Apache HTTP on mw1119 is CRITICAL: Connection refused [18:21:21] PROBLEM - Apache HTTP on mw1112 is CRITICAL: Connection refused [18:21:21] PROBLEM - Apache HTTP on mw1123 is CRITICAL: Connection refused [18:21:22] PROBLEM - Apache HTTP on mw1121 is CRITICAL: Connection refused [18:21:22] PROBLEM - Apache HTTP on mw1138 is CRITICAL: Connection refused [18:21:23] PROBLEM - Apache HTTP on mw1139 is CRITICAL: Connection refused [18:21:23] RECOVERY - DPKG on mw1121 is OK: All packages OK [18:21:24] PROBLEM - Apache HTTP on mw1142 is CRITICAL: Connection refused [18:21:24] PROBLEM - Apache HTTP on mw1140 is CRITICAL: Connection refused [18:21:25] PROBLEM - DPKG on mw1109 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:21:27] PROBLEM - Apache HTTP on mw1146 is CRITICAL: Connection refused [18:21:27] RECOVERY - DPKG on mw1128 is OK: All packages OK [18:21:27] PROBLEM - Apache HTTP on mw1131 is CRITICAL: Connection refused [18:21:28] PROBLEM - Apache HTTP on mw1135 is CRITICAL: Connection refused [18:21:28] PROBLEM - Apache HTTP on mw1137 is CRITICAL: Connection refused [18:21:28] PROBLEM - DPKG on mw1108 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:21:29] ^demon|busy: so i got a rebasing/path conflict on the wikibugs change, i am looking at manual rebase fix. do you know if "$ircecho_logbase =" is something you removed on purpose? [18:21:37] PROBLEM - Apache HTTP on mw1111 is CRITICAL: Connection refused [18:21:37] RECOVERY - Apache HTTP on mw111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.118 second response time [18:21:37] RECOVERY - Apache HTTP on mw113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.135 second response time [18:21:47] PROBLEM - Apache HTTP on mw1148 is CRITICAL: Connection refused [18:21:47] PROBLEM - Apache HTTP on mw1125 is CRITICAL: Connection refused [18:21:57] PROBLEM - Apache HTTP on mw1116 is CRITICAL: Connection refused [18:21:57] PROBLEM - Apache HTTP on mw1145 is CRITICAL: Connection refused [18:21:57] PROBLEM - Apache HTTP on mw1147 is CRITICAL: Connection refused [18:21:57] PROBLEM - Apache HTTP on mw1129 is CRITICAL: Connection refused [18:22:07] PROBLEM - Apache HTTP on mw1124 is CRITICAL: Connection refused [18:22:07] PROBLEM - Apache HTTP on mw1110 is CRITICAL: Connection refused [18:22:18] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.089 second response time [18:22:18] RECOVERY - Apache HTTP on mw1120 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.130 second response time [18:22:18] RECOVERY - Apache HTTP on mw1143 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.132 second response time [18:22:18] RECOVERY - Apache HTTP on mw114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.154 second response time [18:22:18] RECOVERY - Apache HTTP on mw1141 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.134 second response time [18:22:18] RECOVERY - Apache HTTP on mw1133 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 1.482 second response time [18:22:18] RECOVERY - Apache HTTP on mw1113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.068 second response time [18:22:19] RECOVERY - Apache HTTP on mw1119 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [18:22:19] RECOVERY - Apache HTTP on mw1112 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.073 second response time [18:22:20] RECOVERY - Apache HTTP on mw1138 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.070 second response time [18:22:20] RECOVERY - Apache HTTP on mw1139 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [18:22:21] RECOVERY - Apache HTTP on mw1121 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.121 second response time [18:22:21] RECOVERY - Apache HTTP on mw1123 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.127 second response time [18:22:22] RECOVERY - Apache HTTP on mw1140 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [18:22:22] RECOVERY - Apache HTTP on mw1142 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.086 second response time [18:22:28] RECOVERY - Apache HTTP on mw1146 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.083 second response time [18:22:28] RECOVERY - Apache HTTP on mw1135 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.065 second response time [18:22:28] RECOVERY - Apache HTTP on mw1137 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.073 second response time [18:22:28] RECOVERY - Apache HTTP on mw1131 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.089 second response time [18:22:37] RECOVERY - Apache HTTP on mw1111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.074 second response time [18:22:47] RECOVERY - Apache HTTP on mw1125 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.059 second response time [18:22:47] RECOVERY - Apache HTTP on mw1148 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [18:22:57] RECOVERY - Apache HTTP on mw1145 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.058 second response time [18:22:57] RECOVERY - Apache HTTP on mw1129 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.073 second response time [18:22:58] RECOVERY - Apache HTTP on mw1116 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.084 second response time [18:22:58] RECOVERY - Apache HTTP on mw1147 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.082 second response time [18:23:07] RECOVERY - Apache HTTP on mw1134 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.054 second response time [18:23:08] RECOVERY - Apache HTTP on mw1132 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.069 second response time [18:23:08] RECOVERY - Apache HTTP on mw1130 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.068 second response time [18:23:08] RECOVERY - Apache HTTP on mw1124 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [18:23:08] RECOVERY - Apache HTTP on mw1110 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [18:23:08] RECOVERY - Apache HTTP on mw1144 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.082 second response time [18:23:08] RECOVERY - DPKG on mw1104 is OK: All packages OK [18:23:09] RECOVERY - Apache HTTP on mw1122 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.569 second response time [18:23:18] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 190 seconds [18:23:27] RECOVERY - DPKG on mw1108 is OK: All packages OK [18:23:36] <^demon|busy> mutante: I don't remember, tbh...I'm not seeing that removal on the patch though... [18:23:43] RECOVERY - DPKG on mw1105 is OK: All packages OK [18:23:43] RECOVERY - DPKG on mw1107 is OK: All packages OK [18:23:57] RECOVERY - DPKG on mw1102 is OK: All packages OK [18:24:07] RECOVERY - DPKG on mw1106 is OK: All packages OK [18:24:17] RECOVERY - DPKG on mw1100 is OK: All packages OK [18:24:17] RECOVERY - DPKG on mw1103 is OK: All packages OK [18:24:17] RECOVERY - DPKG on mw1101 is OK: All packages OK [18:24:18] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 30 seconds [18:24:18] RECOVERY - DPKG on mw1109 is OK: All packages OK [18:24:18] RECOVERY - DPKG on mw11 is OK: All packages OK [18:25:28] !log still doing package upgrades on mw* machines [18:25:35] Logged the message, Mistress of the network gear. [18:26:18] New review: Ottomata; "(1 comment)" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [18:26:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:26:28] New patchset: Ottomata; "Initial debian packaging using git-buildpackage" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [18:27:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [18:27:47] PROBLEM - DPKG on mw24 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:27:57] PROBLEM - DPKG on mw26 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:28:18] PROBLEM - DPKG on mw28 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:28:18] PROBLEM - DPKG on mw23 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:28:18] PROBLEM - DPKG on mw29 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:28:25] New review: Ottomata; "Just in case you don't see the inline comments:" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [18:28:27] PROBLEM - DPKG on mw25 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:28:28] PROBLEM - DPKG on mw21 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:28:28] PROBLEM - DPKG on mw22 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:28:28] PROBLEM - DPKG on mw20 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:29:17] RECOVERY - DPKG on mw28 is OK: All packages OK [18:29:18] RECOVERY - DPKG on mw23 is OK: All packages OK [18:29:27] RECOVERY - DPKG on mw25 is OK: All packages OK [18:29:28] RECOVERY - DPKG on mw21 is OK: All packages OK [18:29:28] RECOVERY - DPKG on mw22 is OK: All packages OK [18:29:28] RECOVERY - DPKG on mw20 is OK: All packages OK [18:29:47] PROBLEM - Apache HTTP on mw22 is CRITICAL: Connection refused [18:29:48] RECOVERY - DPKG on mw24 is OK: All packages OK [18:29:57] PROBLEM - Apache HTTP on mw24 is CRITICAL: Connection refused [18:29:57] PROBLEM - Apache HTTP on mw28 is CRITICAL: Connection refused [18:29:57] RECOVERY - DPKG on mw26 is OK: All packages OK [18:30:07] PROBLEM - Apache HTTP on mw23 is CRITICAL: Connection refused [18:30:08] PROBLEM - Apache HTTP on mw29 is CRITICAL: Connection refused [18:30:18] PROBLEM - Apache HTTP on mw21 is CRITICAL: Connection refused [18:30:18] RECOVERY - DPKG on mw29 is OK: All packages OK [18:30:18] PROBLEM - Apache HTTP on mw26 is CRITICAL: Connection refused [18:30:18] PROBLEM - Apache HTTP on mw20 is CRITICAL: Connection refused [18:30:27] PROBLEM - Apache HTTP on mw25 is CRITICAL: Connection refused [18:30:47] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.133 second response time [18:30:57] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.132 second response time [18:30:57] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.132 second response time [18:31:07] RECOVERY - Apache HTTP on mw23 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.131 second response time [18:31:07] RECOVERY - Apache HTTP on mw29 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.117 second response time [18:31:18] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.126 second response time [18:31:18] RECOVERY - Apache HTTP on mw26 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.118 second response time [18:31:18] RECOVERY - Apache HTTP on mw20 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.111 second response time [18:31:27] RECOVERY - Apache HTTP on mw25 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.112 second response time [18:31:36] New patchset: Dzahn; "Finish puppetizing wikibugs bot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53973 [18:31:46] leslie, causer of icinga mayhem! [18:32:17] there's your impact! [18:32:30] * Isarra hugs LeslieCarr. [18:32:51] This made my day. I'm not sure why, but it really did. [18:32:51] eh, site didn't die, so that's ok :) [18:32:54] <^demon|busy> mutante: Actually, I don't think irc_infile is necessary anymore... [18:33:01] ^demon|busy: fixed path conflict, by including all options. note how it now has $ircecho_logbase, and $ircecho_logs and _infile .. https://gerrit.wikimedia.org/r/#/c/53973/4/manifests/misc/wikibugs.pp [18:33:18] PROBLEM - DPKG on mw39 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:33:18] PROBLEM - DPKG on mw34 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:33:27] PROBLEM - DPKG on mw3 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:33:28] PROBLEM - DPKG on mw36 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:33:33] the one on multiple lines looks nicer [18:33:37] yeah, don't use infile iirc [18:33:37] <^demon|busy> Yeah, infile is old from before the refactor :) [18:33:37] PROBLEM - DPKG on mw32 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:33:45] kk [18:33:47] PROBLEM - DPKG on mw35 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:33:47] PROBLEM - DPKG on mw31 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:33:57] PROBLEM - DPKG on mw38 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:34:40] <^demon|busy> Yeah, just double checked. So that whole line can just be removed. [18:34:47] RECOVERY - DPKG on mw35 is OK: All packages OK [18:34:47] RECOVERY - DPKG on mw31 is OK: All packages OK [18:34:54] cool, doing so [18:34:57] RECOVERY - DPKG on mw38 is OK: All packages OK [18:34:58] <^demon|busy> So the changes to that file are the ensure => directory, s/script/bin/ and s/File/User/ [18:34:58] New patchset: Dzahn; "Finish puppetizing wikibugs bot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53973 [18:35:28] RECOVERY - DPKG on mw36 is OK: All packages OK [18:35:37] RECOVERY - DPKG on mw32 is OK: All packages OK [18:35:37] RECOVERY - DPKG on mw39 is OK: All packages OK [18:35:47] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60292 [18:35:51] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53973 [18:36:07] RECOVERY - DPKG on mw34 is OK: All packages OK [18:37:27] RECOVERY - DPKG on mw3 is OK: All packages OK [18:37:41] RobH: sq33 has a pci express error (i'm guessing new mobo needed or some shit) - for those old 1950's are we fixing or decommissioning ? [18:38:02] !log reedy synchronized wmf-config/InitialiseSettings.php [18:38:08] Logged the message, Master [18:38:36] !log changing #mediawiki to #wikimedia-dev in wikibugs startup script (just in case, should not even be needed by puppetized ircecho) [18:38:43] Logged the message, Master [18:38:50] notpeter: What is the Echo thing you needed to talk to me about? [18:38:56] (It says so in the Etherpad) [18:39:01] Is this the cron job thing still? [18:39:06] RoanKattouw: there's some kinda maint cron [18:39:09] yeah [18:39:17] mutante: then change it to the wrong value and save but don't boot the bot and see if a puppet run corrects you [18:39:23] it is currently just running on testwiki [18:39:26] OK [18:39:28] PROBLEM - DPKG on mw46 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:39:28] PROBLEM - DPKG on mw42 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:39:31] needs to run on whatever echo is running on [18:39:33] So what exactly needs to be done there? [18:39:35] Oh OK [18:39:37] PROBLEM - DPKG on mw4 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:39:38] PROBLEM - DPKG on mw45 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:39:38] PROBLEM - DPKG on mw48 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:39:48] So the only thing that needs to happen is it needs to be selective about exactly the right wikis? [18:39:55] I emailed Benny about how to do that [18:39:55] I believe so [18:39:57] PROBLEM - DPKG on mw41 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:40:07] But I guess given what the Etherpad said he's kind of busy right now [18:40:07] PROBLEM - DPKG on mw40 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:40:11] err: /Stage[main]/Base::Standard-packages/Package[httpry]/ensure: change from purged to latest failed: [18:40:12] And I didn't know that going in [18:40:16] E: Couldn't find package httpry [18:40:17] yeah [18:40:23] at /var/lib/git/operations/puppet/manifests/base.pp:349 [18:40:31] Oh hah he did work on it [18:40:33] https://gerrit.wikimedia.org/r/#/c/60036/ [18:40:36] RoanKattouw: how is this handled for other things? [18:40:42] (Also, crap, we made it to 60k?!) [18:40:54] notpeter: Promise you won't slap me first [18:40:58] RECOVERY - DPKG on mw41 is OK: All packages OK [18:41:01] and then I'll tell you how it's done for AFT ;) [18:41:08] RECOVERY - DPKG on mw40 is OK: All packages OK [18:41:23] I can make no promises, but you're remote, so it might work out ok ;) [18:41:27] haha [18:41:28] RECOVERY - DPKG on mw46 is OK: All packages OK [18:41:28] RECOVERY - DPKG on mw42 is OK: All packages OK [18:41:30] I'm coming back soon enough [18:41:32] Anyways [18:41:37] RECOVERY - DPKG on mw45 is OK: All packages OK [18:41:38] RECOVERY - DPKG on mw48 is OK: All packages OK [18:41:53] mutante: that maybe a mark thing [18:42:26] notpeter: SSH into fenari as root, run crontab -u catrope -e, and wee [18:42:27] p [18:42:37] mutante: (httpry. idk about dns) [18:42:38] ^demon|busy: Misc::Irc::Wikibugs/Git::Clone[wikibugs]/Exec[git_clone_wikibugs]/returns: change from notrun to 0 failed: git clone https://gerrit.wikimedia.org/r/p/wikimedia/bugzilla/wikibugs.git /var/lib/wikibugs/bin returned 129 instead of one of [0] :/ [18:43:02] RoanKattouw: that seems sub-optimal ;) [18:43:06] ah, lemme create /var/lib/wikibugs [18:43:06] Yes :) [18:43:16] mutante: not by hand! [18:43:18] so yeah, let's kill both of these :) [18:43:20] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60036 [18:43:22] Notice how every wiki has *its own entry* in *my personal crontab* [18:43:31] oh, i noticed :) [18:43:37] RECOVERY - DPKG on mw4 is OK: All packages OK [18:43:37] lets' get that onto terbium? [18:43:38] So I'm now merging Benny's commit that adds an echowikis.dblist file [18:43:45] !log sq33 is going to be decomissioned… long live…. nothing else in tampa [18:43:45] ok, cool [18:43:55] jeremyb_: right.ok [18:43:58] So then the cron job can run "foreachwikiindblist /path/to/echowikis.dblist /path/to/script" [18:44:21] And yeah we should use the same strategy to move the AFT stuff to terbium [18:44:22] New patchset: Demon; "Deprecate $name param to systemuser in favor of $title" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60302 [18:44:35] New patchset: Ori.livneh; "Create self-standing IPython Notebook Puppet module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60187 [18:44:36] New patchset: Ori.livneh; "Use Upstart rather than supervisor to manage IPython" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60094 [18:44:59] RoanKattouw: if you write a patchset to do that for aft, I'll deploy on terbium [18:45:10] OK I will do that today at some point [18:45:12] After lunch [18:45:21] So, about Echo [18:45:24] Cause that's more urgent [18:45:33] Do they have a puppetized cron job right now? And if yes, where? [18:45:48] yes [18:46:19] RoanKattouw: misc::maintenance::echo_mail_batch [18:46:32] in manifests/misc/maintenance.pp [18:46:38] Found it [18:47:01] Runs on terbium, right? [18:47:58] I'm putting in a change to fix it shortly [18:48:05] First deploying echowiki.dblist onto terbium [18:48:30] !log catrope synchronized echowikis.dblist [18:48:37] Logged the message, Master [18:48:40] RoanKattouw: not sure where it runs at the moment [18:48:50] tim switched a bunch,but a couple haad to be pushed back [18:49:16] New review: Demon; "Need to test this on labs first, so nobody get trigger happy ;-)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60302 [18:49:19] OK well [18:49:26] New patchset: Catrope; "Use echowikis.dblist for the Echo cron job" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60303 [18:49:33] As long as /usr/local/apache/common exists there (which it really damn well should), it'll be fine [18:49:38] I just had to check the path real quick [18:49:41] Anyway there you go ---^^ [18:50:59] notpeter: I gotta run now but if you merge that puppet change of mine up there it should start running on all 3 Echo wikis, and the folks running it can adjust those themselves by changing the dblist file [18:51:05] I'll do the AFT stuff later [18:51:16] But lunch comes first [18:51:18] :) [18:51:36] ( notpeter ---^^ ) [18:52:25] RoanKattouw_away: ok! cool [18:59:50] New review: Lwelling; "(1 comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/60043 [19:02:15] New review: Lwelling; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/60043 [19:10:19] PROBLEM - ircecho_service_running on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:11:24] lo [19:16:37] New review: Erosen; "I've regularly use pandas features which depend on python-tz python-xlrd, so those would be nice. P..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54116 [19:19:07] New patchset: Reedy; "Enable Extension:Score everywhere" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60309 [19:19:33] I think I'll hijack the rest of my deployment window [19:20:09] New patchset: Reedy; "enwiki to 1.22wmf2" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60310 [19:20:21] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60310 [19:21:31] New patchset: AzaToth; "Pester IRC as well when a draft is published" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50044 [19:21:39] ^demon|busy: done ↑ [19:22:37] I hope I got the puppet syntaxt correct [19:22:43] <^demon|busy> +1, lgtm. [19:23:21] when I see "lgtm" I only see "lmgtfy" [19:26:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:27:11] New patchset: Reedy; "Enable Extension:Score everywhere" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60309 [19:27:17] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60309 [19:27:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [19:28:23] !log reedy synchronized wmf-config/InitialiseSettings.php 'Enable Score extension on the cluster' [19:28:30] Logged the message, Master [19:43:21] New patchset: Reedy; "(bug 40759) Let Proofread Page setup namespaces for fi.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/48877 [19:43:34] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/48877 [19:47:01] !log reedy synchronized wmf-config/InitialiseSettings.php [19:47:08] Logged the message, Master [19:47:18] New patchset: Reedy; "(bug 44164) Add 'Portal' and 'Author' namespaces to iswikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59614 [19:47:27] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59614 [19:48:05] Reedy: :* [19:48:12] New patchset: Reedy; "pngcrush everything" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59638 [19:48:24] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59638 [19:48:59] New review: Reedy; "reedy@fenari:/home/wikipedia/common$ git pull" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59638 [19:49:39] New patchset: Reedy; "(bug 47325) Rights configuration on es.wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59712 [19:49:44] !log reedy synchronized docroot [19:49:48] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59712 [19:49:51] Logged the message, Master [19:50:18] New patchset: Reedy; "(bug 47337) Flagged Revisions configuration for ru.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59732 [19:50:23] !log reedy synchronized images/ [19:50:26] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59732 [19:50:29] Logged the message, Master [19:51:10] New patchset: Reedy; "(bug 46944) Allow users to save books to userspace on enwikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59756 [19:51:19] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59756 [19:51:24] New patchset: Reedy; "(bug 46431) Update Apple touch icon for en.wiktionary." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59771 [19:51:34] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59771 [19:52:06] New patchset: Reedy; "(bug 44899) Namespace setup for Korean Wikiversity" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59786 [19:52:16] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59786 [19:52:26] New patchset: Reedy; "(bug 46846) Localise project namespaces for dv.wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59998 [19:52:34] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59998 [19:52:39] New patchset: Reedy; "(bug 45638) Remove patrol from autopatrolled on itwikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60106 [19:52:50] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60106 [19:53:16] New patchset: Reedy; "(bug 46534) Add namespace aliases for uz.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60002 [19:53:24] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60002 [19:53:36] New patchset: Reedy; "(bug 44308) Add new namespaces and aliases on zhwikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60263 [19:53:44] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60263 [19:53:48] lol [19:54:11] That's a lot of e-mails that I just got from you, Reedy. [19:54:27] !log reedy synchronized wmf-config/ [19:54:34] Logged the message, Master [19:54:56] !log Running namespaceDupes on uzwiki [19:55:03] Logged the message, Master [19:55:31] New patchset: MaxSem; "Create a role for Solr in Labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60312 [19:56:47] !log Running namespaceDupes on zhwikibooks [19:56:54] Logged the message, Master [19:57:34] New patchset: Reedy; "Remove $wgMemCachedInstanceSize" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60231 [19:58:01] New patchset: Reedy; "Remove $wgMemCachedInstanceSize" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60231 [19:58:15] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60231 [20:01:59] New patchset: MaxSem; "Create a role for Solr in Labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60312 [20:04:27] Reedy: I'm assuming that there were no problems with running NamespaceDupes.php for those wikis? [20:04:46] 1 for zhwikibooks [20:04:50] ... 17286 (0,"Wikijunior:自我保护") -> (110,"自我保护") [[Wikijunior:自我保护]] [20:04:50] ... *** cannot resolve automatically; page exists with ID 17285 *** [20:05:12] New review: Helder.wiki; "Wohooo!!! :-)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60309 [20:05:34] New review: coren; "Simple enough." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/60312 [20:05:35] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60312 [20:06:39] Coren, thanks! [20:06:58] Reedy: there's also kowikiversity [20:07:20] MaxSem: I only saw when I pushed, but try to remember to have a newline at the end of files in general. :-) [20:08:27] Reedy: just sayin' since you didn't log it above, dunno. [20:08:35] I never do [20:08:45] Oh [20:08:47] I missed one [20:10:51] :) [20:11:00] New patchset: Demon; "Strip "(N comments)" lines from IRC output" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60317 [20:11:22] RECOVERY - Host ms-be9 is UP: PING OK - Packet loss = 0%, RTA = 26.60 ms [20:11:32] Reedy: so what happens with that zhwikibooks page? I've never seen the script being run, so :) [20:11:43] Stays in limbo usually [20:11:53] !log Running namespaceDupes on kowikiversity [20:12:01] Logged the message, Master [20:13:22] PROBLEM - Disk space on ms-be9 is CRITICAL: Connection refused by host [20:13:22] PROBLEM - swift-container-updater on ms-be9 is CRITICAL: Connection refused by host [20:13:31] PROBLEM - swift-container-replicator on ms-be9 is CRITICAL: Connection refused by host [20:13:41] PROBLEM - swift-container-server on ms-be9 is CRITICAL: Connection refused by host [20:13:51] PROBLEM - swift-object-replicator on ms-be9 is CRITICAL: Connection refused by host [20:13:52] PROBLEM - RAID on ms-be9 is CRITICAL: Connection refused by host [20:13:52] PROBLEM - swift-account-auditor on ms-be9 is CRITICAL: Connection refused by host [20:13:52] PROBLEM - swift-object-auditor on ms-be9 is CRITICAL: Connection refused by host [20:13:52] PROBLEM - swift-account-reaper on ms-be9 is CRITICAL: Connection refused by host [20:14:01] PROBLEM - SSH on ms-be9 is CRITICAL: Connection refused [20:14:01] PROBLEM - swift-object-updater on ms-be9 is CRITICAL: Connection refused by host [20:14:01] PROBLEM - swift-object-server on ms-be9 is CRITICAL: Connection refused by host [20:14:02] PROBLEM - swift-account-replicator on ms-be9 is CRITICAL: Connection refused by host [20:14:11] PROBLEM - swift-account-server on ms-be9 is CRITICAL: Connection refused by host [20:14:11] PROBLEM - DPKG on ms-be9 is CRITICAL: Connection refused by host [20:14:12] PROBLEM - swift-container-auditor on ms-be9 is CRITICAL: Connection refused by host [20:15:44] oh lol. [20:16:37] it does look funny, I have to say: https://zh.wikibooks.org/wiki/Wikijunior:%E8%87%AA%E6%88%91%E4%BF%9D%E6%8A%A4 [20:16:50] zomg Score ate our ms-be9 [20:17:41] funny loop. [20:18:07] PROBLEM - Host ms-be9 is DOWN: PING CRITICAL - Packet loss = 100% [20:18:42] ms-be9 is me [20:20:23] cmjohnson1: I always thought you were a human, not a dell server. [20:20:29] but, we're all surprised sometimes [20:20:50] hahaha...don't judge a book by its cover [20:21:32] Reedy: can you merge https://gerrit.wikimedia.org/r/#/c/57683/ too, please? it's been waiting 17 days already :) [20:22:53] cmjohnson1: mind => blown [20:23:11] RECOVERY - Host ms-be9 is UP: PING OK - Packet loss = 0%, RTA = 26.66 ms [20:29:22] New patchset: Pyoungmeister; "db55 and db73 => mariadb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60319 [20:35:33] New patchset: Reedy; "(bug 46712) Set a different favicon for iswiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57683 [20:37:06] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60319 [20:37:31] PROBLEM - NTP on ms-be9 is CRITICAL: NTP CRITICAL: No response from NTP server [20:38:25] New patchset: Dzahn; "ensure /var/lib/wikibugs dir exists (home dir for user)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60324 [20:42:01] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57683 [20:42:32] New review: Demon; "Does managehome not handle this yet, or is that param still totally useless?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60324 [20:43:28] PROBLEM - mysqld processes on db73 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [20:43:41] New review: Dzahn; "i think it would have done it if the user was newly created, but it did not because the user alread..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60324 [20:44:38] New review: Demon; "Bah, you're right. That's bug http://projects.puppetlabs.com/issues/7002." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/60324 [20:45:30] <^demon|busy> mutante: That's a dumb bug :\ [20:45:48] <^demon|busy> You'd totally hope something like that would work. [20:47:28] RECOVERY - mysqld processes on db73 is OK: PROCS OK: 1 process with command name mysqld [20:47:42] ^demon|busy: so yea, either it's that additional file definition in puppet or .. we just ignore the situation when we apply new puppet classes to existing hosts and i do it manual anways.. for that reason [20:47:58] but jeremyb_ you also stopped me :) [20:48:16] New patchset: MaxSem; "Enable GeoData in Labs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60325 [20:48:33] <^demon|busy> mutante: Additional definition is fine. Then you can require on the directory, which is sometimes useful. [20:49:48] New patchset: Dzahn; "ensure /var/lib/wikibugs dir exists (home dir for user)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60324 [20:51:26] New patchset: Lcarr; "decom'ing sq33, fixing role/cache.pp puppet and ganglia.pp puppet ot match standards" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60327 [20:55:59] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60324 [20:57:30] anyone want to doulbe check https://gerrit.wikimedia.org/r/#/c/60327/ ? [20:57:38] oh nm [20:57:41] jenkins finally finished up [20:58:44] New patchset: Lcarr; "decom'ing sq33, fixing role/cache.pp puppet and ganglia.pp puppet ot match standards" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60327 [20:59:34] grr why doesn't puppet lint grab those.. [21:01:23] New patchset: Lcarr; "decom'ing sq33, fixing role/cache.pp puppet and ganglia.pp puppet ot match standards" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60327 [21:03:22] New patchset: Lcarr; "decom'ing sq33, fixing role/cache.pp puppet and ganglia.pp puppet ot match standards" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60327 [21:04:35] damn you parser validate and your not working on my machine for god knows what reason! [21:05:16] New patchset: Lcarr; "decom'ing sq33, fixing role/cache.pp puppet and ganglia.pp puppet ot match standards" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60327 [21:06:08] huzzah! now mutante can you check out ? [21:06:47] New patchset: Ori.livneh; "Create self-standing IPython Notebook Puppet module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60187 [21:06:47] New patchset: Ori.livneh; "Use Upstart rather than supervisor to manage IPython" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60094 [21:06:48] New patchset: Ori.livneh; "Update README" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60332 [21:06:48] New patchset: Ori.livneh; "Provide 'certfile' and 'password' parameters" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60333 [21:07:13] chu chu [21:08:41] PROBLEM - mysqld processes on db55 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:09:34] erm, a bit more gerrit spam coming your way [21:11:03] New patchset: Ori.livneh; "Update README" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60332 [21:12:59] mutante: :-) [21:13:12] PROBLEM - DPKG on db55 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:17:11] RECOVERY - DPKG on db55 is OK: All packages OK [21:17:52] New patchset: Dzahn; "wikibugs - also need /var/lib/wikibugs/bin summarize needed directories, set File defaults, retab 2-space softtabs, align arrows to make puppet-lint like it more" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60336 [21:20:33] New patchset: Dzahn; "wikibugs - also need /var/lib/wikibugs/bin summarize needed directories, set File defaults, retab 2-space softtabs, align arrows to make puppet-lint like it more" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60336 [21:20:41] PROBLEM - Host db55 is DOWN: PING CRITICAL - Packet loss = 100% [21:21:34] ar [21:21:38] New patchset: Dzahn; "wikibugs - also need /var/lib/wikibugs/bin summarize needed directories, set File defaults, retab 2-space softtabs, align arrows to make puppet-lint like it more" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60336 [21:22:21] RECOVERY - Host db55 is UP: PING OK - Packet loss = 0%, RTA = 26.56 ms [21:27:20] New patchset: MaxSem; "$wgGeoDataUpdatesViaJob is not going to be reenabled" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60337 [21:27:26] New patchset: Dzahn; "wikibugs - also need /var/lib/wikibugs/bin summarize needed directories, set File defaults, retab 2-space softtabs, align arrows to make puppet-lint like it more" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60336 [21:32:00] New patchset: Dzahn; "wikibugs - also need /var/lib/wikibugs/bin summarize needed directories, set File defaults, retab 2-space softtabs, align arrows to make puppet-lint like it more" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60336 [21:33:15] New review: Dzahn; "for realz, now it should git clone :p" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/60336 [21:33:16] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60336 [21:35:24] New patchset: Isarra; "Add high-res version of wikidata favicon" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60338 [21:40:51] !log upgrading even more mw servers in tampa [21:40:59] Logged the message, Mistress of the network gear. [21:43:09] PROBLEM - DPKG on mw59 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:12] LeslieCarr: there's one thing i notice: before it said to handle node 31-36 but just do the ganglia_aggregator=true for 31-35. so that is also the part of the fix, right [21:43:18] PROBLEM - DPKG on mw58 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:18] PROBLEM - DPKG on mw55 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:19] PROBLEM - DPKG on mw52 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:19] PROBLEM - DPKG on mw5 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:28] PROBLEM - DPKG on mw51 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:38] yeah, turns out 31-35 were all decom'ed [21:43:39] hehe [21:43:48] PROBLEM - DPKG on mw53 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:48] PROBLEM - DPKG on mw56 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:54] New review: Dzahn; "puppet-lint does not report any Tab errors on those that you retabbed" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/60327 [21:44:08] PROBLEM - DPKG on mw54 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:44:48] RECOVERY - DPKG on mw56 is OK: All packages OK [21:45:08] RECOVERY - DPKG on mw54 is OK: All packages OK [21:45:08] RECOVERY - DPKG on mw59 is OK: All packages OK [21:45:18] RECOVERY - DPKG on mw58 is OK: All packages OK [21:45:19] RECOVERY - DPKG on mw55 is OK: All packages OK [21:45:19] RECOVERY - DPKG on mw52 is OK: All packages OK [21:45:28] RECOVERY - DPKG on mw51 is OK: All packages OK [21:45:48] RECOVERY - DPKG on mw53 is OK: All packages OK [21:47:19] RECOVERY - DPKG on mw5 is OK: All packages OK [21:50:08] PROBLEM - DPKG on mw57 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:51:17] mutante: look good as far as you checked out or are you checking out anyting more ? [21:51:23] New review: MZMcBride; "Why?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54505 [21:51:38] PROBLEM - DPKG on mw67 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:51:48] PROBLEM - DPKG on mw64 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:52:04] LeslieCarr: it looks go when it comes to the Tabs, i still get other stuff, like quoting, but i didn't expect you want to change it all at once [21:52:08] PROBLEM - DPKG on mw65 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:52:09] RECOVERY - DPKG on mw57 is OK: All packages OK [21:52:09] PROBLEM - DPKG on mw60 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:52:13] naw [21:52:18] PROBLEM - DPKG on mw61 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:52:18] PROBLEM - DPKG on mw63 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:52:19] PROBLEM - DPKG on mw66 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:52:19] PROBLEM - Apache HTTP on mw57 is CRITICAL: Connection refused [21:52:19] PROBLEM - DPKG on mw62 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:52:20] i got a bunch of the quotes :) though not every last thing [21:52:29] New review: Brion VIBBER; "Looks good to me. Slight blur on some of the lines on the 32x32 version but it looks plenty clean en..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/60338 [21:52:38] PROBLEM - DPKG on mw69 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:52:51] New patchset: Lcarr; "decom'ing sq33, fixing role/cache.pp puppet and ganglia.pp puppet ot match standards" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60327 [21:53:01] i hate all the rebasing [21:53:08] PROBLEM - DPKG on mw68 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:53:18] RECOVERY - DPKG on mw61 is OK: All packages OK [21:53:19] RECOVERY - DPKG on mw63 is OK: All packages OK [21:53:19] RECOVERY - DPKG on mw66 is OK: All packages OK [21:53:19] RECOVERY - Apache HTTP on mw57 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.110 second response time [21:53:19] RECOVERY - DPKG on mw62 is OK: All packages OK [21:53:38] PROBLEM - Apache HTTP on mw63 is CRITICAL: Connection refused [21:53:38] RECOVERY - DPKG on mw69 is OK: All packages OK [21:53:38] PROBLEM - Apache HTTP on mw69 is CRITICAL: Connection refused [21:53:38] RECOVERY - DPKG on mw67 is OK: All packages OK [21:53:47] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60327 [21:53:48] PROBLEM - Apache HTTP on mw65 is CRITICAL: Connection refused [21:53:48] PROBLEM - Apache HTTP on mw64 is CRITICAL: Connection refused [21:53:48] RECOVERY - DPKG on mw64 is OK: All packages OK [21:53:58] PROBLEM - Apache HTTP on mw67 is CRITICAL: Connection refused [21:53:59] PROBLEM - Apache HTTP on mw60 is CRITICAL: Connection refused [21:54:08] PROBLEM - Apache HTTP on mw68 is CRITICAL: Connection refused [21:54:08] RECOVERY - DPKG on mw65 is OK: All packages OK [21:54:09] RECOVERY - DPKG on mw68 is OK: All packages OK [21:54:09] PROBLEM - Apache HTTP on mw66 is CRITICAL: Connection refused [21:54:09] PROBLEM - Apache HTTP on mw62 is CRITICAL: Connection refused [21:54:09] RECOVERY - DPKG on mw60 is OK: All packages OK [21:54:19] PROBLEM - Apache HTTP on mw61 is CRITICAL: Connection refused [21:55:47] of course, all that was me… again [21:55:58] RECOVERY - Apache HTTP on mw67 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.174 second response time [21:55:59] RECOVERY - Apache HTTP on mw60 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.359 second response time [21:56:08] RECOVERY - Apache HTTP on mw68 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.132 second response time [21:56:08] RECOVERY - Apache HTTP on mw66 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.149 second response time [21:56:08] RECOVERY - Apache HTTP on mw62 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.135 second response time [21:56:11] hashar: lol, no .. "/Exec[git_clone_wikibugs]/returns: destination directory '/var/lib/wikibugs/bin' already exists." [21:56:19] RECOVERY - Apache HTTP on mw61 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.338 second response time [21:56:28] mutante: yeah git clone does not like that :D [21:56:31] i thought it failed because it didnt :p [21:56:40] exist [21:56:40] RECOVERY - Apache HTTP on mw63 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.117 second response time [21:56:40] RECOVERY - Apache HTTP on mw69 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.139 second response time [21:56:48] RECOVERY - Apache HTTP on mw65 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.118 second response time [21:56:49] RECOVERY - Apache HTTP on mw64 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.175 second response time [21:58:54] New review: MZMcBride; "Cross-reference: New review: MZMcBride; "Motherfucking autolinker." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54505 [22:05:38] PROBLEM - DPKG on mw7 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:05:57] so many upgrades…. so little dpkg ? [22:06:08] PROBLEM - DPKG on mw70 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:06:09] PROBLEM - DPKG on mw71 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:06:18] PROBLEM - DPKG on mw74 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:06:19] PROBLEM - DPKG on mw73 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:07:17] RECOVERY - DPKG on mw74 is OK: All packages OK [22:07:17] RECOVERY - DPKG on mw73 is OK: All packages OK [22:07:39] New review: Bsitu; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60043 [22:08:06] RECOVERY - DPKG on mw70 is OK: All packages OK [22:08:06] RECOVERY - DPKG on mw71 is OK: All packages OK [22:09:06] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 238 seconds [22:09:16] RECOVERY - DPKG on mw7 is OK: All packages OK [22:12:44] New patchset: Dzahn; "wikibugs - no don't define /var/lib/wikibugs/bin, git::clone doesn't like it if that already exists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60341 [22:13:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 18 seconds [22:13:46] RECOVERY - mysqld processes on db55 is OK: PROCS OK: 1 process with command name mysqld [22:13:53] New review: Dzahn; "sudo git clone :)" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/60341 [22:13:53] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60341 [22:16:56] PROBLEM - DPKG on mw83 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:17:06] PROBLEM - DPKG on mw86 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:17:07] PROBLEM - DPKG on mw85 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:17:16] PROBLEM - DPKG on mw87 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:17:59] PROBLEM - DPKG on mw8 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [22:18:00] RECOVERY - DPKG on mw83 is OK: All packages OK [22:18:08] RECOVERY - DPKG on mw86 is OK: All packages OK [22:18:08] RECOVERY - DPKG on mw85 is OK: All packages OK [22:18:17] RECOVERY - DPKG on mw87 is OK: All packages OK [22:19:29] New patchset: Pyoungmeister; "db65 and db72 -> mariadb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60342 [22:20:57] RECOVERY - DPKG on mw8 is OK: All packages OK [22:22:43] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60342 [22:27:47] PROBLEM - mysqld processes on db72 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [22:37:02] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [22:37:02] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [22:37:42] RECOVERY - mysqld processes on db72 is OK: PROCS OK: 1 process with command name mysqld [22:41:37] Hi ! new developer with WMF in SF. I signed up for a gerrit account (username 'EBernhardson (WMF)') earlier today and uploaded an ssh key. Wondering what else I might need to do before gerrit accepts my git commands [22:41:56] New review: MZMcBride; "Apparently related to the Wikimedia privacy policy: New patchset: Asher; "moving mysql query digest collection (for ishmael) from db9 to db1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60345 [22:42:51] ebernhardson, you'll need to be added to bastion and given shell access, and probably added to the wmf LDAP group [22:42:55] Ryan_Lane, ^^ [22:43:22] oh sorry of course bastion is labs.. but you still need LDAP I think [22:43:41] actually, you just need an account [22:43:46] and upload your key [22:43:53] wmf group just gives more privileges [22:44:08] Ryan_Lane: hmm, i've done that but gerrit rejects my attempts to connect with git [22:44:11] ebernhardson: what group are you in? they should be onboarding you ;) [22:44:18] ebernhardson: that's because you should be using ssh [22:44:43] Ryan_Lane: EE, Terry just kinda threw me in here, Ryan (next office over) suggested i ask here :) [22:45:00] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60345 [22:45:22] I can help [22:45:25] Ryan_Lane: ok, perhaps the docs they gave me are bad, it suggests that to start out grab the code by cloning from gerrit on a special port [22:45:58] yeah. it's ssh over that port :) [22:47:15] Ryan_Lane: I walked over and am helping him out, though the process may expose the fact that I forgot a lot of this stuff myself too, so we may poke you again :P [22:47:36] legoktm: I don't do code review. [22:47:42] PROBLEM - mysqld processes on db65 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [22:47:55] Oh, ok. [22:47:56] ori-l: heh [22:48:15] ori-l: I'm fairly certain offloading developer onboarding to ops is a bad idea :) [22:48:53] I'm here to help if needed, though [22:48:54] New patchset: Pyoungmeister; "db44 -> mariadb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60347 [22:50:27] Ryan_Lane, are you sure? notpeter will teach them all about making private stuff public for the greater good! ;) [22:50:39] :D [22:50:52] you mean stabbing people for doing that? :) [22:51:18] I'm liking notpeter's Puppet syntax for commit messages [22:52:59] ebernhardson@wikimedia.org [22:53:03] ^^ IGNORE [22:53:26] wii ebernhardson [22:53:29] hah [22:53:49] New patchset: Pyoungmeister; "Make Echo daily cron run against the wikis defined in echowikis.dblist" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60043 [22:53:50] ebernhardson: hah. good luck on that. it's in public logs now [22:55:12] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60043 [22:56:41] !log mysql query logs / ishmael migrated from db9 to db1001 [22:56:42] RECOVERY - mysqld processes on db65 is OK: PROCS OK: 1 process with command name mysqld [22:56:48] Logged the message, Master [23:20:17] PROBLEM - mysqld processes on db44 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [23:20:18] New patchset: Dzahn; "wikibugs - fix log file path - quick fix" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60356 [23:20:47] Change abandoned: Catrope; "Taken care of inhttps://gerrit.wikimedia.org/r/#/c/60043/2" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60303 [23:21:13] New review: Dzahn; "the script just has a hardcoded open (OUT, ">>/var/wikibugs/wikibugs.log");" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/60356 [23:21:13] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60356 [23:23:07] PROBLEM - DPKG on db44 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:28:36] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [23:30:07] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 190 seconds [23:30:07] RECOVERY - DPKG on db44 is OK: All packages OK [23:32:26] PROBLEM - Host db44 is DOWN: PING CRITICAL - Packet loss = 100% [23:33:07] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 21 seconds [23:34:03] New patchset: Lwelling; "Fix cron for Echo notification email digests/summaries" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60358 [23:37:35] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60358 [23:37:38] RECOVERY - Host db44 is UP: PING OK - Packet loss = 0%, RTA = 26.67 ms [23:39:38] PROBLEM - Disk space on db44 is CRITICAL: Connection refused by host [23:39:39] PROBLEM - SSH on db44 is CRITICAL: Connection refused [23:39:48] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: Connection refused by host [23:39:49] PROBLEM - MySQL Recent Restart on db44 is CRITICAL: Connection refused by host [23:39:58] PROBLEM - Full LVS Snapshot on db44 is CRITICAL: Connection refused by host [23:39:59] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: Connection refused by host [23:39:59] PROBLEM - MySQL Slave Running on db44 is CRITICAL: Connection refused by host [23:40:09] PROBLEM - DPKG on db44 is CRITICAL: Connection refused by host [23:40:09] PROBLEM - MySQL disk space on db44 is CRITICAL: Connection refused by host [23:40:09] PROBLEM - MySQL Idle Transactions on db44 is CRITICAL: Connection refused by host [23:44:08] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 238 seconds [23:47:08] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 12 seconds [23:47:58] PROBLEM - Host db44 is DOWN: PING CRITICAL - Packet loss = 100% [23:49:59] RECOVERY - MySQL Slave Running on db44 is OK: OK replication [23:50:09] RECOVERY - Host db44 is UP: PING OK - Packet loss = 0%, RTA = 26.69 ms [23:50:09] RECOVERY - MySQL disk space on db44 is OK: DISK OK [23:50:09] RECOVERY - DPKG on db44 is OK: All packages OK [23:50:09] RECOVERY - MySQL Idle Transactions on db44 is OK: OK longest blocking idle transaction sleeps for seconds [23:50:38] RECOVERY - SSH on db44 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [23:50:39] RECOVERY - Disk space on db44 is OK: DISK OK [23:50:48] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay seconds [23:50:48] RECOVERY - MySQL Recent Restart on db44 is OK: OK seconds since restart [23:50:58] RECOVERY - Full LVS Snapshot on db44 is OK: OK no full LVM snapshot volumes [23:50:59] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay seconds [23:51:23] !log reimaging db44 [23:51:30] Logged the message, notpeter [23:52:00] New review: Reedy; "And the default 180 day was better?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54505 [23:52:06] oh, no I'm not.... [23:52:59] PROBLEM - Host db44 is DOWN: PING CRITICAL - Packet loss = 100% [23:53:02] New review: awjrichards; "This should go out in conjunction with Change-Id: I68485a0b70028d322b92b25864465346f9cdc5c7 - becaus..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59553 [23:55:48] New patchset: Dzahn; "remove wikibugs from node mchenry for now until issues with the puppetized version are fixed. deliberately not reverting all changes to the class though, we'll still want them" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60359 [23:56:55] New patchset: Pyoungmeister; "db55 new snapshot host while db44 dead" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60360 [23:57:18] RECOVERY - Host db44 is UP: PING OK - Packet loss = 0%, RTA = 26.56 ms [23:57:40] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60347 [23:57:40] New patchset: Dzahn; "remove wikibugs from node mchenry for now until issues with the puppetized version are fixed. deliberately not reverting all changes to the class though, we'll still want them" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60359 [23:59:37] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/60360