[00:03:16] (03PS1) 10Mwalker: Enable CentralNotice CrossWiki Hiding [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92817 [00:04:37] yurik_: MaxSem : what paravoid said: +1 to a post-mortem, plz [00:04:53] shoor [00:05:01] what ori-l said :) [00:05:33] whoever :) [00:05:45] yurik_: MaxSem , send to engineering@ [00:05:49] greg-g: technically its not MaxSem at all [00:06:02] greg-g, why not wikitech? [00:06:06] postmortems aren't to assign blame [00:06:33] in theory :) [00:06:55] right, just figure out what happened, he helped diagnose and such [00:07:00] AaronSchulz: shush :) [00:07:31] MaxSem: mostly that it's just deploy specific and where they've gone before... I don't have a policy against wikitech [00:08:20] (unless it includes private info, of course) [00:08:21] my point is that engineering stuff should be on wikitech and that volunteers might be interested, too [00:08:29] sure [00:08:54] but, cc engineering@ since not all wmf engineers (and thus deployers) read all of wikitech-l :) [00:14:24] We have a blame wheel. [00:14:27] Let's have a look. [00:14:47] Domas. [00:17:04] sounds legit [00:17:22] We lost WP:BLAME. [00:17:26] To WikiBlame. [00:17:33] But we got WP:BLAMEWHEEL. [00:17:43] https://en.wikipedia.org/wiki/Wikipedia:BLAMEWHEEL [00:18:58] (03PS1) 10Dr0ptp4kt: Further constrain W0 X-CS setting to mobile Wikipedia, for now. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92818 [00:19:11] greg-g, MaxSem so whole should i send it to? [00:19:57] yurik_: engineering@ and wikitech-l (I can't think of anything you might have said that isn't ok to share publicly) [00:20:44] yurik_, I'm already writing it [00:20:54] MaxSem: i thought i was :) [00:21:09] since its zero, not MF [00:21:13] okay [00:21:17] up to you really [00:21:29] (03CR) 10Faidon Liambotis: [C: 04-1] "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92818 (owner: 10Dr0ptp4kt) [00:21:34] draft discarded:P [00:21:58] heh [00:22:08] MaxSem: https://etherpad.wikimedia.org/p/zerocrash [00:22:10] you can both write it and see who gets the most upboats? [00:22:11] (03CR) 10Dr0ptp4kt: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92818 (owner: 10Dr0ptp4kt) [00:22:49] gnite [00:22:58] that's a good handle and/or sci fi novel name :) [00:23:02] g'night paravoid [00:23:29] greg-g: LD is 4pm in each timezone, right? [00:23:51] ori-l: ....... [00:23:52] (03PS2) 10Dr0ptp4kt: Further constrain W0 X-CS setting to mobile Wikipedia, for now. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92818 [00:24:22] btw, I'm tethering, and in and out of connectivity, so if I don't answer just $#*@(!!@#$ CARRIER LOST [00:24:24] (03CR) 10Dr0ptp4kt: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92818 (owner: 10Dr0ptp4kt) [00:39:32] MaxSem: is that you typing in etherpad? [00:39:40] in pink [00:39:45] yup [00:39:59] MaxSem: ok, are we done? [00:40:48] I'd like more copyediting [00:41:10] do we have an english major around? [00:42:07] it wouldn't surprise me [00:42:24] * greg-g looks, but I'm slow, tethering on 3g-ish [00:44:15] 1 [00:44:15] An error occured while loading the pad [00:44:15] Error: Permission denied to access property 'valueOf' in https://etherpad.wikimedia.org/p/zerocrash (line 514) [00:44:18] 0 [00:44:41] WFM [00:45:33] probably just tethering issues [00:46:32] !log running MONITOR for 60 seconds on redis instance on mc1002 [00:46:47] Logged the message, Master [00:50:48] yurik_, how does it look now? [00:51:04] MaxSem: yurik_ looks good to me, who cares about grammemmer [00:51:38] yurik_, fire away [00:52:04] looks good [00:52:06] sending [00:53:51] sent [00:54:10] we can sleep now [00:54:17] oh, max wanted to send it to public [01:01:38] 23:03pm UTC, heh. [01:40:02] (03PS3) 10Reedy: Remove defining of MEDIAWIKI constant, done in WebStart.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92800 [01:40:07] (03CR) 10Reedy: [C: 032] Remove defining of MEDIAWIKI constant, done in WebStart.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92800 (owner: 10Reedy) [01:40:23] (03Merged) 10jenkins-bot: Remove defining of MEDIAWIKI constant, done in WebStart.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92800 (owner: 10Reedy) [01:42:42] (03PS1) 10Reedy: Wrap extrac2.php in ob_start()/ob_flush() [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92828 [02:12:24] (03PS2) 10MZMcBride: Wrap extrac2.php in ob_start()/ob_flush() [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92828 (owner: 10Reedy) [02:13:01] (03CR) 10MZMcBride: "You may want to note in the file why you've added these lines. Though I suppose git blame covers that." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92828 (owner: 10Reedy) [02:25:43] !log LocalisationUpdate completed (1.23wmf1) at Thu Oct 31 02:25:43 UTC 2013 [02:26:01] Logged the message, Master [02:47:38] !log LocalisationUpdate completed (1.22wmf22) at Thu Oct 31 02:47:38 UTC 2013 [02:47:54] Logged the message, Master [03:03:24] PROBLEM - MySQL Replication Heartbeat on db57 is CRITICAL: CRIT replication delay 306 seconds [03:04:04] PROBLEM - MySQL Slave Delay on db57 is CRITICAL: CRIT replication delay 326 seconds [03:07:06] RECOVERY - MySQL Slave Delay on db57 is OK: OK replication delay 107 seconds [03:07:26] RECOVERY - MySQL Replication Heartbeat on db57 is OK: OK replication delay 40 seconds [03:26:30] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 31 03:26:30 UTC 2013 [03:26:46] Logged the message, Master [04:29:52] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [04:33:02] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:35:02] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [04:38:02] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:59:32] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:00:22] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [05:05:35] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:06:35] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 1 logical drive(s), 4 physical drive(s) [05:07:45] PROBLEM - DPKG on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:08:45] RECOVERY - DPKG on arsenic is OK: All packages OK [05:11:35] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:12:35] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [06:58:51] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [07:17:34] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [07:18:34] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [07:31:24] PROBLEM - Puppet freshness on tmh1 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:24] PROBLEM - Puppet freshness on mw10 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:24] PROBLEM - Puppet freshness on mw1011 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:24] PROBLEM - Puppet freshness on mw1054 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:24] PROBLEM - Puppet freshness on mw1211 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:24] PROBLEM - Puppet freshness on mw1129 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:24] PROBLEM - Puppet freshness on mw1213 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:24] PROBLEM - Puppet freshness on mw23 is CRITICAL: No successful Puppet run in the last 10 hours [07:46:25] PROBLEM - Puppet freshness on mw60 is CRITICAL: No successful Puppet run in the last 10 hours [07:47:24] PROBLEM - Puppet freshness on antimony is CRITICAL: No successful Puppet run in the last 10 hours [07:47:24] PROBLEM - Puppet freshness on mw1206 is CRITICAL: No successful Puppet run in the last 10 hours [07:47:24] PROBLEM - Puppet freshness on labstore3 is CRITICAL: No successful Puppet run in the last 10 hours [07:47:24] PROBLEM - Puppet freshness on mw42 is CRITICAL: No successful Puppet run in the last 10 hours [07:47:24] PROBLEM - Puppet freshness on srv242 is CRITICAL: No successful Puppet run in the last 10 hours [07:47:24] PROBLEM - Puppet freshness on mw1208 is CRITICAL: No successful Puppet run in the last 10 hours [07:47:24] PROBLEM - Puppet freshness on mw55 is CRITICAL: No successful Puppet run in the last 10 hours [07:47:25] PROBLEM - Puppet freshness on mw75 is CRITICAL: No successful Puppet run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on hume is CRITICAL: No successful Puppet run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on mw1049 is CRITICAL: No successful Puppet run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on mw1084 is CRITICAL: No successful Puppet run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on mw116 is CRITICAL: No successful Puppet run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on mw1180 is CRITICAL: No successful Puppet run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on mw13 is CRITICAL: No successful Puppet run in the last 10 hours [07:48:24] PROBLEM - Puppet freshness on mw74 is CRITICAL: No successful Puppet run in the last 10 hours [07:48:25] PROBLEM - Puppet freshness on snapshot1002 is CRITICAL: No successful Puppet run in the last 10 hours [07:48:25] PROBLEM - Puppet freshness on sodium is CRITICAL: No successful Puppet run in the last 10 hours [07:48:26] PROBLEM - Puppet freshness on srv299 is CRITICAL: No successful Puppet run in the last 10 hours [07:49:24] PROBLEM - Puppet freshness on mw1030 is CRITICAL: No successful Puppet run in the last 10 hours [07:49:24] PROBLEM - Puppet freshness on mw1165 is CRITICAL: No successful Puppet run in the last 10 hours [07:49:24] PROBLEM - Puppet freshness on mw1181 is CRITICAL: No successful Puppet run in the last 10 hours [07:49:24] PROBLEM - Puppet freshness on srv291 is CRITICAL: No successful Puppet run in the last 10 hours [07:50:24] PROBLEM - Puppet freshness on mw1053 is CRITICAL: No successful Puppet run in the last 10 hours [07:50:24] PROBLEM - Puppet freshness on mw39 is CRITICAL: No successful Puppet run in the last 10 hours [07:50:24] PROBLEM - Puppet freshness on mw1198 is CRITICAL: No successful Puppet run in the last 10 hours [07:50:24] PROBLEM - Puppet freshness on mw1116 is CRITICAL: No successful Puppet run in the last 10 hours [07:50:24] PROBLEM - Puppet freshness on mw1074 is CRITICAL: No successful Puppet run in the last 10 hours [07:50:24] PROBLEM - Puppet freshness on srv272 is CRITICAL: No successful Puppet run in the last 10 hours [07:51:24] PROBLEM - Puppet freshness on brewster is CRITICAL: No successful Puppet run in the last 10 hours [07:51:24] PROBLEM - Puppet freshness on mw33 is CRITICAL: No successful Puppet run in the last 10 hours [07:51:24] PROBLEM - Puppet freshness on mw64 is CRITICAL: No successful Puppet run in the last 10 hours [07:52:24] PROBLEM - Puppet freshness on mw1001 is CRITICAL: No successful Puppet run in the last 10 hours [07:52:24] PROBLEM - Puppet freshness on mw53 is CRITICAL: No successful Puppet run in the last 10 hours [07:52:24] PROBLEM - Puppet freshness on mw1185 is CRITICAL: No successful Puppet run in the last 10 hours [07:52:24] PROBLEM - Puppet freshness on mw70 is CRITICAL: No successful Puppet run in the last 10 hours [07:52:24] PROBLEM - Puppet freshness on mw1022 is CRITICAL: No successful Puppet run in the last 10 hours [07:53:24] PROBLEM - Puppet freshness on mw111 is CRITICAL: No successful Puppet run in the last 10 hours [07:53:24] PROBLEM - Puppet freshness on mw87 is CRITICAL: No successful Puppet run in the last 10 hours [07:53:24] PROBLEM - Puppet freshness on srv285 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:24] PROBLEM - Puppet freshness on mw1086 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:24] PROBLEM - Puppet freshness on mw1093 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:24] PROBLEM - Puppet freshness on mw1112 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:24] PROBLEM - Puppet freshness on mw1139 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:24] PROBLEM - Puppet freshness on mw1090 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:24] PROBLEM - Puppet freshness on mw115 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:24] PROBLEM - Puppet freshness on mw1215 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:25] PROBLEM - Puppet freshness on srv280 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:25] PROBLEM - Puppet freshness on mw1219 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:26] PROBLEM - Puppet freshness on mw1203 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:26] PROBLEM - Puppet freshness on mw1220 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:27] PROBLEM - Puppet freshness on terbium is CRITICAL: No successful Puppet run in the last 10 hours [07:55:24] PROBLEM - Puppet freshness on mw1110 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:24] PROBLEM - Puppet freshness on mw113 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:24] PROBLEM - Puppet freshness on mw1135 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:24] PROBLEM - Puppet freshness on mw1158 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:24] PROBLEM - Puppet freshness on mw92 is CRITICAL: No successful Puppet run in the last 10 hours [07:55:24] PROBLEM - Puppet freshness on srv298 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:24] PROBLEM - Puppet freshness on mw1073 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:24] PROBLEM - Puppet freshness on mw25 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:24] PROBLEM - Puppet freshness on mw1199 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:24] PROBLEM - Puppet freshness on srv293 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:24] PROBLEM - Puppet freshness on srv251 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:21] err: Failed to apply catalog: Could not find dependency Systemuser[pybal-check] for File[/var/lib/pybal-check/.ssh] at /etc/puppet/modules/applicationserver/manifests/pybal_check.pp:20 [07:57:24] PROBLEM - Puppet freshness on mw1019 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:24] PROBLEM - Puppet freshness on mw1058 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:24] PROBLEM - Puppet freshness on mw1070 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:24] PROBLEM - Puppet freshness on mw1085 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:24] PROBLEM - Puppet freshness on mw12 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:24] PROBLEM - Puppet freshness on snapshot3 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:24] PROBLEM - Puppet freshness on mw1157 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:25] PROBLEM - Puppet freshness on srv235 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:25] PROBLEM - Puppet freshness on srv274 is CRITICAL: No successful Puppet run in the last 10 hours [07:57:26] PROBLEM - Puppet freshness on srv268 is CRITICAL: No successful Puppet run in the last 10 hours [07:58:24] PROBLEM - Puppet freshness on mw1102 is CRITICAL: No successful Puppet run in the last 10 hours [07:58:24] PROBLEM - Puppet freshness on mw1184 is CRITICAL: No successful Puppet run in the last 10 hours [07:58:24] PROBLEM - Puppet freshness on mw1191 is CRITICAL: No successful Puppet run in the last 10 hours [07:58:24] PROBLEM - Puppet freshness on srv239 is CRITICAL: No successful Puppet run in the last 10 hours [07:58:24] PROBLEM - Puppet freshness on mw96 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:24] PROBLEM - Puppet freshness on mw1035 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:24] PROBLEM - Puppet freshness on mw1013 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:24] PROBLEM - Puppet freshness on mw1036 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:24] PROBLEM - Puppet freshness on mw63 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:24] PROBLEM - Puppet freshness on mw69 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:24] PROBLEM - Puppet freshness on mw1096 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:24] PROBLEM - Puppet freshness on searchidx2 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:25] PROBLEM - Puppet freshness on srv254 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:25] PROBLEM - Puppet freshness on mw71 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:26] PROBLEM - Puppet freshness on tmh1002 is CRITICAL: No successful Puppet run in the last 10 hours [07:59:43] (03CR) 10Ori.livneh: "err: Failed to apply catalog: Could not find dependency Systemuser[pybal-check] for File[/var/lib/pybal-check/.ssh] at /etc/puppet/modules" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92339 (owner: 10Andrew Bogott) [08:00:24] PROBLEM - Puppet freshness on mw1192 is CRITICAL: No successful Puppet run in the last 10 hours [08:00:24] PROBLEM - Puppet freshness on mw1216 is CRITICAL: No successful Puppet run in the last 10 hours [08:00:24] PROBLEM - Puppet freshness on mw32 is CRITICAL: No successful Puppet run in the last 10 hours [08:00:24] PROBLEM - Puppet freshness on mw86 is CRITICAL: No successful Puppet run in the last 10 hours [08:00:24] PROBLEM - Puppet freshness on mw18 is CRITICAL: No successful Puppet run in the last 10 hours [08:01:24] PROBLEM - Puppet freshness on mw1005 is CRITICAL: No successful Puppet run in the last 10 hours [08:01:24] PROBLEM - Puppet freshness on mw1067 is CRITICAL: No successful Puppet run in the last 10 hours [08:01:24] PROBLEM - Puppet freshness on mw1028 is CRITICAL: No successful Puppet run in the last 10 hours [08:01:24] PROBLEM - Puppet freshness on mw1109 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:24] PROBLEM - Puppet freshness on mw1006 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:24] PROBLEM - Puppet freshness on mw31 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:24] PROBLEM - Puppet freshness on snapshot2 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:24] PROBLEM - Puppet freshness on mw1012 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:24] PROBLEM - Puppet freshness on mw73 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:24] PROBLEM - Puppet freshness on srv248 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:24] PROBLEM - Puppet freshness on srv265 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:25] PROBLEM - Puppet freshness on srv277 is CRITICAL: No successful Puppet run in the last 10 hours [08:02:28] (03PS1) 10Ori.livneh: Systemuser -> Generic::Systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/92838 [08:02:52] yep, there's more of them too [08:03:20] there's twemproxy upstart, [08:03:24] PROBLEM - Puppet freshness on mw101 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:24] PROBLEM - Puppet freshness on mw1160 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:24] PROBLEM - Puppet freshness on mw103 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:24] PROBLEM - Puppet freshness on mw1187 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:24] PROBLEM - Puppet freshness on mw77 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:24] PROBLEM - Puppet freshness on mw54 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:24] PROBLEM - Puppet freshness on mw94 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:25] PROBLEM - Puppet freshness on snapshot1003 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:25] PROBLEM - Puppet freshness on srv295 is CRITICAL: No successful Puppet run in the last 10 hours [08:03:45] oh yeah [08:03:47] i'll update the patch [08:03:54] thank you [08:04:20] I should replace twemproxy with my packages at some point [08:04:24] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:24] PROBLEM - Puppet freshness on mw51 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:24] PROBLEM - Puppet freshness on srv287 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:24] PROBLEM - Puppet freshness on tmh2 is CRITICAL: No successful Puppet run in the last 10 hours [08:04:26] I'll remind the author to test puppet on a few hosts for changesets of this magnitude [08:04:30] I was waiting for a Debian upload but upstream wasn't very friendly [08:06:12] bummer [08:06:24] PROBLEM - Puppet freshness on mw1002 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:24] PROBLEM - Puppet freshness on mw1042 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:24] PROBLEM - Puppet freshness on mw1052 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:24] PROBLEM - Puppet freshness on mw1065 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:24] PROBLEM - Puppet freshness on mw1099 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:32] yeah thanks icinga-wm [08:08:05] (03PS2) 10Ori.livneh: Systemuser -> Generic::Systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/92838 [08:08:24] PROBLEM - Puppet freshness on mw1111 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:24] PROBLEM - Puppet freshness on mw1125 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:24] PROBLEM - Puppet freshness on mw119 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:24] PROBLEM - Puppet freshness on mw1190 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:24] PROBLEM - Puppet freshness on mw38 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:25] PROBLEM - Puppet freshness on tin is CRITICAL: No successful Puppet run in the last 10 hours [08:09:24] PROBLEM - Puppet freshness on mw1004 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:24] PROBLEM - Puppet freshness on mw1056 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:24] PROBLEM - Puppet freshness on mw1146 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:24] PROBLEM - Puppet freshness on mw30 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:24] PROBLEM - Puppet freshness on srv246 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:25] PROBLEM - Puppet freshness on srv249 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:01] * paravoid hates systemuser [08:10:07] i know, heheh [08:10:22] but i hate upstart_service more :P [08:10:24] PROBLEM - Puppet freshness on mw1057 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:24] PROBLEM - Puppet freshness on mw1081 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:24] PROBLEM - Puppet freshness on mw1087 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:24] PROBLEM - Puppet freshness on mw1159 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:24] PROBLEM - Puppet freshness on mw1171 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:25] PROBLEM - Puppet freshness on mw1188 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:25] PROBLEM - Puppet freshness on mw1210 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:26] PROBLEM - Puppet freshness on mw58 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:26] PROBLEM - Puppet freshness on mw8 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:27] PROBLEM - Puppet freshness on snapshot1 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:27] PROBLEM - Puppet freshness on srv276 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:34] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [08:11:09] apergos: does that look ok? [08:11:24] PROBLEM - Puppet freshness on mw1032 is CRITICAL: No successful Puppet run in the last 10 hours [08:11:24] PROBLEM - Puppet freshness on mw57 is CRITICAL: No successful Puppet run in the last 10 hours [08:11:24] PROBLEM - Puppet freshness on srv255 is CRITICAL: No successful Puppet run in the last 10 hours [08:11:24] PROBLEM - Puppet freshness on srv273 is CRITICAL: No successful Puppet run in the last 10 hours [08:11:58] I don't know if it's all of em but let's get it out there and test on a host, see what's left [08:12:24] PROBLEM - Puppet freshness on mw1024 is CRITICAL: No successful Puppet run in the last 10 hours [08:12:24] PROBLEM - Puppet freshness on mw1033 is CRITICAL: No successful Puppet run in the last 10 hours [08:12:24] PROBLEM - Puppet freshness on mw106 is CRITICAL: No successful Puppet run in the last 10 hours [08:12:24] PROBLEM - Puppet freshness on mw1122 is CRITICAL: No successful Puppet run in the last 10 hours [08:12:24] PROBLEM - Puppet freshness on mw1201 is CRITICAL: No successful Puppet run in the last 10 hours [08:12:25] PROBLEM - Puppet freshness on mw2 is CRITICAL: No successful Puppet run in the last 10 hours [08:12:25] PROBLEM - Puppet freshness on mw35 is CRITICAL: No successful Puppet run in the last 10 hours [08:12:26] PROBLEM - Puppet freshness on mw98 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:24] PROBLEM - Puppet freshness on mw1010 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:24] PROBLEM - Puppet freshness on mw1091 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:24] PROBLEM - Puppet freshness on mw112 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:24] PROBLEM - Puppet freshness on srv262 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:24] PROBLEM - Puppet freshness on srv267 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:25] PROBLEM - Puppet freshness on mw1142 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:25] PROBLEM - Puppet freshness on srv269 is CRITICAL: No successful Puppet run in the last 10 hours [08:14:05] (03CR) 10Ori.livneh: [C: 032] Systemuser -> Generic::Systemuser [operations/puppet] - 10https://gerrit.wikimedia.org/r/92838 (owner: 10Ori.livneh) [08:14:24] PROBLEM - Puppet freshness on mw1027 is CRITICAL: No successful Puppet run in the last 10 hours [08:14:24] PROBLEM - Puppet freshness on mw1066 is CRITICAL: No successful Puppet run in the last 10 hours [08:14:24] PROBLEM - Puppet freshness on mw1107 is CRITICAL: No successful Puppet run in the last 10 hours [08:14:24] PROBLEM - Puppet freshness on mw1143 is CRITICAL: No successful Puppet run in the last 10 hours [08:14:24] PROBLEM - Puppet freshness on mw1204 is CRITICAL: No successful Puppet run in the last 10 hours [08:14:25] PROBLEM - Puppet freshness on mw28 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:24] PROBLEM - Puppet freshness on labstore1 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:24] PROBLEM - Puppet freshness on mw1021 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:24] PROBLEM - Puppet freshness on mw1104 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:24] PROBLEM - Puppet freshness on mw1131 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:24] PROBLEM - Puppet freshness on mw1155 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:25] PROBLEM - Puppet freshness on mw1154 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:38] ./manifests/role/labsmediaserver.pp [08:15:41] hahaha [08:15:41] err: Failed to apply catalog: Could not find dependency Upstart_job[twemproxy] for Service[twemproxy] at /etc/puppet/modules/mediawiki/manifests/twemproxy.pp:12 [08:15:50] speaking of the devil [08:15:53] yeah, that's what I was pointing out. separate patch [08:16:10] anyways lambsmediaserver still missing this qualified classname [08:16:10] oh, oops [08:16:24] PROBLEM - Puppet freshness on mw1047 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:24] PROBLEM - Puppet freshness on mw1128 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:24] PROBLEM - Puppet freshness on mw1137 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:24] PROBLEM - Puppet freshness on mw1194 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:24] PROBLEM - Puppet freshness on mw61 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:24] or systemuser [08:16:25] PROBLEM - Puppet freshness on mw93 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:25] PROBLEM - Puppet freshness on srv261 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:26] PROBLEM - Puppet freshness on srv283 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:26] PROBLEM - Puppet freshness on tmh1001 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:26] *for [08:17:09] typing is hard... especially in the morning [08:17:24] PROBLEM - Puppet freshness on labstore2 is CRITICAL: No successful Puppet run in the last 10 hours [08:17:24] PROBLEM - Puppet freshness on mw1020 is CRITICAL: No successful Puppet run in the last 10 hours [08:17:24] PROBLEM - Puppet freshness on mw1075 is CRITICAL: No successful Puppet run in the last 10 hours [08:17:24] PROBLEM - Puppet freshness on mw1078 is CRITICAL: No successful Puppet run in the last 10 hours [08:17:24] PROBLEM - Puppet freshness on mw1179 is CRITICAL: No successful Puppet run in the last 10 hours [08:17:25] PROBLEM - Puppet freshness on mw45 is CRITICAL: No successful Puppet run in the last 10 hours [08:17:34] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:18:24] PROBLEM - Puppet freshness on mw1083 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:24] PROBLEM - Puppet freshness on mw1094 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:24] PROBLEM - Puppet freshness on mw1169 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:24] PROBLEM - Puppet freshness on mw1136 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:24] PROBLEM - Puppet freshness on mw95 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:25] PROBLEM - Puppet freshness on mw99 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:25] PROBLEM - Puppet freshness on srv264 is CRITICAL: No successful Puppet run in the last 10 hours [08:19:24] PROBLEM - Puppet freshness on mw1138 is CRITICAL: No successful Puppet run in the last 10 hours [08:19:24] PROBLEM - Puppet freshness on mw117 is CRITICAL: No successful Puppet run in the last 10 hours [08:19:24] PROBLEM - Puppet freshness on mw26 is CRITICAL: No successful Puppet run in the last 10 hours [08:19:24] PROBLEM - Puppet freshness on mw36 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:03] ori-l: do you still need udpprofile (RT 5882) or is it going to be obsoleted by the statsd integration? [08:20:24] PROBLEM - Puppet freshness on mw1040 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:24] PROBLEM - Puppet freshness on mw1062 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:24] PROBLEM - Puppet freshness on mw109 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:24] PROBLEM - Puppet freshness on mw107 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:24] PROBLEM - Puppet freshness on mw1132 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:25] PROBLEM - Puppet freshness on mw1218 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:25] PROBLEM - Puppet freshness on mw123 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:26] PROBLEM - Puppet freshness on mw40 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:26] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:26] (03PS1) 10Ori.livneh: Upstart_job -> Generic::Upstart_job [operations/puppet] - 10https://gerrit.wikimedia.org/r/92839 [08:20:27] PROBLEM - Puppet freshness on srv296 is CRITICAL: No successful Puppet run in the last 10 hours [08:20:34] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [08:21:01] paravoid: it's going to be obsoleted by the statsd integration, but it depends on how fast you want to kill professor [08:21:24] PROBLEM - Puppet freshness on mw1031 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:24] PROBLEM - Puppet freshness on mw1072 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:24] PROBLEM - Puppet freshness on mw1080 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:24] PROBLEM - Puppet freshness on mw1134 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:24] PROBLEM - Puppet freshness on mw1178 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:59] it might take me 2-3 weeks [08:22:13] that's fine [08:22:24] PROBLEM - Puppet freshness on mw1059 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:24] PROBLEM - Puppet freshness on mw114 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:24] PROBLEM - Puppet freshness on mw15 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:24] PROBLEM - Puppet freshness on mw1200 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:24] PROBLEM - Puppet freshness on mw52 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:25] PROBLEM - Puppet freshness on srv240 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:25] PROBLEM - Puppet freshness on srv259 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:26] PROBLEM - Puppet freshness on srv270 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:26] PROBLEM - Puppet freshness on srv282 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:28] (changeset looks good ori) [08:22:45] i should've updated the rt ticket, sorry [08:22:51] (03CR) 10Ori.livneh: [C: 032] Upstart_job -> Generic::Upstart_job [operations/puppet] - 10https://gerrit.wikimedia.org/r/92839 (owner: 10Ori.livneh) [08:22:52] thanks [08:23:24] PROBLEM - Puppet freshness on mw1 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:24] PROBLEM - Puppet freshness on mw1026 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:24] PROBLEM - Puppet freshness on mw1045 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:24] PROBLEM - Puppet freshness on mw1082 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:24] PROBLEM - Puppet freshness on mw1141 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:25] PROBLEM - Puppet freshness on mw1145 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:25] PROBLEM - Puppet freshness on mw1174 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:26] PROBLEM - Puppet freshness on mw59 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:26] PROBLEM - Puppet freshness on mw65 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:27] PROBLEM - Puppet freshness on mw91 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:27] PROBLEM - Puppet freshness on srv271 is CRITICAL: No successful Puppet run in the last 10 hours [08:23:37] no worries [08:24:24] PROBLEM - Puppet freshness on mw1009 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:24] PROBLEM - Puppet freshness on mw1120 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:24] PROBLEM - Puppet freshness on mw1060 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:24] PROBLEM - Puppet freshness on mw3 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:24] PROBLEM - Puppet freshness on mw7 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:25] PROBLEM - Puppet freshness on srv243 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:34] RECOVERY - Puppet freshness on srv271 is OK: puppet ran at Thu Oct 31 08:24:27 UTC 2013 [08:25:53] good, I see an appserver able to apply the catalog [08:26:04] RECOVERY - Puppet freshness on mw1180 is OK: puppet ran at Thu Oct 31 08:26:02 UTC 2013 [08:26:08] i tried it on srv271 and it failed it restart twemproxy [08:26:13] ugh [08:26:29] log: https://dpaste.de/ZjbU/raw/ [08:26:33] ah yep see the same thing [08:26:38] mw1180 [08:27:08] well crapla [08:27:55] is this going to kill every app server now? [08:28:16] well none of them will be able to restart it [08:28:52] oh, but it's actually running [08:28:55] yes [08:29:03] yeah, so it's one failed puppet run, then ok [08:29:22] still a bug but not quite the ticking timebomb i thought it was for a moment [08:29:39] no, when these refreshes fail they leave the old X running (lucky us) [08:29:51] * apergos reruns puppet to see if that clears up [08:30:24] PROBLEM - Puppet freshness on arsenic is CRITICAL: No successful Puppet run in the last 10 hours [08:30:34] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:30:38] I don wonder why refresh didn't work properly, that's not exciting [08:30:44] RECOVERY - Puppet freshness on tmh1 is OK: puppet ran at Thu Oct 31 08:30:43 UTC 2013 [08:30:57] anyways, second run is successful. so in two hours... :-/ [08:31:04] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:31:18] * apergos glares at arsenic [08:31:55] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [08:32:24] PROBLEM - Puppet freshness on mw1043 is CRITICAL: No successful Puppet run in the last 10 hours [08:32:24] PROBLEM - Puppet freshness on mw1167 is CRITICAL: No successful Puppet run in the last 10 hours [08:32:24] RECOVERY - Puppet freshness on mw1167 is OK: puppet ran at Thu Oct 31 08:32:23 UTC 2013 [08:32:25] RECOVERY - Puppet freshness on mw1043 is OK: puppet ran at Thu Oct 31 08:32:23 UTC 2013 [08:33:19] Upstart_job makes the upstart job "work" for things that expect an init.d script [08:33:26] and Puppet manages it as an init.d script [08:33:47] but Puppet actually has excellent Upstart integration if you just provider => 'upstart' the service [08:34:34] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [08:35:24] ah yeah, this is it: file { "/etc/init.d/${title}": ensure => "/lib/init/upstart-job"; } [08:36:38] it's actually a simpler issue though [08:36:54] these resources don't even map to Service types [08:37:04] there's just a 'start fooservice' exec [08:37:08] with no onlyif / unless check [08:37:24] RECOVERY - Puppet freshness on arsenic is OK: puppet ran at Thu Oct 31 08:37:20 UTC 2013 [08:37:34] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:39:05] adding this to the exec on line 13 of upstart_job.pp should do it: unless => "status ${title} | grep -q start/running", [08:39:31] that would be nice [08:39:39] very nice indeed [08:42:24] PROBLEM - Puppet freshness on mw1041 is CRITICAL: No successful Puppet run in the last 10 hours [08:42:24] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: No successful Puppet run in the last 10 hours [08:42:24] RECOVERY - Puppet freshness on mw1041 is OK: puppet ran at Thu Oct 31 08:42:16 UTC 2013 [08:45:44] RECOVERY - Puppet freshness on mw10 is OK: puppet ran at Thu Oct 31 08:45:41 UTC 2013 [08:45:44] RECOVERY - Puppet freshness on mw116 is OK: puppet ran at Thu Oct 31 08:45:41 UTC 2013 [08:45:54] RECOVERY - Puppet freshness on mw60 is OK: puppet ran at Thu Oct 31 08:45:47 UTC 2013 [08:45:54] RECOVERY - Puppet freshness on mw1213 is OK: puppet ran at Thu Oct 31 08:45:52 UTC 2013 [08:45:55] RECOVERY - Puppet freshness on mw1211 is OK: puppet ran at Thu Oct 31 08:45:52 UTC 2013 [08:45:55] RECOVERY - Puppet freshness on mw1054 is OK: puppet ran at Thu Oct 31 08:45:52 UTC 2013 [08:46:04] RECOVERY - Puppet freshness on mw23 is OK: puppet ran at Thu Oct 31 08:45:57 UTC 2013 [08:46:04] RECOVERY - Puppet freshness on mw1011 is OK: puppet ran at Thu Oct 31 08:46:02 UTC 2013 [08:46:04] RECOVERY - Puppet freshness on mw1129 is OK: puppet ran at Thu Oct 31 08:46:02 UTC 2013 [08:46:34] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [08:46:34] RECOVERY - Puppet freshness on labstore3 is OK: puppet ran at Thu Oct 31 08:46:32 UTC 2013 [08:46:44] RECOVERY - Puppet freshness on mw75 is OK: puppet ran at Thu Oct 31 08:46:42 UTC 2013 [08:46:44] RECOVERY - Puppet freshness on mw42 is OK: puppet ran at Thu Oct 31 08:46:42 UTC 2013 [08:46:44] RECOVERY - Puppet freshness on mw55 is OK: puppet ran at Thu Oct 31 08:46:42 UTC 2013 [08:46:54] RECOVERY - Puppet freshness on srv242 is OK: puppet ran at Thu Oct 31 08:46:52 UTC 2013 [08:46:55] RECOVERY - Puppet freshness on mw1206 is OK: puppet ran at Thu Oct 31 08:46:52 UTC 2013 [08:46:55] RECOVERY - Puppet freshness on antimony is OK: puppet ran at Thu Oct 31 08:46:52 UTC 2013 [08:47:04] RECOVERY - Puppet freshness on mw1208 is OK: puppet ran at Thu Oct 31 08:47:02 UTC 2013 [08:47:35] (03PS1) 10Ori.livneh: Make Upstart_job check service status before attempting to start it [operations/puppet] - 10https://gerrit.wikimedia.org/r/92840 [08:47:54] RECOVERY - Puppet freshness on mw74 is OK: puppet ran at Thu Oct 31 08:47:47 UTC 2013 [08:47:54] RECOVERY - Puppet freshness on srv299 is OK: puppet ran at Thu Oct 31 08:47:47 UTC 2013 [08:47:55] RECOVERY - Puppet freshness on mw13 is OK: puppet ran at Thu Oct 31 08:47:52 UTC 2013 [08:48:04] RECOVERY - Puppet freshness on searchidx1001 is OK: puppet ran at Thu Oct 31 08:47:57 UTC 2013 [08:48:04] RECOVERY - Puppet freshness on sodium is OK: puppet ran at Thu Oct 31 08:47:57 UTC 2013 [08:48:04] RECOVERY - Puppet freshness on mw1049 is OK: puppet ran at Thu Oct 31 08:47:57 UTC 2013 [08:48:04] RECOVERY - Puppet freshness on mw1084 is OK: puppet ran at Thu Oct 31 08:47:57 UTC 2013 [08:48:14] RECOVERY - Puppet freshness on snapshot1002 is OK: puppet ran at Thu Oct 31 08:48:07 UTC 2013 [08:48:14] RECOVERY - Puppet freshness on hume is OK: puppet ran at Thu Oct 31 08:48:07 UTC 2013 [08:48:44] RECOVERY - Puppet freshness on srv291 is OK: puppet ran at Thu Oct 31 08:48:42 UTC 2013 [08:48:54] RECOVERY - Puppet freshness on mw1165 is OK: puppet ran at Thu Oct 31 08:48:47 UTC 2013 [08:48:54] RECOVERY - Puppet freshness on mw1181 is OK: puppet ran at Thu Oct 31 08:48:47 UTC 2013 [08:48:54] RECOVERY - Puppet freshness on mw1030 is OK: puppet ran at Thu Oct 31 08:48:47 UTC 2013 [08:49:34] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:49:44] RECOVERY - Puppet freshness on srv272 is OK: puppet ran at Thu Oct 31 08:49:42 UTC 2013 [08:49:44] RECOVERY - Puppet freshness on mw39 is OK: puppet ran at Thu Oct 31 08:49:42 UTC 2013 [08:49:54] RECOVERY - Puppet freshness on mw1198 is OK: puppet ran at Thu Oct 31 08:49:47 UTC 2013 [08:49:54] RECOVERY - Puppet freshness on mw1116 is OK: puppet ran at Thu Oct 31 08:49:52 UTC 2013 [08:49:55] RECOVERY - Puppet freshness on mw1074 is OK: puppet ran at Thu Oct 31 08:49:52 UTC 2013 [08:50:04] RECOVERY - Puppet freshness on mw1053 is OK: puppet ran at Thu Oct 31 08:49:57 UTC 2013 [08:50:24] RECOVERY - Puppet freshness on brewster is OK: puppet ran at Thu Oct 31 08:50:22 UTC 2013 [08:50:37] so one thing I wonder is when refresh = 'restart the service' and when it is 'start only if not running' [08:50:44] RECOVERY - Puppet freshness on mw33 is OK: puppet ran at Thu Oct 31 08:50:38 UTC 2013 [08:50:44] RECOVERY - Puppet freshness on mw64 is OK: puppet ran at Thu Oct 31 08:50:43 UTC 2013 [08:51:31] well, we have to differentiate between 'service' (unix) and 'Service[]', the puppet type [08:51:42] refreshing Service types restarts them [08:51:44] RECOVERY - Puppet freshness on mw53 is OK: puppet ran at Thu Oct 31 08:51:38 UTC 2013 [08:51:44] RECOVERY - Puppet freshness on mw70 is OK: puppet ran at Thu Oct 31 08:51:38 UTC 2013 [08:51:54] RECOVERY - Puppet freshness on mw1001 is OK: puppet ran at Thu Oct 31 08:51:48 UTC 2013 [08:51:54] RECOVERY - Puppet freshness on mw1185 is OK: puppet ran at Thu Oct 31 08:51:53 UTC 2013 [08:51:57] but this isn't a Service type, it's an exec that calls initctl [08:52:04] RECOVERY - Puppet freshness on mw1022 is OK: puppet ran at Thu Oct 31 08:51:58 UTC 2013 [08:52:06] so puppet has no concept of 'restart' to apply [08:52:32] it's just execute or don't execute [08:53:04] RECOVERY - Puppet freshness on mw111 is OK: puppet ran at Thu Oct 31 08:52:58 UTC 2013 [08:53:04] RECOVERY - Puppet freshness on mw1219 is OK: puppet ran at Thu Oct 31 08:52:58 UTC 2013 [08:53:04] RECOVERY - Puppet freshness on mw87 is OK: puppet ran at Thu Oct 31 08:53:03 UTC 2013 [08:53:04] RECOVERY - Puppet freshness on srv285 is OK: puppet ran at Thu Oct 31 08:53:03 UTC 2013 [08:53:24] RECOVERY - Puppet freshness on mw1093 is OK: puppet ran at Thu Oct 31 08:53:23 UTC 2013 [08:53:34] RECOVERY - Puppet freshness on mw1139 is OK: puppet ran at Thu Oct 31 08:53:28 UTC 2013 [08:53:44] RECOVERY - Puppet freshness on srv280 is OK: puppet ran at Thu Oct 31 08:53:43 UTC 2013 [08:53:54] RECOVERY - Puppet freshness on mw115 is OK: puppet ran at Thu Oct 31 08:53:48 UTC 2013 [08:53:55] RECOVERY - Puppet freshness on mw1203 is OK: puppet ran at Thu Oct 31 08:53:53 UTC 2013 [08:53:55] RECOVERY - Puppet freshness on mw1090 is OK: puppet ran at Thu Oct 31 08:53:53 UTC 2013 [08:53:55] RECOVERY - Puppet freshness on mw1220 is OK: puppet ran at Thu Oct 31 08:53:53 UTC 2013 [08:54:04] RECOVERY - Puppet freshness on mw1215 is OK: puppet ran at Thu Oct 31 08:53:58 UTC 2013 [08:54:04] RECOVERY - Puppet freshness on mw1086 is OK: puppet ran at Thu Oct 31 08:53:58 UTC 2013 [08:54:04] RECOVERY - Puppet freshness on mw1112 is OK: puppet ran at Thu Oct 31 08:54:03 UTC 2013 [08:54:14] RECOVERY - Puppet freshness on terbium is OK: puppet ran at Thu Oct 31 08:54:08 UTC 2013 [08:54:44] RECOVERY - Puppet freshness on mw92 is OK: puppet ran at Thu Oct 31 08:54:43 UTC 2013 [08:54:44] RECOVERY - Puppet freshness on srv268 is OK: puppet ran at Thu Oct 31 08:54:43 UTC 2013 [08:54:54] RECOVERY - Puppet freshness on srv298 is OK: puppet ran at Thu Oct 31 08:54:48 UTC 2013 [08:54:54] RECOVERY - Puppet freshness on mw113 is OK: puppet ran at Thu Oct 31 08:54:53 UTC 2013 [08:55:04] RECOVERY - Puppet freshness on mw1110 is OK: puppet ran at Thu Oct 31 08:54:53 UTC 2013 [08:55:04] RECOVERY - Puppet freshness on mw1158 is OK: puppet ran at Thu Oct 31 08:54:58 UTC 2013 [08:55:14] RECOVERY - Puppet freshness on mw1135 is OK: puppet ran at Thu Oct 31 08:55:03 UTC 2013 [08:55:54] RECOVERY - Puppet freshness on srv251 is OK: puppet ran at Thu Oct 31 08:55:44 UTC 2013 [08:55:54] RECOVERY - Puppet freshness on mw25 is OK: puppet ran at Thu Oct 31 08:55:44 UTC 2013 [08:55:54] RECOVERY - Puppet freshness on srv293 is OK: puppet ran at Thu Oct 31 08:55:44 UTC 2013 [08:55:54] RECOVERY - Puppet freshness on mw1199 is OK: puppet ran at Thu Oct 31 08:55:49 UTC 2013 [08:56:04] RECOVERY - Puppet freshness on mw1073 is OK: puppet ran at Thu Oct 31 08:55:54 UTC 2013 [08:56:44] RECOVERY - Puppet freshness on snapshot3 is OK: puppet ran at Thu Oct 31 08:56:34 UTC 2013 [08:56:54] RECOVERY - Puppet freshness on srv274 is OK: puppet ran at Thu Oct 31 08:56:44 UTC 2013 [08:56:54] RECOVERY - Puppet freshness on srv235 is OK: puppet ran at Thu Oct 31 08:56:44 UTC 2013 [08:56:54] RECOVERY - Puppet freshness on mw1085 is OK: puppet ran at Thu Oct 31 08:56:49 UTC 2013 [08:57:04] RECOVERY - Puppet freshness on mw1157 is OK: puppet ran at Thu Oct 31 08:56:54 UTC 2013 [08:57:04] RECOVERY - Puppet freshness on mw12 is OK: puppet ran at Thu Oct 31 08:56:54 UTC 2013 [08:57:04] RECOVERY - Puppet freshness on mw1070 is OK: puppet ran at Thu Oct 31 08:56:54 UTC 2013 [08:57:04] RECOVERY - Puppet freshness on mw1058 is OK: puppet ran at Thu Oct 31 08:56:54 UTC 2013 [08:57:04] RECOVERY - Puppet freshness on mw1019 is OK: puppet ran at Thu Oct 31 08:56:54 UTC 2013 [08:57:05] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:57:14] RECOVERY - Puppet freshness on mw1102 is OK: puppet ran at Thu Oct 31 08:57:09 UTC 2013 [08:57:44] RECOVERY - Puppet freshness on mw96 is OK: puppet ran at Thu Oct 31 08:57:39 UTC 2013 [08:57:54] RECOVERY - Puppet freshness on mw1191 is OK: puppet ran at Thu Oct 31 08:57:49 UTC 2013 [08:58:04] RECOVERY - Puppet freshness on srv239 is OK: puppet ran at Thu Oct 31 08:57:59 UTC 2013 [08:58:14] RECOVERY - Puppet freshness on mw1184 is OK: puppet ran at Thu Oct 31 08:58:04 UTC 2013 [08:58:14] PROBLEM - twemproxy process on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:58:24] RECOVERY - Puppet freshness on tmh1002 is OK: puppet ran at Thu Oct 31 08:58:14 UTC 2013 [08:58:44] RECOVERY - Puppet freshness on srv254 is OK: puppet ran at Thu Oct 31 08:58:39 UTC 2013 [08:58:44] PROBLEM - DPKG on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:58:51] apergos: ori-l : twemproxy process on arsenic is CRITICAL :D (well ssh dead as well) [08:58:54] RECOVERY - Puppet freshness on mw71 is OK: puppet ran at Thu Oct 31 08:58:44 UTC 2013 [08:58:55] RECOVERY - Puppet freshness on mw69 is OK: puppet ran at Thu Oct 31 08:58:44 UTC 2013 [08:58:55] RECOVERY - Puppet freshness on mw63 is OK: puppet ran at Thu Oct 31 08:58:44 UTC 2013 [08:59:04] RECOVERY - Puppet freshness on mw1096 is OK: puppet ran at Thu Oct 31 08:58:55 UTC 2013 [08:59:04] RECOVERY - Puppet freshness on mw1013 is OK: puppet ran at Thu Oct 31 08:58:55 UTC 2013 [08:59:04] RECOVERY - Puppet freshness on mw1035 is OK: puppet ran at Thu Oct 31 08:58:55 UTC 2013 [08:59:04] RECOVERY - Puppet freshness on mw1036 is OK: puppet ran at Thu Oct 31 08:58:55 UTC 2013 [08:59:14] RECOVERY - twemproxy process on arsenic is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [08:59:24] RECOVERY - Puppet freshness on searchidx2 is OK: puppet ran at Thu Oct 31 08:59:20 UTC 2013 [08:59:44] RECOVERY - Puppet freshness on mw86 is OK: puppet ran at Thu Oct 31 08:59:40 UTC 2013 [08:59:48] arsenic has a problem from yesterday, there is a ticket [08:59:54] RECOVERY - Puppet freshness on mw32 is OK: puppet ran at Thu Oct 31 08:59:45 UTC 2013 [08:59:54] RECOVERY - Puppet freshness on mw1216 is OK: puppet ran at Thu Oct 31 08:59:50 UTC 2013 [08:59:54] RECOVERY - Puppet freshness on mw1192 is OK: puppet ran at Thu Oct 31 08:59:50 UTC 2013 [08:59:54] RECOVERY - Puppet freshness on mw18 is OK: puppet ran at Thu Oct 31 08:59:50 UTC 2013 [08:59:56] (that same problem) [09:00:04] RECOVERY - Puppet freshness on mw1109 is OK: puppet ran at Thu Oct 31 08:59:55 UTC 2013 [09:00:12] I will look at it again later today [09:00:54] RECOVERY - Puppet freshness on mw1005 is OK: puppet ran at Thu Oct 31 09:00:50 UTC 2013 [09:01:14] RECOVERY - Puppet freshness on mw1067 is OK: puppet ran at Thu Oct 31 09:01:05 UTC 2013 [09:01:14] RECOVERY - Puppet freshness on mw1028 is OK: puppet ran at Thu Oct 31 09:01:05 UTC 2013 [09:01:23] apergos: okk. Also do you have any idea why puppet freshness is spammed there ? [09:01:44] RECOVERY - Puppet freshness on snapshot2 is OK: puppet ran at Thu Oct 31 09:01:36 UTC 2013 [09:01:54] RECOVERY - Puppet freshness on srv277 is OK: puppet ran at Thu Oct 31 09:01:46 UTC 2013 [09:01:54] RECOVERY - Puppet freshness on mw73 is OK: puppet ran at Thu Oct 31 09:01:46 UTC 2013 [09:01:55] RECOVERY - Puppet freshness on mw1012 is OK: puppet ran at Thu Oct 31 09:01:51 UTC 2013 [09:02:04] RECOVERY - Puppet freshness on mw1006 is OK: puppet ran at Thu Oct 31 09:01:56 UTC 2013 [09:02:04] RECOVERY - Puppet freshness on srv265 is OK: puppet ran at Thu Oct 31 09:02:01 UTC 2013 [09:02:04] RECOVERY - Puppet freshness on srv248 is OK: puppet ran at Thu Oct 31 09:02:01 UTC 2013 [09:02:14] RECOVERY - Puppet freshness on mw31 is OK: puppet ran at Thu Oct 31 09:02:06 UTC 2013 [09:02:14] PROBLEM - twemproxy process on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:02:14] RECOVERY - Puppet freshness on mw77 is OK: puppet ran at Thu Oct 31 09:02:11 UTC 2013 [09:02:31] I didn't get that question, hashar [09:02:44] RECOVERY - Puppet freshness on mw101 is OK: puppet ran at Thu Oct 31 09:02:41 UTC 2013 [09:02:44] RECOVERY - Puppet freshness on mw94 is OK: puppet ran at Thu Oct 31 09:02:41 UTC 2013 [09:02:44] RECOVERY - Puppet freshness on mw103 is OK: puppet ran at Thu Oct 31 09:02:41 UTC 2013 [09:02:54] RECOVERY - Puppet freshness on mw54 is OK: puppet ran at Thu Oct 31 09:02:46 UTC 2013 [09:02:54] RECOVERY - Puppet freshness on srv295 is OK: puppet ran at Thu Oct 31 09:02:46 UTC 2013 [09:02:55] RECOVERY - Puppet freshness on mw1187 is OK: puppet ran at Thu Oct 31 09:02:51 UTC 2013 [09:03:04] RECOVERY - Puppet freshness on snapshot1003 is OK: puppet ran at Thu Oct 31 09:02:56 UTC 2013 [09:03:04] RECOVERY - Puppet freshness on mw1160 is OK: puppet ran at Thu Oct 31 09:02:56 UTC 2013 [09:03:10] apergos: I am referring to icinga-wm spamming puppet freshness recovery messages :] [09:03:30] (03PS2) 10Odder: (bug 56384) Configure $wgImportSources for dewikisource [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92797 [09:03:34] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Thu Oct 31 09:03:31 UTC 2013 [09:03:44] RECOVERY - DPKG on arsenic is OK: All packages OK [09:03:44] RECOVERY - Puppet freshness on srv287 is OK: puppet ran at Thu Oct 31 09:03:42 UTC 2013 [09:03:54] RECOVERY - Puppet freshness on tmh2 is OK: puppet ran at Thu Oct 31 09:03:47 UTC 2013 [09:03:54] RECOVERY - Puppet freshness on mw51 is OK: puppet ran at Thu Oct 31 09:03:47 UTC 2013 [09:03:54] RECOVERY - Puppet freshness on mw1164 is OK: puppet ran at Thu Oct 31 09:03:52 UTC 2013 [09:03:54] RECOVERY - Puppet freshness on mw1217 is OK: puppet ran at Thu Oct 31 09:03:52 UTC 2013 [09:03:54] RECOVERY - Puppet freshness on mw1176 is OK: puppet ran at Thu Oct 31 09:03:52 UTC 2013 [09:03:55] RECOVERY - Puppet freshness on mw1117 is OK: puppet ran at Thu Oct 31 09:03:52 UTC 2013 [09:04:04] RECOVERY - Puppet freshness on mw1100 is OK: puppet ran at Thu Oct 31 09:03:57 UTC 2013 [09:04:04] RECOVERY - Puppet freshness on mw1088 is OK: puppet ran at Thu Oct 31 09:03:57 UTC 2013 [09:04:04] RECOVERY - Puppet freshness on mw1099 is OK: puppet ran at Thu Oct 31 09:03:57 UTC 2013 [09:04:23] ah. yes, this is due to the conversion of generic to a module, there were a couple of changes overlooked [09:04:52] ori submitted patches and in about another 1.5 hours everything should be clear (successful runs), in about .5 hour we should be spam free [09:04:54] RECOVERY - Puppet freshness on mw122 is OK: puppet ran at Thu Oct 31 09:04:37 UTC 2013 [09:04:54] RECOVERY - Puppet freshness on mw82 is OK: puppet ran at Thu Oct 31 09:04:37 UTC 2013 [09:04:54] RECOVERY - Puppet freshness on mw46 is OK: puppet ran at Thu Oct 31 09:04:42 UTC 2013 [09:05:04] RECOVERY - Puppet freshness on srv238 is OK: puppet ran at Thu Oct 31 09:04:57 UTC 2013 [09:05:04] RECOVERY - Puppet freshness on mw20 is OK: puppet ran at Thu Oct 31 09:04:57 UTC 2013 [09:05:04] RECOVERY - Puppet freshness on srv297 is OK: puppet ran at Thu Oct 31 09:04:57 UTC 2013 [09:05:04] RECOVERY - Puppet freshness on mw1065 is OK: puppet ran at Thu Oct 31 09:05:02 UTC 2013 [09:05:10] generic-definitions, I mean [09:05:24] RECOVERY - Puppet freshness on mw1123 is OK: puppet ran at Thu Oct 31 09:05:17 UTC 2013 [09:05:24] RECOVERY - Puppet freshness on mw1144 is OK: puppet ran at Thu Oct 31 09:05:17 UTC 2013 [09:05:24] RECOVERY - Puppet freshness on mw1042 is OK: puppet ran at Thu Oct 31 09:05:17 UTC 2013 [09:05:24] RECOVERY - Puppet freshness on mw1052 is OK: puppet ran at Thu Oct 31 09:05:17 UTC 2013 [09:05:44] RECOVERY - Puppet freshness on srv286 is OK: puppet ran at Thu Oct 31 09:05:37 UTC 2013 [09:05:44] RECOVERY - Puppet freshness on srv258 is OK: puppet ran at Thu Oct 31 09:05:42 UTC 2013 [09:05:44] RECOVERY - Puppet freshness on mw14 is OK: puppet ran at Thu Oct 31 09:05:42 UTC 2013 [09:05:54] RECOVERY - Puppet freshness on srv284 is OK: puppet ran at Thu Oct 31 09:05:48 UTC 2013 [09:05:54] RECOVERY - Puppet freshness on mw62 is OK: puppet ran at Thu Oct 31 09:05:48 UTC 2013 [09:05:54] RECOVERY - Puppet freshness on mw1175 is OK: puppet ran at Thu Oct 31 09:05:48 UTC 2013 [09:05:54] RECOVERY - Puppet freshness on mw1002 is OK: puppet ran at Thu Oct 31 09:05:53 UTC 2013 [09:06:04] RECOVERY - Puppet freshness on mw1114 is OK: puppet ran at Thu Oct 31 09:05:58 UTC 2013 [09:06:04] RECOVERY - Puppet freshness on mw1126 is OK: puppet ran at Thu Oct 31 09:06:03 UTC 2013 [09:06:24] PROBLEM - DPKG on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:06:44] RECOVERY - Puppet freshness on srv275 is OK: puppet ran at Thu Oct 31 09:06:38 UTC 2013 [09:06:54] RECOVERY - Puppet freshness on mw1162 is OK: puppet ran at Thu Oct 31 09:06:48 UTC 2013 [09:07:04] RECOVERY - Puppet freshness on mw1044 is OK: puppet ran at Thu Oct 31 09:06:54 UTC 2013 [09:07:34] RECOVERY - Puppet freshness on tin is OK: puppet ran at Thu Oct 31 09:07:29 UTC 2013 [09:07:54] RECOVERY - Puppet freshness on mw38 is OK: puppet ran at Thu Oct 31 09:07:44 UTC 2013 [09:07:55] RECOVERY - Puppet freshness on mw1190 is OK: puppet ran at Thu Oct 31 09:07:49 UTC 2013 [09:07:55] RECOVERY - Puppet freshness on mw1125 is OK: puppet ran at Thu Oct 31 09:07:49 UTC 2013 [09:08:04] RECOVERY - Puppet freshness on mw1111 is OK: puppet ran at Thu Oct 31 09:07:54 UTC 2013 [09:08:44] RECOVERY - Puppet freshness on srv246 is OK: puppet ran at Thu Oct 31 09:08:39 UTC 2013 [09:08:54] RECOVERY - Puppet freshness on srv249 is OK: puppet ran at Thu Oct 31 09:08:49 UTC 2013 [09:09:04] RECOVERY - Puppet freshness on mw30 is OK: puppet ran at Thu Oct 31 09:08:54 UTC 2013 [09:09:14] RECOVERY - Puppet freshness on mw1146 is OK: puppet ran at Thu Oct 31 09:09:04 UTC 2013 [09:09:14] RECOVERY - Puppet freshness on mw1004 is OK: puppet ran at Thu Oct 31 09:09:09 UTC 2013 [09:09:14] RECOVERY - Puppet freshness on mw1081 is OK: puppet ran at Thu Oct 31 09:09:09 UTC 2013 [09:09:24] RECOVERY - Puppet freshness on mw1056 is OK: puppet ran at Thu Oct 31 09:09:14 UTC 2013 [09:09:24] RECOVERY - Puppet freshness on mw1057 is OK: puppet ran at Thu Oct 31 09:09:14 UTC 2013 [09:09:24] RECOVERY - DPKG on arsenic is OK: All packages OK [09:09:24] RECOVERY - Puppet freshness on mw1159 is OK: puppet ran at Thu Oct 31 09:09:19 UTC 2013 [09:09:44] RECOVERY - Puppet freshness on mw119 is OK: puppet ran at Thu Oct 31 09:09:34 UTC 2013 [09:09:44] RECOVERY - Puppet freshness on snapshot1 is OK: puppet ran at Thu Oct 31 09:09:34 UTC 2013 [09:09:44] RECOVERY - Puppet freshness on srv276 is OK: puppet ran at Thu Oct 31 09:09:39 UTC 2013 [09:09:44] RECOVERY - Puppet freshness on mw8 is OK: puppet ran at Thu Oct 31 09:09:39 UTC 2013 [09:09:54] RECOVERY - Puppet freshness on mw1188 is OK: puppet ran at Thu Oct 31 09:09:50 UTC 2013 [09:09:54] RECOVERY - Puppet freshness on mw1210 is OK: puppet ran at Thu Oct 31 09:09:50 UTC 2013 [09:09:54] RECOVERY - Puppet freshness on mw1171 is OK: puppet ran at Thu Oct 31 09:09:50 UTC 2013 [09:10:04] RECOVERY - Puppet freshness on mw58 is OK: puppet ran at Thu Oct 31 09:09:55 UTC 2013 [09:10:04] RECOVERY - Puppet freshness on mw1087 is OK: puppet ran at Thu Oct 31 09:09:55 UTC 2013 [09:10:44] RECOVERY - Puppet freshness on srv255 is OK: puppet ran at Thu Oct 31 09:10:35 UTC 2013 [09:10:44] RECOVERY - Puppet freshness on mw57 is OK: puppet ran at Thu Oct 31 09:10:40 UTC 2013 [09:10:44] RECOVERY - Puppet freshness on srv273 is OK: puppet ran at Thu Oct 31 09:10:40 UTC 2013 [09:10:54] RECOVERY - Puppet freshness on mw1032 is OK: puppet ran at Thu Oct 31 09:10:50 UTC 2013 [09:11:44] RECOVERY - Puppet freshness on mw2 is OK: puppet ran at Thu Oct 31 09:11:35 UTC 2013 [09:11:44] RECOVERY - Puppet freshness on mw106 is OK: puppet ran at Thu Oct 31 09:11:40 UTC 2013 [09:11:54] RECOVERY - Puppet freshness on mw98 is OK: puppet ran at Thu Oct 31 09:11:50 UTC 2013 [09:11:54] RECOVERY - Puppet freshness on mw35 is OK: puppet ran at Thu Oct 31 09:11:50 UTC 2013 [09:12:04] RECOVERY - Puppet freshness on mw1201 is OK: puppet ran at Thu Oct 31 09:12:00 UTC 2013 [09:12:04] RECOVERY - Puppet freshness on mw1033 is OK: puppet ran at Thu Oct 31 09:12:00 UTC 2013 [09:12:04] RECOVERY - Puppet freshness on mw1024 is OK: puppet ran at Thu Oct 31 09:12:00 UTC 2013 [09:12:14] RECOVERY - Puppet freshness on mw1122 is OK: puppet ran at Thu Oct 31 09:12:05 UTC 2013 [09:12:44] RECOVERY - Puppet freshness on mw112 is OK: puppet ran at Thu Oct 31 09:12:40 UTC 2013 [09:12:44] RECOVERY - Puppet freshness on srv269 is OK: puppet ran at Thu Oct 31 09:12:40 UTC 2013 [09:12:44] RECOVERY - Puppet freshness on srv267 is OK: puppet ran at Thu Oct 31 09:12:40 UTC 2013 [09:12:44] RECOVERY - Puppet freshness on srv262 is OK: puppet ran at Thu Oct 31 09:12:40 UTC 2013 [09:12:55] RECOVERY - Puppet freshness on mw1010 is OK: puppet ran at Thu Oct 31 09:12:50 UTC 2013 [09:13:14] RECOVERY - Puppet freshness on mw1142 is OK: puppet ran at Thu Oct 31 09:13:05 UTC 2013 [09:13:14] RECOVERY - Puppet freshness on mw1091 is OK: puppet ran at Thu Oct 31 09:13:05 UTC 2013 [09:13:54] RECOVERY - Puppet freshness on mw1204 is OK: puppet ran at Thu Oct 31 09:13:45 UTC 2013 [09:13:54] RECOVERY - Puppet freshness on mw28 is OK: puppet ran at Thu Oct 31 09:13:50 UTC 2013 [09:14:04] RECOVERY - Puppet freshness on mw1107 is OK: puppet ran at Thu Oct 31 09:14:00 UTC 2013 [09:14:04] RECOVERY - Puppet freshness on mw1143 is OK: puppet ran at Thu Oct 31 09:14:00 UTC 2013 [09:14:04] RECOVERY - Puppet freshness on mw1027 is OK: puppet ran at Thu Oct 31 09:14:00 UTC 2013 [09:14:04] RECOVERY - Puppet freshness on mw1066 is OK: puppet ran at Thu Oct 31 09:14:00 UTC 2013 [09:14:34] RECOVERY - Puppet freshness on labstore1 is OK: puppet ran at Thu Oct 31 09:14:30 UTC 2013 [09:14:54] RECOVERY - Puppet freshness on mw1131 is OK: puppet ran at Thu Oct 31 09:14:46 UTC 2013 [09:14:55] RECOVERY - Puppet freshness on mw1155 is OK: puppet ran at Thu Oct 31 09:14:51 UTC 2013 [09:14:55] RECOVERY - Puppet freshness on mw1104 is OK: puppet ran at Thu Oct 31 09:14:51 UTC 2013 [09:14:55] RECOVERY - Puppet freshness on mw1154 is OK: puppet ran at Thu Oct 31 09:14:51 UTC 2013 [09:15:04] RECOVERY - Puppet freshness on mw1021 is OK: puppet ran at Thu Oct 31 09:15:01 UTC 2013 [09:15:24] PROBLEM - DPKG on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:15:44] RECOVERY - Puppet freshness on mw93 is OK: puppet ran at Thu Oct 31 09:15:41 UTC 2013 [09:15:44] RECOVERY - Puppet freshness on srv261 is OK: puppet ran at Thu Oct 31 09:15:41 UTC 2013 [09:15:44] RECOVERY - Puppet freshness on srv283 is OK: puppet ran at Thu Oct 31 09:15:41 UTC 2013 [09:15:44] RECOVERY - Puppet freshness on mw61 is OK: puppet ran at Thu Oct 31 09:15:41 UTC 2013 [09:15:54] RECOVERY - Puppet freshness on mw1194 is OK: puppet ran at Thu Oct 31 09:15:51 UTC 2013 [09:16:14] RECOVERY - Puppet freshness on mw1047 is OK: puppet ran at Thu Oct 31 09:16:06 UTC 2013 [09:16:14] RECOVERY - Puppet freshness on mw1128 is OK: puppet ran at Thu Oct 31 09:16:06 UTC 2013 [09:16:14] RECOVERY - Puppet freshness on tmh1001 is OK: puppet ran at Thu Oct 31 09:16:11 UTC 2013 [09:16:18] spam [09:16:24] RECOVERY - Puppet freshness on mw1137 is OK: puppet ran at Thu Oct 31 09:16:16 UTC 2013 [09:16:34] RECOVERY - Puppet freshness on labstore2 is OK: puppet ran at Thu Oct 31 09:16:26 UTC 2013 [09:16:44] RECOVERY - Puppet freshness on mw45 is OK: puppet ran at Thu Oct 31 09:16:41 UTC 2013 [09:16:52] (03CR) 10PleaseStand: [C: 04-1] "I'm not convinced this change is necessary." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92828 (owner: 10Reedy) [09:16:54] RECOVERY - Puppet freshness on mw1179 is OK: puppet ran at Thu Oct 31 09:16:51 UTC 2013 [09:16:55] RECOVERY - Puppet freshness on mw1078 is OK: puppet ran at Thu Oct 31 09:16:51 UTC 2013 [09:16:55] RECOVERY - Puppet freshness on mw1020 is OK: puppet ran at Thu Oct 31 09:16:51 UTC 2013 [09:16:55] RECOVERY - Puppet freshness on mw1075 is OK: puppet ran at Thu Oct 31 09:16:51 UTC 2013 [09:17:44] RECOVERY - Puppet freshness on mw99 is OK: puppet ran at Thu Oct 31 09:17:41 UTC 2013 [09:17:44] RECOVERY - Puppet freshness on mw95 is OK: puppet ran at Thu Oct 31 09:17:41 UTC 2013 [09:17:44] RECOVERY - Puppet freshness on srv264 is OK: puppet ran at Thu Oct 31 09:17:41 UTC 2013 [09:17:54] RECOVERY - Puppet freshness on mw1083 is OK: puppet ran at Thu Oct 31 09:17:51 UTC 2013 [09:18:04] RECOVERY - Puppet freshness on mw1136 is OK: puppet ran at Thu Oct 31 09:18:02 UTC 2013 [09:18:04] RECOVERY - Puppet freshness on mw1169 is OK: puppet ran at Thu Oct 31 09:18:02 UTC 2013 [09:18:14] RECOVERY - Puppet freshness on mw1094 is OK: puppet ran at Thu Oct 31 09:18:07 UTC 2013 [09:18:44] RECOVERY - Puppet freshness on mw117 is OK: puppet ran at Thu Oct 31 09:18:37 UTC 2013 [09:18:44] RECOVERY - Puppet freshness on mw26 is OK: puppet ran at Thu Oct 31 09:18:42 UTC 2013 [09:18:44] RECOVERY - Puppet freshness on mw36 is OK: puppet ran at Thu Oct 31 09:18:42 UTC 2013 [09:18:54] RECOVERY - Puppet freshness on mw1138 is OK: puppet ran at Thu Oct 31 09:18:52 UTC 2013 [09:19:44] RECOVERY - Puppet freshness on mw109 is OK: puppet ran at Thu Oct 31 09:19:43 UTC 2013 [09:19:44] RECOVERY - Puppet freshness on mw107 is OK: puppet ran at Thu Oct 31 09:19:43 UTC 2013 [09:19:44] RECOVERY - Puppet freshness on mw123 is OK: puppet ran at Thu Oct 31 09:19:43 UTC 2013 [09:19:44] RECOVERY - Puppet freshness on mw40 is OK: puppet ran at Thu Oct 31 09:19:43 UTC 2013 [09:19:54] RECOVERY - Puppet freshness on srv296 is OK: puppet ran at Thu Oct 31 09:19:48 UTC 2013 [09:19:54] RECOVERY - Puppet freshness on snapshot4 is OK: puppet ran at Thu Oct 31 09:19:48 UTC 2013 [09:19:54] RECOVERY - Puppet freshness on mw1218 is OK: puppet ran at Thu Oct 31 09:19:53 UTC 2013 [09:20:04] RECOVERY - Puppet freshness on mw1040 is OK: puppet ran at Thu Oct 31 09:19:58 UTC 2013 [09:20:04] RECOVERY - Puppet freshness on mw1062 is OK: puppet ran at Thu Oct 31 09:20:03 UTC 2013 [09:20:14] RECOVERY - Puppet freshness on mw1132 is OK: puppet ran at Thu Oct 31 09:20:08 UTC 2013 [09:20:54] RECOVERY - Puppet freshness on mw1178 is OK: puppet ran at Thu Oct 31 09:20:48 UTC 2013 [09:20:54] RECOVERY - Puppet freshness on mw1080 is OK: puppet ran at Thu Oct 31 09:20:53 UTC 2013 [09:20:54] RECOVERY - Puppet freshness on mw1134 is OK: puppet ran at Thu Oct 31 09:20:53 UTC 2013 [09:20:55] RECOVERY - Puppet freshness on mw1031 is OK: puppet ran at Thu Oct 31 09:20:53 UTC 2013 [09:21:04] RECOVERY - Puppet freshness on mw1072 is OK: puppet ran at Thu Oct 31 09:21:03 UTC 2013 [09:21:44] RECOVERY - Puppet freshness on mw114 is OK: puppet ran at Thu Oct 31 09:21:38 UTC 2013 [09:21:44] RECOVERY - Puppet freshness on srv270 is OK: puppet ran at Thu Oct 31 09:21:43 UTC 2013 [09:21:44] RECOVERY - Puppet freshness on srv240 is OK: puppet ran at Thu Oct 31 09:21:43 UTC 2013 [09:21:44] RECOVERY - Puppet freshness on srv259 is OK: puppet ran at Thu Oct 31 09:21:43 UTC 2013 [09:21:44] RECOVERY - Puppet freshness on srv282 is OK: puppet ran at Thu Oct 31 09:21:43 UTC 2013 [09:21:54] RECOVERY - Puppet freshness on mw1141 is OK: puppet ran at Thu Oct 31 09:21:53 UTC 2013 [09:22:04] RECOVERY - Puppet freshness on mw15 is OK: puppet ran at Thu Oct 31 09:21:58 UTC 2013 [09:22:04] RECOVERY - Puppet freshness on mw65 is OK: puppet ran at Thu Oct 31 09:22:03 UTC 2013 [09:22:04] RECOVERY - Puppet freshness on mw52 is OK: puppet ran at Thu Oct 31 09:22:03 UTC 2013 [09:22:14] RECOVERY - Puppet freshness on mw1200 is OK: puppet ran at Thu Oct 31 09:22:08 UTC 2013 [09:22:14] RECOVERY - Puppet freshness on mw1145 is OK: puppet ran at Thu Oct 31 09:22:08 UTC 2013 [09:22:14] RECOVERY - Puppet freshness on mw1059 is OK: puppet ran at Thu Oct 31 09:22:13 UTC 2013 [09:22:14] RECOVERY - Puppet freshness on mw1045 is OK: puppet ran at Thu Oct 31 09:22:13 UTC 2013 [09:22:54] RECOVERY - Puppet freshness on mw91 is OK: puppet ran at Thu Oct 31 09:22:44 UTC 2013 [09:22:54] RECOVERY - Puppet freshness on mw1 is OK: puppet ran at Thu Oct 31 09:22:44 UTC 2013 [09:22:54] RECOVERY - Puppet freshness on mw59 is OK: puppet ran at Thu Oct 31 09:22:49 UTC 2013 [09:23:04] RECOVERY - Puppet freshness on mw1174 is OK: puppet ran at Thu Oct 31 09:22:54 UTC 2013 [09:23:04] RECOVERY - Puppet freshness on mw1082 is OK: puppet ran at Thu Oct 31 09:22:59 UTC 2013 [09:23:14] RECOVERY - Puppet freshness on mw1026 is OK: puppet ran at Thu Oct 31 09:23:09 UTC 2013 [09:23:44] RECOVERY - Puppet freshness on srv243 is OK: puppet ran at Thu Oct 31 09:23:39 UTC 2013 [09:23:44] RECOVERY - Puppet freshness on mw3 is OK: puppet ran at Thu Oct 31 09:23:40 UTC 2013 [09:23:54] RECOVERY - Puppet freshness on mw7 is OK: puppet ran at Thu Oct 31 09:23:45 UTC 2013 [09:23:54] RECOVERY - Puppet freshness on mw1060 is OK: puppet ran at Thu Oct 31 09:23:50 UTC 2013 [09:23:54] RECOVERY - Puppet freshness on mw1120 is OK: puppet ran at Thu Oct 31 09:23:50 UTC 2013 [09:24:04] RECOVERY - Puppet freshness on mw1009 is OK: puppet ran at Thu Oct 31 09:23:55 UTC 2013 [09:28:04] RECOVERY - twemproxy process on arsenic is OK: PROCS OK: 1 process with UID = 65534 (nobody), command name nutcracker [09:28:14] RECOVERY - DPKG on arsenic is OK: All packages OK [09:28:24] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [09:28:54] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [09:33:34] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:34:34] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [09:37:34] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:50:34] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [09:51:04] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:51:54] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [09:53:34] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:04:34] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [10:07:31] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:11:31] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [10:13:08] (03CR) 10Reedy: "Suggesting that the notice being sent is causing these followup warnings to happen?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92828 (owner: 10Reedy) [10:15:31] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:17:05] (03PS1) 10ArielGlenn: remove virt1002/3 (rt #3687 renamed), virt1009 [operations/dns] - 10https://gerrit.wikimedia.org/r/92850 [10:17:31] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [10:19:01] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:20:01] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:22:04] ah arsenic is in swap hell. nice :-/ [10:24:21] PROBLEM - DPKG on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:24:22] don't know what to shoot over there, there's several hogs [10:25:21] RECOVERY - DPKG on arsenic is OK: All packages OK [10:25:31] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:26:35] (03PS1) 10Hashar: contint: rm /var/lib/jenkins/tmpfs no more needed [operations/puppet] - 10https://gerrit.wikimedia.org/r/92853 [10:26:51] apergos: would you mind merging in https://gerrit.wikimedia.org/r/92853 it is a cleanup change for contint. [10:27:03] apergos: will do the puppet run / manual cleanup on the server myself :] [10:27:12] still looking at arsenic, I'll be with you in a few minutes though [10:27:23] take your time, not urgent [10:27:46] I 'll do it [10:28:00] \O/ [10:28:10] (03CR) 10Akosiaris: [C: 032] contint: rm /var/lib/jenkins/tmpfs no more needed [operations/puppet] - 10https://gerrit.wikimedia.org/r/92853 (owner: 10Hashar) [10:28:40] done [10:28:42] 512M of memory freed! [10:28:45] ah thanks [10:28:47] thank you [10:29:13] !log jenkins / gallium : got rid of the old /var/lib/jenkins/tmpfs 512MB mount ( {{gerrit|92853}} + manual amount ) [10:29:29] Logged the message, Master [10:33:01] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:37:01] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:38:17] (03PS6) 10Akosiaris: Modularizing puppetmaster [operations/puppet] - 10https://gerrit.wikimedia.org/r/91353 [10:39:31] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [10:42:31] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:44:31] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [10:46:01] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:47:01] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:50:01] PROBLEM - SSH on arsenic is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:51:01] RECOVERY - SSH on arsenic is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [10:52:51] PROBLEM - SSH on gallium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:53:11] PROBLEM - zuul_service_running on gallium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:53:41] PROBLEM - jenkins_service_running on gallium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:55:01] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/local/bin/zuul-server [10:55:41] RECOVERY - jenkins_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [10:58:11] PROBLEM - zuul_service_running on gallium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:58:15] oops [10:58:16] wtf [10:58:37] ah that is "just" the box being overloaded [10:58:41] PROBLEM - jenkins_service_running on gallium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:59:01] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/local/bin/zuul-server [10:59:31] RECOVERY - jenkins_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [10:59:41] RECOVERY - SSH on gallium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:13:44] !log jenkins compressing all console logs [11:13:58] Logged the message, Master [11:23:54] !log added python-gear package by hashar to apt.wikimedia.org [11:24:08] Logged the message, Master [11:24:18] \O/ [11:37:16] (03PS2) 10Mark Bergsma: Connect to the system backend ip instead of a (volatile) LVS ip [operations/puppet] - 10https://gerrit.wikimedia.org/r/92539 [11:37:23] (03CR) 10jenkins-bot: [V: 04-1] Connect to the system backend ip instead of a (volatile) LVS ip [operations/puppet] - 10https://gerrit.wikimedia.org/r/92539 (owner: 10Mark Bergsma) [11:43:18] (03CR) 10Mark Bergsma: [C: 032 V: 032] Connect to the system backend ip instead of a (volatile) LVS ip [operations/puppet] - 10https://gerrit.wikimedia.org/r/92539 (owner: 10Mark Bergsma) [11:53:02] (03PS1) 10Mark Bergsma: Treat HTTP status 400/413 specially [operations/puppet] - 10https://gerrit.wikimedia.org/r/92864 [11:54:15] (03CR) 10Mark Bergsma: [C: 032] Treat HTTP status 400/413 specially [operations/puppet] - 10https://gerrit.wikimedia.org/r/92864 (owner: 10Mark Bergsma) [12:04:49] (03PS1) 10Mark Bergsma: Allow OPTIONS requests for CORS [operations/puppet] - 10https://gerrit.wikimedia.org/r/92866 [12:06:42] (03CR) 10Mark Bergsma: [C: 032] Allow OPTIONS requests for CORS [operations/puppet] - 10https://gerrit.wikimedia.org/r/92866 (owner: 10Mark Bergsma) [12:13:04] (03PS1) 10Mark Bergsma: Missing ) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92867 [12:14:16] (03CR) 10Mark Bergsma: [C: 032] Missing ) [operations/puppet] - 10https://gerrit.wikimedia.org/r/92867 (owner: 10Mark Bergsma) [12:18:10] (03CR) 10QChris: [C: 04-1] "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92818 (owner: 10Dr0ptp4kt) [12:44:12] (03CR) 10Mark Bergsma: [C: 032] Change ulsfo upload-lb IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/92533 (owner: 10Mark Bergsma) [12:49:56] (03PS1) 10Mark Bergsma: Add ulsfo text-lb & login-lb A/AAAA records [operations/dns] - 10https://gerrit.wikimedia.org/r/92869 [12:50:41] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo text-lb & login-lb A/AAAA records [operations/dns] - 10https://gerrit.wikimedia.org/r/92869 (owner: 10Mark Bergsma) [12:52:05] (03CR) 10Mark Bergsma: [C: 032] Repartition ulsfo LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/92342 (owner: 10Mark Bergsma) [12:53:04] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [12:53:04] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [12:56:04] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:56:04] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:59:44] !log added python-pbr backported from saucy to apt.wikimedia.org [13:00:02] Logged the message, Master [13:14:30] PROBLEM - Host cp4009 is DOWN: PING CRITICAL - Packet loss = 100% [13:14:30] PROBLEM - Host cp4008 is DOWN: PING CRITICAL - Packet loss = 100% [13:16:00] RECOVERY - Host cp4008 is UP: PING OK - Packet loss = 0%, RTA = 75.17 ms [13:16:10] RECOVERY - HTTPS on cp4008 is OK: OK - Certificate will expire on 01/20/2016 12:00. [13:16:10] RECOVERY - Host cp4009 is UP: PING OK - Packet loss = 0%, RTA = 75.03 ms [13:16:30] RECOVERY - HTTPS on cp4009 is OK: OK - Certificate will expire on 01/20/2016 12:00. [13:16:33] i'm rebooting those [13:18:10] PROBLEM - Host cp4010 is DOWN: PING CRITICAL - Packet loss = 100% [13:18:10] PROBLEM - Varnish HTTP text-backend on cp4009 is CRITICAL: Connection refused [13:19:20] RECOVERY - Host cp4010 is UP: PING OK - Packet loss = 0%, RTA = 75.01 ms [13:20:00] RECOVERY - HTTPS on cp4010 is OK: OK - Certificate will expire on 01/20/2016 12:00. [13:20:50] PROBLEM - Host cp4016 is DOWN: PING CRITICAL - Packet loss = 100% [13:21:50] RECOVERY - HTTPS on cp4016 is OK: OK - Certificate will expire on 01/20/2016 12:00. [13:22:00] RECOVERY - Host cp4016 is UP: PING OK - Packet loss = 0%, RTA = 74.99 ms [13:29:20] PROBLEM - Host cp4017 is DOWN: PING CRITICAL - Packet loss = 100% [13:30:00] RECOVERY - HTTPS on cp4017 is OK: OK - Certificate will expire on 01/20/2016 12:00. [13:30:10] RECOVERY - Host cp4017 is UP: PING OK - Packet loss = 0%, RTA = 75.01 ms [13:30:20] PROBLEM - Host cp4018 is DOWN: PING CRITICAL - Packet loss = 100% [13:30:40] PROBLEM - NTP on cp4009 is CRITICAL: NTP CRITICAL: Offset unknown [13:31:20] RECOVERY - Host cp4018 is UP: PING OK - Packet loss = 0%, RTA = 75.06 ms [13:31:30] RECOVERY - HTTPS on cp4018 is OK: OK - Certificate will expire on 01/20/2016 12:00. [13:31:45] (03PS1) 10Cmjohnson: Removing decom'd server bayes from netboot.cfg, and download/exports file [operations/puppet] - 10https://gerrit.wikimedia.org/r/92870 [13:33:08] (03PS1) 10Cmjohnson: Removing dns entries for bayes [operations/dns] - 10https://gerrit.wikimedia.org/r/92871 [13:35:40] RECOVERY - NTP on cp4009 is OK: NTP OK: Offset 0.001048564911 secs [13:35:51] (03CR) 10Cmjohnson: [C: 032] Removing decom'd server bayes from netboot.cfg, and download/exports file [operations/puppet] - 10https://gerrit.wikimedia.org/r/92870 (owner: 10Cmjohnson) [13:38:39] (03CR) 10Cmjohnson: [C: 032] Removing dns entries for bayes [operations/dns] - 10https://gerrit.wikimedia.org/r/92871 (owner: 10Cmjohnson) [13:39:08] !log dns update [13:39:24] Logged the message, Master [13:44:10] PROBLEM - NTP on cp4017 is CRITICAL: NTP CRITICAL: Offset unknown [13:45:00] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [13:45:10] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [13:48:10] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:48:10] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:49:10] RECOVERY - NTP on cp4017 is OK: NTP OK: Offset -0.0004098415375 secs [13:49:57] (03PS1) 10Mark Bergsma: Add ulsfo text caches [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92872 [13:50:51] (03CR) 10Mark Bergsma: [C: 032] Add ulsfo text caches [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92872 (owner: 10Mark Bergsma) [13:51:51] !log mark synchronized wmf-config/squid.php 'Update cache list with ulsfo text caches' [13:52:07] Logged the message, Master [13:52:40] (03CR) 10Andrew Bogott: "Haven't read closely yet, but a few general thoughts:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91353 (owner: 10Akosiaris) [13:55:10] RECOVERY - Varnish HTTP text-backend on cp4009 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.150 second response time [14:03:23] is there anyone arround who loves Ganglia and can help me work through an issue I'm having with python script we've got collecting data from Elasticsearch? [14:10:07] manybubbles: hello :) [14:10:11] world [14:10:30] hashar: how is your afternoon going? [14:10:41] feeling useless [14:10:48] that isn't good [14:11:05] but got two new python packages included on apt.wm.o thx to akosiaris :-D [14:11:37] manybubbles: what is wrong with your script ? do you get a labs instance to test it out ? [14:12:18] hashar: it looks like it works on and off. mostly off in production [14:12:24] http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20eqiad&h=testsearch1001.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [14:12:32] everything is flat almost forever [14:13:41] hashar: it stops being flat when I run gmond in the foreground [14:13:45] then I get real updates [14:13:56] but I'm not sure that is 100% always happening either [14:15:22] * jeremyb echoes the labs question [14:15:26] manybubbles: sounds to me the plugin is working properly [14:15:31] but it is not pooling fast enough [14:15:44] hashar: that'll leave things flat? [14:15:53] looking at the graph [14:16:14] There was an error collecting ganglia data (127.0.0.1:8654): XML error: Invalid document end at 1 [14:16:15] bah [14:16:30] eh? [14:17:02] * hashar applies for a real estate job [14:17:13] see house, show it, sell, rinse. Much simpler [14:17:23] yeah ... right [14:17:25] looks like ganglia is dead hehe [14:17:28] you forgot repeat? [14:17:51] jeremyb: yeah rinse & repeat obviously [14:17:58] is there any repeat in France? Cause here we got 0 repeat rate [14:18:16] now I am confused [14:18:29] do you mean whether we have the equivalent of "to repeat" in french ? [14:19:19] no I mean if real estate brokers in France can "repeat" the process of selling a house [14:19:40] in Greece they almost certainly cant. No buyers in the market [14:19:55] (03PS1) 10Mark Bergsma: Add monitoring for the ulsfo text-lb LVS service [operations/puppet] - 10https://gerrit.wikimedia.org/r/92874 [14:20:44] ohh [14:20:56] the french market is a bit stalled right now [14:21:15] if I wanted a house, there is are thousands of them waiting around me [14:21:41] but at 400k € entry price, I am not sure who can buy them beside folks already having sold their previous house [14:22:00] I am pretty sure the prices will eventually drop next spring when people realize the price asked is simply too hard [14:22:16] also most real estate also are in renting, and that is a sustained flow of money coming in [14:22:29] this is not a stalled market. That is an overpriced market. [14:22:29] you usually pay 1 month of renting price to the real estate agent to cover up the paper work etc [14:22:31] no houses below 400k? that's fucked up ;) [14:22:43] houses start around 100k here [14:22:56] Sounds like Spain and their real estate market. [14:22:56] ganglia back up [14:23:05] there are house at 250-300k mark. They are just not feating my needs [14:23:13] that's different [14:23:19] like, I need a swimming pool. [14:23:26] sounds like your spoiled ;) [14:23:30] you're [14:23:38] manybubbles: ganglia back up :-] [14:23:38] ahahaha [14:23:55] (03CR) 10Mark Bergsma: [C: 032] Add monitoring for the ulsfo text-lb LVS service [operations/puppet] - 10https://gerrit.wikimedia.org/r/92874 (owner: 10Mark Bergsma) [14:24:17] akosiaris: most of western Europe has/had a real estate bubble beside Germany [14:24:35] in germany, their system is really specific, houses belonging mostly to mutual funds or something like that [14:24:58] hashar: cool. good luck with the pool, btw [14:25:01] plus they have a low demographic, and a house in eastern germany is like 1k € , that introduce a bias in the germany average house price [14:25:04] hashar: feeding* [14:25:17] manybubbles: my landlord is washing it for me :-] [14:25:30] hashar: I can see how you wouldn't want to move [14:25:37] though, isn't it cold now? [14:25:43] hashar: or meeting* really [14:26:31] so to close the topic of real estate, it is a matter of math. My flat is 750€/month, buying it is 250k€ + 1500€ tax per year minimal. It is not worth it [14:27:03] manybubbles: I am wondering how often ganglia polls on your host. [14:27:32] manybubbles: and possibly, the ganglia RRD is configured to repeat the last known value if it does not receive anything. [14:27:57] hashar: let me keep digging [14:27:59] wat ? [14:28:06] it should not do that [14:28:20] RRD have n/a for a reason [14:28:36] that is mrtg talk :P [14:28:46] manybubbles: the es_merges_current_size metrics seems to vary nicely though [14:29:15] manybubbles: the heap size, I guess it stays at some maximal once it has been filled [14:29:24] hashar: not this one, not [14:29:29] *no*, I mean [14:29:44] the heap size is the current heap utilization it should constantly go up and down [14:29:50] almost too much to be useful, actually [14:30:06] akosiaris: maybe that is the graph rendering so. I am pretty sure you can instruct the graph to reuse the last known values when a data point is N/A. [14:33:00] hashar: well if you don't use the standard graphing libraries of rrdtool you could, it just doesn't make much sense. [14:33:06] (03PS1) 10Mark Bergsma: Send OC text traffic to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/92877 [14:34:25] mark: bblack: there's no current way to send only part of a country to a DC? I'm thinking at the moment about e.g. part of North America to ulsfo [14:34:35] (03PS2) 10Mark Bergsma: Send OC text traffic to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/92877 [14:34:43] there is [14:37:59] i just tried an IP that maxmind used to (~5 years ago) say was in the middle of the country (kansas? STL?) and now it's correct (NYC) at https://www.maxmind.com/en/geoip_demo [14:40:25] manybubbles: and I guess the es_http_current_open should surely vary as well :/ [14:42:17] hashar: yeah [14:43:26] hashar: I just restarted gmond and that caused everything to update [14:49:51] (03CR) 10Mark Bergsma: [C: 032] Send OC text traffic to ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/92877 (owner: 10Mark Bergsma) [14:52:16] !log Sending text (all wikis) traffic from OC to ulsfo [14:52:34] Logged the message, Master [14:57:12] w00t! [14:59:56] manybubbles: ohhhhh [15:00:10] manybubbles: have a look at syslog messages maybe ? [15:04:11] mark: would restarting varnish clear out all cached content? (in beta, i could use a clear out of the bits cache) [15:04:24] on bits yes [15:04:26] most others not [15:04:34] the backends use persistent storage [15:04:39] frontends and bits don't [15:05:07] thx [15:09:19] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:10:09] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 1 logical drive(s), 4 physical drive(s) [15:28:55] (03PS1) 10Odder: (bug 56398) Update logo for wikimania2014wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92882 [15:52:17] <^d> apergos: That's one way to ping me :p [15:52:28] well you weren't on irc earlier [15:52:38] so I couldn't really do much about it except stare in horror :-D [15:53:24] so the story is, this morning arsenic started whining, it was obvious from the graphs that it was swapdeath, [15:53:44] <^d> Yeah I left a bunch of scripts running overnight. [15:53:53] I tried sigstopping some processes (some had already oom but the remaining ones just gobble up all the memory over time) [15:54:09] shot one or two (I have a record if you care) in hopes that would be enough [15:54:18] but it wasn't, there was another oom later of the biggest hog [15:54:19] <^d> I've got records of everything, no worries. [15:54:24] and now you are up to date. [15:54:24] <^d> I can kick things off from where the left off. [15:54:30] ok. [15:54:38] <^d> *they, even [15:54:43] I would really only run 2 and see how those to [15:54:44] do [15:54:53] <^d> Well they ran fine when I had ~8 running. [15:55:09] well you want to check in an hour and see memory consumption [15:55:13] <^d> And then we enabled HT, so I was able to run around 15-16 and be on the edge of swapping yesterday before I went to bed. [15:55:14] <^d> :) [15:55:16] because it increases and at some point [15:55:26] http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Miscellaneous+eqiad&h=arsenic.eqiad.wmnet&tab=m&vn=&mc=2&z=small&metric_group=ALLGROUPS [15:55:27] that. [15:55:40] <^d> Yeah I know. I've been watching the same graph. [15:56:23] <^d> Are you attached to my screen with pid 604? [15:56:27] no [15:56:32] I am not attached to any screens [15:56:38] <^d> Hmm, silly arsenic :) [15:56:41] :-D [15:57:30] so I would recommend then that you either cont those processes that are left or shoot em [15:57:46] no point in leaving them around, they hold memory (or swap) [15:58:03] <^d> Yeah I'm going to in a second. [15:58:05] k [15:58:11] good luck [16:11:11] (03CR) 10Akosiaris: "Before anything else, I would like to point out that this modularization was mostly done to help me move forward faster with the multiple " [operations/puppet] - 10https://gerrit.wikimedia.org/r/91353 (owner: 10Akosiaris) [16:19:34] (03CR) 10Andrew Bogott: "Labs instances do not import puppet::self::master directly, rather they import puppetmaster::self. I'm going to see if I can safely modif" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91353 (owner: 10Akosiaris) [16:23:28] (03PS1) 10Mark Bergsma: Remove the payments LVS service in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/92890 [16:29:07] !log reedy synchronized php-1.23wmf2 'Staging php-1.23wmf2' [16:29:25] Logged the message, Master [16:30:18] !log reedy synchronized docroot and w [16:30:35] Logged the message, Master [16:34:29] (03PS1) 10Mark Bergsma: Add donate-lb.ulsfo.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/92894 [16:35:24] (03CR) 10Mark Bergsma: [C: 032] Add donate-lb.ulsfo.wikimedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/92894 (owner: 10Mark Bergsma) [16:36:33] (03CR) 10Jgreen: [C: 031] Remove the payments LVS service in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/92890 (owner: 10Mark Bergsma) [16:36:56] (03CR) 10Mark Bergsma: [C: 032] Remove the payments LVS service in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/92890 (owner: 10Mark Bergsma) [16:37:36] (03CR) 10Akosiaris: "Great. Migrating the VMs to using role::puppet::self means we can finally drop the puppetmaster::self class and then maybe move puppet::se" [operations/puppet] - 10https://gerrit.wikimedia.org/r/91353 (owner: 10Akosiaris) [16:40:31] (03PS2) 10BBlack: Fixed incorrect check against empty string in Varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/91813 (owner: 10Yurik) [16:41:44] (03CR) 10BBlack: [C: 032] Fixed incorrect check against empty string in Varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/91813 (owner: 10Yurik) [16:45:41] (03PS1) 10Mark Bergsma: Remove ulsfo per-project text LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/92896 [16:46:36] (03PS2) 10Mark Bergsma: Remove ulsfo per-project text LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/92896 [16:47:29] (03PS2) 10BBlack: Optimized the number of req.http.host checks performed [operations/puppet] - 10https://gerrit.wikimedia.org/r/91569 (owner: 10Yurik) [16:47:50] (03PS1) 10Reedy: Add version specific extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92897 [16:48:00] (03PS7) 10MarkTraceur: Add three Multimedia extensions to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 [16:48:16] (03CR) 10Mark Bergsma: [C: 032] Remove ulsfo per-project text LVS service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/92896 (owner: 10Mark Bergsma) [16:49:14] (03CR) 10Reedy: [C: 032] Add version specific extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92897 (owner: 10Reedy) [16:49:24] (03CR) 10BBlack: [C: 032] Optimized the number of req.http.host checks performed [operations/puppet] - 10https://gerrit.wikimedia.org/r/91569 (owner: 10Yurik) [16:49:26] (03Merged) 10jenkins-bot: Add version specific extension-list [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92897 (owner: 10Reedy) [16:50:00] bblack: how do you decide what to optimize? do you profile varnish? [16:50:45] (03PS1) 10Reedy: Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92898 [16:50:50] jeremyb: I'm just reviewing, it's not my optimization, it's Yuri's :) [16:51:07] (03CR) 10Reedy: [C: 032] Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92898 (owner: 10Reedy) [16:51:13] but it seems basically sane to do simple equality checks which might be high-volume first, and let them exclude more-complicated regex substitutions [16:51:16] (03Merged) 10jenkins-bot: Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92898 (owner: 10Reedy) [16:52:02] bblack: sure. i was just wondering if there was an easy way to find stuff that would make a big difference [16:52:09] (03PS8) 10Reedy: Add three Multimedia extensions to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 (owner: 10MarkTraceur) [16:52:24] (03CR) 10Reedy: [C: 032] Add three Multimedia extensions to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 (owner: 10MarkTraceur) [16:52:27] <3 [16:52:27] !log reedy synchronized wmf-config/ [16:52:36] (03Merged) 10jenkins-bot: Add three Multimedia extensions to config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92793 (owner: 10MarkTraceur) [16:52:38] jeremyb: without access to a replayable set of representative production traffic, it's all guesswork :) [16:52:41] Logged the message, Master [16:58:56] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [17:05:50] (03CR) 10Manybubbles: [C: 031] "Where do these numbers come from? I'm all for raising the numbers but do we have a way of measuring where we are now, how long people are" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92808 (owner: 10Chad) [17:19:15] (03PS1) 10Mark Bergsma: donate.* needs MX records as well [operations/dns] - 10https://gerrit.wikimedia.org/r/92910 [17:19:46] (03CR) 10Mark Bergsma: [C: 032] donate.* needs MX records as well [operations/dns] - 10https://gerrit.wikimedia.org/r/92910 (owner: 10Mark Bergsma) [17:22:25] Reedy: we're getting some more fatals now: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=MediaWiki+errors&vl=errors+%2F+sec&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=fatal|exception>ype=stack&glegend=show&aggregate=1&embed=1 [17:22:38] http://ur1.ca/fyspx [17:23:28] we know [17:23:33] see #wikimedia-tech [17:25:47] ah, sorry [17:26:41] trying to figure out the cause [17:26:47] or how to fix [17:27:02] and why it's appearing just now [17:30:09] (03CR) 10Chad: [C: 04-1] "Basically I made them up based on what we already do. It's far too high for the number of servers we have in the ES cluster." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92808 (owner: 10Chad) [17:30:17] (03PS1) 10Dzahn: add stat1 and bast4001 to misc servers groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/92912 [17:45:35] (03PS3) 10Dr0ptp4kt: Further constrain W0 X-CS setting to mobile Wikipedia, for now. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92818 [17:47:05] (03CR) 10Dr0ptp4kt: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/92818 (owner: 10Dr0ptp4kt) [17:54:13] I'm pretty sure this isn't just wikidata [17:54:22] Cirrus does seem to be brining it up too [17:54:34] test.wikidata [17:54:56] a search on cawiki did it [17:55:13] so did a search on enwikisource [17:55:19] so it seems something related to cirrus somehow [17:59:19] !log demon synchronized wmf-config/ 'Cluster to known good state' [17:59:32] Logged the message, Master [18:00:52] I can still cause the problem, at least [18:01:03] by typing in the search box on a cirrus wiki [18:01:12] which ones? [18:01:17] which have it default? [18:01:20] mediawiki.org? [18:01:53] !log demon synchronized w/ 'Cluster to known good state' [18:02:08] Logged the message, Master [18:02:16] aude: I'm using cawiki [18:02:24] but mediawiki.org should do [18:02:37] (03CR) 10Dzahn: [C: 032] add stat1 and bast4001 to misc servers groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/92912 (owner: 10Dzahn) [18:03:16] ok [18:05:52] greg-g: are you sure you want to do LDs on a slots schedule? I think it would have been very safe if we simply allowed "ok, one is done, mandatory 5 min fatalmonitoring time" [18:06:17] the only reason i even noticed yesterdays issue is because i kept fatalmonitor open after my deploy [18:06:48] and about 5-10 min later i noticed that fatal errors was climbing [18:07:19] (could have looked earlier but haven't trained my eyes on checking "error" vs "warning" thing [18:07:26] (03PS1) 10Cmjohnson: Removing mgmt dns entries for decom'd servers in sdtpa alsted, amaranth, gilman, durant [operations/dns] - 10https://gerrit.wikimedia.org/r/92917 [18:09:19] also, is there a problem with our blog? http://blog.wikimedia.org/2013/10/24/airtel-wikipedia-zero-text-trial/ <-- there are 3 pages of comments, but i can't see any of them [18:10:11] (03CR) 10Cmjohnson: [C: 032] Removing mgmt dns entries for decom'd servers in sdtpa alsted, amaranth, gilman, durant [operations/dns] - 10https://gerrit.wikimedia.org/r/92917 (owner: 10Cmjohnson) [18:11:14] !log dns update [18:11:29] Logged the message, Master [18:14:16] so why is only Cirrus freaking out about this? do other extensions use the pool counter differently? [18:14:28] can we just try syncing PoolCounter client again? [18:14:39] maybe something went bad [18:15:28] it *looks* like it is there on the servers I'm spot checking [18:15:43] yurik, they're only trackbacks - you don't see them because they're not approved (and will not be) [18:15:52] maybe corrupt on one [18:16:02] !log hashar synchronized php-1.23wmf1/extensions/PoolCounter/ 'making sure all apaches got all the files.' [18:16:16] Logged the message, Master [18:16:25] !log hashar synchronized php-1.22wmf22/extensions/PoolCounter/ 'making sure all apaches got all the files.' [18:16:41] Logged the message, Master [18:16:50] MaxSem: are you saying that out of 3 pages of comments there is not a single "valid" one? [18:17:02] yes! [18:17:05] ouch [18:17:11] what has web come to! [18:17:27] at least we've removed trackbacks from MW [18:17:38] a tiny bit of sanity... [18:17:44] aude: still broken [18:17:57] hmmmm [18:18:18] ^d: I suppose the difference between the way cirrus uses the pool counter and the others do is that cirrus declares the config in its own file rather than in that pool counter config file [18:18:26] maybe that file isn't getting included for some reason [18:18:54] manybubbles: huh [18:18:59] interesting theory [18:19:18] I mean, we're sure it is doing it "wrong" now. [18:19:25] I wonder what requesttracker will do if i replace a queue status options and there are tickets in now no longer available status... [18:19:44] manybubbles: chad is agreeing with you verbally right now [18:20:32] greg-g: cool. so my guess is we should move the cirrus pool counter stuff in with the rest which should put out the fire [18:20:34] !log hashar synchronized wmf-config/InitialiseSettings.php 'touch' [18:20:45] and then figure out what is going on with that file not getting included [18:20:51] Logged the message, Master [18:21:19] manybubbles: that being the patch that chad has but not merged yet? [18:21:43] greg-g: yeah [18:22:07] so folks were trying the patch? [18:22:08] he thinks the limit are too high but I'm not as concerned about that at this point [18:22:23] (03PS4) 10Chad: Move Cirrus poolcounter settings to where they belong, update values to reflect production more [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92808 [18:22:26] manybubbles: right, so he's going to merge that and deploy it [18:22:28] incoming! [18:22:45] cool. that should be fine [18:22:46] (03CR) 10Chad: [C: 032 V: 032] Move Cirrus poolcounter settings to where they belong, update values to reflect production more [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92808 (owner: 10Chad) [18:22:49] and make the bleeding stop [18:22:58] just bleeping stop all the bleeping! [18:23:59] !log demon synchronized wmf-config/ 'Proper poolcounter config for Cirrus' [18:24:00] gah [18:24:06] ok, not synced yet [18:24:17] Logged the message, Master [18:24:18] oh [18:24:21] different error! [18:24:22] ahhh [18:24:32] HP fatal error in /usr/local/apache/common-local/wmf-config/CommonSettings.php line 1862: [18:24:32] I don't get the error when searching any more [18:24:35] bad [18:24:40] (03PS1) 10Cmjohnson: Removing unused servers from netboot.cfg and dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/92919 [18:24:41] require_once() [<a href='function.require'>function.require</a>]: Failed opening required '/usr/local/apache/common-local/php-1.23wmf1/extensions/BetaFeatures/BetaFeatures.php' (include_path='/usr/local/apache/common-local/php-1.23wmf1/extensions/TimedMediaHandler/handlers/OggHandler/PEAR/File_Ogg:/usr/local/apache/common-local/php-1.23wmf1:/usr/local/lib/php:/usr/share/php') [18:24:41] so pool counter get initialized by cirrus search before the pool counter get included ? [18:24:52] ^d: [18:25:01] that's on testwikidata [18:25:11] so actually, .... [18:25:12] There shouldn't be BetaFeatures in wmf1 [18:25:14] hashar: we're not sure at all :( [18:25:17] You can't pin this on me [18:25:27] it's not switched to wmf2, but is set to have it in initialise settings [18:25:34] manybubbles: where should it be? [18:25:37] er marktraceur [18:25:38] so mediawiki.org now has a bunch of ftals [18:25:39] not manybubbles [18:25:42] greg-g: wmf2 [18:25:48] oh rightg [18:25:50] -g [18:25:53] greg-g: grr [18:25:54] Failed opening required '/usr/local/apache/common-local/php-1.23wmf1/extensions/BetaFeatures/BetaFeatures.php' [18:25:56] greg-g -g [18:26:02] I guess beta features is not available in 1.22wmf22 [18:26:15] brb [18:26:21] they are only in 1.23wmf2 [18:26:25] Yeah [18:26:28] which is not enabled anywhere yet [18:26:57] * marktraceur shouldn't have written the config patch yet maybe? [18:27:07] I blame Greg for everything [18:27:17] marktraceur: reasonable here [18:27:40] i assume someone is fixing the enabling of beta features etc [18:27:49] i would set it false everywhere for now [18:28:24] Yeah, roll back the config change maybe [18:28:26] oh hey, Ryan_Lane's here, we can just blame him, I hear that's common. [18:28:44] (03PS1) 10Hashar: mediawikiwiki: wmgUseBetaFeatures => false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92920 [18:28:46] Then we can move forward [18:28:49] gonna disable betaFeatures for mediawiki [18:28:52] if someone can confirm [18:29:00] hashar: disable it everywhere and also [18:29:03] multimediaviewer [18:29:07] <^d> I'm doing it. [18:29:09] and the third thing [18:29:11] ok [18:29:16] ^d: ok chad [18:29:24] (03Abandoned) 10Hashar: mediawikiwiki: wmgUseBetaFeatures => false [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92920 (owner: 10Hashar) [18:29:39] we really need to get some integration tests for wmf-config :( [18:29:47] Indeed [18:29:49] I tried once last year, the code is not easily testable [18:29:50] third thing is commons metadata [18:29:56] !log demon synchronized wmf-config/ [18:29:58] Yeah [18:30:08] they should all be false everywhere [18:30:09] Logged the message, Master [18:30:11] \O/ [18:30:22] ok, better [18:30:25] ^d: and now I feel useless. congratulations [18:30:40] greg-g: heh [18:30:49] what am I being blamed for? [18:31:17] !log removing decommissioned servers alsted, amaranth, durant, spence, williams from dsh group misc-servers on tin [18:31:27] now https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=MediaWiki+errors&vl=errors+%2F+sec&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=fatal|exception>ype=stack&glegend=show&aggregate=1&embed=1 [18:31:31] Logged the message, Master [18:31:31] is crazy [18:31:54] Ryan_Lane: site outage ;) [18:31:56] to see the cirrus fatals in context helps though [18:31:58] ah [18:32:03] greg-g: i have texted sam to keep him informed. He should be around soon [18:32:31] hm. I didn't get any pages [18:33:07] Ryan_Lane: it is mw related :-D merely a bunch of fatals [18:33:08] Ryan_Lane: mediawiki fatals [18:33:15] ah [18:33:24] * yurik wasn't here [18:33:48] well I am off for real now [18:33:52] ^d: thank you :-] [18:33:57] hashar: thank you! [18:34:23] * yurik is rubbernecking fatalmonitor [18:35:04] hard to tell but looks like maybe they are gone? (hopeful) [18:35:13] they are [18:35:18] just the 180s timeout now [18:35:20] which is usual [18:35:29] * hashar waves [18:36:35] hmmm [18:37:10] did disabling those extensions fix it? [18:37:23] (03PS1) 10Andrew Bogott: Remove a bunch of dangling commas in param lists. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92921 [18:37:30] (03CR) 10jenkins-bot: [V: 04-1] Remove a bunch of dangling commas in param lists. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92921 (owner: 10Andrew Bogott) [18:37:42] aude: that and redoing how cirrus does its poolcounter config [18:37:48] ok [18:37:55] makes sense [18:38:02] aude: https://gerrit.wikimedia.org/r/92808 [18:38:09] yeah [18:38:23] although strange that the old poolcounter settings worked before [18:38:52] unless we were hitting a limit, but can't see hwo that is related to not being able to find the pool counter client class [18:41:27] aude: I'm really not sure what is up with that. I mean, we're doing it better now, but I'm worried that that means that sometimes the pool counter is simply turned off for some hosts. [18:41:32] ^d: does ^ make sense? [18:42:05] the extension turned off? [18:42:08] odd [18:43:02] silly robla and his flood [18:43:11] (03PS2) 10Andrew Bogott: Remove a bunch of dangling commas in param lists. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92921 [18:43:20] aude: anyway, that error we were seeing looked like the extension was turned off [18:43:29] right [18:43:44] i don't understand how that can be [18:44:19] Maybe Reedy can figure it out when he gets back. better him than me, I suppose. [18:44:26] probably [18:44:44] as long as it's okay now, then not as much hurry [18:45:08] (03CR) 10Andrew Bogott: [C: 032] Remove a bunch of dangling commas in param lists. [operations/puppet] - 10https://gerrit.wikimedia.org/r/92921 (owner: 10Andrew Bogott) [18:48:40] <^d> manybubbles, aude: I'm going to have lunch and think. [18:48:51] ^d: sounds good [18:48:55] we're safe [18:48:57] ok [18:49:00] no hurry [18:49:36] (03CR) 10Cmjohnson: [C: 032] Removing unused servers from netboot.cfg and dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/92919 (owner: 10Cmjohnson) [18:52:30] (03PS1) 10MarkTraceur: Enable three multimedia extensions on beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92922 [18:52:39] greg-g, ^d, etc. ^^ [18:52:50] hey manybubbles, I wanted to check about the progress of elastic - can you say now that we're 100% sticking with it?:P [18:53:02] marktraceur: ^d went out for a smoke... [18:53:05] MaxSem: yes [18:53:13] wee:) [18:53:14] greg-g: lunch, he claimed [18:53:29] or something [18:53:40] I should do that too [18:53:44] Bollocks [18:53:47] alright, we're holding off on deploys until after lunch [18:53:58] * valhallasw prepares stroopwafels and coffee for when people get back from lunch [18:54:11] * marktraceur goes to get japacurry then [18:54:42] marktraceur: btw, not your fault for not having it on beta before hand, this was an extremely abbreviated push out (ie: final code reviews the day before) thus not allowing time for time on beta. Really, that's a product timeline mistake [18:54:51] and... he just left [18:55:04] stuff happens [18:55:21] wikidata community is patient and this is the risk of trying new stuff like elastic [18:57:57] greg-g: Multimedia will be having a retrospective as part of next week's team meeting if you'd like to attend :) [19:10:01] (03PS2) 10Ori.livneh: Remove references to 'olivneh' account from node defs [operations/puppet] - 10https://gerrit.wikimedia.org/r/92267 [19:14:32] bd808: sure thing :) [19:14:45] bd808: I'm 60% self blame on this one [19:15:02] Blame is lame. Learning is awesome [19:15:20] cause knowledge is power [19:15:35] Yo Joe! [19:15:50] knowledge of blaming: precious [19:17:03] bd808: fine, I'm doing 60% of the learning [19:17:06] ;) [19:21:37] greg-g, are the production problems from earlier resolved? [19:22:03] superm401: yes [19:27:24] superm401: though we are where we were at 8am thismorning (ie: wmf2 hasn't been deployed yet) [19:28:19] greg-g, thanks, noted. Wasn't sure if https://www.mediawiki.org/wiki/MediaWiki_1.23/Roadmap was up to date and didn't check on the cluster yet. [19:29:08] superm401: looks up-to-date [19:33:53] greg-g: That patch still not out, though [19:34:10] marktraceur: the betalabs one? [19:34:13] Or merged [19:34:14] Yeah [19:34:26] merge it and it should sync automatically [19:34:31] * marktraceur does [19:34:41] Self merging skeeves me out [19:34:44] yeah... [19:34:47] get someone to do it [19:34:53] at least look at it [19:34:54] That's why I pinged you! [19:34:56] Sync automatically? [19:34:57] fine! [19:35:02] superm401: betalabs [19:35:20] pulls from master every 3 minutes or so [19:35:37] Ah, sorry, I misread as/imagined BetaFeatures getting deployed to prod as soon as the submodule was merged. [19:35:55] superm401: Continuous deploy, man [19:36:04] #nevergonnahappen [19:36:06] we're already there already [19:36:06] marktraceur: if only [19:36:33] greg-g: Well. Three-minute deploy cycle on beta. But one week on prod. [19:36:41] I was joking [19:37:51] greg-g: So the plan is to test on beta, then do the config patch for prod, and send it out once Reedy is in a stable wifi orbit? [19:38:09] marktraceur: yessir [19:38:13] 'kay [19:38:31] I may as well write the second patch now, then, while I'm waitin' [19:38:54] sure sure [19:40:18] Oh, huh [19:40:28] The patch for the three extensions isn't reverted [19:40:34] So...no problem then [19:41:16] marktraceur: they were disabled everywhere [19:41:20] the config is still there [19:41:30] Awesome [19:42:02] i don't think that config change is on gerrit yet [19:42:38] $wmgUseBetaFeatures = false for everywhere [19:43:27] ...well that's pretty bloody silly, but OK [19:44:09] yeah [19:46:07] greg-g: Wait, will the extension code be automatically pushed out there? [19:53:54] (03PS1) 10Dzahn: make Special:EntityData redirects for wikidata 303 ..instead of 302 [operations/apache-config] - 10https://gerrit.wikimedia.org/r/92925 [19:54:09] marktraceur: i'm..... not 100% sure [19:57:38] (03PS1) 10Cmjohnson: Removing several decom'd servers from dsh group pmtpa and ALL [operations/puppet] - 10https://gerrit.wikimedia.org/r/92926 [20:00:21] (03CR) 10Cmjohnson: [C: 032] Removing several decom'd servers from dsh group pmtpa and ALL [operations/puppet] - 10https://gerrit.wikimedia.org/r/92926 (owner: 10Cmjohnson) [20:01:03] :/ [20:01:17] greg-g: I guess we can wing it and fix as needed [20:01:59] Jeff_Green: expected behaviour that "indium" can be pinged from iron but "lutetium" can't? [20:02:07] both frack [20:02:44] * jeremyb gets out his periodic table [20:02:49] mutante: it's not entirely surprising, but I'm not sure where it;s blocked offhand [20:03:40] Jeff_Green: know off-hand if one of them is decom or spare? [20:03:55] re: RT-6123 latest comment [20:03:55] they are both active frack servers [20:03:57] jeremyb: They're alloys, so you're screwed there [20:04:04] Or...I think they are [20:04:10] elements! [20:04:11] Jeff_Green: ok, thanks, just removing from some checklist then [20:04:31] Oh, lol, I fail at science. [20:04:43] marktraceur: tyvm [20:05:01] marktraceur: 3 points from uhh, gryffindor, err something. [20:06:07] activemq makes me want to dig a hole and throw my laptop in it. [20:09:38] Reasonable reaction [20:10:01] cmjohnson1: 'amaranth' says 'gone', but it's toolserver [20:10:12] i dunno if this can die [20:10:23] 152.80.208.in-addr.arpa:226 1H IN PTR amaranth.toolserver.org. [20:10:23] 152.80.208.in-addr.arpa:234 1H IN PTR web.amaranth.toolserver.org. [20:10:34] or it's just a server naming clash [20:10:52] hrm...may be a clash. [20:11:33] needs nosy to confirm or something [20:11:41] greg-g: how do you feel about another minor deploy? :) [20:11:41] i'm gonna just skip it for now [20:11:48] 31 20:11:38 -!- There is no such nick nosy [20:12:08] greg-g: https://gerrit.wikimedia.org/r/#/c/92824/ [20:12:12] jeremyb: $toolserver_admin [20:12:57] i guess we shouldt kill it just yet :) [20:13:05] greg-g: apparently we are including extra CSS & JS into too many non-zero feature phones [20:13:07] web.amaranth.toolserver.org = https://jira.toolserver.org/secure/Dashboard.jspa [20:13:13] it's their ticket system [20:13:56] web.amaranth.toolserver.org. 3386 IN A 208.80.152.234 [20:14:00] tampa.. [20:15:50] 181 ; 208.80.152.224/27 sandbox vlan [20:16:20] okay....that ip does not show up in the check [20:16:51] all traces of amaranth in our puppet cfgs are gone [20:17:12] ok, yea, because we dont puppetize the toolserver boxes [20:17:24] maybe there really was another amaranth in the past [20:17:35] it's quite possible [20:18:12] yurik: not yet :/ [20:23:08] (yes there was) [20:23:46] * marktraceur stares intently at greg-g in the digital world too [20:34:47] (03PS1) 10Dzahn: remove scandium CNAME for bast1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/92976 [20:37:51] greg-g: Can has merge? Or update? [20:38:11] mark: ah gotcha. is the sandbox vlan 208.80.152.224/27 going to die when Tampa dies / should that toolserver stuff be listed as related to shutdown? [20:38:32] marktraceur: beta's not loading for me [20:38:33] even though the servers are in esams [20:39:07] Is for me [20:39:08] ^d: yo, wanna do the wmf2 rollout in the next bit? we're blocking the betafeatures and other deploys today :) and since you sit close to me :) [20:40:36] mutante: maybe this changed in ~3-6 months but AFAIK, amaranth is their tunnel to internal WMF stuff. e.g. for mysql binlog replication [20:41:02] greg-g: http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page seems fine [20:41:09] I confirm it was, no idea now [20:42:11] mutante: so, it doesn't matter so much what the IP is but they need some way to get access to replication (possibly in eqiad) and if the method changes then coordinate with them on the switch [20:42:25] jeremyb: i see, ok. just looked at all 208.80.152.x and listing what's left [20:42:43] jeremyb: yea, that qualifies as "needs a migration ticket" then [20:42:44] ok [20:43:04] i don't expect tool-labs to have completely replaced toolserver.org by then [20:43:08] or do i [20:43:15] that's kinda irrelevant [20:43:30] the question is whether people expect the toolserver replicas to be maintianed [20:43:33] maintained* [20:43:35] nods [20:44:00] hmm that would be "should there be replicas in tool-labs" [20:44:23] there are already [20:44:30] they're not exactly the same [20:44:33] marktraceur: ah, there it is (took a while) [20:44:47] marktraceur: where be your beta config change/ [20:44:47] ? [20:44:50] but they're close enough that i think most people could use them [20:45:05] greg-g: https://gerrit.wikimedia.org/r/92922 [20:45:13] mutante: yes [20:45:18] not sure about perf (there were some changes and tuning, idk if people still complain about that) [20:45:44] issues like cross-database joins.... [20:46:55] mark: kk, creating RT [20:51:16] (03PS1) 10Chad: group0 wikis to 1.23wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92977 [20:51:48] <^d> greg-g: ^ [20:53:31] (03CR) 10Greg Grossmeier: [C: 031] group0 wikis to 1.23wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92977 (owner: 10Chad) [20:53:40] ^d: pong [20:54:09] (03CR) 10Chad: [C: 032] group0 wikis to 1.23wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92977 (owner: 10Chad) [20:54:20] (03Merged) 10jenkins-bot: group0 wikis to 1.23wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92977 (owner: 10Chad) [20:54:28] * greg-g crosses fingers [20:54:35] * greg-g does a rain dance [20:54:44] * greg-g throws salt over his right shoulder [20:55:01] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.23wmf2 [20:55:17] (03CR) 10Hashar: [C: 032] "What can possibly goes wrong beside some fatals errors in prod? Nothing :-] Lets play on beta." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92922 (owner: 10MarkTraceur) [20:55:20] Logged the message, Master [20:55:28] (03Merged) 10jenkins-bot: Enable three multimedia extensions on beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92922 (owner: 10MarkTraceur) [20:55:34] PHP fatal error in /usr/local/apache/common-local/wmf-config/CommonSettings.php line 2717: [20:55:37] require() [function.require]: Failed opening required '/usr/local/apache/common-local/php-1.23wmf2/../wmf-config/ExtensionMessages-1.23wmf2.php' (include_path='/usr/local/apache/common-local/php-1.23wmf2/extensions/TimedMediaHandler/handlers/OggHandler/PEAR/File_Ogg:/usr/local/apache/common-local/php-1.23wmf2:/usr/local/lib/php:/usr/share/php') [20:55:41] wtf [20:55:44] again [20:55:50] ok [20:55:58] i don't think all the scap completed [20:56:12] localisation cache build [20:56:14] mw.o down for me [20:56:18] http://www.mediawiki.org/wiki/MediaWiki_on_IRC [20:56:19] revert time [20:56:23] chad's on it [20:56:51] Needs update.php, greg-g [20:56:57] cc hashar [20:57:10] ^d's on it [20:57:57] marktraceur, update.php on prod is fun. so much fun that you'll not survive it;) [20:58:06] >.< [20:58:09] https://j.mp/wmfatal [20:58:16] time to have a shortlink for that [20:58:19] hehe [20:58:21] thanks aude [20:59:32] hold on, chad misplaces his glasses [20:59:37] (joke) [20:59:38] hah [20:59:51] It would seem I have just stumnbled on this also [21:01:42] should be just a moment [21:02:14] beta update.php in progress https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases/ [21:02:37] hashar: is that automatic or do we trigger it? [21:02:43] every hour [21:02:52] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: [21:03:01] together with pullign from gerrit [21:03:05] I haven't found a nice way to detect new database changes so I do the 24 times per day brute force method [21:03:05] Logged the message, Master [21:03:09] ok [21:03:12] hashar: Getting 403s on the spinner gif, causing a fatal JS error in MultimediaViewer [21:03:14] so not synched [21:03:17] the pulling from gerrit is every 6 minutes [21:03:20] ok [21:03:33] doc: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated [21:03:40] but the update.php can be triggered [21:03:41] yay mediawiki.org is back ;p [21:03:51] aude yeah any job can be triggered manually if needed [21:03:51] or just happened to be a new hour? [21:03:54] ok [21:04:17] I need to tweak the update script to convert it as a deamon listening for gerrit changes [21:04:26] "GET http://en.wikipedia.beta.wmflabs.org/w/resources/jquery/images/spinner-large.gif 403 (Forbidden)" [21:04:27] * aude nods [21:04:29] and thus update the mediawiki/extensions.git whenever needed [21:04:31] lacking time [21:04:42] so 6 minutes brute force is good enough for now, it get the job done. [21:04:50] yeah [21:05:20] something we could do is run update.php in dry run (not actually changing anything) to detect whether there is any potential change pending, then actually execute them if there are [21:05:26] i should fill bugs for all of that probably [21:06:36] Uhhhh...maybe the problem is that resources aren't on that server [21:07:01] I'm not sure at this point, can someone weigh in? [21:07:32] marktraceur: should be on bits.beta.wmflabs.org [21:07:41] hashar: Crap. [21:07:44] sorry [21:07:46] * marktraceur needs to fix the bug in MMV [21:08:11] that prevents a bug in production. Congratulations! [21:09:16] Crap crap crap [21:09:20] (03PS1) 10Chad: Revert "Add version specific extension-list" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92980 [21:11:19] (03PS1) 10Chad: Revert "group0 wikis to 1.23wmf2" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92982 [21:11:39] revert revert revert [21:11:53] (03PS1) 10Chad: Revert "Enable three multimedia extensions on beta" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92983 [21:15:04] (03CR) 10Chad: [C: 032] Revert "group0 wikis to 1.23wmf2" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92982 (owner: 10Chad) [21:15:10] (03CR) 10Chad: [C: 032] Revert "Add version specific extension-list" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92980 (owner: 10Chad) [21:15:10] mwalker: hey, how important are you CN patches? [21:15:15] (03CR) 10Chad: [C: 032] Revert "Enable three multimedia extensions on beta" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92983 (owner: 10Chad) [21:16:01] greg-g: in the grand scheme of things; not all that important [21:16:07] I'd like to get them out today [21:16:10] but I can also wait till monday [21:17:20] (03Merged) 10jenkins-bot: Revert "group0 wikis to 1.23wmf2" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92982 (owner: 10Chad) [21:17:38] (03Merged) 10jenkins-bot: Revert "Add version specific extension-list" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92980 (owner: 10Chad) [21:17:50] greg-g: I'm also going to be around for a long while today; and they aren't that risky -- so I can also deploy after the LD [21:18:10] * aude hates to ask, but we also have tiny updates for wikibase [21:18:21] doesn't have to be today but we don't want to wait days [21:18:27] (03Merged) 10jenkins-bot: Revert "Enable three multimedia extensions on beta" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92983 (owner: 10Chad) [21:18:35] whenever things are sorted :) [21:19:03] mwalker: aude can ya'll do them tomorrow? [21:19:08] greg-g: yes [21:19:16] *gasp* deploy on a friday! [21:19:26] I know I know :) [21:19:28] ours are tiny but important [21:19:31] * greg-g nods [21:19:44] tomorrow we'll get things back to normal as much as possible [21:19:50] ie: back on track [21:20:02] k [21:20:06] greg-g: in the spirit of obeying the no friday deploys except for really important things; I'll move my slot to monday [21:21:10] our fix is already in 1.23wmf2, so it's only needed for 1.23wmf1 [21:21:17] mwalker: cool [21:21:26] aude: ahh, cool, that's easy for tomorrow [21:21:29] as it was there at time the new branch was made [21:21:37] we'll see it on test.wikidata [21:21:38] !log demon Started syncing Wikimedia installation... : Unbreak cluster. Like most things, the sequel wasn't as good as the original [21:21:50] and then get it onto wikidata tomorrow [21:21:52] Logged the message, Master [21:22:31] alright, time to go home [21:24:07] aude: g'night [21:24:20] I have to head out and take care of some things, I'll be online later tonight [21:26:15] marktraceur: I cant really figure out what broke from backscroll; but I see a lot of you in it -- are you to blame for my inability to deploy today? or is it just greg-g taking halloween precautions? [21:26:40] mwalker: Everything got cocked up and only part of it was my fault, but none of the bigger things AFAIK [21:27:03] greg-g may have decided to blame me somewhat, but I'm not sure [21:27:24] nah; greg-g is innocent in the blaming; I just picked a random friendly person :) [21:28:04] 'kay [21:28:11] though; I probably could have picked one with fewer nerf armaments! [21:28:14] mwalker: Blame everybody on this side of 3 for your inability [21:28:14] * mwalker ducks [21:29:01] (03PS1) 10Chad: Revert "Revert "Enable three multimedia extensions on beta"" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92987 [21:29:09] (03CR) 10Chad: [C: 032 V: 032] Revert "Revert "Enable three multimedia extensions on beta"" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92987 (owner: 10Chad) [21:29:25] hmm; now we're approaching the territory where we possibly need an effigy to blame for such random acts of brokenness [21:29:25] (03PS1) 10Chad: Revert "Add three Multimedia extensions to config" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92988 [21:29:33] (03CR) 10Chad: [C: 032 V: 032] Revert "Add three Multimedia extensions to config" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92988 (owner: 10Chad) [21:29:36] I cant offer up the awesome possum because its... awesome [21:29:44] * Elsie blinks. [21:30:00] <^d> I swear to jesus f'ing christ. [21:30:02] <^d> I need a drink. [21:30:58] What happened to Sam? :-( [21:31:13] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:31:17] <^d> Fell in the ocean, best I can tell [21:31:24] Oh dear. [21:31:25] what happened to prod? [21:31:31] Same fate. [21:31:44] <^d> ori-l: Reverted the wrong change, fixing now. [21:31:51] <^d> And then I refuse to touch this crap again today :p [21:32:48] (03PS1) 10Dzahn: remove bellin/blondel references, they don't exist [operations/puppet] - 10https://gerrit.wikimedia.org/r/92989 [21:33:31] !log demon synchronized wmf-config/InitialiseSettings.php 'Fixing for the last time' [21:33:46] Logged the message, Master [21:33:47] !log demon synchronized wmf-config/CommonSettings.php 'Fixing for the last time' [21:33:56] oops [21:34:02] Logged the message, Master [21:34:21] Yeah, mw.org is back up. :) [21:34:24] Thanks, ^d [21:36:03] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [21:38:43] !log demon Finished syncing Wikimedia installation... : Unbreak cluster. Like most things, the sequel wasn't as good as the original [21:38:55] Logged the message, Master [21:39:13] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:39:16] ^d: Not sure (un)breaking the site is the best time to make jokes. Perhaps after? :-) [21:40:13] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [21:43:05] Elsie, the killguy. [21:43:11] *killjoy [21:43:21] I'll take either. [21:44:45] And killgal? [21:45:08] templates/misc/udpmxircecho.py.erb [21:45:20] 53 # Get oper mode if we are connecting to browne [21:45:37] if c.get_server_name().endswith(".wikimedia.org") [21:46:08] 'browne' is very gone [21:49:27] (03PS1) 10Dzahn: remove 'khaldun' remnant. [operations/dns] - 10https://gerrit.wikimedia.org/r/92991 [22:06:10] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:07:10] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [22:14:10] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:17:10] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [22:17:28] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Bye bye MW 1.22 [22:17:44] Logged the message, Master [22:17:51] (03PS1) 10Reedy: All wikipedias to 1.23wmf1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92998 [22:18:06] (03CR) 10Reedy: [C: 032] All wikipedias to 1.23wmf1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92998 (owner: 10Reedy) [22:18:09] That's the easy one [22:18:37] (03Merged) 10jenkins-bot: All wikipedias to 1.23wmf1 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/92998 (owner: 10Reedy) [22:18:59] mwalker: were you still going to LD? [22:19:10] Reedy: Hey so what's going on with wmf2? Any chance I can sneak in https://gerrit.wikimedia.org/r/#/c/92915/ ? [22:19:11] ori-l: no; I moved myself to monday [22:19:14] Reedy: stop [22:19:21] (I asked earlier but you were off line) [22:19:22] mwalker: ok, thanks [22:19:35] RoanKattouw: we don't feel good about prod atm :/ [22:19:39] things are a bit WTF-y [22:19:44] can it wait? [22:19:47] Yeah, sure [22:19:51] thanks, sorry [22:20:08] It's just that it's in wmf2 so if that hasn't been properly deployed yet I figured it could be snuck into it before it gets deployed [22:20:10] PROBLEM - RAID on arsenic is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:21:04] it sounds safe, but we're trying to write up a definitive account of what went wrong today and it's proving to be a bit tricky because there are many moving pieces [22:23:10] RECOVERY - RAID on arsenic is OK: OK: no RAID installed [22:30:46] could somebody with the powers restart parsoid on wtp1011? [22:31:01] it does not seem to be doing much work [22:31:05] gwicke: i can [22:31:34] gwicke: done [22:31:40] RECOVERY - Parsoid on wtp1011 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [22:31:42] !log restarted parsoid on wtp1011 [22:31:46] there it goes [22:31:58] Logged the message, RobH [22:32:08] RobH: awesome, thanks! [22:32:13] welcome [22:51:53] Will wmf2 still be deployed, or is this on hold until some TBA moment? [22:59:06] <^demon> valhallasw: I doubt it'll happen this afternoon. [22:59:13] <^demon> I'm sure as heck not touching things again. [22:59:14] <^demon> :) [23:00:03] heh. OK. [23:00:54] is there a specific time slot typically used for shifted deployments (the same 18-20 UTC window?) -- I need to time a deployment of the gerrit patch uploader with the mw.o deployment to make sure OAuth doesn't break. [23:01:54] valhallasw: These deployments don't usually get postponed so I don't know what would "typically" happen, but you can ask greg-g what will happen in this case [23:02:47] <^demon> greg-g's somewhere between the office and home. [23:03:10] In that case I'll go for sleep + F5'ing mw.o every now and then tomorrow. [23:03:18] thanks for the info. [23:07:55] valhallasw: FWIW it sounds like we won't deploy to mw.o tomorrow. I'd guess Monday. The "official" plan will get sent out soon, I'm sure. [23:08:01] Hey valhallasw. [23:12:07] marktraceur: I'll F5 gmail instead then ;-) [23:12:20] Elsie: Hey. I'm actually heading to bed now. [23:12:32] Heh [23:12:40] so please prod me tomorrow :-) [23:22:44] (03PS1) 10Dzahn: remove constable from netboot/dhcp/dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/93004 [23:24:49] I just was saying hello. But perhaps I'll do it again tomorrow. [23:26:24] (03PS2) 10Dzahn: remove constable from netboot/dhcp/dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/93004 [23:30:14] !log reedy synchronized wmf-config/ 'touch' [23:30:34] Logged the message, Master [23:31:06] (03PS1) 10Cmjohnson: Removing dns entries for constable [operations/dns] - 10https://gerrit.wikimedia.org/r/93005 [23:31:10] !log reedy synchronized php-1.23wmf1/resources 'touch' [23:31:22] Logged the message, Master [23:32:44] !log reedy synchronized php-1.23wmf1/extensions/MobileFrontend 'touch' [23:32:58] Logged the message, Master [23:32:59] (03CR) 10Cmjohnson: [C: 032] Removing dns entries for constable [operations/dns] - 10https://gerrit.wikimedia.org/r/93005 (owner: 10Cmjohnson) [23:33:00] (03PS3) 10Dzahn: remove constable from netboot/dhcp/dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/93004 [23:33:32] !log dns update [23:33:46] Logged the message, Master [23:34:24] (03CR) 10Dzahn: [C: 032] remove constable from netboot/dhcp/dsh [operations/puppet] - 10https://gerrit.wikimedia.org/r/93004 (owner: 10Dzahn) [23:36:06] (03PS1) 10Yurik: Add more zero values to analytics header [operations/puppet] - 10https://gerrit.wikimedia.org/r/93006 [23:41:57] !log delete dsh groups 'nagios' and 'misc-servers' from tin/fenari. they are gone from puppet and unused (just not actively deleted) [23:42:00] cmjohnson1: ^ done [23:42:02] ttyl then [23:42:11] Logged the message, Master [23:42:14] cool ttyl [23:54:13] (03CR) 10Cmjohnson: [C: 032] remove 'khaldun' remnant. [operations/dns] - 10https://gerrit.wikimedia.org/r/92991 (owner: 10Dzahn) [23:55:37] !log dns update [23:55:53] Logged the message, Master [23:56:28] (03CR) 10Cmjohnson: [C: 032] remove scandium CNAME for bast1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/92976 (owner: 10Dzahn)