[00:01:31] 3hardware-requests, operations: Re-image osmium and repurpose into VE testing host - https://phabricator.wikimedia.org/T87215#992351 (10RobH) [00:01:35] (03PS3) 10KartikMistry: Publish article to Main namespace for cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186358 [00:02:04] (03PS4) 10Giuseppe Lavagetto: [WMF] New package with additional patches and fixes to the ini files and to the upstart/init scripts [debs/hhvm] - 10https://gerrit.wikimedia.org/r/185187 [00:02:19] (03CR) 10KartikMistry: [C: 04-1] "Not to be merged until we finalized decision on this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186358 (owner: 10KartikMistry) [00:02:31] (03CR) 10Giuseppe Lavagetto: [C: 032] [WMF] New package with additional patches and fixes to the ini files and to the upstart/init scripts [debs/hhvm] - 10https://gerrit.wikimedia.org/r/185187 (owner: 10Giuseppe Lavagetto) [00:02:41] (03CR) 10Giuseppe Lavagetto: [V: 032] [WMF] New package with additional patches and fixes to the ini files and to the upstart/init scripts [debs/hhvm] - 10https://gerrit.wikimedia.org/r/185187 (owner: 10Giuseppe Lavagetto) [00:02:44] (03PS2) 10KartikMistry: cxserver: enable no->nn language pair [puppet] - 10https://gerrit.wikimedia.org/r/186522 [00:04:23] (03CR) 10KartikMistry: [C: 04-1] "Not to merge right now! :)" [puppet] - 10https://gerrit.wikimedia.org/r/186522 (owner: 10KartikMistry) [00:06:46] chasemp: Are you fiddling with phab email config atm? [00:07:13] (03PS4) 10Giuseppe Lavagetto: virt: use role, hiera [puppet] - 10https://gerrit.wikimedia.org/r/185153 [00:07:26] Coren: He's talking to twentyafterfour atm, so I guess not [00:07:42] <_joe_> andrewbogott: would you care to review https://gerrit.wikimedia.org/r/#/c/185153/? [00:07:51] sure [00:07:55] <_joe_> Reedy: you're our eye outside the ops den [00:07:55] Seems reasonable. [00:08:12] Reedy: aka "Snitch" [00:08:48] 3hardware-requests, operations: Re-image osmium and repurpose into VE testing host - https://phabricator.wikimedia.org/T87215#992357 (10mark) Approved to reuse osmium for this. [00:09:24] (03PS3) 10Gage: logstash: Update apache2 parsing pattern [puppet] - 10https://gerrit.wikimedia.org/r/184112 (owner: 10BryanDavis) [00:09:41] ori: mind poking at https://gerrit.wikimedia.org/r/#/c/183568/ ? should be an easy win [00:10:21] (03CR) 10Gage: [C: 032] logstash: Update apache2 parsing pattern [puppet] - 10https://gerrit.wikimedia.org/r/184112 (owner: 10BryanDavis) [00:11:45] (03CR) 10Andrew Bogott: [C: 032] virt: use role, hiera [puppet] - 10https://gerrit.wikimedia.org/r/185153 (owner: 10Giuseppe Lavagetto) [00:11:56] _joe_: looks right to me; shall I merge and test right now? [00:12:24] <_joe_> andrewbogott: go on [00:15:35] (03PS3) 10Gage: logstash: remove support for most udp2log events [puppet] - 10https://gerrit.wikimedia.org/r/185482 (owner: 10BryanDavis) [00:17:44] (03CR) 10Gage: [C: 032] logstash: remove support for most udp2log events [puppet] - 10https://gerrit.wikimedia.org/r/185482 (owner: 10BryanDavis) [00:17:55] <^d> bd808: 7 unassigned, 4 initializing [00:19:04] 3operations, ops-eqiad: testing ticket for emails - https://phabricator.wikimedia.org/T87481#992371 (10RobH) 3NEW a:3RobH [00:20:13] (03CR) 10Alexandros Kosiaris: [C: 04-1] WIP: cxserver: Add Yandex MT support (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/186538 (owner: 10KartikMistry) [00:23:09] !log begin re-imaging osmium [00:23:19] Logged the message, Master [00:23:50] 3hardware-requests, operations: Re-image osmium and repurpose into VE testing host - https://phabricator.wikimedia.org/T87215#992384 (10yuvipanda) Am re-imaging osmium now [00:24:06] 3hardware-requests, operations: Re-image osmium and repurpose into VE testing host - https://phabricator.wikimedia.org/T87215#992385 (10yuvipanda) a:5RobH>3yuvipanda [00:24:19] robh: just noticed ^ was assigned to you, should’ve asked before I took it. is that ok? [00:25:10] PROBLEM - Host osmium is DOWN: PING CRITICAL - Packet loss = 100% [00:25:35] hey guys, can you merge https://gerrit.wikimedia.org/r/#/c/186420/ please? [00:25:52] gah [00:26:51] RECOVERY - Host osmium is UP: PING OK - Packet loss = 0%, RTA = 2.44 ms [00:29:32] (03PS3) 10Giuseppe Lavagetto: services: create sca role [puppet] - 10https://gerrit.wikimedia.org/r/185154 [00:30:28] (03CR) 10Giuseppe Lavagetto: [C: 032] services: create sca role [puppet] - 10https://gerrit.wikimedia.org/r/185154 (owner: 10Giuseppe Lavagetto) [00:34:31] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: puppet fail [00:36:06] (03PS2) 10KartikMistry: WIP: cxserver: Add Yandex MT support [puppet] - 10https://gerrit.wikimedia.org/r/186538 [00:36:43] (03Abandoned) 10Se4598: Disable compact personal bar (beta feature) on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/185081 (https://phabricator.wikimedia.org/T86831) (owner: 10Se4598) [00:36:45] (03CR) 10jenkins-bot: [V: 04-1] WIP: cxserver: Add Yandex MT support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (owner: 10KartikMistry) [00:37:22] (03PS1) 10Giuseppe Lavagetto: sca: move hiera file to the correct location (d'oh) [puppet] - 10https://gerrit.wikimedia.org/r/186544 [00:37:28] <_joe_> akosiaris: ^^ [00:37:43] (03CR) 10Giuseppe Lavagetto: [C: 032] sca: move hiera file to the correct location (d'oh) [puppet] - 10https://gerrit.wikimedia.org/r/186544 (owner: 10Giuseppe Lavagetto) [00:37:54] (03CR) 10Giuseppe Lavagetto: [V: 032] sca: move hiera file to the correct location (d'oh) [puppet] - 10https://gerrit.wikimedia.org/r/186544 (owner: 10Giuseppe Lavagetto) [00:38:33] akosiaris: base::firewall broke at least one person’s workflow. someone was sshing from tin to wtp* hosts and was wondering why it was no longer working :) [00:39:20] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:43:17] Reedy: there? [00:43:22] se4598: ja [00:43:54] you merged https://gerrit.wikimedia.org/r/#/c/185116 but it's still showing up for me in the beta section? [00:45:57] Yeah.. [00:46:09] I was wondering if there's a cleanup script for betafeatures when disabled [00:46:11] James_F|Away: ^^ [00:46:45] Or, marktraceur [00:47:53] Hm. [00:48:22] ah, below is a 'wmgVectorBetaPersonalBar' still set [00:48:27] maybe this ^? [00:48:28] heh [00:48:33] Not sure [00:48:38] I guess there's no benefit from removing the row in the database [00:48:50] The search one disappeared at least [00:49:28] se4598: Looks phishy at least [00:49:47] Let me kill it [00:50:22] (03PS1) 10Reedy: Remove wmgVectorBetaPersonalBar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186549 [00:50:39] !log reedy Synchronized wmf-config/: Kill wmgVectorBetaPersonalBar (duration: 00m 08s) [00:50:50] Logged the message, Master [00:50:56] se4598: That looks to have removed it [00:51:14] (03CR) 10Reedy: [C: 032] Remove wmgVectorBetaPersonalBar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186549 (owner: 10Reedy) [00:51:31] Reedy: they're in -labs.php also [00:51:37] if we want that [00:52:14] (03PS2) 10Giuseppe Lavagetto: base::resolving: get rid of the global domain_search variable [puppet] - 10https://gerrit.wikimedia.org/r/185912 [00:53:22] Hmm [00:53:30] I'm not sure about beta... Presumably it should go too [00:53:50] (03PS3) 10KartikMistry: WIP: cxserver: Add Yandex MT support [puppet] - 10https://gerrit.wikimedia.org/r/186538 [00:53:58] (03PS1) 10BBlack: set up varnish instance defaults file even under systemd [puppet] - 10https://gerrit.wikimedia.org/r/186550 [00:54:52] (03PS1) 10Ottomata: Create geowiki module [puppet] - 10https://gerrit.wikimedia.org/r/186551 [00:55:09] (03PS2) 10MaxSem: Replace phuedx's key [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [00:55:34] (03CR) 10jenkins-bot: [V: 04-1] Create geowiki module [puppet] - 10https://gerrit.wikimedia.org/r/186551 (owner: 10Ottomata) [00:56:08] (03PS2) 10BBlack: set up varnish instance defaults file even under systemd [puppet] - 10https://gerrit.wikimedia.org/r/186550 [00:56:34] (03Merged) 10jenkins-bot: Remove wmgVectorBetaPersonalBar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186549 (owner: 10Reedy) [00:56:51] (03PS2) 10Ottomata: Create geowiki module [puppet] - 10https://gerrit.wikimedia.org/r/186551 [00:57:07] (03CR) 10BBlack: [C: 032] set up varnish instance defaults file even under systemd [puppet] - 10https://gerrit.wikimedia.org/r/186550 (owner: 10BBlack) [00:57:35] (03CR) 10jenkins-bot: [V: 04-1] Create geowiki module [puppet] - 10https://gerrit.wikimedia.org/r/186551 (owner: 10Ottomata) [00:59:26] (03PS3) 10Ottomata: Create geowiki module [puppet] - 10https://gerrit.wikimedia.org/r/186551 [01:00:06] (03CR) 10jenkins-bot: [V: 04-1] Create geowiki module [puppet] - 10https://gerrit.wikimedia.org/r/186551 (owner: 10Ottomata) [01:01:15] (03PS4) 10Ottomata: Create geowiki module [puppet] - 10https://gerrit.wikimedia.org/r/186551 [01:05:10] 3Phabricator, operations: have any task put into ops-access-requests automatically generate an ops-access-review task - https://phabricator.wikimedia.org/T87467#992484 (10Aklapper) p:5Triage>3High [01:05:32] (03CR) 10Ottomata: [C: 04-1] "Will wait until monday to merge this." [puppet] - 10https://gerrit.wikimedia.org/r/186551 (owner: 10Ottomata) [01:05:57] !log restarting Jenkins (deadlock on deployment-bastion slave) [01:06:04] Logged the message, Master [01:07:17] (03CR) 10Se4598: "this was not enough, Reedy removed it with https://gerrit.wikimedia.org/r/186549 (may be still present on beta-labs)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/185116 (https://phabricator.wikimedia.org/T85541) (owner: 10Jforrester) [01:09:43] 3hardware-requests, operations: Re-image osmium and repurpose into VE testing host - https://phabricator.wikimedia.org/T87215#992502 (10yuvipanda) 5Open>3Resolved Aaaand done :) [01:10:50] se4598: https://phabricator.wikimedia.org/T87489#992506 [01:11:36] (03PS3) 10Yuvipanda: Replace phuedx's key [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [01:11:51] (03PS4) 10Yuvipanda: admin: replace phuedx's key [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [01:12:02] (03CR) 10Yuvipanda: [C: 032] "Verified in person" [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [01:12:17] 3WMF-NDA-Requests, operations: Grant Nikerabbit access to WMF-NDA group - https://phabricator.wikimedia.org/T86632#992518 (10Qgil) @aklapper, can you help here? [01:12:42] (03CR) 10jenkins-bot: [V: 04-1] admin: replace phuedx's key [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [01:13:19] (03CR) 10Yuvipanda: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [01:13:35] (03CR) 10Yuvipanda: admin: replace phuedx's key [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [01:13:40] (03CR) 10Yuvipanda: [C: 032] admin: replace phuedx's key [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [01:15:06] 3WMF-NDA-Requests, operations, WMF-Legal: Grant WMF-NDA access to Stas in Phabricator - https://phabricator.wikimedia.org/T85170#992524 (10Qgil) Not the deprecated legalpad.wm.org, and not the process we have in place here, since these are for volunteer developers. WMF employees sign an NDA when they join the F... [01:15:21] 3WMF-NDA-Requests, operations: Grant Nikerabbit access to WMF-NDA group - https://phabricator.wikimedia.org/T86632#992526 (10Reedy) Think we need to ping legal to confirm he definitely has (for the paper trail), then ops can actually grant it [01:15:58] PROBLEM - puppet last run on db2028 is CRITICAL: CRITICAL: puppet fail [01:17:46] 3WMF-NDA-Requests, operations, WMF-Legal: Grant WMF-NDA access to Stas in Phabricator - https://phabricator.wikimedia.org/T85170#992530 (10Dzahn) I'm not sure i understand why it makes a difference if they are volunteer or employee. The problem seems the same, we need a way to check if somebody signed an NDA and... [01:33:38] RECOVERY - puppet last run on db2028 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [01:36:06] 3WMF-NDA-Requests, operations, WMF-Legal: Grant WMF-NDA access to Stas in Phabricator - https://phabricator.wikimedia.org/T85170#992584 (10Qgil) The current Legalpad document is called "Trusted Volunteer Access & Confidentiality Agreement" and it has references to "Volunteer" all over. I have no idea what is the... [01:49:44] 3operations: mysql boxes not in ganglia - https://phabricator.wikimedia.org/T87209#992622 (10Reedy) p:5Triage>3High [01:49:56] 3operations: mysql boxes not in ganglia - https://phabricator.wikimedia.org/T87209#985982 (10Reedy) [02:00:27] 3operations: Add Erik Bernhardson to the mwreview group - https://phabricator.wikimedia.org/T87492#992653 (10EBernhardson) 3NEW [02:02:44] 3Labs, operations: Add Erik Bernhardson to the mwreview group - https://phabricator.wikimedia.org/T87492#992662 (10Reedy) [02:09:30] (03PS2) 10Mattflaschen: Make flow-bot grantable/removable on enwiki, testwiki, test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181120 (https://phabricator.wikimedia.org/T76793) [02:10:38] (03PS3) 10Mattflaschen: Make flow-bot grantable/removable on enwiki, testwiki, test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181120 (https://phabricator.wikimedia.org/T86403) [02:17:04] !log l10nupdate Synchronized php-1.25wmf14/cache/l10n: (no message) (duration: 00m 02s) [02:17:08] !log LocalisationUpdate completed (1.25wmf14) at 2015-01-24 02:17:08+00:00 [02:17:21] Logged the message, Master [02:17:24] Logged the message, Master [02:25:39] (03PS1) 10BryanDavis: logstash: fix exception-json messages [puppet] - 10https://gerrit.wikimedia.org/r/186563 [02:30:07] !log l10nupdate Synchronized php-1.25wmf15/cache/l10n: (no message) (duration: 00m 02s) [02:30:11] !log LocalisationUpdate completed (1.25wmf15) at 2015-01-24 02:30:10+00:00 [02:30:14] Logged the message, Master [02:30:17] Logged the message, Master [02:34:03] (03PS2) 10Ori.livneh: logstash: fix exception-json messages [puppet] - 10https://gerrit.wikimedia.org/r/186563 (owner: 10BryanDavis) [02:34:12] (03CR) 10Ori.livneh: [C: 032 V: 032] logstash: fix exception-json messages [puppet] - 10https://gerrit.wikimedia.org/r/186563 (owner: 10BryanDavis) [02:54:57] (03PS2) 10Jforrester: Add the Citoid extension to use the citoid service to Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186001 [02:55:20] (03CR) 10Jforrester: "Ib238012e is now merged. Should be good to go." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186001 (owner: 10Jforrester) [02:55:50] (03PS3) 10Ori.livneh: Add the Citoid extension to use the citoid service to Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186001 (owner: 10Jforrester) [02:55:58] (03CR) 10Ori.livneh: [C: 032] Add the Citoid extension to use the citoid service to Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186001 (owner: 10Jforrester) [03:01:05] (03Merged) 10jenkins-bot: Add the Citoid extension to use the citoid service to Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186001 (owner: 10Jforrester) [03:01:28] I had a Beta feature for new navigation (User page, sandbox etc) in a drop down menu. Has that been dropped ? [03:02:45] NotASpy: Yes. [03:03:01] NotASpy: De-deployed today because it had a bug with ULS that wasn't going to get fixed soon enough. [03:03:25] aww. I liked it. Will it be back at some point ? [03:04:05] NotASpy: I hope so, but I don't think there's a developer assigned to it, so it'll take some time and/or luck. [03:04:31] oh well, back to the old look for now. Thanks [03:04:51] Sorry. [03:04:51] still, I get my clock back, so it's not all bad. [03:05:11] :-) [03:05:26] (and I was saying thanks for letting me know, not being sarcastic) [03:05:33] * James_F nods. [03:12:18] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [03:40:29] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [05:33:42] what's going on? I keep getting PHP fatal error: [05:33:42] Class undefined: ProfilerSectionOnly [05:47:10] i was able to reproduce this error by hitting https://en.wikipedia.org/wiki/Special:Random about 20 times. icinga currently shows 2 mw10xx hosts giving http 500. moments ago it showed 4 hosts in that state. *looks* [05:48:10] just went up to 5. this is probably not within my realm of expertise. [05:48:59] (03CR) 10Glaisher: "causes T87497. please revert until 1.25wmf15 is deployed everywhere." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/183545 (owner: 10Aaron Schulz) [05:49:27] jgage: ^ [05:49:38] that needs to be reverted [05:50:46] thanks. i've only made one change to that repo and i screwed up merging it. let me review the procedure (he said, hoping someone else would appear..) [05:52:06] hmph https://wikitech.wikimedia.org/wiki/SWAT_deploys doesn't actually say how to do it [05:57:33] pretty sure there's a page about it [05:58:03] jgage: https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment#Change_wiki_configuration [05:58:44] was just looking at https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Small_changes:_sync_individual_files [06:00:27] i really don't think it's a good idea for me to do this for the first time at 10pm friday with nobody else around [06:00:40] nor does it seem like it was a good idea to merge that patch at 3pm friday, but what do i know [06:01:17] friday deploys are evil [06:01:39] hey [06:01:44] hi [06:01:51] wanna help unfuck an http 500? :) [06:02:13] (03PS1) 10Faidon Liambotis: Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186578 (https://phabricator.wikimedia.org/T87497) [06:02:30] (03PS2) 10Faidon Liambotis: Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186578 (https://phabricator.wikimedia.org/T87497) [06:02:39] (03CR) 10Faidon Liambotis: [C: 032] Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186578 (https://phabricator.wikimedia.org/T87497) (owner: 10Faidon Liambotis) [06:02:43] (03Merged) 10jenkins-bot: Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186578 (https://phabricator.wikimedia.org/T87497) (owner: 10Faidon Liambotis) [06:02:50] thanks paravoid [06:04:25] !log faidon Synchronized wmf-config/StartProfiler.php: fix for T87497/r186578 (duration: 00m 06s) [06:04:35] Logged the message, Master [06:04:50] thanks paravoid :) [06:15:09] hrm 500s are still occurring according to icinga [06:15:34] yes, debugging [06:22:24] !log faidon Synchronized wmf-config: touched config (duration: 00m 07s) [06:22:31] Logged the message, Master [06:23:59] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: puppet fail [06:27:10] !log mass-restarting hhvm across the cluster [06:27:16] Logged the message, Master [06:28:29] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [06:28:39] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:49] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:50] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:19] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:49] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:58] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:19] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [06:32:09] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:19] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:13] should be fixed now [06:33:26] hhvm wasn't picking up the change, it's a few of us debugging here at the hotel [06:33:55] (03PS1) 10Giuseppe Lavagetto: hhvm: disable stat_cache [puppet] - 10https://gerrit.wikimedia.org/r/186579 [06:34:06] gah figuring out quoting for salt cmd.run is such a pain. how do i run dpkg-query -W -f=\'${Package}\t${Version}\n' coreutils ? [06:34:18] or more generally how do i check a package version across the cluster [06:42:49] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:43:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [06:45:09] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:45:49] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:46:29] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:29] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:46:29] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:47:29] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:29] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:47:39] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:48:29] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:54:40] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: Puppet has 1 failures [07:11:09] RECOVERY - puppet last run on rcs1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [07:24:01] 3operations, Wikimedia-Git-or-Gerrit: Gerrit ssh key changed and does not match puppet - https://phabricator.wikimedia.org/T87287#992903 (10Mattflaschen) >>! In T87287#989130, @akosiaris wrote: > The key mentioned in the puppet repo is used for gerrit replication and is not gerrit's key, so it would not match an... [07:24:13] 3operations, Wikimedia-Git-or-Gerrit: Gerrit ssh key changed and does not match puppet - https://phabricator.wikimedia.org/T87287#992904 (10Mattflaschen) 5Open>3Invalid a:3Mattflaschen [07:51:29] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 1 failures [08:09:08] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [09:18:45] (03PS5) 10Yuvipanda: admin: replace phuedx's key [puppet] - 10https://gerrit.wikimedia.org/r/186420 (owner: 10Phuedx) [09:21:06] 3Beta-Cluster, operations: Minimize differences between beta and production (Tracking) - https://phabricator.wikimedia.org/T87220#992980 (10yuvipanda) a:3yuvipanda [09:51:28] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: Puppet has 1 failures [10:09:08] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [14:50:09] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: puppet fail [15:10:09] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [15:29:47] (03CR) 10Ricordisamoa: "Yes, it has bugs, that's why it was a beta feature." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/185116 (https://phabricator.wikimedia.org/T85541) (owner: 10Jforrester) [15:44:09] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.136:9200/_cluster/health error while fetching: Request timed out. [15:44:19] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.137:9200/_cluster/health error while fetching: Request timed out. [15:46:28] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 1, timed_out: False, active_primary_shards: 42, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 125, initializing_shards: 0, number_of_data_nodes: 3 [15:46:38] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 1, timed_out: False, active_primary_shards: 42, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 125, initializing_shards: 0, number_of_data_nodes: 3 [16:11:45] (03CR) 10Lydia Pintscher: "The problem was that users couldn't connect the bugs it caused to the beta feature. They had no way of knowing that it is the cause of the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/185116 (https://phabricator.wikimedia.org/T85541) (owner: 10Jforrester) [16:42:42] <_joe_> it's kinda weird to wake up and passenger o'clock is so long gone [17:09:40] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [17:15:49] (03CR) 10Reedy: "And just because something has (seemingly major) bugs, doesn't mean that deploying it as a beta feature really is a sensible solution" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/185116 (https://phabricator.wikimedia.org/T85541) (owner: 10Jforrester) [17:27:19] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [17:34:23] (03PS1) 10Glaisher: Standardize the name of interface editor group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186593 (https://phabricator.wikimedia.org/T85731) [17:35:54] (03CR) 10Glaisher: [C: 04-1] "Needs to be coordinated with the communities so that the groups are empty before merging this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186593 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [17:40:10] (03CR) 10Technical 13: [C: 04-1] "See https://phabricator.wikimedia.org/T85731" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186593 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [17:40:22] _joe_: heh, it is always DNS o'clock in labs [17:42:03] (03CR) 10Glaisher: "I don't understand why that should be an issue here because local groups only have interface editing-related rights unlike the global grou" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186593 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [17:50:49] 3Labs, operations: Add Erik Bernhardson to the mwreview group - https://phabricator.wikimedia.org/T87492#993285 (10Dzahn) p:5Triage>3Normal [18:19:15] 3Labs, operations: Add Erik Bernhardson to the mwreview group - https://phabricator.wikimedia.org/T87492#993292 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Done. [19:11:10] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: puppet fail [19:19:13] (03CR) 10Vogone: "In case this is indeed going to be done, the patch looks fine to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186593 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [19:30:58] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:16:32] (03PS1) 10Yuvipanda: scap: Move 'common_scripts' into scripts class [puppet] - 10https://gerrit.wikimedia.org/r/186597 (https://phabricator.wikimedia.org/T87221) [20:16:34] (03PS1) 10Yuvipanda: scap: Move scap master code into own class [puppet] - 10https://gerrit.wikimedia.org/r/186598 (https://phabricator.wikimedia.org/T87221) [20:16:48] ori: ^ if you wanna have a looksie sometime :) [20:20:48] (03PS1) 10Yuvipanda: scap: Clean up absent'd lint related files / packages [puppet] - 10https://gerrit.wikimedia.org/r/186599 [20:25:32] (03PS1) 10Yuvipanda: scap: Move l10nupdate into module [puppet] - 10https://gerrit.wikimedia.org/r/186600 (https://phabricator.wikimedia.org/T87221) [20:27:34] (03PS2) 10Yuvipanda: scap: Move 'common_scripts' into scripts class [puppet] - 10https://gerrit.wikimedia.org/r/186597 (https://phabricator.wikimedia.org/T87221) [20:27:36] (03PS2) 10Yuvipanda: scap: Move scap master code into own class [puppet] - 10https://gerrit.wikimedia.org/r/186598 (https://phabricator.wikimedia.org/T87221) [20:27:38] (03PS2) 10Yuvipanda: scap: Clean up absent'd lint related files / packages [puppet] - 10https://gerrit.wikimedia.org/r/186599 [20:27:40] (03PS2) 10Yuvipanda: scap: Move l10nupdate into module [puppet] - 10https://gerrit.wikimedia.org/r/186600 (https://phabricator.wikimedia.org/T87221) [20:34:15] (03PS1) 10Yuvipanda: scap: Move rsync proxies into module [puppet] - 10https://gerrit.wikimedia.org/r/186601 (https://phabricator.wikimedia.org/T87221) [20:34:21] all of these are wildly untested, btw [20:40:05] (03PS1) 10Yuvipanda: logging: Move fatalmonitor into role::mediawiki::logging [puppet] - 10https://gerrit.wikimedia.org/r/186603 (https://phabricator.wikimedia.org/T87221) [20:41:06] (03CR) 10He7d3r: "Please remember to update pages like" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186593 (https://phabricator.wikimedia.org/T85731) (owner: 10Glaisher) [20:41:34] YuviPanda: my toy was moved :/ [20:42:14] matanya: what was your toy? [20:42:16] fatalmonitor? [20:42:54] compact personal header (a.k.a winter) [20:43:01] oh? [20:43:09] I’ve no idea what that’s about, sorry :( [20:44:04] YuviPanda: http://unicorn.wmflabs.org/winter/ [20:44:25] see uppper right corner. it was a beta feature, and now gone :/ [20:44:33] oh [20:44:49] I’m not sure how or why. Reedy knows perhaps? [20:45:48] maybe [20:46:02] will get no answer until monday most likely [20:46:14] yup [20:46:17] or maybe even longer, perhaps [20:46:19] monday is dev summit [20:46:25] I’m on IRC only because my feet are hurting [20:50:52] 3Beta-Cluster, operations: Renumber apache user/group to uid=48 - https://phabricator.wikimedia.org/T78076#993380 (10yuvipanda) I've been talking to @faidon and @Joe about this over the last few days, hopefully we'll find a way to fix this before end of coming week. [20:53:09] 3Analytics, operations, ops-core: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#973506 (10Tnegrin) Otto, Oliver, Erik -- This is a heads-up that these log messages are going away. My understanding was that we were using them to count HTTPS requests but we use a different mechanism on... [20:56:39] 3operations: Kill network.pp - https://phabricator.wikimedia.org/T87519#993386 (10yuvipanda) 3NEW [20:58:52] YuviPanda: https://gerrit.wikimedia.org/r/#/c/185116/ found the root cause [21:00:06] ay [21:00:07] ah [21:00:07] right [21:33:41] (03CR) 10BryanDavis: "From beta logs:" (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186319 (owner: 10Jdlrobson) [21:42:45] (03PS1) 10BryanDavis: beta: Change ProfilerSimpleText to ProfilerXhprof [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186604 [21:44:27] (03CR) 10BryanDavis: "Should fix "Fatal error: Class undefined: ProfilerSimpleText in /srv/mediawiki/php-master/includes/profiler/Profiler.php on line 80" in be" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186604 (owner: 10BryanDavis) [21:47:41] (03CR) 10Aaron Schulz: [C: 031] beta: Change ProfilerSimpleText to ProfilerXhprof [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186604 (owner: 10BryanDavis) [21:49:24] (03CR) 10BryanDavis: "Using `|` as the regex start/stop char and for alternation is triggering this error in beta:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186392 (owner: 10Reedy) [21:54:05] (03PS1) 10BryanDavis: Fix regex pattern delimiters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186607 [22:22:38] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Puppet has 1 failures [22:23:35] 3Scrum-of-Scrums, RESTBase, operations, Services: Restbase deployment - https://phabricator.wikimedia.org/T1228#993438 (10GWicke) [22:26:38] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [22:26:58] !log Full restart of logstash elasticsearch cluster [22:27:11] Logged the message, Master [22:27:48] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 15 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 11, utimed_out: False, uactive_primary_shards: 42, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 111, uinitializing_shards: 4, unumber_of_data_nodes: 3} [22:27:48] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 15 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 3, uunassigned_shards: 11, utimed_out: False, uactive_primary_shards: 42, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 111, uinitializing_shards: 4, unumber_of_data_nodes: 3} [22:28:58] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 42, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 126, initializing_shards: 0, number_of_data_nodes: 3 [22:28:58] RECOVERY - ElasticSearch health check for shards on logstash1001 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 42, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 126, initializing_shards: 0, number_of_data_nodes: 3 [22:38:19] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [22:39:58] !log Emptied logstash redis lists on all 3 hosts [22:40:06] Logged the message, Master [22:40:09] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:42:25] 3Wikidata, Analytics, wikidata-query-service, operations, Services, MediaWiki-General-or-Unknown: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#993443 (10GWicke) >>! In T84923#968636, @JanZerebecki wrote: > http://www.fedmsg.com might fit this need. It is used/developed by Fed... [23:04:42] 3Scrum-of-Scrums, RESTBase, operations, Services: Restbase deployment - https://phabricator.wikimedia.org/T1228#993464 (10GWicke) [23:05:07] 3Scrum-of-Scrums, RESTBase, operations, Services: Restbase deployment - https://phabricator.wikimedia.org/T1228#21142 (10GWicke) [23:14:53] 3Scrum-of-Scrums, RESTBase, operations, Services: Restbase deployment - https://phabricator.wikimedia.org/T1228#993470 (10GWicke) [23:15:31] 3Scrum-of-Scrums, RESTBase, operations, Services: Restbase deployment - https://phabricator.wikimedia.org/T1228#21142 (10GWicke)