[00:00:04] RoanKattouw, ^d: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150212T0000). [00:00:24] I'll do it [00:00:29] It's just my patch anyway [00:00:46] twentyafterfour: Are you still doing things or am I clear to SWAT? [00:04:46] (03PS1) 10Dzahn: add index.html for static Bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/190118 (https://phabricator.wikimedia.org/T85140) [00:06:04] 3WMF-Legal, operations, Engineering-Community: Implement the Volunteer NDA process in Phabricator - https://phabricator.wikimedia.org/T655#1033026 (10MBrar.WMF) @Qgil We are finalizing the NDA language now. The way it is formatted now requires someone from WMF to fill in an appendix with a sentence or two about,... [00:07:24] 3operations, Phabricator, Wikimedia-Bugzilla: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1033028 (10Dzahn) [00:07:53] 3operations, Phabricator, Wikimedia-Bugzilla: Bugzilla HTML static version and database dump - https://phabricator.wikimedia.org/T1198#1033029 (10Dzahn) [00:12:00] (03CR) 10Aaron Schulz: [C: 032] Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188586 (owner: 10Aaron Schulz) [00:12:14] (03Merged) 10jenkins-bot: Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188586 (owner: 10Aaron Schulz) [00:12:34] (03PS2) 10Dzahn: add index.html for static Bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/190118 (https://phabricator.wikimedia.org/T85140) [00:13:02] !log aaron Synchronized wmf-config/StartProfiler.php: Use ProfilerSectionOnly to handle DB/filebackend entries (duration: 00m 05s) [00:13:10] Logged the message, Master [00:14:58] (03CR) 10Dzahn: [C: 032] add index.html for static Bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/190118 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [00:15:21] !log catrope Synchronized php-1.25wmf16/resources/lib/oojs-ui: SWAT (duration: 00m 06s) [00:15:25] Logged the message, Master [00:18:40] 3operations, Phabricator, Wikimedia-Bugzilla: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1033063 (10Dzahn) >>! In T85140#1003690, @jayvdb wrote: > Also the bug activity is not included the static version > This is very important Activities are now available. see http://s... [00:23:31] (03PS1) 10Dzahn: load mod_rewrite on static Bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/190126 (https://phabricator.wikimedia.org/T85140) [00:25:55] (03CR) 10Dzahn: [C: 032] load mod_rewrite on static Bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/190126 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [00:29:44] This looks broken, like I need someone to rm the build dir for us? https://integration.wikimedia.org/ci/job/mwext-DonationInterface-testextension-zend/422/console [00:30:21] 00:22:53 oojs/oojs-ui: 0.7.0 installed, 0.6.6 required. [00:30:21] 00:22:53 Error: your composer.lock file is not up to date, run "composer update" to install newer dependencies [00:30:32] yep. thx [00:30:47] ori: Isn't it great? :-( [00:31:15] ori: (It'll be fixed when https://gerrit.wikimedia.org/r/#/c/190119/ merges, once RoanKattouw stops DoSing Jenkins. :-)) [00:31:29] hehe okay thanks for the status! [00:33:15] !log catrope Synchronized php-1.25wmf17/resources/lib/oojs-ui: SWAT (duration: 00m 08s) [00:33:22] Logged the message, Master [00:33:31] Well and once Jenkins actually starts reporting results back to Gerrit, which it hasn't been doing [00:35:02] ‘Error uploading file mwstore://local-backend/local-public/6/6b/Brion_Vibber_-_MediaWiki's big _code & usability push_-_2009.pdf’ [00:35:39] * andrewbogott blames brion [00:46:41] RoanKattouw: swat away [00:47:00] twentyafterfour: Thanks. I'm already done :D [00:58:22] (03PS1) 10Dzahn: static-bz: rewrite /show_bug.cgi to static HTML [puppet] - 10https://gerrit.wikimedia.org/r/190132 (https://phabricator.wikimedia.org/T85140) [00:58:43] (03PS1) 10Gilles: Readjust Media Viewer sampling factors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190133 (https://phabricator.wikimedia.org/T89150) [00:58:53] (03CR) 10jenkins-bot: [V: 04-1] Readjust Media Viewer sampling factors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190133 (https://phabricator.wikimedia.org/T89150) (owner: 10Gilles) [01:04:17] doesn't this: ^/show_bug.cgi?id=([0-9]+)$ looks like it would match this: GET /show_bug.cgi?id=13823 [01:05:48] mutante: With escaping? [01:07:22] James_F: nope. ^/show_bug\.cgi\?id=([0-9]+)$ [01:09:14] mutante: ^/show_bug\.cgi\?id=([0-9]+) works for me. [01:09:21] mutante: But the $ makes it break. [01:09:43] oh? but it ends in the number? [01:09:52] Yeah, I don't understand either. [01:10:01] thanks, trying [01:11:04] hrmm, not yet [01:11:11] Obviously with ^ it matches "/show_bug.cgi?id=13823" but not "GET /show_bug.cgi?id=13823" [01:11:34] Not sure what the request format is. :-) [01:11:36] yea, it shouldn't , the GET is just from Apache access log [01:11:49] i just need to match the URL part behind the server name [01:12:27] Oh. [01:12:47] James_F: ah, duh, we can't match the parameter part, of course [01:12:53] from ? on [01:13:03] Oh, hah. [01:14:22] need to use %{QUERY_STRING} [01:15:10] (03CR) 10Dzahn: [C: 04-1] "doesn't match. can't match query string. need to use %{QUERY_STRING}" [puppet] - 10https://gerrit.wikimedia.org/r/190132 (https://phabricator.wikimedia.org/T85140) (owner: 10Dzahn) [01:24:36] @replag [01:24:36] Krinkle: No replag currently. See also "replag all". [01:24:38] http://bots.wmflabs.org/dump/%23wikimedia-operations.htm [01:24:38] @info enwiki [01:24:38] Krinkle: [enwiki: s1] db1052: 10.64.32.22, db1051: 10.64.32.21, db1055: 10.64.32.25, db1057: 10.64.32.27, db1065: 10.64.48.20, db1066: 10.64.48.21, db1072: 10.64.48.27, db1073: 10.64.48.28 [01:24:44] http://bots.wmflabs.org/dump/%23wikimedia-operations.htm [01:24:44] @info s2 [01:24:44] Krinkle: [s2] db1024: 10.64.16.13, db1018: 10.64.16.7, db1036: 10.64.16.25, db1054: 10.64.32.24, db1060: 10.64.32.30, db1063: 10.64.48.16, db1067: 10.64.48.22 [01:24:48] @replag all [01:24:49] Krinkle: [s1] db1038: 0s, db1019: 0s, db1015: 0s, db1027: 0s, db1035: 0s, db1044: 0s; [s2] db1038: 0s, db1019: 0s, db1015: 0s, db1027: 0s, db1035: 0s, db1044: 0s; [s3] db1038: 0s, db1019: 0s, db1015: 0s, db1027: 0s, db1035: 0s, db1044: 0s [01:24:50] Krinkle: [s4] db1040: 0s, db1042: 0s, db1053: 0s, db1056: 0s, db1059: 0s, db1064: 0s, db1068: 0s, db1070: 0s; [s5] db1058: 0s, db1045: 0s, db1026: 0s, db1021: 0s, db1049: 0s, db1071: 0s; [s6] db1038: 0s, db1019: 0s, db1015: 0s, db1027: 0s, db1035: 0s, db1044: 0s; [s7] db1038: 0s, db1019: 0s, db1015: 0s, db1027: 0s, db1035: 0s, db1044: 0s [01:25:58] wow... haven't seen that bot in ahwile [01:31:45] (03PS1) 10Andrew Bogott: Add a shinken test that verifies that wikitech-static is up and running. [puppet] - 10https://gerrit.wikimedia.org/r/190138 [01:35:36] 3operations: Monitor the up-to-date status of wikitech-static - https://phabricator.wikimedia.org/T89323#1033344 (10Andrew) 3NEW [01:50:08] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00666666666667 [01:58:22] 3operations: Monitor the up-to-date status of wikitech-static - https://phabricator.wikimedia.org/T89323#1033397 (10Andrew) [02:00:18] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00666666666667 [02:05:43] (03CR) 10Andrew Bogott: [C: 032] Add a shinken test that verifies that wikitech-static is up and running. [puppet] - 10https://gerrit.wikimedia.org/r/190138 (owner: 10Andrew Bogott) [02:09:58] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Puppet last ran 2 days ago [02:10:15] 3operations, Analytics-Kanban, Analytics-Cluster: Increase and monitor Hadoop NameNode heapsize - https://phabricator.wikimedia.org/T89245#1033403 (10kevinator) p:5Triage>3High [02:10:27] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [02:23:26] !log l10nupdate Synchronized php-1.25wmf16/cache/l10n: (no message) (duration: 00m 01s) [02:23:33] Logged the message, Master [02:24:34] !log LocalisationUpdate completed (1.25wmf16) at 2015-02-12 02:23:30+00:00 [02:24:38] Logged the message, Master [02:24:49] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [02:33:47] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: Puppet last ran 2 days ago [02:34:19] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: Puppet last ran 2 days ago [02:34:35] Should we be using tmpwatch? [02:34:48] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [02:35:18] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [02:40:57] !log l10nupdate Synchronized php-1.25wmf17/cache/l10n: (no message) (duration: 00m 01s) [02:41:04] Logged the message, Master [02:42:04] !log LocalisationUpdate completed (1.25wmf17) at 2015-02-12 02:41:01+00:00 [02:42:08] Logged the message, Master [02:53:07] !log breaking wikitech-static on purpose to test the shinken alert [02:53:12] Logged the message, Master [02:59:19] Interesting, tmpreaper's security warning about an attack where setuid is the victim, and replacing its tmp file after it's removed by reaper, and then creating your own and using SIGCONT to proceed setuid [02:59:21] Fascinating [03:00:38] 3operations: Monitor the up-to-date status of wikitech-static - https://phabricator.wikimedia.org/T89323#1033440 (10Dzahn) maybe just check the mtime of one of the files in the filesystem, to make sure files have been written on the remote host without having to go through actual wiki. there's check_file_age on... [03:02:18] !log restarting wikitech-static. shinken works! [03:02:26] Logged the message, Master [03:10:34] andrewbogott: is virt1000 firewall in puppet? [03:10:47] springle: yes [03:11:37] andrewbogott: want to poke a hole to allow virt1000 mysql to be monitored from db1011 (the tendril box). where should i look in puppet? [03:12:23] modules/openstack/manifests/filrewall.pp [03:12:26] (Sorry, just found it) [03:12:31] it might help us track down some OOMs [03:12:34] tnx [03:12:40] poorly labeled but that’s the firewall for the controller [03:13:26] (03PS1) 10Ori.livneh: vbench: code clean-ups [puppet] - 10https://gerrit.wikimedia.org/r/190145 [03:13:30] springle: there are almost certainly some obsolete rules in there, feel free to clean up if you’re so moved [03:13:42] :) [03:13:54] (03PS2) 10Ori.livneh: vbench: code clean-ups [puppet] - 10https://gerrit.wikimedia.org/r/190145 [03:14:00] (03CR) 10Ori.livneh: [C: 032 V: 032] vbench: code clean-ups [puppet] - 10https://gerrit.wikimedia.org/r/190145 (owner: 10Ori.livneh) [03:15:17] andrewbogott: integration-slave1006 shows df: ‘/public/backups’: Stale file handle [03:15:25] We're not using that, but thought you might wanna know. [03:15:52] Krinkle: that’s because of an out-of-date puppetmaster [03:16:19] j [03:16:20] oK [03:16:29] shouldn’t matter anyway [03:22:33] (03PS1) 10Springle: Poke mysql holes in virt1000 firewall for iron and monitoring. [puppet] - 10https://gerrit.wikimedia.org/r/190146 [03:27:49] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [03:31:51] (03PS1) 10Dzahn: wikitech - add ferm rules for http/https [puppet] - 10https://gerrit.wikimedia.org/r/190147 [03:31:54] hmm. it seems that git deploy has not updated submodule in cxserver. [03:35:19] kart_: (from #wikimedia-releng) What's the context? Is this during a manual deployment? [03:35:40] Doing git pull and submodule update needs tobe done manually on tin between deploy start and deploy sync [03:36:04] Did it not update the submodule on the servers it syncs too? [03:36:08] (03CR) 10Andrew Bogott: [C: 031] Poke mysql holes in virt1000 firewall for iron and monitoring. [puppet] - 10https://gerrit.wikimedia.org/r/190146 (owner: 10Springle) [03:36:30] Krinkle: yep. submodule is not updated. [03:36:40] kart_: On tin or on the target server(s)? [03:36:46] target [03:36:55] tin is updated. [03:37:06] That sounds like a bug. [03:37:08] Yeah :) [03:37:14] It's working for integration/slave-scripts though [03:37:32] But we stopped using submodules some time ago [03:37:37] Krinkle: workaround? I'll file bug report. [03:37:43] Don't know. [03:37:58] Krinkle: manually update on target? (well, bad idea) [03:38:28] kart_: Whatever you do, !log it. [03:38:36] kart_: I'd revert deployment, file a bug, and log the action. [03:38:43] And scream on engineering-l [03:39:22] Krinkle: how to revert? :) [03:40:20] kart_: If it didn't update the submodule, you won' thave to worry about undoing that on the target server. So yiu're only worried about the main git tree. Which is trivially reverted by checking out the previously deployed commit before your change, then updating submodule locally on tin to match that commit. [03:40:28] kart_: and then another deploy start/sync. [03:41:59] You should be aware of which commit(s) your'e deploying, but if not, git-deploy nicely creates tags to track this. Running eg. `git log --oneline --decorate --graph` will show the previous tag somewhere, which you can check out. [03:42:06] Or git reflog if you cant find it [03:42:53] Noted. Thanks a lot. [03:42:57] Filing bug [03:43:05] Let me now how it goes, I've not done this so far. [03:44:51] 3operations, Phabricator: Mysql search issues flagged by Phabricator setup - https://phabricator.wikimedia.org/T89274#1033490 (10Springle) +1 to all three options, imo. The ARIA engine uses the same fulltext stopword list as MyISAM did, which is fairly long[1]. We also need to increase the aria_pagecache_buffer... [03:45:41] (03CR) 10Springle: [C: 032] Poke mysql holes in virt1000 firewall for iron and monitoring. [puppet] - 10https://gerrit.wikimedia.org/r/190146 (owner: 10Springle) [03:46:18] Krinkle: https://phabricator.wikimedia.org/T89328 [03:46:48] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [03:47:16] kart_: cool [03:47:40] kart_: Was this a scheduled deployment? In that case do log that it was reverted with ref to the bug. [04:08:16] (03PS1) 10Dzahn: remove heavily outdated ipmi entries in esams [dns] - 10https://gerrit.wikimedia.org/r/190150 [04:09:16] (03PS2) 10Dzahn: remove heavily outdated ipmi entries in esams [dns] - 10https://gerrit.wikimedia.org/r/190150 [04:09:45] !log sign puppet cert dbproxy1003, first run [04:09:51] Logged the message, Master [04:10:39] (03PS3) 10Dzahn: remove old ipmi entries in esams [dns] - 10https://gerrit.wikimedia.org/r/190150 [04:15:23] (03PS1) 10Springle: assign dbproxy1003 to m3 [puppet] - 10https://gerrit.wikimedia.org/r/190151 [04:16:12] 3operations: install/deploy dbproxy1003 through dbproxy1011 - https://phabricator.wikimedia.org/T86958#1033520 (10Springle) dbproxy1003 signed and deployed [04:16:43] (03CR) 10Springle: [C: 032] assign dbproxy1003 to m3 [puppet] - 10https://gerrit.wikimedia.org/r/190151 (owner: 10Springle) [04:32:28] (03PS1) 10Springle: move m1-master CNAME to dbproxy1001 [dns] - 10https://gerrit.wikimedia.org/r/190152 [04:32:54] (03CR) 10Springle: [C: 032] move m1-master CNAME to dbproxy1001 [dns] - 10https://gerrit.wikimedia.org/r/190152 (owner: 10Springle) [04:38:48] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: puppet fail [04:39:17] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: puppet fail [04:42:08] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: puppet fail [04:42:27] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: puppet fail [04:42:38] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: puppet fail [04:42:48] PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: puppet fail [04:42:58] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: puppet fail [04:42:58] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: puppet fail [04:43:18] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [04:43:18] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: puppet fail [04:43:18] PROBLEM - puppet last run on cp3015 is CRITICAL: CRITICAL: puppet fail [04:43:18] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: puppet fail [04:43:29] PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: puppet fail [04:43:38] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: puppet fail [04:43:38] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: puppet fail [04:43:38] PROBLEM - puppet last run on mw1038 is CRITICAL: CRITICAL: puppet fail [04:43:38] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: puppet fail [04:43:48] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail [04:43:48] PROBLEM - puppet last run on mw1048 is CRITICAL: CRITICAL: puppet fail [04:43:48] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: puppet fail [04:43:57] PROBLEM - puppet last run on mw1059 is CRITICAL: CRITICAL: puppet fail [04:43:58] PROBLEM - puppet last run on wtp1019 is CRITICAL: CRITICAL: puppet fail [04:43:58] PROBLEM - puppet last run on mw1005 is CRITICAL: CRITICAL: puppet fail [04:43:58] PROBLEM - puppet last run on analytics1017 is CRITICAL: CRITICAL: puppet fail [04:43:58] PROBLEM - puppet last run on mw1006 is CRITICAL: CRITICAL: puppet fail [04:44:07] PROBLEM - puppet last run on es2005 is CRITICAL: CRITICAL: puppet fail [04:44:08] PROBLEM - puppet last run on mw1106 is CRITICAL: CRITICAL: puppet fail [04:44:18] PROBLEM - puppet last run on labstore2001 is CRITICAL: CRITICAL: puppet fail [04:44:19] PROBLEM - puppet last run on mw1067 is CRITICAL: CRITICAL: puppet fail [04:44:22] hmm [04:44:28] PROBLEM - puppet last run on mw1012 is CRITICAL: CRITICAL: puppet fail [04:44:29] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: puppet fail [04:44:38] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: puppet fail [04:44:38] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: puppet fail [04:44:38] PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: puppet fail [04:44:39] PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: puppet fail [04:44:48] PROBLEM - puppet last run on mw1145 is CRITICAL: CRITICAL: puppet fail [04:44:48] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: puppet fail [04:44:48] PROBLEM - puppet last run on ms-be1004 is CRITICAL: CRITICAL: puppet fail [04:44:48] PROBLEM - puppet last run on analytics1025 is CRITICAL: CRITICAL: puppet fail [04:44:48] PROBLEM - puppet last run on mw1141 is CRITICAL: CRITICAL: puppet fail [04:44:58] PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: puppet fail [04:45:03] does puppet master need a kick again? [04:45:07] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: puppet fail [04:45:08] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail [04:45:08] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: puppet fail [04:45:08] PROBLEM - puppet last run on db2019 is CRITICAL: CRITICAL: puppet fail [04:45:08] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: puppet fail [04:45:08] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: puppet fail [04:45:17] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: puppet fail [04:45:17] PROBLEM - puppet last run on analytics1041 is CRITICAL: CRITICAL: puppet fail [04:45:18] PROBLEM - puppet last run on dbstore2001 is CRITICAL: CRITICAL: puppet fail [04:45:18] actually i broke it [04:45:18] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: puppet fail [04:45:27] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: puppet fail [04:45:27] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: puppet fail [04:45:27] PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: puppet fail [04:45:28] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [04:45:28] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: puppet fail [04:45:28] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: puppet fail [04:45:38] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: puppet fail [04:45:38] PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: puppet fail [04:45:38] PROBLEM - puppet last run on platinum is CRITICAL: CRITICAL: puppet fail [04:45:38] PROBLEM - puppet last run on wtp1006 is CRITICAL: CRITICAL: puppet fail [04:45:47] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: puppet fail [04:45:48] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail [04:45:48] PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: puppet fail [04:45:48] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: puppet fail [04:45:57] PROBLEM - puppet last run on mw1187 is CRITICAL: CRITICAL: puppet fail [04:45:57] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: puppet fail [04:45:58] PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: puppet fail [04:45:58] PROBLEM - puppet last run on gold is CRITICAL: CRITICAL: puppet fail [04:45:58] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: puppet fail [04:45:58] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: puppet fail [04:45:59] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: puppet fail [04:46:08] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: puppet fail [04:46:08] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: puppet fail [04:46:08] PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail [04:46:08] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [04:46:08] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: puppet fail [04:46:09] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: puppet fail [04:46:17] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: puppet fail [04:46:17] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: puppet fail [04:46:18] PROBLEM - puppet last run on heze is CRITICAL: CRITICAL: puppet fail [04:46:18] PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail [04:46:18] PROBLEM - puppet last run on elastic1008 is CRITICAL: CRITICAL: puppet fail [04:46:18] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: puppet fail [04:46:28] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: puppet fail [04:46:28] PROBLEM - puppet last run on sca1002 is CRITICAL: CRITICAL: puppet fail [04:46:28] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: puppet fail [04:46:28] PROBLEM - puppet last run on es2001 is CRITICAL: CRITICAL: puppet fail [04:46:28] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [04:46:29] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: puppet fail [04:46:29] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: puppet fail [04:46:29] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: puppet fail [04:46:29] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: puppet fail [04:46:38] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: puppet fail [04:46:38] PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: puppet fail [04:46:47] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: puppet fail [04:46:48] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: puppet fail [04:46:49] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: puppet fail [04:46:57] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: puppet fail [04:46:58] PROBLEM - puppet last run on mw1217 is CRITICAL: CRITICAL: puppet fail [04:46:58] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: puppet fail [04:46:58] PROBLEM - puppet last run on capella is CRITICAL: CRITICAL: puppet fail [04:46:59] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: puppet fail [04:47:18] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: puppet fail [04:47:19] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: puppet fail [04:47:19] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: puppet fail [04:47:19] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: puppet fail [04:47:27] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: puppet fail [04:47:27] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: puppet fail [04:47:27] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: puppet fail [04:47:28] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: puppet fail [04:47:28] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: puppet fail [04:47:37] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: puppet fail [04:47:38] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: puppet fail [04:47:38] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: puppet fail [04:47:38] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: puppet fail [04:47:48] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: puppet fail [04:47:57] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: puppet fail [04:47:57] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: puppet fail [04:47:58] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: puppet fail [04:47:59] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: puppet fail [04:48:07] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: puppet fail [04:48:07] PROBLEM - puppet last run on elastic1022 is CRITICAL: CRITICAL: puppet fail [04:48:07] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: puppet fail [04:48:08] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: puppet fail [04:48:17] PROBLEM - puppet last run on labcontrol2001 is CRITICAL: CRITICAL: puppet fail [04:48:18] PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: puppet fail [04:48:27] PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: puppet fail [04:48:28] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: puppet fail [04:48:28] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: puppet fail [04:48:37] PROBLEM - puppet last run on amssq48 is CRITICAL: CRITICAL: puppet fail [04:48:37] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: puppet fail [04:48:38] PROBLEM - puppet last run on lithium is CRITICAL: CRITICAL: puppet fail [04:48:38] PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: puppet fail [04:48:38] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [04:48:48] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: puppet fail [04:48:57] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: puppet fail [04:48:57] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: puppet fail [04:48:58] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: puppet fail [04:48:58] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: puppet fail [04:48:58] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: puppet fail [04:48:58] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: puppet fail [04:48:58] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: puppet fail [04:48:59] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: puppet fail [04:48:59] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail [04:49:07] PROBLEM - puppet last run on virt1003 is CRITICAL: CRITICAL: puppet fail [04:49:07] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: puppet fail [04:49:08] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: puppet fail [04:49:08] PROBLEM - puppet last run on amssq60 is CRITICAL: CRITICAL: puppet fail [04:49:08] PROBLEM - puppet last run on wtp1012 is CRITICAL: CRITICAL: puppet fail [04:49:17] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: puppet fail [04:49:17] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: puppet fail [04:49:18] PROBLEM - puppet last run on pc1002 is CRITICAL: CRITICAL: puppet fail [04:49:18] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: puppet fail [04:49:18] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: puppet fail [04:49:18] PROBLEM - puppet last run on nembus is CRITICAL: CRITICAL: puppet fail [04:49:18] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: puppet fail [04:49:19] PROBLEM - puppet last run on mw1076 is CRITICAL: CRITICAL: puppet fail [04:49:19] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: puppet fail [04:49:27] PROBLEM - puppet last run on polonium is CRITICAL: CRITICAL: puppet fail [04:49:28] PROBLEM - puppet last run on db1003 is CRITICAL: CRITICAL: puppet fail [04:49:28] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: puppet fail [04:49:28] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: puppet fail [04:49:28] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: puppet fail [04:49:28] PROBLEM - puppet last run on labnet1001 is CRITICAL: CRITICAL: puppet fail [04:49:28] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: puppet fail [04:49:29] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: puppet fail [04:49:39] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: puppet fail [04:49:39] PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: puppet fail [04:49:39] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: puppet fail [04:49:39] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: puppet fail [04:49:48] PROBLEM - puppet last run on virt1004 is CRITICAL: CRITICAL: puppet fail [04:49:48] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: puppet fail [04:49:49] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: puppet fail [04:49:49] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: puppet fail [04:49:49] PROBLEM - puppet last run on db2042 is CRITICAL: CRITICAL: puppet fail [04:49:49] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [04:49:49] PROBLEM - puppet last run on labstore1001 is CRITICAL: CRITICAL: puppet fail [04:49:49] PROBLEM - puppet last run on db1016 is CRITICAL: CRITICAL: puppet fail [04:49:57] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: puppet fail [04:49:57] PROBLEM - puppet last run on mc1014 is CRITICAL: CRITICAL: puppet fail [04:49:58] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: puppet fail [04:49:58] PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: puppet fail [04:49:58] PROBLEM - puppet last run on mw1049 is CRITICAL: CRITICAL: puppet fail [04:49:58] PROBLEM - puppet last run on mw1051 is CRITICAL: CRITICAL: puppet fail [04:49:58] PROBLEM - puppet last run on mw1079 is CRITICAL: CRITICAL: puppet fail [04:49:59] PROBLEM - puppet last run on wtp1005 is CRITICAL: CRITICAL: puppet fail [04:49:59] PROBLEM - puppet last run on elastic1006 is CRITICAL: CRITICAL: puppet fail [04:50:08] PROBLEM - puppet last run on analytics1026 is CRITICAL: CRITICAL: puppet fail [04:50:08] PROBLEM - puppet last run on es2004 is CRITICAL: CRITICAL: puppet fail [04:50:08] PROBLEM - puppet last run on elastic1024 is CRITICAL: CRITICAL: puppet fail [04:50:08] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: puppet fail [04:50:08] PROBLEM - puppet last run on analytics1016 is CRITICAL: CRITICAL: puppet fail [04:50:08] PROBLEM - puppet last run on snapshot1002 is CRITICAL: CRITICAL: puppet fail [04:50:09] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: puppet fail [04:50:09] PROBLEM - puppet last run on analytics1022 is CRITICAL: CRITICAL: puppet fail [04:50:09] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: puppet fail [04:50:17] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: puppet fail [04:50:18] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: puppet fail [04:50:18] PROBLEM - puppet last run on mw1149 is CRITICAL: CRITICAL: puppet fail [04:50:18] PROBLEM - puppet last run on mw1014 is CRITICAL: CRITICAL: puppet fail [04:50:18] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: puppet fail [04:50:18] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: puppet fail [04:50:18] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: puppet fail [04:50:19] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: puppet fail [04:50:19] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: puppet fail [04:50:20] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: puppet fail [04:50:20] PROBLEM - puppet last run on rbf2002 is CRITICAL: CRITICAL: puppet fail [04:50:21] PROBLEM - puppet last run on virt1001 is CRITICAL: CRITICAL: puppet fail [04:50:27] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: puppet fail [04:50:27] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: puppet fail [04:50:28] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: puppet fail [04:50:28] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: puppet fail [04:50:28] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: puppet fail [04:50:28] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: puppet fail [04:50:28] PROBLEM - puppet last run on labmon1001 is CRITICAL: CRITICAL: puppet fail [04:50:37] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: puppet fail [04:50:37] PROBLEM - puppet last run on mw1111 is CRITICAL: CRITICAL: puppet fail [04:50:38] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: puppet fail [04:50:38] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: puppet fail [04:50:38] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: puppet fail [04:50:38] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: puppet fail [04:50:38] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: puppet fail [04:50:39] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: puppet fail [04:50:39] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: puppet fail [04:50:48] PROBLEM - puppet last run on mw1180 is CRITICAL: CRITICAL: puppet fail [04:50:49] PROBLEM - puppet last run on analytics1002 is CRITICAL: CRITICAL: puppet fail [04:50:49] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: puppet fail [04:50:57] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: puppet fail [04:50:58] PROBLEM - puppet last run on plutonium is CRITICAL: CRITICAL: puppet fail [04:51:18] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: puppet fail [04:51:27] PROBLEM - puppet last run on db1004 is CRITICAL: CRITICAL: puppet fail [04:51:28] PROBLEM - puppet last run on rubidium is CRITICAL: CRITICAL: puppet fail [04:51:40] should come back... [04:51:49] PROBLEM - puppet last run on mw1050 is CRITICAL: CRITICAL: puppet fail [04:51:49] PROBLEM - puppet last run on mw1030 is CRITICAL: CRITICAL: puppet fail [04:51:49] PROBLEM - puppet last run on elastic1011 is CRITICAL: CRITICAL: puppet fail [04:51:49] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: puppet fail [04:51:49] PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: puppet fail [04:51:49] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: puppet fail [04:51:49] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: puppet fail [04:51:50] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: puppet fail [04:51:50] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: puppet fail [04:51:57] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: puppet fail [04:51:57] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: puppet fail [04:51:57] PROBLEM - puppet last run on rhenium is CRITICAL: CRITICAL: puppet fail [04:51:57] PROBLEM - puppet last run on argon is CRITICAL: CRITICAL: puppet fail [04:51:58] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: puppet fail [04:51:58] PROBLEM - puppet last run on analytics1023 is CRITICAL: CRITICAL: puppet fail [04:51:58] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: puppet fail [04:51:59] PROBLEM - puppet last run on mw1087 is CRITICAL: CRITICAL: puppet fail [04:52:07] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: puppet fail [04:52:07] PROBLEM - puppet last run on elastic1015 is CRITICAL: CRITICAL: puppet fail [04:52:07] PROBLEM - puppet last run on virt1007 is CRITICAL: CRITICAL: puppet fail [04:52:07] PROBLEM - puppet last run on mw1053 is CRITICAL: CRITICAL: puppet fail [04:52:08] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: puppet fail [04:52:08] PROBLEM - puppet last run on elastic1014 is CRITICAL: CRITICAL: puppet fail [04:52:08] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: puppet fail [04:52:09] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: puppet fail [04:52:09] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: puppet fail [04:52:10] PROBLEM - puppet last run on wtp1023 is CRITICAL: CRITICAL: puppet fail [04:52:10] PROBLEM - puppet last run on lvs1001 is CRITICAL: CRITICAL: puppet fail [04:52:11] PROBLEM - puppet last run on mw1074 is CRITICAL: CRITICAL: puppet fail [04:52:11] PROBLEM - puppet last run on analytics1037 is CRITICAL: CRITICAL: puppet fail [04:52:12] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail [04:52:27] PROBLEM - puppet last run on virt1010 is CRITICAL: CRITICAL: puppet fail [04:52:27] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: puppet fail [04:52:27] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: puppet fail [04:52:27] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: puppet fail [04:52:28] PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: puppet fail [04:52:28] PROBLEM - puppet last run on haedus is CRITICAL: CRITICAL: puppet fail [04:52:28] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: puppet fail [04:52:29] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [04:52:29] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: puppet fail [04:52:30] PROBLEM - puppet last run on db2004 is CRITICAL: CRITICAL: puppet fail [04:52:30] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [04:52:31] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [04:52:31] PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: puppet fail [04:52:32] PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: puppet fail [04:52:47] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: puppet fail [04:52:47] PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: puppet fail [04:52:48] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: puppet fail [04:52:48] PROBLEM - puppet last run on db2037 is CRITICAL: CRITICAL: puppet fail [04:52:48] PROBLEM - puppet last run on db2023 is CRITICAL: CRITICAL: puppet fail [04:52:48] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [04:52:48] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail [04:52:49] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: puppet fail [04:52:49] PROBLEM - puppet last run on hooft is CRITICAL: CRITICAL: puppet fail [04:52:50] PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: puppet fail [04:52:58] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: puppet fail [04:52:58] PROBLEM - puppet last run on protactinium is CRITICAL: CRITICAL: puppet fail [04:52:58] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: puppet fail [04:52:58] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: puppet fail [04:52:58] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: puppet fail [04:52:58] PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: puppet fail [04:52:59] PROBLEM - puppet last run on db1057 is CRITICAL: CRITICAL: puppet fail [04:53:08] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: puppet fail [04:53:08] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: puppet fail [04:53:08] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: puppet fail [04:53:08] PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: puppet fail [04:53:08] PROBLEM - puppet last run on titanium is CRITICAL: CRITICAL: puppet fail [04:53:08] PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: puppet fail [04:53:09] PROBLEM - puppet last run on analytics1011 is CRITICAL: CRITICAL: puppet fail [04:53:17] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: puppet fail [04:53:17] PROBLEM - puppet last run on wtp1011 is CRITICAL: CRITICAL: puppet fail [04:53:17] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: puppet fail [04:53:18] PROBLEM - puppet last run on mc1013 is CRITICAL: CRITICAL: puppet fail [04:53:18] PROBLEM - puppet last run on es1002 is CRITICAL: CRITICAL: puppet fail [04:53:18] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: puppet fail [04:53:18] PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: puppet fail [04:53:20] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Feb 12 04:52:16 UTC 2015 (duration 52m 15s) [04:53:26] Logged the message, Master [04:53:28] PROBLEM - puppet last run on elastic1002 is CRITICAL: CRITICAL: puppet fail [04:53:28] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: puppet fail [04:53:28] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: puppet fail [04:53:28] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: puppet fail [04:53:28] PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: puppet fail [04:53:28] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: puppet fail [04:53:48] PROBLEM - puppet last run on elastic1005 is CRITICAL: CRITICAL: puppet fail [04:53:48] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: puppet fail [04:54:07] PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: puppet fail [04:54:29] !log broke puppet db grant. fixed puppet db grant [04:54:33] Logged the message, Master [04:58:08] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [04:58:48] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [05:00:08] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [05:00:48] RECOVERY - puppet last run on elastic1026 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [05:00:58] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [05:01:18] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:01:28] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [05:01:37] RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [05:01:38] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [05:01:38] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [05:01:47] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [05:01:48] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [05:01:57] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [05:01:58] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [05:01:58] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [05:02:08] RECOVERY - puppet last run on es2005 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [05:02:18] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [05:02:28] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [05:02:29] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [05:02:38] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [05:02:48] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [05:02:49] RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [05:02:49] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [05:02:49] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:02:49] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [05:03:08] RECOVERY - puppet last run on analytics1017 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:03:08] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:03:08] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [05:03:09] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [05:03:09] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [05:03:18] RECOVERY - puppet last run on dbstore2001 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:03:27] RECOVERY - puppet last run on labstore2001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [05:03:28] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:03:28] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:03:28] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [05:03:28] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [05:03:28] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [05:03:28] RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [05:03:39] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:03:39] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [05:03:39] RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [05:03:40] RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [05:03:40] RECOVERY - puppet last run on platinum is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [05:03:40] RECOVERY - puppet last run on wtp1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:03:48] RECOVERY - puppet last run on potassium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:03:48] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [05:03:48] RECOVERY - puppet last run on analytics1025 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:03:48] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [05:03:57] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [05:03:58] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [05:03:58] RECOVERY - puppet last run on gold is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:03:58] RECOVERY - puppet last run on mw1059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:04:07] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [05:04:07] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:04:08] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:04:08] RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:04:08] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [05:04:17] RECOVERY - puppet last run on db2019 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [05:04:18] RECOVERY - puppet last run on analytics1041 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:04:18] RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [05:04:18] RECOVERY - puppet last run on heze is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [05:04:18] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [05:04:27] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [05:04:28] RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:04:28] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [05:04:28] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [05:04:28] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [05:04:38] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:04:39] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:04:39] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [05:04:39] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:04:48] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [05:04:48] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [05:04:57] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:04:58] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [05:04:58] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [05:04:58] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [05:04:58] RECOVERY - puppet last run on mw1217 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:04:58] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [05:04:58] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [05:04:59] (03CR) 10Cenarium: "No it would not since $wgFlaggedRevsRestrictionLevels = array( '', 'autoconfirmed', 'review' );. It checks for autoconfirmed and not autor" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189661 (owner: 10Cenarium) [05:05:07] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [05:05:08] RECOVERY - puppet last run on capella is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:05:08] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:05:08] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [05:05:08] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [05:05:18] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [05:05:18] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:05:18] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [05:05:18] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [05:05:19] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:05:27] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [05:05:28] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [05:05:28] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [05:05:28] RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [05:05:28] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:05:28] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [05:05:28] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [05:05:29] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [05:05:29] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [05:05:30] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [05:05:38] RECOVERY - puppet last run on es2001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [05:05:38] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [05:05:38] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [05:05:38] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:05:38] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [05:05:39] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [05:05:39] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [05:05:48] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [05:05:58] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [05:05:58] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:06:07] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:06:07] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [05:06:08] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [05:06:08] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [05:06:18] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [05:06:18] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [05:06:27] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:06:27] RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [05:06:28] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [05:06:28] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:06:37] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [05:06:38] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [05:06:38] RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [05:06:38] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [05:06:57] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [05:06:57] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [05:06:58] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:06:58] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [05:06:58] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [05:07:08] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [05:07:08] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:07:08] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [05:07:17] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [05:07:18] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [05:07:27] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [05:07:28] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:07:37] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [05:07:38] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [05:07:39] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [05:07:48] RECOVERY - puppet last run on labstore1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [05:07:48] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [05:07:57] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:07:58] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [05:07:58] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:08:07] RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:08:08] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [05:08:08] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [05:08:08] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [05:08:18] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [05:08:18] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [05:08:18] RECOVERY - puppet last run on pc1002 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [05:08:28] RECOVERY - puppet last run on virt1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:08:28] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:08:28] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [05:08:28] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [05:08:28] RECOVERY - puppet last run on nembus is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [05:08:29] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:08:29] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [05:08:38] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [05:08:38] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [05:08:47] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:08:47] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:08:48] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:08:48] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [05:08:48] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [05:08:48] RECOVERY - puppet last run on db2042 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [05:08:49] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [05:08:57] RECOVERY - puppet last run on argon is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [05:08:58] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:08:58] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:09:08] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:09:08] RECOVERY - puppet last run on virt1007 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:09:08] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:09:08] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [05:09:08] RECOVERY - puppet last run on elastic1006 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:09:08] RECOVERY - puppet last run on analytics1026 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [05:09:08] RECOVERY - puppet last run on elastic1024 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [05:09:09] RECOVERY - puppet last run on es2004 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [05:09:09] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [05:09:17] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:09:17] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [05:09:18] RECOVERY - puppet last run on wtp1012 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [05:09:18] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:09:18] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [05:09:27] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [05:09:27] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [05:09:28] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [05:09:28] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:09:28] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:09:28] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [05:09:28] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:09:29] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [05:09:29] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [05:09:30] RECOVERY - puppet last run on mw1076 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [05:09:30] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [05:09:31] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [05:09:31] RECOVERY - puppet last run on rbf2002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [05:09:32] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:09:48] RECOVERY - puppet last run on lithium is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:09:48] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:09:48] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [05:09:48] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [05:09:48] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [05:09:49] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:09:49] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [05:09:49] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [05:09:58] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [05:09:58] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [05:09:58] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [05:09:58] RECOVERY - puppet last run on analytics1002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [05:09:58] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [05:09:58] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [05:09:59] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:09:59] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [05:10:00] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:10:00] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [05:10:01] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [05:10:01] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [05:10:07] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:10:08] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:10:08] RECOVERY - puppet last run on plutonium is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:10:08] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:10:08] RECOVERY - puppet last run on mw1049 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [05:10:08] RECOVERY - puppet last run on elastic1015 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [05:10:08] RECOVERY - puppet last run on mw1051 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:10:09] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:10:09] RECOVERY - puppet last run on mw1057 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [05:10:21] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [05:10:27] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [05:10:27] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:10:28] RECOVERY - puppet last run on virt1010 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [05:10:28] RECOVERY - puppet last run on db1004 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [05:10:28] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:10:37] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:10:38] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [05:10:38] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:10:38] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [05:10:39] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [05:10:47] RECOVERY - puppet last run on mw1034 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [05:10:47] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [05:10:48] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [05:10:48] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [05:10:48] RECOVERY - puppet last run on mw1050 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:10:48] RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [05:10:49] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:10:57] RECOVERY - puppet last run on mw1030 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [05:10:58] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:10:58] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [05:10:58] RECOVERY - puppet last run on wtp1015 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:11:08] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:11:17] RECOVERY - puppet last run on titanium is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:11:18] RECOVERY - puppet last run on elastic1014 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [05:11:18] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [05:11:18] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:11:18] RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:11:18] RECOVERY - puppet last run on wtp1011 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [05:11:18] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [05:11:19] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [05:11:19] RECOVERY - puppet last run on es1002 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [05:11:20] RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [05:11:27] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [05:11:27] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [05:11:28] RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [05:11:28] RECOVERY - puppet last run on elastic1002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [05:11:28] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [05:11:28] RECOVERY - puppet last run on mc1007 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [05:11:28] RECOVERY - puppet last run on rcs1002 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [05:11:29] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:11:38] RECOVERY - puppet last run on es2007 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [05:11:38] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:11:47] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [05:11:48] RECOVERY - puppet last run on elastic1005 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [05:11:48] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:11:48] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:11:59] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:11:59] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [05:11:59] RECOVERY - puppet last run on protactinium is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [05:11:59] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:11:59] RECOVERY - puppet last run on elastic1011 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:11:59] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [05:11:59] RECOVERY - puppet last run on mw1097 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [05:12:00] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [05:12:00] RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:12:01] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [05:12:01] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [05:12:08] RECOVERY - puppet last run on db1057 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [05:12:09] RECOVERY - puppet last run on mw1087 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:12:18] RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [05:12:18] RECOVERY - puppet last run on mw1074 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [05:12:18] RECOVERY - puppet last run on analytics1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:12:18] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [05:12:18] RECOVERY - puppet last run on mc1013 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [05:12:18] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [05:12:18] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:12:27] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [05:12:28] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [05:12:28] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:12:37] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:12:37] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [05:12:38] RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:12:38] RECOVERY - puppet last run on haedus is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [05:12:38] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:12:38] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [05:12:47] RECOVERY - puppet last run on mw1032 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [05:12:48] RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:12:48] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [05:12:58] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [05:13:28] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:39:03] (03PS1) 10GWicke: Remove preheat_kernel_page_cache option [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/190156 [06:18:49] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Puppet has 1 failures [06:27:49] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:47] PROBLEM - puppet last run on db2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:58] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:08] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:48] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:58] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:59] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:38] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:48] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:37:57] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:45:48] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:46:58] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:46:59] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:48:09] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:49:58] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [07:16:35] (03PS3) 10Giuseppe Lavagetto: mediawiki: rewrite /w/wiki.phtml on HHVM [puppet] - 10https://gerrit.wikimedia.org/r/189697 [07:23:06] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: rewrite /w/wiki.phtml on HHVM [puppet] - 10https://gerrit.wikimedia.org/r/189697 (owner: 10Giuseppe Lavagetto) [07:33:08] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [07:33:38] <_joe__> I know [07:33:48] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [07:34:18] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [07:34:49] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [07:41:17] 3operations: Scribunto_LuaInterpreterNotFoundError in production - https://phabricator.wikimedia.org/T88942#1033606 (10Joe) 5Open>3Resolved [08:10:53] (03CR) 10Cenarium: "I've checked FlaggedRevs code and the permission hook checks for the restriction on top of the 'autoreview' userright, and not as I though" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189661 (owner: 10Cenarium) [08:23:21] good morning [08:23:39] I am going to make the puppet-lint Jenkins job voting and thus vote -1 whenever it detects an error [08:23:43] (not warnings though) [08:27:41] 3operations: Make Puppet repository pass lenient and strict lint checks - https://phabricator.wikimedia.org/T87132#1033657 (10hashar) The Jenkins job (puppet-lint lenient) now complains whenever a puppet-lint error is detected. Huge thanks to everyone that made this possible. Still have to fix up or ignore the... [08:28:02] !log puppet-lint now complains on error (not warnings) \O/ {{bug:T87132}} [08:28:09] Logged the message, Master [08:29:59] hashar: nice! [09:25:57] (03CR) 10Alexandros Kosiaris: [C: 032] icinga: use DNS, not IP in wikidata check [puppet] - 10https://gerrit.wikimedia.org/r/189475 (owner: 10Alexandros Kosiaris) [09:26:20] (03CR) 10Alexandros Kosiaris: [C: 032] Make corp LDAP mirror alerts paging [puppet] - 10https://gerrit.wikimedia.org/r/188528 (owner: 10Alexandros Kosiaris) [09:26:51] (03PS2) 10Alexandros Kosiaris: Remove unused dictd package declarations in cxserver [puppet] - 10https://gerrit.wikimedia.org/r/187185 [09:26:56] ^demon|lunch: sure, why not :) https://phabricator.wikimedia.org/T37611#1031305 is what prompted me to put it out there [09:30:51] (03CR) 10Alexandros Kosiaris: [C: 032] Remove unused dictd package declarations in cxserver [puppet] - 10https://gerrit.wikimedia.org/r/187185 (owner: 10Alexandros Kosiaris) [09:32:28] (03CR) 10Alexandros Kosiaris: "ping?" [puppet] - 10https://gerrit.wikimedia.org/r/187006 (owner: 10Alexandros Kosiaris) [09:35:07] 3Wikimedia-General-or-Unknown, operations: DMARC: Users cannot send emails via a wiki's [[Special:EmailUser]] - https://phabricator.wikimedia.org/T66795#1033712 (10Jalexander) I'd really like to get this moving if possible, I'm starting to get more and more complaints both at the privacy mailing address (the big... [09:53:58] 3operations: Monitor the up-to-date status of wikitech-static - https://phabricator.wikimedia.org/T89323#1033718 (10fgiunchedi) how does wikitech-static generation work? is there a job on wikitech that generates a static copy to be pushed to an external host or it gets pulled from the external host? [09:56:26] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Remove preheat_kernel_page_cache option [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/190156 (owner: 10GWicke) [09:57:58] * godog shakes fists at submodules [09:58:30] hello there [09:58:35] hey hashar [09:58:51] could use some sysadmin help. gallium has a CPU with 100% Wait time and I can't find the process causing it [09:58:56] https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=gallium.wikimedia.org&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Miscellaneous+eqiad [09:59:09] top shows 12% wa [09:59:36] hashar: try iotop [09:59:54] ahhh [10:00:02] find / -ignore_readdir_race ( -fstype NFS -o -fstype nfs~dia$\)\|\(^/var/lib/schroot/mount$\) ) .... [10:00:19] because of /etc/cron.daily/locate [10:00:25] you are such a hacker godog :] [10:00:49] haha luckily iotop is available nowadays [10:01:07] now I am wondering why locate is enabled on gallium :D [10:02:00] !log es-tool fast-restart on elastic1005 [10:02:08] Logged the message, Master [10:06:21] (03PS1) 10Giuseppe Lavagetto: nutcracker: revert to using a template for nutcracker config [puppet] - 10https://gerrit.wikimedia.org/r/190176 [10:06:51] history.log.1.gz:Commandline: apt-get install locate [10:06:51] history.log.1.gz:Install: locate:amd64 (4.4.2-4ubuntu1) [10:06:52] \O/ [10:07:12] Start-Date: 2015-01-30 00:31:39 [10:07:12] Commandline: apt-get install locate [10:07:13] damn [10:09:40] !log gallium: uninstalling locate package from gallium. Has been installed on 2015-01-30 00:31:39 apparently manually by root@iron.wikimedia.org [10:09:46] Logged the message, Master [10:12:40] !log gallium and lanthanum: dpkg --purge locate [10:12:43] Logged the message, Master [10:12:53] (03PS1) 10Matanya: swift: minor lint [puppet] - 10https://gerrit.wikimedia.org/r/190177 [10:16:37] (03PS1) 10Matanya: hadoop: minor lint [puppet] - 10https://gerrit.wikimedia.org/r/190178 [10:18:59] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: puppet fail [10:20:49] (03PS1) 10Matanya: nova: lint selector [puppet] - 10https://gerrit.wikimedia.org/r/190179 [10:25:52] (03PS1) 10Alexandros Kosiaris: Include admin in role::ve [puppet] - 10https://gerrit.wikimedia.org/r/190180 [10:27:00] (03CR) 10Alexandros Kosiaris: [C: 032] Include admin in role::ve [puppet] - 10https://gerrit.wikimedia.org/r/190180 (owner: 10Alexandros Kosiaris) [10:37:21] 3Ops-Access-Requests: Sudo for Roan on osmium - https://phabricator.wikimedia.org/T89038#1033737 (10akosiaris) I have to go ahead with https://gerrit.wikimedia.org/r/#/c/190180/ since catalogcompiler told me DZahn's change was a noop. [10:38:19] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:39:13] (03PS2) 10Giuseppe Lavagetto: nutcracker: revert to using a template for nutcracker config [puppet] - 10https://gerrit.wikimedia.org/r/190176 [10:40:36] (03PS5) 10Alexandros Kosiaris: add chromium-admins to visual editor role [puppet] - 10https://gerrit.wikimedia.org/r/189611 (https://phabricator.wikimedia.org/T89038) (owner: 10Dzahn) [10:40:57] (03CR) 10Giuseppe Lavagetto: [C: 032] nutcracker: revert to using a template for nutcracker config [puppet] - 10https://gerrit.wikimedia.org/r/190176 (owner: 10Giuseppe Lavagetto) [10:50:30] (03PS1) 10Filippo Giunchedi: restbase: switch to new partitioning scheme [puppet] - 10https://gerrit.wikimedia.org/r/190182 (https://phabricator.wikimedia.org/T76986) [10:50:56] akosiaris: ^ [10:52:56] (03PS2) 10Filippo Giunchedi: swift: minor lint [puppet] - 10https://gerrit.wikimedia.org/r/190177 (owner: 10Matanya) [10:53:03] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: minor lint [puppet] - 10https://gerrit.wikimedia.org/r/190177 (owner: 10Matanya) [10:53:40] matanya: looks good, thanks! [10:53:47] sure :) [10:55:24] (03CR) 10Alexandros Kosiaris: [C: 032] restbase: switch to new partitioning scheme [puppet] - 10https://gerrit.wikimedia.org/r/190182 (https://phabricator.wikimedia.org/T76986) (owner: 10Filippo Giunchedi) [11:00:28] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [11:01:28] (03PS2) 10Filippo Giunchedi: restbase: switch to new partitioning scheme [puppet] - 10https://gerrit.wikimedia.org/r/190182 (https://phabricator.wikimedia.org/T76986) [11:01:34] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: switch to new partitioning scheme [puppet] - 10https://gerrit.wikimedia.org/r/190182 (https://phabricator.wikimedia.org/T76986) (owner: 10Filippo Giunchedi) [11:02:24] (03PS6) 10Alexandros Kosiaris: add chromium-admins to visual editor role [puppet] - 10https://gerrit.wikimedia.org/r/189611 (https://phabricator.wikimedia.org/T89038) (owner: 10Dzahn) [11:02:41] (03CR) 10Alexandros Kosiaris: [C: 04-1] "A couple of problems with this one." [puppet] - 10https://gerrit.wikimedia.org/r/189611 (https://phabricator.wikimedia.org/T89038) (owner: 10Dzahn) [11:03:50] (03CR) 10Alexandros Kosiaris: [C: 032] add chromium-admins to visual editor role [puppet] - 10https://gerrit.wikimedia.org/r/189611 (https://phabricator.wikimedia.org/T89038) (owner: 10Dzahn) [11:05:29] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [11:06:18] 3Ops-Access-Requests: Sudo for Roan on osmium - https://phabricator.wikimedia.org/T89038#1033820 (10akosiaris) 5Open>3Resolved a:3akosiaris All changes merged, change shepherded in production, resolving [11:09:08] (03PS2) 10Gilles: Readjust Media Viewer sampling factors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190133 (https://phabricator.wikimedia.org/T89150) [11:11:48] (03PS1) 10Giuseppe Lavagetto: mediawiki: add memcached on mc1017 to the pool as mc_shard_17 [puppet] - 10https://gerrit.wikimedia.org/r/190183 [11:16:31] <_joe_> Ok I'm going to merge this, if something fails I'll revert. but it shouldn't. [11:17:15] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: add memcached on mc1017 to the pool as mc_shard_17 [puppet] - 10https://gerrit.wikimedia.org/r/190183 (owner: 10Giuseppe Lavagetto) [11:20:02] <_joe_> ach shit. [11:21:07] PROBLEM - nutcracker process on mw1201 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 108 (nutcracker), command name nutcracker [11:22:01] (03PS1) 10Giuseppe Lavagetto: Revert "mediawiki: add memcached on mc1017 to the pool as mc_shard_17" [puppet] - 10https://gerrit.wikimedia.org/r/190184 [11:22:08] RECOVERY - nutcracker process on mw1201 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [11:22:17] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Revert "mediawiki: add memcached on mc1017 to the pool as mc_shard_17" [puppet] - 10https://gerrit.wikimedia.org/r/190184 (owner: 10Giuseppe Lavagetto) [11:28:43] !log es-tool restart-fast on elastic1006 [11:28:53] Logged the message, Master [11:41:23] (03PS1) 10Merlijn van Deen: 'learn more': exclude from tab order and open in new window [puppet] - 10https://gerrit.wikimedia.org/r/190186 (https://phabricator.wikimedia.org/T89339) [11:42:29] (03PS2) 10Merlijn van Deen: 'learn more': exclude from tab order and open in new window [puppet] - 10https://gerrit.wikimedia.org/r/190186 (https://phabricator.wikimedia.org/T89339) [11:47:18] 3WMF-Legal, operations, Engineering-Community: Implement the Volunteer NDA process in Phabricator - https://phabricator.wikimedia.org/T655#1033905 (10Qgil) @MBrar.WMF Legalpad doesn't offer any customization. The main use case is signing a Contributors License Agreement: a piece of text with one checkbox. We ca... [11:51:23] (03PS1) 10Giuseppe Lavagetto: mediawiki: add mc1017 to the nutcracker pool [puppet] - 10https://gerrit.wikimedia.org/r/190189 [11:51:48] (03CR) 10Alexandros Kosiaris: [C: 032] nova: lint selector [puppet] - 10https://gerrit.wikimedia.org/r/190179 (owner: 10Matanya) [11:55:08] 3operations, Incident-20150205-SiteOutage, ops-eqiad: Split memcached in eqiad across multiple racks/rows - https://phabricator.wikimedia.org/T83551#915542 (10Joe) [11:55:08] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: Puppet has 1 failures [11:56:30] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Blocked until T89345 is resolved." [puppet] - 10https://gerrit.wikimedia.org/r/190189 (owner: 10Giuseppe Lavagetto) [12:03:01] (03PS1) 10Filippo Giunchedi: restbase: adjust partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/190190 (https://phabricator.wikimedia.org/T76986) [12:03:15] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: adjust partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/190190 (https://phabricator.wikimedia.org/T76986) (owner: 10Filippo Giunchedi) [12:05:24] 3operations, Phabricator: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1033959 (10JanZerebecki) These items from the description of this Task seem to not be mentioned on the wiki page, neither is their confidential counterpart: Mod... [12:06:04] (03PS1) 10Matanya: toollabs: selector out of resource [puppet] - 10https://gerrit.wikimedia.org/r/190192 [12:07:51] (03PS1) 10Matanya: wikimetrics: 4 digit file mode [puppet] - 10https://gerrit.wikimedia.org/r/190193 [12:08:25] anyone else to look at: https://phabricator.wikimedia.org/T89328 ? [12:08:30] akosiaris: ^^ [12:13:28] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:14:18] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: puppet fail [12:14:54] kart_: yeah, I am aware, gtg to lunch now, I 'll help after that [12:20:10] akosiaris: cool. [12:20:24] akosiaris: ping me when back :) [12:23:13] (03PS1) 10Matanya: logging: lint role [puppet] - 10https://gerrit.wikimedia.org/r/190195 [12:26:29] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [12:32:39] (03CR) 10Aklapper: [C: 031] 'learn more': exclude from tab order and open in new window [puppet] - 10https://gerrit.wikimedia.org/r/190186 (https://phabricator.wikimedia.org/T89339) (owner: 10Merlijn van Deen) [12:33:29] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [12:34:17] (03PS3) 10Giuseppe Lavagetto: base: add the service_unit init wrapper [puppet] - 10https://gerrit.wikimedia.org/r/189753 [12:35:41] <_joe_> paravoid: would you take a look now? I think it's more usable now, but comments welcome. (And in this case writing tests was useful!!) [12:38:09] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:05:59] (03PS1) 10Filippo Giunchedi: restbase: fix partman swap and ordering [puppet] - 10https://gerrit.wikimedia.org/r/190196 [13:06:12] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: fix partman swap and ordering [puppet] - 10https://gerrit.wikimedia.org/r/190196 (owner: 10Filippo Giunchedi) [13:11:28] PROBLEM - DPKG on labmon1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:15:47] RECOVERY - DPKG on labmon1001 is OK: All packages OK [13:24:24] 3operations, ops-eqiad: relocate/wire/setup dbproxy1003 through dbproxy1011 - https://phabricator.wikimedia.org/T86957#1034056 (10faidon) p:5Triage>3Normal Yes, please do this. [13:32:03] 3operations: NetEase/YouDao company seeks guidance for setting up local mirror of wikipedia - https://phabricator.wikimedia.org/T89137#1034078 (10Amire80) [13:35:48] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [13:39:44] 3Ops-Access-Requests: Create shell access for Zeljko - RelEng rights - https://phabricator.wikimedia.org/T87597#1034085 (10zeljkofilipin) labs UID: 2580 preferred shell user name: zfilipin ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0EYlgOW63HKlQz7OduxiH1NM4NPxgjxci8qznJdlo75f54mFT308wFlI7DYL/JFkv6yoELEu/xzazRybWu7... [13:49:48] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:54:04] (03PS1) 10Zfilipin: Add account for Željko Filipin [puppet] - 10https://gerrit.wikimedia.org/r/190202 (https://phabricator.wikimedia.org/T87597) [13:54:08] \O/ [13:54:30] (03PS1) 10Se4598: set wgTranslationNotificationsAlwaysHttpsInEmail to true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 [13:54:36] (03PS2) 10Hashar: Add account for Željko Filipin [puppet] - 10https://gerrit.wikimedia.org/r/190202 (https://phabricator.wikimedia.org/T87597) (owner: 10Zfilipin) [13:56:27] (03CR) 10Se4598: "example: https://de.wikipedia.org/w/index.php?oldid=137069753&diff=prev" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 (owner: 10Se4598) [13:58:06] (03CR) 10Hashar: [C: 031] "I have paired with Zeljkof. Can confirm the key it is also pasted on T87597." [puppet] - 10https://gerrit.wikimedia.org/r/190202 (https://phabricator.wikimedia.org/T87597) (owner: 10Zfilipin) [13:58:39] 3operations, Wikidata, Datasets-General-or-Unknown: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1034116 (10ArielGlenn) I ran a series of tests locally and also checked production output. I can verify that the transform is actually applied, the output looks good to... [14:01:38] (03CR) 10Alexandros Kosiaris: [C: 032] Add account for Željko Filipin [puppet] - 10https://gerrit.wikimedia.org/r/190202 (https://phabricator.wikimedia.org/T87597) (owner: 10Zfilipin) [14:02:58] 3Ops-Access-Requests, Continuous-Integration: Make sure relevant RelEng people have access to gallium (Chris M, Dan, Mukunda, Zeljko) - https://phabricator.wikimedia.org/T85936#1034144 (10akosiaris) [14:02:59] 3Ops-Access-Requests: Create shell access for Zeljko - RelEng rights - https://phabricator.wikimedia.org/T87597#1034142 (10akosiaris) 5Open>3Resolved Changed merged. Welcome to the production cluster :-) [14:03:46] 3Ops-Access-Requests: Create shell access for Zeljko - RelEng rights - https://phabricator.wikimedia.org/T87597#1034157 (10zeljkofilipin) Thanks! :) [14:05:32] thanks Dr. Kosiaris ! [14:05:36] well treated patient [14:07:03] Dr. Musso, I must say you have been especially valuable in this case. Your pairing up had marvelous results :-) [14:07:31] and Miss Puppet nurse properly applied the protocol [14:08:40] lets see how ssh client behave now [14:09:44] You guys are so cute [14:09:49] yeah [14:09:57] we are professional you know [14:10:00] and on duty [14:10:08] unlike people being on vacations on some beach :D [14:10:27] I'm not on any beach [14:10:36] Am in a hippie town [14:12:35] akosiaris: would you mind ssh bast1001.wikimedia.org sudo puppet agent -tv ? [14:12:42] akosiaris: need the account to be created there :] [14:14:34] Yuvi|Vacation: which town? [14:14:45] (03CR) 10Alexandros Kosiaris: [C: 032] apache: remove wikiversity.org from portal alias [puppet] - 10https://gerrit.wikimedia.org/r/189195 (https://phabricator.wikimedia.org/T88776) (owner: 10John F. Lewis) [14:15:24] !log Manually logged a missing cross-wiki rights log change entry on meta "Avraham changed group membership for User:Bencmq@zhwiki from bureaucrat, check user and administrator to bureaucrat and administrator (requested)". See T89205 for details [14:15:30] Logged the message, Master [14:21:17] hashar: hmmm something is not right [14:21:56] :( [14:22:10] maybe he needs to be added to some other group to have the account created on bastion [14:26:06] <^demon|lunch> godog: Reason I asked is because it doesn't follow our foo/bar/baz naming hierarchy :p [14:26:30] ^demon|lunch: lunch ? Are you in the middle of the atlantic ? :D [14:26:36] good morning [14:26:39] <^demon|lunch> I never changed my nick yesterday :p [14:27:03] why isn't ITIL CMDB reporting such inconsistency [14:27:09] ^d: haha what would be a good place in there? [14:27:33] <^d> I dunno, possibly nowhere :p [14:28:13] <^d> We've overloaded the word "tools" these days, which makes it hard to know what anyone is talking about :p [14:28:19] hashar: ITIL CMDB is suffering from Bogon emissions right now [14:28:41] * hashar calls toll free service desk to get an incident ticket filled and elevate its priority [14:28:47] (or was it severity ) [14:28:49] or maybeimpact [14:28:53] hmm [14:28:54] bad ether in the cables [14:28:56] rm ~/itil [14:29:15] incident ticket misfiled. Happy to be of service, please feel free to call again [14:30:20] hey kart_, yt? [14:30:52] (03PS2) 10Hashar: Update objectcache logging settings for I8a8e278e6f028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187730 (https://phabricator.wikimedia.org/T89345) (owner: 10Legoktm) [14:32:39] ottomata: ? [14:32:39] ^d: that's why it is called -utils :P [14:33:22] kart_: hiya [14:33:24] mind if I poke at this? [14:33:26] https://phabricator.wikimedia.org/T89328 [14:33:27] ? [14:33:41] can I do a deploy, or would that hurt something? [14:33:52] i'll only deploy the currently checked out commit on tin [14:33:56] ottomata: yes please do, I am supposed to help but I am in the middle of something else [14:34:27] <^d> godog: Touché. I was reading it as wmf-tools. [14:34:32] <^d> I should have my coffee before thinking [14:35:26] kart_: so far, on tin, i moved cxserver/deploy away to deploy.orig, and then ran puppet [14:35:41] puppet re cloned cxserver/deploy there [14:35:42] and now [14:35:47] checkout-submodules = true [14:36:09] i wonder, did you set checkout_submodules to true in puppet after tin had already cloned it for the first time? [14:36:11] (03PS4) 10Giuseppe Lavagetto: base: add the service_unit init wrapper [puppet] - 10https://gerrit.wikimedia.org/r/189753 [14:36:29] maybe git-deploy does something dumb where it doesn't try to set that value except on first clone [14:36:36] ottomata: look like that case. [14:36:49] kart_: try deploying, let's see what it does [14:36:57] ottomata: it is worse than that [14:37:10] there used to be 2 repos [14:37:22] and now they are merged into one [14:37:45] and while at SF I had to fight a lot with git deploy to make it correctly checkout the submodules [14:37:45] (03PS2) 10Ottomata: hadoop: minor lint [puppet] - 10https://gerrit.wikimedia.org/r/190178 (owner: 10Matanya) [14:38:06] bwerrrrr [14:38:19] <_joe_> hey ottomata [14:38:22] hiya [14:38:25] <_joe_> how was your vacation? [14:38:29] oh man so gooooood [14:38:35] did you win? [14:38:35] akosiaris: you said earlier something was not right. Was it referring to zfilipin account creation on bast1001 ? [14:38:38] <_joe_> ok, enough :P [14:38:44] hahahah [14:38:59] ha, no, but we did ok! [14:39:00] akosiaris: should I try deploy now? Or you're my doctor? :) [14:39:09] and swam in the waves between games :) [14:39:28] kart_: ha, uh, i'm not sure what akosiaris is saying, but sure! i'd try it. [14:39:29] unless [14:39:32] do you mind if I try it? [14:39:33] i want to watch it [14:39:39] <_joe_> ottomata: I don't remember, was it costa rica or puerto rico? [14:39:44] puerto rico [14:39:44] hashar: yes [14:39:51] anything I can help with ? [14:39:58] <_joe_> ottomata: did you go to arecibo then? [14:40:05] it was so easy to travel there (as an american). tourism was easy too, nobody pushing anything [14:40:06] kart_: ottomata: yes please do try to deploy [14:40:10] naw [14:40:23] <_joe_> (http://en.wikipedia.org/wiki/Arecibo_Observatory) [14:40:33] flew into san juan, biked to fajardo, ferry to culebra, little puddle jumper plane to vieques, back to san juan for tournament [14:40:45] yeah, a guy next to me on the plane was telling me about that [14:40:48] i would loooove to go see that [14:40:49] ottomata: isn't it good to come back and see a stupid patch from me ? :P [14:40:51] next time! [14:40:54] matanya: yes! [14:40:59] :) [14:41:10] jouncebot: next [14:41:10] In 1 hour(s) and 18 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150212T1600) [14:41:24] (03CR) 10Ottomata: [C: 032] hadoop: minor lint [puppet] - 10https://gerrit.wikimedia.org/r/190178 (owner: 10Matanya) [14:41:47] ottomata: go ahead [14:42:33] (03PS1) 10Alexandros Kosiaris: Followup commit for I051a7a7070136b79 [puppet] - 10https://gerrit.wikimedia.org/r/190210 (https://phabricator.wikimedia.org/T87597) [14:44:11] (03CR) 10Alexandros Kosiaris: [C: 032] Followup commit for I051a7a7070136b79 [puppet] - 10https://gerrit.wikimedia.org/r/190210 (https://phabricator.wikimedia.org/T87597) (owner: 10Alexandros Kosiaris) [14:44:41] ottomata: I merge the minor lint as well on palladium [14:44:44] merged* [14:44:55] danke, i was about to merge another, but then was looking at this deploy instead [14:44:56] thanks [14:45:00] ottomata: just check src missing in deploy? :) [14:45:27] ok, kart_, i think i got it. [14:45:31] akosiaris: kart_: [14:45:38] i made puppet reclone on tin [14:45:44] then, i manually ran git submodule update --init [14:45:45] on tin [14:45:53] because, http://tin.eqiad.wmnet/cxserver/deploy/.git/modules/src [14:45:55] didin't exist [14:45:58] until I ran that [14:46:09] after I did, git deploy worked fine [14:46:19] ottomata: I fear that. [14:46:34] that's what gwicke was suggesting but I was fearing :) [14:46:38] so, bad news: git deploy with submodules is funky. [14:46:46] ottomata: yep. [14:46:48] good news, I think you can fix without being root [14:46:54] mv deploy deploy.bad [14:46:57] (wait for puppet to run) [14:47:00] cd deploy [14:47:04] git submodule update --init [14:47:08] git deploy start; git deploy sync [14:47:13] hmmm [14:47:21] noted. [14:47:23] I remember on extra step [14:47:24] a yes [14:47:37] akosiaris: thank you! [14:47:53] ah yes [14:48:09] kart_: i'll comment and close ticket [14:48:09] ottomata: IIRC if you try to clone the submodule now from the repo [14:48:14] you will fail [14:48:14] ottomata: can you please restart cxserver once done or let me know. [14:48:28] clone submodule from the repo? [14:48:37] you need to also run git update-server-info [14:48:46] akosiaris: git deploy sync says all is well, and i the submodule checked out on the client [14:48:47] it's what git deploy does [14:48:54] on tin? [14:48:59] git deploy is not to be trusted tbh [14:49:00] (03PS2) 10Giuseppe Lavagetto: mediawiki: add mc1017 to the nutcracker pool [puppet] - 10https://gerrit.wikimedia.org/r/190189 [14:49:08] anyway, back up a step [14:49:09] so [14:49:12] k [14:49:24] I 'll try and explain what git deploy does [14:49:33] tell me if I am not making any sense [14:49:39] akosiaris: i knew this once...since i helped implement the submodule part [14:49:49] oh, you did ? [14:49:54] ok [14:49:56] yeah, but i barely remember it [14:50:12] so, we probably have to patch git deploy in various parts [14:50:16] if we are to keep it [14:50:22] cause it is misbehaving in various ways [14:50:28] it always has, afaik [14:50:29] ottomata: I did. Thanks. [14:50:52] kart_thanks [14:50:55] hashar: don't mention it [14:51:02] ottomata: thanks btw for looking into this [14:51:03] ottomata: git deploy is: https://github.com/mislav/git-deploy [14:51:04] ? [14:51:38] Should we update then? git deploy log - will be useful and not available in our git deploy. [14:51:52] ottomata: what I was reffering to is that on hosts .gitmodules is rewriten by git-deploy as [14:51:56] - url = https://gerrit.wikimedia.org/r/p/mediawiki/services/cxserver [14:51:57] + url = http://tin.eqiad.wmnet/cxserver/deploy/.git/modules/src [14:52:16] noticed that too. [14:52:41] but unless git update-server-info is ran on tin in the (checked-out) submodule, that submodules is not cloneable [14:52:56] so git deploy fails to clone it on the various hosts :-( [14:52:59] PROBLEM - cxserver on sca1001 is CRITICAL: Connection refused [14:53:07] heh.. cx is not happy [14:53:15] hm, aye [14:53:19] blaj [14:53:36] Error: EACCES, permission denied 'log' [14:54:30] configuration error AFIACT [14:55:05] ok, ran puppet, it 's ok now [14:55:09] RECOVERY - cxserver on sca1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.018 second response time [14:55:33] I am wondering though why the config file vanished [14:56:37] akosiaris: where was it? [14:56:49] /srv/deployment/cxserver/deploy/src [14:56:54] akosiaris: can you document git update-server-info to ticket? [14:57:02] akosiaris: that's puppet, right? [14:57:11] kart_: I 'd rather we fixed it instead of document it [14:57:17] I consider it a bug [14:57:28] akosiaris: better. [14:57:46] need to gather some good info and open a bug task about it [14:57:50] I have proposed 2 deployment in a week for cxserver [14:58:02] probably to expose more bugs :D [14:58:12] heh [14:58:38] time to make bug mash [14:59:10] so, akosiaris, i think git update-server-info will be run, if the submodules stuff is all straight when puppet first clones it [14:59:32] if a submodule is added to a repo after the initial clone [14:59:33] i think it will not be [14:59:46] ottomata: could be. Still a bug though [15:00:04] well, "unexpected behaviour" [15:00:22] for sure [15:00:57] anyway, mind updating/resolving https://phabricator.wikimedia.org/T89328 and let's gather some actual info into another ticket [15:01:10] HHmmmm, actually. i think its buggy either way. [15:01:24] because, even after my reclone [15:01:28] the submodule was not cloned [15:01:34] until i manually ran git submodule update [15:01:36] --init [15:02:23] and, if it isn't cloned, then update-server-info won't run [15:04:01] quite possibly. [15:04:16] We need to actually track that bug down or it's going to keep bugging us [15:06:27] ottomata: I will poke you before next deployment then ;) [15:06:54] Dinner time. Should be around after that in case git deploy decide to dive. [15:07:31] kart_: i think deployment should work now [15:07:35] its just the initial setup that is funky [15:07:50] just try it first, if it doesn't work go ahead and poke one of us [15:11:22] (03PS1) 10coren: Labs: Increase labnet1001 conntrack tables [puppet] - 10https://gerrit.wikimedia.org/r/190214 [15:12:10] (03PS2) 10coren: Labs: Increase labnet1001 conntrack tables [puppet] - 10https://gerrit.wikimedia.org/r/190214 (https://phabricator.wikimedia.org/T72076) [15:12:13] (03PS2) 10Ottomata: logging: lint role [puppet] - 10https://gerrit.wikimedia.org/r/190195 (owner: 10Matanya) [15:12:39] (03CR) 10BBlack: [C: 031] "On a human reading of it, this looks like it would cover everything I'm thinking of. Nice work :)" [puppet] - 10https://gerrit.wikimedia.org/r/189753 (owner: 10Giuseppe Lavagetto) [15:13:40] (03CR) 10Alexandros Kosiaris: [C: 04-1] cxserver: Use different registry for Beta and Production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/188796 (https://phabricator.wikimedia.org/T88793) (owner: 10KartikMistry) [15:13:47] kart_: ^ [15:14:21] Augh, labnet1001 doesn't even show in ganglia in the first place. Hm. Because it's also a ganglia_aggregator? [15:15:38] Hm. No. virt1000 also and that one works. [15:17:34] (03CR) 10Ottomata: [C: 032] logging: lint role [puppet] - 10https://gerrit.wikimedia.org/r/190195 (owner: 10Matanya) [15:18:33] 3Ops-Access-Requests: Create shell access for Zeljko - RelEng rights - https://phabricator.wikimedia.org/T87597#1034282 (10hashar) Confirmed with Zeljkof that everything works for him :-) [15:23:47] 3operations, Phabricator: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1034299 (10chasemp) Rearranged with some more details there. Not sure how deep to go on internals but everything there seems relevant to normal usage I think. [15:24:45] Coren: i hope i set it up right when i started that abandoned tools RfC on meta. I pinged you in it. [15:25:45] T13|needsCoffee: I commented on it. You probably want to advertize it in on labs-l, on enwiki and in a couple other strategic places where tool writers congregate. [15:27:19] Ahh, I don't tie your personal account and WMF account together in my head. I'll post to those places today. :) [15:29:44] I know dewp has a lot of tool/bot people as well, can I convince someone here that is fluent in the language to post in the appropriate forums on dewp? [15:30:03] scfc is probably your best bet there. [15:33:09] 3operations: NetEase/YouDao company seeks guidance for setting up local mirror of wikipedia - https://phabricator.wikimedia.org/T89137#1034326 (10ArielGlenn) Adding content from an email received today from Brent at NetEase: "We have assigned people who is responsible for direct communicating with your leaders... [15:34:47] akosiaris: back. looking. [15:40:53] (03PS18) 10KartikMistry: cxserver: Use different registry for Beta and Production [puppet] - 10https://gerrit.wikimedia.org/r/188796 (https://phabricator.wikimedia.org/T88793) [15:41:58] kart_: btw, it is still giving that ordered_json error [15:42:05] trying to figure out why [15:42:22] 3Project-Creators, operations: Project Proposal: Label style projects for common operations tools - https://phabricator.wikimedia.org/T1147#1034354 (10Aklapper) What is specifically left to do / discuss here to close this ticket? Can this be closed as resolved? Ops-FR is covered in T89160... [15:43:18] (03PS3) 10Rush: 'learn more': exclude from tab order and open in new window [puppet] - 10https://gerrit.wikimedia.org/r/190186 (https://phabricator.wikimedia.org/T89339) (owner: 10Merlijn van Deen) [15:44:18] (03CR) 10Rush: [C: 032] 'learn more': exclude from tab order and open in new window [puppet] - 10https://gerrit.wikimedia.org/r/190186 (https://phabricator.wikimedia.org/T89339) (owner: 10Merlijn van Deen) [15:46:16] (03PS1) 10KartikMistry: CX: Update wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190218 [15:46:25] akosiaris: ah. format is okay? - ca: v/s ca: [15:47:13] ? [15:47:35] 3Project-Creators, operations: Project Proposal: Label style projects for common operations tools - https://phabricator.wikimedia.org/T1147#1034359 (10faidon) A few of my suggestions/requests above haven't been implemented. We have no tags for DBA work yet — I pinged @Springle about it this on Monday, I'll re-p... [15:47:51] oh, it's yaml... I haven't checked yet for yaml syntax errors [15:49:12] 3operations, Phabricator: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1034361 (10JanZerebecki) That makes it clear for me. Thx. The one remaining issue I see in this task is the naming of "MediaWiki security bug". [15:49:53] akosiaris: I think it was okay last time. [15:51:52] marktraceur: manybubbles and ^d seem to be missing this morning, so it looks like it's you or me for SWAT. [15:52:26] gi11es, _joe_, Krenair: Ping for SWAT in about 8 minutes [15:52:37] anomie: pong [15:52:54] hey [15:53:08] I can swat as well [15:53:22] <_joe_> anomie: pong [15:53:23] Krenair: Oh, do you want to then? [15:53:45] I don't mind, if you want to do it that's fine [15:54:03] I have no particular desire to do it today [15:54:23] Nor do I really [15:54:32] ok I'll do it [15:57:01] Is it just me or is meta crawling really slow today? [15:57:27] seems fine to me [16:00:04] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150212T1600). [16:00:12] (03PS3) 10Alex Monk: Readjust Media Viewer sampling factors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190133 (https://phabricator.wikimedia.org/T89150) (owner: 10Gilles) [16:00:19] (03CR) 10Alex Monk: [C: 032] Readjust Media Viewer sampling factors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190133 (https://phabricator.wikimedia.org/T89150) (owner: 10Gilles) [16:00:23] (03Merged) 10jenkins-bot: Readjust Media Viewer sampling factors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190133 (https://phabricator.wikimedia.org/T89150) (owner: 10Gilles) [16:00:31] (03PS1) 10Ottomata: Ensure temporary /etc/hosts cnames for hadoop names are absent now that dns change has propagated [puppet] - 10https://gerrit.wikimedia.org/r/190219 [16:01:05] Hm, should have backported the upload fix for this morning [16:01:12] Or did we do it last night [16:01:37] Nope, guess not [16:01:43] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/190133/ (duration: 00m 07s) [16:01:48] Logged the message, Master [16:01:49] (03CR) 10Ottomata: [C: 032] Ensure temporary /etc/hosts cnames for hadoop names are absent now that dns change has propagated [puppet] - 10https://gerrit.wikimedia.org/r/190219 (owner: 10Ottomata) [16:02:51] PROBLEM - Redis on mc1018 is CRITICAL: Connection refused [16:03:20] PROBLEM - configured eth on mc1018 is CRITICAL: Connection refused by host [16:03:31] PROBLEM - dhclient process on mc1018 is CRITICAL: Connection refused by host [16:03:37] where is logmsgbot [16:03:41] PROBLEM - puppet last run on mc1018 is CRITICAL: Connection refused by host [16:03:50] PROBLEM - salt-minion processes on mc1018 is CRITICAL: Connection refused by host [16:03:56] gi11es, please test [16:04:13] (my irc bouncer just randomly broke :() [16:04:22] PROBLEM - DPKG on mc1018 is CRITICAL: Connection refused by host [16:04:22] aww [16:04:40] PROBLEM - Disk space on mc1018 is CRITICAL: Connection refused by host [16:04:41] PROBLEM - Memcached on mc1018 is CRITICAL: Connection refused [16:05:01] PROBLEM - RAID on mc1018 is CRITICAL: Connection refused by host [16:06:02] Krenair: testing [16:06:23] (03PS2) 10Ottomata: Re-enable varnishkafka for bits again [puppet] - 10https://gerrit.wikimedia.org/r/186641 (owner: 10QChris) [16:06:39] Krenair|temp: logmsgbot a service on neon. /usr/local/bin/dologmsg sends stuff to neon via netcat. The python? script there drops it in this channel. [16:06:42] (03CR) 10Hashar: [C: 04-1] "It would probably be safer to split that change per module. This way it is easier to apply on production and has less impact in case somet" [puppet] - 10https://gerrit.wikimedia.org/r/189898 (owner: 10Dzahn) [16:06:56] Coren: I asked Guillaume to add it to the next tech news which should be the best way to get it in all the right places on various wikis and have posted it to wikitech-l and labs-l. Any place else you can think of I should mention it? [16:07:07] bd808, I was just wondering why it was being so slow to log the message :) [16:08:24] (03CR) 10Ottomata: [C: 032] Re-enable varnishkafka for bits again [puppet] - 10https://gerrit.wikimedia.org/r/186641 (owner: 10QChris) [16:08:26] Krenair|temp: Ah. You have to channel your inner Reedy. "ffs logmsgbot" sounds about right :) [16:08:35] :D [16:08:44] !log re-enabling bits varnishkafka instances [16:08:50] Logged the message, Master [16:09:01] RECOVERY - salt-minion processes on mc1018 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:09:11] RECOVERY - Redis on mc1018 is OK: TCP OK - 0.006 second response time on port 6379 [16:09:12] * JimConsultant finds someone to throw CentralNotice bugs at [16:09:20] RECOVERY - RAID on mc1018 is OK: OK: no disks configured for RAID [16:09:29] is that an fr-tech thing? [16:09:40] RECOVERY - configured eth on mc1018 is OK: NRPE: Unable to read output [16:09:41] RECOVERY - DPKG on mc1018 is OK: All packages OK [16:09:50] RECOVERY - dhclient process on mc1018 is OK: PROCS OK: 0 processes with command name dhclient [16:09:51] RECOVERY - Disk space on mc1018 is OK: DISK OK [16:10:00] RECOVERY - puppet last run on mc1018 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:10:07] how's it going gi11es? [16:10:13] (03PS19) 10Alexandros Kosiaris: cxserver: Use different registry for Beta and Production [puppet] - 10https://gerrit.wikimedia.org/r/188796 (https://phabricator.wikimedia.org/T88793) (owner: 10KartikMistry) [16:10:19] Krenair|temp: I don't see a K4 :( [16:10:22] am doing 3 things at once, hold on [16:11:01] RECOVERY - Memcached on mc1018 is OK: TCP OK - 0.002 second response time on port 11211 [16:11:26] based on https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/CentralNotice,n,z you could try AndyRussG, Ejegg, Awight [16:11:53] (03PS2) 10Hashar: contint: migrate to require_package() [puppet] - 10https://gerrit.wikimedia.org/r/188034 [16:11:58] !log es-tool restart-fast on elastic1007 [16:11:59] (03PS4) 10Hashar: contint: install Java 8 on Trusty servers [puppet] - 10https://gerrit.wikimedia.org/r/183222 (https://phabricator.wikimedia.org/T85964) [16:12:02] Logged the message, Master [16:12:24] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [puppet] - 10https://gerrit.wikimedia.org/r/163814 (owner: 10Hashar) [16:13:35] Krenair|temp: Hmmm? [16:13:45] JimConsultant, ^ :) [16:13:54] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "Titan is more or less DOA, so we definitely don't need this untested java 8 package anywhere" [puppet] - 10https://gerrit.wikimedia.org/r/183222 (https://phabricator.wikimedia.org/T85964) (owner: 10Hashar) [16:14:06] AndyRussG: https://phabricator.wikimedia.org/T89258 [16:14:17] CentralNotice is yelling at hhvm.log really really noisy [16:14:23] <_joe_> Krenair|temp: ping me when it's my turn... I do have a few things to do [16:14:38] Am waiting for gi11es, sorry _joe_ [16:14:46] <_joe_> JimConsultant: it's like that since this morning at least [16:14:50] Krenair: almost there [16:14:53] <_joe_> this morning in europe I mean [16:15:03] <_joe_> Krenair|temp: yeah no rush at all :) [16:15:09] _joe_: I know. [16:15:17] I noticed yesterday afternoon US time I think [16:15:20] <_joe_> also, it's a fatal [16:15:29] Actually yesterday morning [16:16:37] AndyRussG: Anyway, bad bug is bad. Someone who knows CentralNotice needs to poke it (sooner rather than later) [16:17:48] JimConsultant: Krenair|temp: thank you! I'm looking [16:17:55] Thanks! [16:18:54] We did a CentralNotice deploy on Tuesday... :/ [16:19:10] And it went live to the wikipedias on the deploy train yesterday [16:19:46] Krenair: the sql query I'm running to confirm this is mighty slow [16:19:55] :/ [16:22:46] Krenair: fix confirmed [16:22:46] thank you! sorry it took so long [16:22:46] ok, _joe_ [16:22:47] (03PS3) 10Alex Monk: Update objectcache logging settings for I8a8e278e6f028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187730 (https://phabricator.wikimedia.org/T89345) (owner: 10Legoktm) [16:22:47] <_joe_> Krenair|temp: here I am [16:22:47] (03CR) 10Alex Monk: [C: 032] Update objectcache logging settings for I8a8e278e6f028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187730 (https://phabricator.wikimedia.org/T89345) (owner: 10Legoktm) [16:22:47] (03Merged) 10jenkins-bot: Update objectcache logging settings for I8a8e278e6f028 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187730 (https://phabricator.wikimedia.org/T89345) (owner: 10Legoktm) [16:22:47] PROBLEM - configured eth on restbase1004 is CRITICAL: Connection refused by host [16:22:48] PROBLEM - dhclient process on restbase1004 is CRITICAL: Connection refused by host [16:22:56] PROBLEM - puppet last run on restbase1004 is CRITICAL: Connection refused by host [16:23:05] PROBLEM - salt-minion processes on restbase1004 is CRITICAL: Connection refused by host [16:23:59] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/187730/ (duration: 00m 05s) [16:24:04] Logged the message, Master [16:24:06] PROBLEM - DPKG on restbase1004 is CRITICAL: Connection refused by host [16:24:06] PROBLEM - Disk space on restbase1004 is CRITICAL: Connection refused by host [16:24:09] _joe_, please test [16:24:35] <_joe_> Krenair: I see errors flowing again, so it's cool [16:24:36] PROBLEM - RAID on restbase1004 is CRITICAL: Connection refused by host [16:24:50] <_joe_> (I know this sounds crazy) [16:25:24] <_joe_> whoa no Krenair [16:25:29] <_joe_> we need to revert :/ [16:25:34] hm [16:25:36] <_joe_> there was one problematic bit [16:25:50] <_joe_> du -sh sql-bagostuff.log [16:31:28] JimConsultant: we shouldn't be getting requests to Special:BannerRandom any more [16:31:39] (03CR) 10Cmjohnson: [C: 032] Adding dhcp entries for restbase1005/6 [puppet] - 10https://gerrit.wikimedia.org/r/190228 (owner: 10Cmjohnson) [16:31:44] AndyRussG: Cached stuff from JS? [16:31:53] JimConsultant: it's been months... [16:31:57] (03PS1) 10Giuseppe Lavagetto: Make noisy sqlbagofstuff logging silent again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190230 [16:32:10] I am watching exception.log on fluorine, I see a lot of DBQueryError from wikigrok "Deadlock found when trying to get lock; try restarting transaction" [16:32:21] <_joe_> Krenair: https://gerrit.wikimedia.org/r/#/c/190230/1 [16:32:57] (03CR) 10Alex Monk: [C: 032] Make noisy sqlbagofstuff logging silent again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190230 (owner: 10Giuseppe Lavagetto) [16:33:02] (03Merged) 10jenkins-bot: Make noisy sqlbagofstuff logging silent again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190230 (owner: 10Giuseppe Lavagetto) [16:33:29] <_joe_> Krenair: it should do the trick I hope [16:33:49] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/190230/ (duration: 00m 07s) [16:33:54] (03PS1) 10BryanDavis: logstash: Support MediaWiki logs via Syslog [puppet] - 10https://gerrit.wikimedia.org/r/190231 [16:33:55] Logged the message, Master [16:34:17] bah [16:34:19] ignore that log [16:34:20] AndyRussG: Well, obviously something is calling it. Can we fix it? [16:35:11] !log krenair Synchronized wmf-config: trying that last sync again, I forgot to actually run the merge (duration: 00m 06s) [16:35:11] JimConsultant: In theory, but it's brittle deprecated code, so my first question is what is calling it and how [16:35:14] Logged the message, Master [16:35:19] <_joe_> Krenair: I'd say we're ok [16:35:26] ok, now try _joe_ :) [16:35:31] I was stupid [16:35:33] (03Abandoned) 10Cenarium: enwiki FlaggedRevs: Remove 'autoreview' from 'autoconfirmed', check the former for PC2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189661 (owner: 10Cenarium) [16:35:35] Just looking at our config here [16:35:57] <_joe_> Krenair: yeah I've seen the logs appearing [16:36:01] <_joe_> when you synced [16:36:19] ok, so it's fine now? [16:36:20] (03CR) 10GWicke: restbase: switch to new partitioning scheme (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190182 (https://phabricator.wikimedia.org/T76986) (owner: 10Filippo Giunchedi) [16:36:26] <_joe_> Krenair: yes! [16:36:52] great. certainly everything else looks fine to me [16:37:03] JimConsultant: historically it was called with a bunch of params [16:37:30] (03PS3) 10Alex Monk: Add alias for previous project namespace (fawikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188887 (https://phabricator.wikimedia.org/T60655) (owner: 10Mjbmr) [16:37:41] (03PS1) 10Mjbmr: Unifying talk namespaces for fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190232 [16:38:00] AndyRussG: $wgCentralBannerDispatcher = "//{$wmfHostnames['meta']}/wiki/Special:BannerRandom"; [16:38:10] Looks like it's being used to me? [16:38:35] bd808: can you help fix https://phabricator.wikimedia.org/T89345 ASAP? [16:38:45] (03CR) 10Alex Monk: [C: 032] Add alias for previous project namespace (fawikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188887 (https://phabricator.wikimedia.org/T60655) (owner: 10Mjbmr) [16:38:48] <_joe_> paravoid: it's fixed [16:38:50] (03Merged) 10jenkins-bot: Add alias for previous project namespace (fawikibooks) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188887 (https://phabricator.wikimedia.org/T60655) (owner: 10Mjbmr) [16:38:58] <_joe_> paravoid: let me the time to close the ticket :) [16:39:00] JimConsultant: it's there but that variable shouldn't be used... and I did see that the correct (new) banner loading system was being used following our deploy [16:39:20] 3operations, Incident-20150205-SiteOutage, ops-eqiad: Split memcached in eqiad across multiple racks/rows - https://phabricator.wikimedia.org/T83551#1034547 (10Joe) [16:39:50] (03PS3) 10Giuseppe Lavagetto: mediawiki: add mc1017 to the nutcracker pool [puppet] - 10https://gerrit.wikimedia.org/r/190189 [16:39:52] (03CR) 10Mjbmr: "This also needs to run namespaceDupes.php" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190232 (owner: 10Mjbmr) [16:40:02] paravoid: I think it was just fixed in swat [16:40:11] ok [16:40:12] AndyRussG: I see it in master being used in a couple of places [16:40:22] (granted I'm unfamiliar with this code) [16:40:28] (03CR) 10Giuseppe Lavagetto: [C: 032] "Logging is enabled again, so let's go!" [puppet] - 10https://gerrit.wikimedia.org/r/190189 (owner: 10Giuseppe Lavagetto) [16:40:28] JimConsultant: requests now use $wgCentralSelectedBannerDispatcher [16:40:56] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/188887/ (duration: 00m 07s) [16:41:01] Logged the message, Master [16:41:27] Ah, I'm seeing it now in the js file [16:41:32] JimConsultant: The code that used SpecialBannerRandom is turned off via $wgCentralNoticeChooseBannerOnClient = true; [16:41:47] We left the code in there just in case we had to revert back to the old system [16:41:55] JimConsultant: But it should never be called [16:42:15] And when did that config change go out? Tuesday? [16:42:32] JimConsultant: you can see that it's Special:BannerLoader that's being called by opening the network view of developer tools on a browser and visiting a wiki as a targeted user [16:42:36] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [16:42:45] JimConsultant: banners generally are definitely being loaded correctly [16:42:56] i.e. not with the old SpecialBannerRandom [16:43:02] (03PS2) 10BryanDavis: logstash: Support MediaWiki logs via Syslog [puppet] - 10https://gerrit.wikimedia.org/r/190231 (https://phabricator.wikimedia.org/T88870) [16:43:30] JimConsultant: no, that config change is months old, from November, I think [16:43:38] hmmm [16:43:46] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:43:46] The whole December campaign was run on the new system [16:43:46] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [16:43:55] PROBLEM - puppet last run on mw1040 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [16:43:55] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [16:43:55] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [16:43:55] PROBLEM - puppet last run on mw1080 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [16:43:55] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [16:44:04] JimConsultant: I think we really need to find some request logs [16:44:15] ffs icinga-wm [16:44:47] JimConsultant: and in fact we should already remove the old BannerRandom code and config [16:44:49] <_joe_> !log triggering a puppet run to insert mc1017 in the nutcracker pool [16:44:54] Logged the message, Master [16:45:41] (03CR) 10Filippo Giunchedi: restbase: switch to new partitioning scheme (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190182 (https://phabricator.wikimedia.org/T76986) (owner: 10Filippo Giunchedi) [16:46:05] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:46:52] JimConsultant: for example, log out, open the browser network tab, and go here: https://en.wikipedia.org/wiki/Main_Page?country=IL [16:47:26] You'll see the fundraising banner for Israel is loaded using Special:BannerLoader, i.e. the new system (as of last November) [16:47:30] Yeah, I'm seeing the correct requests. [16:47:36] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:47:36] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:47:36] RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:47:36] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:47:36] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:47:36] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:47:46] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:47:46] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:47:46] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:47:46] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:47:46] RECOVERY - puppet last run on mw1250 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:47:46] RECOVERY - puppet last run on mw1209 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:47:47] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:47:47] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:47:56] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:47:56] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:47:56] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:48:05] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:48:06] RECOVERY - puppet last run on mw1255 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:48:06] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:48:06] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:48:06] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:48:06] RECOVERY - puppet last run on mw1225 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:48:16] RECOVERY - puppet last run on mw1223 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:48:16] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:48:16] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:48:16] RECOVERY - puppet last run on mw1236 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:48:16] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:48:17] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:48:17] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:48:25] RECOVERY - puppet last run on mw1237 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:48:25] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:48:25] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:48:25] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:48:25] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:48:26] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:48:26] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:48:27] RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:48:27] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:48:28] RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:48:28] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [16:48:29] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:48:29] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:48:30] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:48:35] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:48:35] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:48:35] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:48:35] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:48:36] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:48:36] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [16:48:47] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:48:56] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:48:56] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:48:56] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:48:56] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:48:57] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:48:59] JimConsultant: are you able to pull request logs? [16:49:05] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:49:05] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:49:06] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:49:10] AndyRussG: Just api requests [16:49:15] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:49:16] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:49:16] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:49:26] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:49:35] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:49:36] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:49:36] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:49:36] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:49:37] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:49:45] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:49:45] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:49:45] RECOVERY - puppet last run on mw1193 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:49:45] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:49:46] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:49:46] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:49:46] RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:49:47] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:49:48] JimConsultant: Hmmm... whom to ask... [16:49:50] <_joe_> sorry that's me :/ [16:49:51] AndyRussG: Let's at least get a stack trace. [16:49:55] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:49:55] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:49:56] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:49:56] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:49:56] RECOVERY - puppet last run on mw1169 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:49:56] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:49:56] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:49:57] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:49:57] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:50:01] I'll write up a hot patch [16:50:05] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:50:05] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:50:05] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:50:06] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:50:06] RECOVERY - puppet last run on mw1204 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:50:11] JimConsultant: yeah that would also be interesting... [16:50:15] RECOVERY - puppet last run on mw1167 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:50:16] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:50:16] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:50:16] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:50:16] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:50:16] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [16:50:16] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:50:17] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:50:17] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:50:18] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:50:18] RECOVERY - puppet last run on mw1076 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:50:25] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:50:25] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:50:25] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:50:26] RECOVERY - puppet last run on mw1160 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:50:26] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:50:26] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:50:26] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:50:27] RECOVERY - puppet last run on mw1051 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:50:27] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:50:28] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:50:35] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [16:50:35] RECOVERY - puppet last run on mw1081 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [16:50:36] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:50:36] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:50:36] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:50:36] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:50:36] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:50:37] RECOVERY - puppet last run on mw1056 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:50:37] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:50:45] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:50:46] RECOVERY - puppet last run on mw1057 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [16:50:46] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:50:46] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:50:46] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:50:46] RECOVERY - puppet last run on mw1049 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:50:55] RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:50:55] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:50:55] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:50:56] RECOVERY - puppet last run on mw1030 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:50:56] RECOVERY - puppet last run on mw1152 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:51:06] RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:51:21] Krenair: Um, /srv/mediawiki-staging/ is set to a sha1 instead of being on a branch? [16:51:26] RECOVERY - puppet last run on mw1034 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:51:26] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:51:33] bah [16:51:35] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:51:35] RECOVERY - puppet last run on mw1112 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:51:36] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:51:36] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:51:44] that's probably my fault after I reverted that thing for _joe_ earlier [16:51:45] RECOVERY - puppet last run on mw1142 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:51:45] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:51:45] RECOVERY - puppet last run on mw1113 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:51:46] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:51:46] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [16:51:46] RECOVERY - puppet last run on mw1050 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:51:46] RECOVERY - puppet last run on mw1087 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:51:47] RECOVERY - puppet last run on mw1097 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:51:47] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:51:48] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:51:55] RECOVERY - puppet last run on mw1074 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:51:56] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:51:56] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:51:56] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:51:56] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:51:56] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:51:56] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:51:57] RECOVERY - puppet last run on mw1108 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:51:57] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [16:51:58] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:51:58] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:51:59] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [16:51:59] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:52:00] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:52:06] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:52:06] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:52:06] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:52:06] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [16:52:06] RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:52:16] RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:52:16] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:52:25] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:52:25] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:52:25] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:52:25] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:52:25] RECOVERY - puppet last run on mw1107 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:52:26] RECOVERY - puppet last run on mw1122 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:52:27] JimConsultant, ok, check now [16:52:45] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [16:52:45] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:52:49] thanks for noticing that [16:52:55] RECOVERY - puppet last run on mw1104 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:52:55] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:52:56] RECOVERY - puppet last run on mw1096 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:52:56] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:52:56] RECOVERY - puppet last run on mw1094 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [16:53:06] RECOVERY - puppet last run on mw1059 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:53:06] RECOVERY - puppet last run on mw1058 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:53:06] RECOVERY - puppet last run on mw1091 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:53:06] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:53:06] RECOVERY - puppet last run on mw1093 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:53:15] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:53:15] RECOVERY - puppet last run on mw1077 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:53:16] RECOVERY - puppet last run on mw1103 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [16:53:16] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [16:53:25] RECOVERY - puppet last run on mw1086 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:53:26] RECOVERY - puppet last run on mw1070 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:53:26] RECOVERY - puppet last run on mw1090 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:53:26] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:53:26] RECOVERY - puppet last run on mw1073 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:53:35] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:53:35] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:53:36] RECOVERY - puppet last run on mw1071 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:53:36] RECOVERY - puppet last run on mw1095 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:53:36] RECOVERY - puppet last run on mw1066 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:53:36] RECOVERY - puppet last run on mw1063 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:53:36] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:53:37] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:53:45] RECOVERY - puppet last run on mw1082 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:53:45] RECOVERY - puppet last run on mw1064 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [16:53:45] RECOVERY - puppet last run on mw1083 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:53:46] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:53:46] RECOVERY - puppet last run on mw1043 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:53:46] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:53:47] RECOVERY - puppet last run on mw1032 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:53:47] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:53:56] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:53:56] RECOVERY - puppet last run on mw1028 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [16:53:56] RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [16:53:56] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:53:56] Krenair: Much, thanks [16:53:56] RECOVERY - puppet last run on mw1105 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:53:57] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:54:05] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:54:06] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:54:06] RECOVERY - puppet last run on mw1035 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:54:10] 3Ops-Access-Requests: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1034569 (10JAllemandou) 3NEW a:3Ottomata [16:54:11] !log demon Synchronized php-1.25wmf16/extensions/CentralNotice/includes/BannerChooser.php: live hack for debugging (duration: 00m 06s) [16:54:15] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [16:54:15] RECOVERY - puppet last run on mw1037 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [16:54:16] RECOVERY - puppet last run on mw1020 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:54:16] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [16:54:16] RECOVERY - puppet last run on mw1045 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:54:16] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [16:54:16] RECOVERY - puppet last run on mw1010 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [16:54:17] RECOVERY - puppet last run on mw1033 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:54:17] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [16:54:17] Logged the message, Master [16:54:26] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [16:54:26] RECOVERY - puppet last run on mw1016 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:54:26] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [16:54:35] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:54:35] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:54:35] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:54:36] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:54:36] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:54:36] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [16:54:36] RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:54:45] RECOVERY - puppet last run on mw1041 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:54:55] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [16:54:56] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:54:56] RECOVERY - puppet last run on mw1027 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:54:57] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:54:57] RECOVERY - puppet last run on mw1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:56:31] andrewbogott_afk, fyi - exception log spam - 2015-02-12 16:55:09 silver labswiki: [06343869] [no req] Scribunto_LuaInterpreterNotFoundError from line 233 of /srv/mediawiki/php-1.25wmf16/extensions/Scribunto/engines/LuaSandbox/Engine.php: The luasandbox extension is not present, this engine cannot be used. [16:56:32] AndyRussG: And of course my live hack is giving us nothing. [16:56:58] JimConsultant: hmm? [16:57:11] !log demon Synchronized php-1.25wmf16/extensions/CentralNotice/includes/BannerChooser.php: rm live hack for debugging (duration: 00m 05s) [16:57:14] Logged the message, Master [16:57:24] * AndyRussG trembles at the thought... [16:57:31] I just recently got deployer creds... heh [16:58:18] I removed the type hint and told it to throw an exception if $allocContext was null [16:58:24] But I got no exceptions! [17:00:04] I think we can fix this easily enough [17:00:15] (03PS20) 10Alexandros Kosiaris: cxserver: Use different registry for Beta and Production [puppet] - 10https://gerrit.wikimedia.org/r/188796 (https://phabricator.wikimedia.org/T88793) (owner: 10KartikMistry) [17:00:27] Krenair: can you report a bug for the wikigrok log spam one? (I just barely saw it before icinga-wm made my eyes bleed) [17:00:36] JimConsultant: more than the annoying messages, I'm worried about whether there is something funky happening [17:01:23] 3operations, ops-eqiad: decom cp1037,cp1038,cp1039,cp1040 - https://phabricator.wikimedia.org/T87800#1034586 (10Cmjohnson) a:3Cmjohnson claiming to do on-site work [17:01:31] How frequent were the messages? Maybe there is some JS error that is causing some clients to run the code for the old system? Or maybe someone is bombarding our servers with free BannerRandom requests? [17:02:18] JimConsultant: (of course, since I'm not the one who has to see the messages... heh ;p ) [17:02:40] They're drowning out everything else [17:02:46] greg-g, I'm actually looking into another log spam entry right now :p [17:02:46] relating to a specific user script on enwiki [17:02:47] Even OOM, which are so noisy they're just background usually [17:03:10] PROBLEM - configured eth on restbase1005 is CRITICAL: Connection refused by host [17:03:22] PROBLEM - dhclient process on restbase1005 is CRITICAL: Connection refused by host [17:03:23] Krenair: :) [17:03:31] PROBLEM - puppet last run on restbase1005 is CRITICAL: Connection refused by host [17:03:40] PROBLEM - salt-minion processes on restbase1005 is CRITICAL: Connection refused by host [17:04:41] PROBLEM - DPKG on restbase1005 is CRITICAL: Connection refused by host [17:04:50] PROBLEM - Disk space on restbase1005 is CRITICAL: Connection refused by host [17:05:11] PROBLEM - RAID on restbase1005 is CRITICAL: Connection refused by host [17:05:25] AndyRussG: What if we just rip it all out? [17:05:34] All the random stuff? [17:05:40] Then no errors and no requests :p [17:06:15] * AndyRussG squirms [17:06:51] JimConsultant: I dunno, CentralNotice is pretty mission critical, so I'd be much, much happier getting to the bottom of this :) [17:07:18] awight: Boo! https://phabricator.wikimedia.org/T89258 [17:08:42] (03PS1) 10Giuseppe Lavagetto: mediawiki: add memcached host mc1018 as shard18 [puppet] - 10https://gerrit.wikimedia.org/r/190234 [17:09:02] AndyRussG: oh no! Must be my BannerLoader param hack? [17:09:08] (03PS1) 10Cmjohnson: Removing mgmt dns for broken dbproxy1008. replacing and renaming it with cp1038 [dns] - 10https://gerrit.wikimedia.org/r/190235 [17:09:38] awight: I don't think so! Just checked the code... We're not calling BannerRandom anywhere I can see [17:10:55] There must be lots of spooky legacy stuff floating around... [17:10:57] robh: ^^ [17:11:04] I see that I broke that variable in CentralNotice commit bc023091fe4c788a8695e78d768151ee7692b11f [17:11:13] (03CR) 10Giuseppe Lavagetto: [C: 032] "let's go." [puppet] - 10https://gerrit.wikimedia.org/r/190234 (owner: 10Giuseppe Lavagetto) [17:11:37] AndyRussG: point taken, though--let's kill Special:BannerRandom [17:11:39] cmjohnson: i take it the other one was kaput? [17:11:43] oh well! [17:11:47] awight: after Tuesday's deploy we did check banner loading, it's all happening on Special:BannerLoader [17:11:49] yep..idrac failure [17:11:58] just detail in ticket and i'll steal and use it again later today, thanks =] [17:12:00] awight: maybe we could try to figure out what's calling it? [17:12:09] yep [17:12:15] at least the system that dies is one of dozens [17:12:16] which is nice. [17:12:22] helpful [17:12:23] and now we have more memory to upgrade other systems! [17:12:32] AndyRussG: actually, ./wmf-config/CommonSettings.php:1450: $wgCentralBannerDispatcher = "//{$wmfHostnames['meta']}/wiki/Special:BannerRandom"; [17:12:46] Krenair: btw: https://phabricator.wikimedia.org/T89359 [17:12:55] Krenair: re that wikigrok error [17:13:04] awight: yes that's the old config variable. It's now on wgCentralSelectedBannerDispatcher [17:13:17] <_joe_> !log adding mc1018 to the nutcracker pool, this time without forcing a puppet run [17:13:24] Logged the message, Master [17:13:56] awight: wgCentralBannerDispatcher should only get any calls if the banner isn't chosen on the client [17:15:23] <_joe_> Krenair: are you don with swat? [17:15:31] <_joe_> I have a couple of sync-files to do [17:15:31] https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCentralNotice.git/82eb579bba58ed5050863642b51cc4b69b9b87fa/modules%2Fext.centralNotice.bannerController%2FbannerController.js#L182 [17:15:39] As per our wmf_deploy branch [17:16:14] awight: after the deploy on Tuesday we did check this stuff :) [17:16:18] 3operations, ops-eqiad: relocate/wire/setup dbproxy1003 through dbproxy1011 - https://phabricator.wikimedia.org/T86957#1034629 (10Cmjohnson) I took the old cp1038 (WMF3097) renamed it, swapped the ssds with old dbproxy1008 (other ssds need to be wiped (https://phabricator.wikimedia.org/T87800). Re-labeled and... [17:16:30] 3operations, ops-eqiad: relocate/wire/setup dbproxy1003 through dbproxy1011 - https://phabricator.wikimedia.org/T86957#1034630 (10Cmjohnson) a:5Cmjohnson>3RobH [17:16:34] _joe_, yeah [17:16:38] was done ages ago, sorry [17:16:44] <_joe_> Krenair: no no that's good [17:16:51] <_joe_> I have to sync-file opsy stuff [17:17:48] awight: JimConsultant: couldn't we pls ask someone for some request logs to see if Special:BannerRandom is being called somehow? It shouldn't be, but I'd really like to be sure [17:17:51] AndyRussG: This is making me think another thing, that we should be watching the cluster error logs any time we deploy CN... I wonder if we have access? [17:17:57] (03PS2) 10Giuseppe Lavagetto: sessions: add redis server on mc1017 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190001 [17:18:08] awight: yeah also good point... [17:18:14] awight: You do. fluorine:/a/mw-log/ [17:18:19] woot! [17:18:34] (comes with deploy access) [17:18:42] AndyRussG: you agree we can start dismantling the bad old BannerRandom stuff? [17:18:46] Ah good to know :) [17:18:50] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: puppet fail [17:18:54] awight: I agree but first I want to find out what borked [17:18:57] ... we can still watch request logs even if the endpoint is gone :) [17:19:17] Krenair: JimConsultant twentyafterfour etc: there now exists https://phabricator.wikimedia.org/tag/wikimedia-log-errors/ [17:19:25] I just saw :/ [17:19:46] greg-g: ooh, a personal rogue's gallery! :) [17:19:47] Did you see https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#New_projects ? [17:19:54] Krenair: yep [17:20:02] (03PS3) 10Giuseppe Lavagetto: sessions: add redis server on mc1017 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190001 [17:20:22] Krenair: https://phabricator.wikimedia.org/T89292 [17:20:41] (03CR) 10Giuseppe Lavagetto: [C: 032] sessions: add redis server on mc1017 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190001 (owner: 10Giuseppe Lavagetto) [17:20:48] (03CR) 10Cmjohnson: [C: 032] Removing mgmt dns for broken dbproxy1008. replacing and renaming it with cp1038 [dns] - 10https://gerrit.wikimedia.org/r/190235 (owner: 10Cmjohnson) [17:22:15] greg-g, ah I am clearly just not on top of my emails :) [17:22:15] bah, 40 unread [17:22:15] (almost all otrs or phabricator of course, but still) [17:22:38] :) [17:24:04] Krenair: is there a way to see if there are calls to Special:BannerRandom in the request logs? [17:24:15] paravoid: Hi! also, ^ ? [17:24:30] hi? [17:25:16] paravoid: tldr: we're getting requests to /wiki/Special:BannerRandom on meta. We should not be. Can we see where that traffic is coming from? [17:25:41] !log oblivian Synchronized wmf-config/session.php: Adding mc1017 to the sessions redis pool (duration: 00m 05s) [17:25:45] Request logs? I don't know if I have access to those. [17:25:45] paravoid: Hi! yes what JimConsultant said Uh, on the topic of CentralNotice emergencies... [17:25:48] Logged the message, Master [17:26:10] paravoid: Trying to run down the super-spammy CentralNotice fatals we're getting. [17:26:13] (03CR) 10Alexandros Kosiaris: [C: 04-1] cxserver: Use different registry for Beta and Production (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/188796 (https://phabricator.wikimedia.org/T88793) (owner: 10KartikMistry) [17:26:22] if you do, wouldn't that just be part of the requested URL? [17:26:55] Krenair: it would be in the request URL, it's an old background request that shouldn't be happening anymore [17:27:14] awight: BTW see also my messages about Special:BannerRandom, if we do a flash CN deploy today.. [17:27:29] AndyRussG: the sampling? Good idea! [17:27:39] err, Special:RecordImpression? [17:27:51] awight: Yes! ejegg|away's idea... [17:27:52] I checked on erbium, I don't see any recent hits [17:27:56] awight: and thanks! [17:28:16] i see just three hits (on the 1:1000 sampled logs) in the whole day actually [17:28:32] but for more log fiddling, analytics is probably your best bet [17:28:51] fwiw those three are [17:28:52] paravoid: interesting [17:28:59] http://meta.wikimedia.org/wiki/Special:BannerRandom?uselang=de&sitename=Wikipedia&project=wikipedia&anonymous=true&bucket=1&country=DE&device=desktop&slot=21&debug= [17:29:04] referer: http://de.wikipedia.org/wiki/Magnum/Episodenliste [17:29:10] http://meta.wikimedia.org/wiki/Special:BannerRandom?uselang=en&sitename=Wikipedia&project=wikipedia&anonymous=true&bucket=1&country=XX&device=android&slot=3&debug= [17:29:16] referer: http://en.m.wikipedia.org/wiki/A_Flea_in_Her_Ear [17:29:19] http://meta.m.wikimedia.org/wiki/Special:BannerRandom?uselang=en&sitename=Wikipedia&project=wikipedia&anonymous=true&bucket=1&country=IR&device=android&slot=6&debug= [17:29:25] referer: http://en.m.wikipedia.org/wiki/Main_Page [17:29:46] So real traffic and referrers, not someone doing bogus traffic, ok [17:30:04] paravoid: It's currently > 50% of the content of hhvm.log [17:30:12] maybe? who knows [17:30:13] So yeah, I think sampled isn't useful for us [17:30:25] paravoid: not the same IP addresses, of course, right? [17:30:42] (03CR) 10KartikMistry: cxserver: Use different registry for Beta and Production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/188796 (https://phabricator.wikimedia.org/T88793) (owner: 10KartikMistry) [17:30:46] no [17:30:55] the third one is with a Mobile Safari UA [17:31:04] despite the device=android in the URL [17:31:19] oh wait, that's probably just chrome for mobile [17:31:24] akosiaris: I've to go now. Tired :/ [17:31:25] (03CR) 10Rush: logstash: Support MediaWiki logs via Syslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190231 (https://phabricator.wikimedia.org/T88870) (owner: 10BryanDavis) [17:31:37] paravoid: OK thanks... [17:31:42] really appreciated! [17:32:22] (03CR) 10Alexandros Kosiaris: cxserver: Use different registry for Beta and Production (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/188796 (https://phabricator.wikimedia.org/T88793) (owner: 10KartikMistry) [17:32:23] JimConsultant: do you have a specific time when the errors started? [17:32:43] why do you think it's meta? [17:32:43] Lemme check [17:32:48] kart_: OK, have a nice night [17:32:50] (03PS2) 10Giuseppe Lavagetto: sessions: add redis server on mc1018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190002 [17:33:09] btw, the patch now compiles against production finally [17:33:11] (03CR) 10Giuseppe Lavagetto: [C: 032] sessions: add redis server on mc1018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190002 (owner: 10Giuseppe Lavagetto) [17:33:27] paravoid: I mean the banners just run from meta. [17:33:31] Traffic could be on any wiki [17:34:03] I don't understand? [17:34:26] paravoid: this URL is the old, supposeldy now deactivated way of retrieving banners [17:34:27] who's oblivian? [17:34:33] !log oblivian Synchronized wmf-config/session.php: Adding mc1018 to the sessions redis pool (duration: 00m 07s) [17:34:35] greg-g: _joe_ [17:34:42] Logged the message, Master [17:34:47] <_joe_> greg-g: what's the issue? [17:34:56] just didn't recognize the username [17:34:56] JimConsultant/AndyRussG: [17:34:57] root@erbium:/a/log/webrequest# grep -c Special:BannerRandom sampled-1000.tsv.log [17:35:00] 3 [17:35:09] <_joe_> greg-g: oh ok [17:35:13] _joe_: welcome to deploys :) [17:35:14] <_joe_> that's my shell name [17:35:25] <_joe_> greg-g: nah just deploying boring opsy stuff [17:35:28] akosiaris, godog, mobrovac, earldouglas: summarized action items in http://etherpad.wikimedia.org/p/ServiceOps [17:35:40] _joe_: don't worry, I won't add you to SWAT :P [17:35:40] <_joe_> greg-g: things that would belong to puppet to be honest :) [17:35:47] * greg-g nods [17:36:05] paravoid: it's a background call that was made when a user visited a page from any wiki, but only _to_ meta [17:36:24] 3 hits in the whole (sampled) log for any wiki [17:36:29] paravoid: a similar system is currently in place, but the calls should be going to Special:BannerLoader instead of Special:BannerRandom [17:36:32] greg-g: cool, thanks! [17:36:39] no, that was gwicke [17:36:42] :) [17:36:50] wrong g! [17:37:10] ;) [17:38:11] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [17:38:19] too many g [17:39:42] AndyRussG, awight: First time we started getting this was "Feb 10 23:38:41" from mw1020 [17:40:06] JimConsultant: that's UTC, right? [17:40:07] (So yeah, tuesday, just before midnight UTC) [17:40:09] Yeah [17:40:52] JimConsultant: OK that does coincide with our CN deploy. https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0February.C2.A010 [17:41:23] krenair@silver:/etc/php5$ diff apache2/conf.d cli/conf.d -r [17:41:23] Only in apache2/conf.d: fss.ini [17:41:23] Only in apache2/conf.d: luasandbox.ini [17:41:23] Only in apache2/conf.d: wikidiff2.ini [17:42:08] JimConsultant: I was gonna say (tho maybe this doesn't make sense given the above) maybe this is like a pre-emptive HHMV warning? Could that be why your hotpatch got nothing? [17:42:36] 3RESTBase, Services, operations, Scrum-of-Scrums: RESTbase deployment - https://phabricator.wikimedia.org/T1228#1034679 (10mobrovac) Some notes from the Services-Ops meeting available on [this etherpad](http://etherpad.wikimedia.org/p/ServiceOps) [17:42:42] PROBLEM - RAID on restbase1006 is CRITICAL: Connection refused by host [17:43:02] Causing Scribunto_LuaInterpreterNotFoundError exception.log spam from labswiki jobs which run under CLI on php5 boxes [17:43:07] AndyRussG: No, it'd fatal in any version [17:43:11] PROBLEM - configured eth on restbase1006 is CRITICAL: Connection refused by host [17:43:22] PROBLEM - dhclient process on restbase1006 is CRITICAL: Connection refused by host [17:43:24] (as opposed to hhvm where we apparently have a separate system for jobs) [17:43:24] mutante, bd808, andrewbogott_afk, etc. [17:43:32] PROBLEM - puppet last run on restbase1006 is CRITICAL: Connection refused by host [17:43:42] PROBLEM - salt-minion processes on restbase1006 is CRITICAL: Connection refused by host [17:44:05] JimConsultant: I mean, is it possible that the errors are not actually the code being run, but some analysis HHMV does on the code? [17:44:11] No [17:44:15] (03CR) 10BryanDavis: logstash: Support MediaWiki logs via Syslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190231 (https://phabricator.wikimedia.org/T88870) (owner: 10BryanDavis) [17:44:18] JimConsultant: OK thanks [17:45:17] Krenair: blerg. I thought andrewbogott_afk fixed that via puppet. Let me see if I can find the patch. [17:45:31] Probably for apache2 only [17:46:01] AndyRussG: fwiw, I think this is a run-time error and not just a lint thing, cos: Fatal error: Argument 1 passed to BannerChooser::__construct() must be an instance of AllocationContext, null given in /srv/mediawiki/php-1.25wmf16/extensions/CentralNotice/includes/BannerChooser.php on line 41 [17:46:33] Krenair: https://gerrit.wikimedia.org/r/#/c/189774/ -- -2'd by _joe_ [17:46:53] and he didn't add the cli links either [17:47:20] awight: Yeah. What's wrong is pretty clear from looking at the code. [17:47:41] Can a root hotfix the cli symlinks? [17:47:42] awight: and where could it be coming from? It's filling up the HHMV log, but there are practically no requests to Special:BannerRandom [17:47:53] !! [17:47:59] (03CR) 10BryanDavis: Puppetize a few symlinks that are hotfixed on silver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/189774 (owner: 10Andrew Bogott) [17:48:05] awight: exactly [17:48:09] Krenair: Yup. That's the help you need [17:48:59] akosiaris: Can you help fix silver by making symlinks in /etc/php5/cli/conf.d that match the ones in /etc/php5/apache2/conf.d ? [17:49:03] awight: SpecialBannerRandom tries to use $this->allocContext which it doesn't have since the refactor [17:49:05] awight: as per paravoid, "3 hits in the whole (sampled) log"... I don't think that's enough hits to fill up the hhmv log? [17:49:07] Hence null gets passed [17:49:27] that's from BannerRandom, not BannerLoader [17:49:30] and no, it's not [17:50:28] AndyRussG: yeah, it happens many times per second [17:51:07] JimConsultant: agreed. I'm sweating over a patch to remove the remaining cruft: https://gerrit.wikimedia.org/r/190239 [17:51:43] JimConsultant: awight: I don't agree with removing cruft until we know why that code is running [17:52:31] apologies to JimConsultant for having to see the logspam... [17:52:47] JimConsultant: could you paste somewhere the exact hotfix you tried that turned off the errors? [17:52:55] (03PS1) 10Awight: Remove references to Special:BannerRandom and server-side choice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190241 [17:53:07] I just removed the typehint from BannerChooser [17:53:24] I wouldn't want to do that forever though :) [17:53:36] awight: ^ OK yes I do agree with the config cruftcleansing :) [17:53:41] AndyRussG: yep, I'm still on-board with that, but meanwhile I'll get the boat ready :) [17:53:58] awight: K cool, yeah thanks much :) [17:54:19] (03CR) 10Awight: [C: 04-2] "Please don't deploy until CentralNotice will default to "client choice"." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190241 (owner: 10Awight) [17:54:50] (03CR) 10Awight: "ok. confirmed that this is safe to deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190241 (owner: 10Awight) [17:58:13] We want that out now? [17:59:36] AndyRussG: you okay with deploying the config thing now? ^^ [17:59:49] awight: one sec [18:01:11] PROBLEM - Host restbase1003 is DOWN: PING CRITICAL - Packet loss = 100% [18:01:52] that's me [18:03:02] awight: yea the config change looks fine, deploying it might even give us some insight [18:04:15] RECOVERY - Host restbase1003 is UP: PING OK - Packet loss = 0%, RTA = 2.18 ms [18:08:28] JimConsultant: if you feel like deploying the config patch, please do! [18:08:38] 3Ops-Access-Requests: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1034706 (10akosiaris) p:5Triage>3Normal [18:08:38] Can do, one moment [18:08:47] awesome, thanks! [18:08:55] 3Ops-Access-Requests: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1034708 (10akosiaris) p:5Triage>3Normal [18:09:15] (03CR) 10Chad: [C: 032] Remove references to Special:BannerRandom and server-side choice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190241 (owner: 10Awight) [18:09:24] (03Merged) 10jenkins-bot: Remove references to Special:BannerRandom and server-side choice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190241 (owner: 10Awight) [18:11:18] !log demon Synchronized wmf-config/CommonSettings.php: Remove random banner references (duration: 00m 05s) [18:11:24] Logged the message, Master [18:11:35] JimConsultant: thanks! [18:11:38] yw [18:13:20] 3Ops-Access-Requests: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1034714 (10akosiaris) Hello Joseph, Since you are new there is a one time procedure you need to follow to be granted access. Please review, and reply back with the following: - Read, comprehend,... [18:13:32] 3Ops-Access-Requests: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1034715 (10akosiaris) @Tnegrin, please approve [18:14:40] 3operations: deploy db2043-2066 - https://phabricator.wikimedia.org/T89365#1034718 (10RobH) 3NEW a:3RobH [18:14:44] apergos: Can you take a peek at https://gerrit.wikimedia.org/r/#/c/190214/2 ? [18:20:18] (03CR) 10Nikerabbit: [C: 031] CX: Update wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190218 (owner: 10KartikMistry) [18:20:18] awight, AndyRussG: Didn't expect it to really, but didn't make the errors go away [18:20:18] (03PS1) 10BryanDavis: beta: switch logstash transport from redis to syslog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190246 (https://phabricator.wikimedia.org/T88870) [18:20:18] (03CR) 10BryanDavis: [C: 04-1] "Can't go to beta until the associated MediaWiki change is merged. Beta will also need Icd6060b7ac5c44c7af6e57878bec600693ee5301 to be eith" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190246 (https://phabricator.wikimedia.org/T88870) (owner: 10BryanDavis) [18:20:38] JimConsultant: yeah that's also exactly what I expected [18:20:47] Coren: I'm kinda checked out at ths point, and I don't really know the conntrack stuff at all [18:21:08] JimConsultant: banners are still up, BTW, as far as I can see, nothing bad happened [18:21:09] but I coul look at it tomorrow morning if you haven't found someone better [18:21:24] apergos: No worries; it's hotfixed in anyways so there is no rush. [18:21:37] 3RESTBase: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1034732 (10GWicke) 3NEW [18:22:32] JimConsultant: it's pretty clear that _something_ happened when we did the deploy [18:22:42] * Coren facepalms. [18:23:24] AndyRussG: Yes, I think we've established that :) [18:23:27] akosiaris: Can you take a peek at https://gerrit.wikimedia.org/r/#/c/190214/2 ? [18:23:32] * apergos snickers [18:23:49] JimConsultant: AndyRussG I missed that--what did happen? [18:24:06] People I regularily talk to should be allowed reserved initials. :-) [18:24:11] awight: : the logs errors started exactly then [18:24:19] nope. I've had this nick a long long time [18:24:47] AndyRussG: ok. I thought u you meant the config deploy just now. [18:25:02] awight: ah no, all clear there :) sorry [18:25:26] * AndyRussG screws head on a little tighter [18:25:59] don't mind me, I'm being ridiculous and trying to back out the entire legacy bannerrandom thing. [18:26:19] Would you rather I split the patch so we only deploy the line which disables Special:BannerRandom? [18:26:20] awight: that's not ridiculous... [18:26:46] I dunno what to do yet :/ [18:26:53] My instinct is to dig the entire hole, but please stop me if you have misgivings :) [18:27:52] I think we need a patch now that fixes the immediate fatal behavior. However we do that is optional. [18:28:10] Then we can follow up with a "remove the stuff entirely" or a "actually fix it" once someone has figured out the root cause [18:28:19] Scratching our heads at the root cause is getting us nowhere [18:29:15] JimConsultant: is there no way to get a stack trace? [18:29:36] JimConsultant: awight: here is a bug that looks siiiimilar, mabe? https://phabricator.wikimedia.org/T75474 [18:29:37] I tried [18:29:46] Some kind of HHMV tuning? [18:29:48] 3operations, Phabricator: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1034741 (10Aklapper) Thanks Chase for working on this, this is damn good! Only thing I'm also wondering is if enough people understand what MediaWiki is in "Me... [18:30:01] Not really. [18:30:11] PROBLEM - Host restbase1004 is DOWN: PING CRITICAL - Packet loss = 100% [18:30:13] AndyRussG: why is that related? [18:30:16] I mean similar in that it's accessing a property that doesn't exist. [18:30:20] But not related [18:30:31] I'm sure hhvm does a little bit of static analysis but the fatal error indicates run-timeliness [18:30:36] 3Ops-Access-Requests: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1034744 (10Tnegrin) approved [18:30:54] This is all runtime. [18:31:09] JimConsultant: awight: I'm just really concerned that this dead code is being called and we have no idea how. That's messed up [18:31:14] seriously. [18:31:27] and apparently not via web request, although it's on all the app servers [18:31:43] Yeah! that's just weirding me out [18:32:01] It's obviously not as dead as we think :) [18:32:12] So what can we do to alleviate the problem until that root cause is figured out? [18:32:14] Agreed that we should dry out the gremlins [18:32:22] RECOVERY - Host restbase1004 is UP: PING OK - Packet loss = 0%, RTA = 2.58 ms [18:32:23] I did a grep through all code in my local deploy staging repo for SpecialBannerRandom, nothing there [18:32:33] JimConsultant: we're working on a patch ASAP to disable that special page [18:32:35] Anyone wanna see if it's somehow there on tin? [18:32:49] Let's summon... ori? yt? [18:33:35] ori: hi! We have this: https://phabricator.wikimedia.org/T89258 ... but that code is supposed to be fully dead and never run.. any thoughts? [18:33:55] awight: Thank you [18:35:11] ori: AndyRussG ^^ and we hear there are virtually no web requests to that URL, yet many fatal errors per second are appearing in hhvm.log. [18:37:56] JimConsultant: what about a hotfix that sends a stack trace into the log or somewhere? You could just turn it off and on reeaaaly fast [18:38:08] awight: ^ [18:38:14] I tried that and got no stack trace. [18:38:18] whoa [18:38:38] I see how it gets called [18:38:40] It's very clear now [18:38:44] JimConsultant: I mean, you took out the type hint [18:38:45] ? [18:39:00] hey csteipp [18:39:13] * AndyRussG nearly falls off edge of seat.. [18:39:16] Well, sorta. SpecialBannerRandom::execute() gets called [18:39:21] (that's unclear) [18:39:31] PROBLEM - Host restbase1004 is DOWN: PING CRITICAL - Packet loss = 100% [18:39:37] JimConsultant: what is calling it? If that's a webrequest, why do the requests not appear in the log? (according to rumor) [18:39:42] JimConsultant: yeah see that should neve happen [18:39:57] AndyRussG: well if there's a web request, it should happen [18:40:06] awight: right but no web requests [18:40:18] ipso facto... [18:40:23] Maybe we should be double-checking that fact? [18:40:37] Something else could be calling execute() on a special page possibly. Dunno why [18:41:26] JimConsultant: right, so we gotta track that down [18:41:46] rogue browser tests :)? [18:42:01] I have an idea for a debug point, sec [18:42:02] nah that would be a web request [18:42:05] running at 100/sec on the cluster! [18:43:16] I'm grepping staging on tin [18:44:09] !log demon Synchronized php-1.25wmf16/extensions/CentralNotice/special/SpecialBannerRandom.php: live hack (duration: 00m 08s) [18:44:15] Logged the message, Master [18:44:30] THERE we go [18:44:44] O_o [18:45:40] https://phabricator.wikimedia.org/P289 [18:45:47] Much better info as to when it's being called [18:46:05] hehehe [18:46:32] fwiw, throwing the exception there shuts up the fatal [18:46:36] So that's clearly it [18:47:01] 3operations, ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1034772 (10RobH) 3NEW a:3RobH [18:47:01] !log demon Synchronized php-1.25wmf16/extensions/CentralNotice/special/SpecialBannerRandom.php: rm live hack, have our data (duration: 00m 06s) [18:47:01] So if it's being called from web requests, why are there rumored to not be any request logs to this endpoint? [18:47:04] Logged the message, Master [18:47:27] 3operations, ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1034772 (10RobH) Placement discussion is still ongoing (no answer yet) in IRC. I'm keeping this assigned to me until we make some decisions on using row D. [18:47:38] (03PS1) 10GWicke: Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/190254 [18:47:43] lastlog requuest [18:47:47] doh [18:47:51] awight: JimConsultant: can we get the request URL? [18:48:04] AndyRussG: it's in there, /wiki/Special:BannerRandom?uselang=en&sitename=Wikipedia&project=wikipedia&anonymous=true&bucket=0&country=ID&device=desktop&slot=20 [18:48:05] you did in that stacktrace [18:48:14] And it was metawiki [18:48:18] (obvs) [18:48:27] godog, akosiaris: submodule update @ https://gerrit.wikimedia.org/r/190254 [18:48:37] Right sorry.. [18:48:53] Maybe the referer? [18:48:59] Not in this log [18:49:00] So ahem, can we take another look at the weblogs for Special:BannerRandom? [18:49:19] Anyway, I think request logs aren't all that interesting... [18:49:26] I think we need the unsampled logs then [18:49:37] JimConsultant: well, it's just that we heard that there were only 3 requests to this endpoint [18:49:41] JimConsultant: has it been going down at all since the config change? [18:49:43] that seems either impossible or very bad. [18:49:57] awight: I think it's most likely cached JS on anons calling the old code path [18:50:05] Maybe Special:BannerRandom is also filtered from the logs when they're sampled? [18:50:35] JimConsultant: but that was 4 months ago... and it started when we did the deploy [18:50:37] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/190254 (owner: 10GWicke) [18:50:43] AndyRussG: oooh, we did have a udp2log grep on that forever, now [18:51:00] afaik though, that doesn't remove the requests from the general log. [18:51:20] gwicke: so it'll be always two changes with submodules eh? sigh [18:52:40] So it must be the sampled logs are wrong and there _are_ millions of these things flying in per second [18:53:00] AndyRussG: CSS/JS can potentially get cached for up to 90 days. [18:53:09] (it's why we keep old branches around on bits for awhile) [18:53:28] That's 3 months right there [18:54:00] JimConsultant: still doesn't explain why it started on Tuesday [18:54:12] AndyRussG: cos I broke the S:BannerRandom endpoint... [18:54:24] previously it would have succeeded [18:54:28] That ^ [18:54:43] (03CR) 10Jsahleen: "Can't really comment on this because I don't know the scope of users that would be affected by the change. It seems trivial, but if it bre" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 (owner: 10Se4598) [18:55:34] s/90/30/ in theory for varnish cache. Not sure if we send longer cache duration to the browser directly [18:55:58] godog: sadly, yes [18:56:09] awight: JimConsultant: hmmm... so, this is ancient cached JS continuing to call the now-broken endpoint? [18:56:21] awight: what was that sha of that patch again? [18:56:29] godog: we have been doing the same with the deploy repos; it gets a bit old after a while [18:56:59] but don't have a magic better solution either [18:57:08] Krenair: what's up? [18:57:17] AndyRussG: Yes, that's my thought [18:57:18] apart from some automated process to always pull the latest [18:57:36] bd808: Is it really only 30 now? Eh, it was 90 [18:57:38] * JimConsultant shrugs [18:57:51] AndyRussG: bc023091fe4c788a8695e78d768151ee7692b11f [18:58:41] csteipp, have been watching the exception logs and I've seen a load coming out of WebRequest->checkUrlExtension() [18:58:59] JimConsultant: I'm pretty sure 30 but I'd have to dig up the old email I wrote that explained it in gory detail. I figured it all out when I was advocating for weekly dead branch purging [18:59:06] shall I pm you the details? [18:59:11] gwicke: the magic better solution is no submodules :P [18:59:24] it's trivial to reproduce the error [18:59:24] Krenair: Yeah [19:00:06] 3operations, Phabricator: Mysql search issues flagged by Phabricator setup - https://phabricator.wikimedia.org/T89274#1034816 (10Chad) [19:00:52] awight: JimConsultant: the new banner choosing system config shows up on the mediawiki-config repo on Nov. 25, here: fcd536d2cfe4b18d1a0f64856a0947160ffd999b [19:00:56] bd808: Heh, even if it's 30, my point remains..."longer than you'd think" [19:00:58] JimConsultant: Yup, found the email. 30 days in varnish for anons (at best; could be evicted sooner) [19:01:26] godog: that has its downsides too.. but yeah. [19:01:57] I wrote about it in great detail in an email titled "Cleaning up old MW branches on tin and apaches" to ops-l about a year ago [19:02:26] * bd808 should clean that up and put it on a wiki somewhere [19:03:34] 3operations, ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1034823 (10RobH) a:5RobH>3faidon Update: I'd like to use row D, and thus keep our consistency of what service types go in what racks throughout the codfw buildout. That consistency is as follows: A1:... [19:07:06] PROBLEM - NTP on restbase1005 is CRITICAL: NTP CRITICAL: Offset unknown [19:08:15] 3RESTBase: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1034833 (10GWicke) [19:12:24] bd808: I remember that email; I was rooting for the cleanup [19:13:08] 90 days is quite a long time to keep old branches around [19:13:32] 3RESTBase, Ops-Access-Requests: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1034843 (10GWicke) [19:14:10] JimConsultant: I'm diving into some changes to the JS code that went out with this deploy. I'm guessing that some JS error is making the code sometimes think that it should revert to the old system when it shouldn't [19:15:38] 3operations, ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1034853 (10RobH) @papaul: Don't take this task quite yet, but you CAN go ahead and rack the db systems that will fit into C6-codfw. The issue is where to put the remainder. [19:16:33] JimConsultant: Is it OK to let the log errors continue for a few hours while we try some stuff and deploy a possible fix? [19:16:40] Yeah [19:16:45] Important thing is it's being worked on :) [19:18:47] !log demon Synchronized php-1.25wmf16/extensions/CentralNotice/special/SpecialBannerRandom.php: rm live hack leftovers, now being worked on (duration: 00m 05s) [19:18:56] Logged the message, Master [19:18:57] Ok, I cleaned all of my debug hacks out [19:20:21] JimConsultant: OK thanks!! I'll keep u posted :) [19:20:32] PROBLEM - Host restbase1005 is DOWN: PING CRITICAL - Packet loss = 100% [19:20:41] that's me [19:21:00] (03PS4) 10Dzahn: misc-web-lb changes to support servermon.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/188389 (https://phabricator.wikimedia.org/T88427) (owner: 10RobH) [19:21:02] RECOVERY - Host restbase1005 is UP: PING WARNING - Packet loss = 80%, RTA = 1.53 ms [19:21:22] 3Ops-Access-Requests: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1034863 (10JAllemandou) Hi Alexandros, I read, comprehend, and sign the Acknowledgement of Wikimedia Server Access Responsibilities document a few hours ago :) I have also already signed up on wik... [19:21:28] 3operations, Continuous-Integration: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1034864 (10hashar) [19:26:49] (03PS5) 10Dzahn: misc-web-lb changes to support servermon.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/188389 (https://phabricator.wikimedia.org/T88427) (owner: 10RobH) [19:31:57] 3operations, Phabricator: Mysql search issues flagged by Phabricator setup - https://phabricator.wikimedia.org/T89274#1034896 (10thiemowmde) +1 to the word length (I just reported this, see T89369) and +1 to the operator. To be honest I would classify the fact that a search engine defaults to OR as a bug nowaday... [19:43:23] (03CR) 10Dzahn: "this was quite a rebase because meanwhile other backends had been added and this attempted to also sort them alphabetically. i removed tha" [puppet] - 10https://gerrit.wikimedia.org/r/188389 (https://phabricator.wikimedia.org/T88427) (owner: 10RobH) [19:44:10] PROBLEM - DPKG on restbase1006 is CRITICAL: Connection refused by host [19:44:39] PROBLEM - Disk space on restbase1006 is CRITICAL: Connection refused by host [19:44:48] (03CR) 10Dzahn: [C: 032] misc-web-lb changes to support servermon.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/188389 (https://phabricator.wikimedia.org/T88427) (owner: 10RobH) [19:49:07] paravoid: WRT the logs you checked out earlier, is it possible that Special:RandomBanner was somehow filtered out in the sampling process, and that it's much more commonly requested than it seemed? Would it be possible to check unsampled logs, by chance, especially from Tuesday onward? thanks in advance! [19:49:20] 3operations, Datasets-General-or-Unknown: dumps.wikimedia.org seems super-slow right now - https://phabricator.wikimedia.org/T45647#1034952 (10Nemo_bis) 5Resolved>3Open Happening again. Henrik Abelsson wrote: > I don't know, but I get similar speeds both from the (colocated) server running stats.grok.se and... [19:51:34] 3Ops-Access-Requests: Give Tyler Cipriani shell access (with access to CI systems as well) - https://phabricator.wikimedia.org/T89378#1034958 (10greg) 3NEW [19:51:54] 3operations, Datasets-General-or-Unknown: dumps.wikimedia.org seems super-slow right now - https://phabricator.wikimedia.org/T45647#1034965 (10ggellerman) [19:54:40] (03PS1) 10RobH: setting db2043-db2070 [dns] - 10https://gerrit.wikimedia.org/r/190272 [19:55:48] (03CR) 10RobH: [C: 032] setting db2043-db2070 [dns] - 10https://gerrit.wikimedia.org/r/190272 (owner: 10RobH) [19:57:26] (03CR) 10Kaldari: [C: 031] Enable gather extension on en beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189863 (owner: 10Robmoen) [19:58:26] (03PS1) 10Dzahn: servermon: enforce https when behind misc-web [puppet] - 10https://gerrit.wikimedia.org/r/190276 [19:59:19] (03PS2) 10Dzahn: servermon: enforce https when behind misc-web [puppet] - 10https://gerrit.wikimedia.org/r/190276 (https://phabricator.wikimedia.org/T88427) [20:01:03] 3Ops-Access-Requests: Give Tyler Cipriani shell access (with access to CI systems as well) - https://phabricator.wikimedia.org/T89378#1034999 (10RobH) I appears that @thcipriani isn't yet a user on our cluster, so we'll need to follow the steps on: https://wikitech.wikimedia.org/wiki/Requesting_shell_access Pl... [20:02:15] (03CR) 10Dzahn: [C: 032] "we do the same thing in several other places already, straight copy from kibana/doc/and others" [puppet] - 10https://gerrit.wikimedia.org/r/190276 (https://phabricator.wikimedia.org/T88427) (owner: 10Dzahn) [20:04:06] (03CR) 10Dzahn: [C: 032] switch servermon to misc-web [dns] - 10https://gerrit.wikimedia.org/r/188723 (https://phabricator.wikimedia.org/T88427) (owner: 10Dzahn) [20:05:06] !log moving servermon behind misc-web [20:05:13] Logged the message, Master [20:09:24] 3operations, ops-eqiad: decom cp1037,cp1038,cp1039,cp1040 - https://phabricator.wikimedia.org/T87800#1035010 (10Cmjohnson) The disks have been wiped, updated racktables and server spares. [20:09:33] (03PS1) 10Dzahn: servermon: include Apache mod_headers [puppet] - 10https://gerrit.wikimedia.org/r/190280 (https://phabricator.wikimedia.org/T88427) [20:10:35] 3Ops-Access-Requests: Give Tyler Cipriani shell access (with access to CI systems as well) - https://phabricator.wikimedia.org/T89378#1035015 (10RobH) p:5Triage>3Normal [20:10:47] (03CR) 10Dzahn: [C: 032] servermon: include Apache mod_headers [puppet] - 10https://gerrit.wikimedia.org/r/190280 (https://phabricator.wikimedia.org/T88427) (owner: 10Dzahn) [20:10:54] (03PS1) 10Cmjohnson: Removing old cp1037-1040 mgmt entries, the asset tags remain [dns] - 10https://gerrit.wikimedia.org/r/190282 [20:11:19] (03PS2) 10Dzahn: servermon: include Apache mod_headers [puppet] - 10https://gerrit.wikimedia.org/r/190280 (https://phabricator.wikimedia.org/T88427) [20:12:08] (03CR) 10Cmjohnson: [C: 032] Removing old cp1037-1040 mgmt entries, the asset tags remain [dns] - 10https://gerrit.wikimedia.org/r/190282 (owner: 10Cmjohnson) [20:13:04] 3operations, ops-eqiad: decom cp1037,cp1038,cp1039,cp1040 - https://phabricator.wikimedia.org/T87800#1035022 (10Cmjohnson) Removed cp10xx.mgmt and left the asset tags [20:13:15] 3operations, ops-eqiad: decom cp1037,cp1038,cp1039,cp1040 - https://phabricator.wikimedia.org/T87800#1035023 (10Cmjohnson) 5Open>3Resolved [20:13:39] 3operations, ops-eqiad: dysprosium failed idrac - https://phabricator.wikimedia.org/T88129#1035024 (10Cmjohnson) the idrac license we have doesn't work. This will require a call to Dell. [20:17:37] (03PS1) 10Ori.livneh: vbench: control for garbage collection [puppet] - 10https://gerrit.wikimedia.org/r/190285 [20:20:45] (03PS2) 10Ori.livneh: vbench: control for garbage collection [puppet] - 10https://gerrit.wikimedia.org/r/190285 [20:21:20] (03CR) 10Ori.livneh: [C: 032 V: 032] vbench: control for garbage collection [puppet] - 10https://gerrit.wikimedia.org/r/190285 (owner: 10Ori.livneh) [20:21:41] 3operations: Move servermon.wikimedia.org behind misc-web - https://phabricator.wikimedia.org/T88427#1035031 (10Dzahn) servermon.wikimedia.org is an alias for misc-web-lb.eqiad.wikimedia.org. .. * About to connect() to servermon.wikimedia.org port 443 (#0) .. * Connected to servermon.wikimedia.org (208.80.154.2... [20:21:44] 3Labs, hardware-requests, ops-eqiad, operations: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1035032 (10Cmjohnson) Great News! The old cp box that just broke (idrac related) has 4 sticks of 8GB RAM that will work. Let me know when you want to add/swap. [20:22:02] 3Ops-Access-Requests: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1035042 (10RobH) @nuria While this may have been an oversight, we still have set policies to follow on these, outlined here: https://wikitech.wikimedia.org/wiki/Requesting_shell_access#Escalating_Existing_Shel... [20:22:26] 3operations: Migrate racktables to servermon - https://phabricator.wikimedia.org/T88424#1035046 (10Dzahn) [20:22:28] 3operations: Move servermon.wikimedia.org behind misc-web - https://phabricator.wikimedia.org/T88427#1035045 (10Dzahn) 5Open>3Resolved [20:30:57] (03PS1) 10Dzahn: servermon: turn RewriteEngine on for proto redirect [puppet] - 10https://gerrit.wikimedia.org/r/190298 (https://phabricator.wikimedia.org/T88427) [20:31:35] (03CR) 10Dzahn: [C: 032] servermon: turn RewriteEngine on for proto redirect [puppet] - 10https://gerrit.wikimedia.org/r/190298 (https://phabricator.wikimedia.org/T88427) (owner: 10Dzahn) [20:34:33] 3operations: Move servermon.wikimedia.org behind misc-web - https://phabricator.wikimedia.org/T88427#1035059 (10Dzahn) curl -vvv http://servermon.wikimedia.org .. < HTTP/1.1 301 Moved Permanently < Server: Apache/2.2.22 (Ubuntu) < Vary: X-Forwarded-Proto,Accept-Encoding < Location: https://servermon.wikimedia.or... [20:36:32] 3operations: Move servermon.wikimedia.org behind misc-web - https://phabricator.wikimedia.org/T88427#1035076 (10Dzahn) @akosiaris now same for librenms? [20:38:58] PROBLEM - Disk space on einsteinium is CRITICAL: DISK CRITICAL - free space: /home/smalyshev/morespace 3776 MB (3% inode=99%): [20:39:38] please ignore einsteinium diskspace warnings - it's ok [20:41:22] 3Ops-Access-Requests: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1035102 (10Nuria) Signed document. FYI, I am not really looking for wide priviledges here, just the bare minimum to do my work. It is unfortunate that Eventlogging requires sudo to do operations like tail lo... [20:49:15] 3Ops-Access-Requests: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1035117 (10Tnegrin) approved by manager [20:51:21] 3Ops-Access-Requests: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1035123 (10RobH) I think we have everything we need to process this, pending the rubberstamping during Monday ops meeting. (Since this is simply mirroring the access @nuria has on another system, I don't expec... [20:52:05] 3Project-Creators, operations, OTRS: Project Proposal: Label style projects for common operations tools - https://phabricator.wikimedia.org/T1147#1035125 (10Krenair) I did #otrs [20:57:06] 3Ops-Access-Requests: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1035143 (10RobH) I'm also not seeing where @nuria's rights for sudo are being assgined yet. eventlogging-roots: gid: 739 description: Full root on EventLogging servers. members: [] privileges... [20:58:14] nuria: im totally not trying to make your request hard, but it seems i dunno how the heck you have sudo on the one and not the other ;D [20:58:26] (i still expect to get this handled, im actually just trying to put in a patchset so on monday its just merged) [20:58:27] 3Ops-Access-Requests: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1035144 (10Nuria) Yes, I have sudo on vanadium, I'm making usage of it right now to tail logs. [20:58:52] robh: no worries, strutinizing access requests is a GOOD thing [20:59:11] yea i just dont see where the system applies your current sudo [20:59:15] =P [20:59:39] i'll dig through the role files, perhaps its declared in an odd place [20:59:47] robh: I get it, just letting you know that i'm not in a quest for power, for me the less privileges i have, the better [21:00:00] robh: eventlogging admins? [21:00:06] robh: is there such a thing? [21:00:35] yep [21:00:37] but you arent in the array [21:00:45] and it applies to both halfnium and vanadium equally [21:00:50] its why its so odd you have one and not the other. [21:01:26] so yea, my asking all this shit now isnt me saying you cannot have it. you obvouisly have it on half the systems you need so i support you getting it properly distributed across them both [21:01:36] im just trying to figure out the implementation [21:01:48] its just odd that it works at all for you tbh [21:02:54] i think maybe you have it in place cuz its old cruft from a previous implementation of the role [21:03:02] and when it was updated, you just werent copied over appropriately [21:03:14] I could test this now, but i rather wait and test on monday before i push the change [21:03:20] so i dont break your access today ;D [21:04:31] (the test would be to pull your sudoer file and then rerun puppet ;) [21:04:33] (03CR) 10GWicke: restbase: switch to new partitioning scheme (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190182 (https://phabricator.wikimedia.org/T76986) (owner: 10Filippo Giunchedi) [21:04:48] so i'm intersted enough that im gooing to claim your ticket to implement on monday [21:06:53] 3Ops-Access-Requests: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1035162 (10RobH) a:3RobH Ops note: It seems that nuria should be added to eventlogging-roots, which will give sudo on JUST these two systems (vanadium, halfnium). However, there is no puppet reference that... [21:08:22] robh: wait maybe i got sudo there now? [21:08:26] robh: andrewbogott_afk and I talked about this before [21:08:30] and he logged a ticket that basically said [21:08:33] robh: i just tried recently and could ssh but had no sudo [21:08:41] nuria has perms in some places that seem to not have made it into puppet correctly [21:08:55] he was trying to figure out the correct perms for millimetric at the time [21:09:05] but I don't know where the train of investigation ended [21:09:10] I guess wherever you are seeing things :) [21:10:05] it makes me wonder how many other permissions may not have been properly cleaned up [21:10:07] and worries me [21:10:33] in this case, nuria's permissions are not a big deal, trusted user, etc... but that doesnt mean it doesnt bug me that i cannot see where it was implemented, heh [21:10:53] ... can i just say this is far more interesting than dead hard disks ;D [21:11:16] (which my past roles had me handle more of, this is far more fun) [21:11:27] robh: confirming, NO sudo in hafnium -as of yet- [21:11:55] yea, so you have a file in /etc/sudoer/d on vanadium. I am convinced it was put into place by some older puppetized versions of user implementation [21:12:10] but considering how wonky user implmentation was before the admin module implementation, this isnt surprising. [21:12:20] robh: you probably checked that ages ago, or it isn't even possible in your puppet-setup, but is there any chance, someone just added him to the /etc/sudoers manually? ;) [21:12:33] Trminator: its totally possible [21:12:46] i think that is what happened before on another box [21:12:49] i was giving my coworkers the benefit of the doubt and doing it in puppet a long time ago ;D [21:13:07] but either way im making a task now to audit the /etc/sudoers across cluster and compare to data.yml [21:13:23] so when i find them going forward im totally going to kill them with fire [21:13:40] nuria: not yours, by benefit of being patient 0 you get a pass until we fix it properly on monday ;D [21:13:42] robh: well, clean way would be to tell puppet to clear out etc sudoer.d before adding new stuff? ;) [21:14:02] Trminator: agreed 100%, but when we first pushed the new admin module we avoided removing user accoutns with it [21:14:09] since we were transitioning systems to it [21:14:11] ahh, yea [21:14:20] makes sense :) [21:14:20] robh: no rush on my end though, so i can wait until you guys fix it properly [21:14:24] its been discussed in past in place ops meeting in january that we may want to start enforcing and removing [21:14:31] robh: as long as my sudo on vanadium remains [21:14:50] nuria: yep, im not taking it away, and im only waiting until monday for the sudo addition due to following my own policy ;D [21:15:01] well, just remove the sudo, and check who comes screaming? ;) [21:15:31] (not talking about nuria but generally any non-puppet added sudo) [21:16:15] yea, thats what i plan to do, just not this week [21:16:25] i rather just clear the action with my team before i generate potential upset users [21:16:32] users being those with shell. [21:17:12] (03PS1) 10Chad: Add a few more aliases to my .gitconfig [puppet] - 10https://gerrit.wikimedia.org/r/190350 [21:18:00] robh: ofc. [21:18:38] but man i like the idea. [21:18:41] heh [21:19:33] someone able to do a super easy puppet merge? https://gerrit.wikimedia.org/r/190350 is just a dotfile for me [21:21:14] I dont do work for consultants! [21:21:16] ;D [21:21:32] doing now [21:21:47] (03CR) 10RobH: [C: 032] Add a few more aliases to my .gitconfig [puppet] - 10https://gerrit.wikimedia.org/r/190350 (owner: 10Chad) [21:22:19] JimConsultant: live on palladium [21:22:24] ty sir [21:22:38] welcome [21:22:52] praise = blame ?:) [21:23:36] Depending on my mood :p [21:24:35] 3Ops-Access-Requests: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1035215 (10Ottomata) Joseph will need to be in the following groups: - analytics-admins - analytics-privatedata-users - statistics-privatedata-users - statistics-users - statistics-web-users - stat... [21:36:26] 3Ops-Access-Requests: Requesting access to ANALYTICS RESOURCES for joal - https://phabricator.wikimedia.org/T89357#1035277 (10RobH) As the analytics-admins and statistics-admins grants limited user sudo rights, this should be listed for approval during the operations meeting (per https://wikitech.wikimedia.org/w... [21:50:04] 3operations, hardware-requests, Wikimedia-Logstash: purchase 3 additional logstash nodes - https://phabricator.wikimedia.org/T89402#1035355 (10RobH) 3NEW a:3RobH [21:50:39] 3hardware-requests, operations, Wikimedia-Logstash: Production hardware for Logstash service - https://phabricator.wikimedia.org/T84958#1035376 (10RobH) [21:51:22] 3operations, hardware-requests, Wikimedia-Logstash: purchase 3 additional logstash nodes - https://phabricator.wikimedia.org/T89402#1035355 (10RobH) [21:51:23] 3hardware-requests, operations, Wikimedia-Logstash: Allocate temporary Elasticsearch nodes from spares pool for Logstash - https://phabricator.wikimedia.org/T87460#1035379 (10RobH) [21:52:21] w00t! [21:55:07] 3hardware-requests, operations, Wikimedia-Logstash: Production hardware for Logstash service - https://phabricator.wikimedia.org/T84958#1035393 (10RobH) [21:55:08] 3operations, hardware-requests, Wikimedia-Logstash: purchase 3 additional logstash nodes - https://phabricator.wikimedia.org/T89402#1035391 (10RobH) 5Open>3stalled The RT ticket for this order is: https://rt.wikimedia.org/Ticket/Display.html?id=9199 I am setting this task to stalled while the actual procur... [21:56:23] (03PS1) 10Ori.livneh: vbench: add devwiki configuration [puppet] - 10https://gerrit.wikimedia.org/r/190356 [22:00:21] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [22:00:48] (03CR) 10Ori.livneh: [C: 032] vbench: add devwiki configuration [puppet] - 10https://gerrit.wikimedia.org/r/190356 (owner: 10Ori.livneh) [22:02:32] isn't devwiki overlap with wikitec? [22:03:33] (03PS1) 10Amire80: Enable EducationProgram in the Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190357 [22:04:15] (03CR) 10Dzahn: "what's the bug for this one? is this for the dev portal page?" [puppet] - 10https://gerrit.wikimedia.org/r/190356 (owner: 10Ori.livneh) [22:04:35] (03PS1) 10Cmjohnson: Adding new dns entries for mc1007-1016 (Do not merge until _joe_ reviews) [dns] - 10https://gerrit.wikimedia.org/r/190358 [22:05:19] mutante: it's a private test rig [22:05:35] not a public wiki [22:06:37] (03CR) 10Dzahn: "yep, kind of expected that. though when i split i "why don't you also do ..."" [puppet] - 10https://gerrit.wikimedia.org/r/189898 (owner: 10Dzahn) [22:07:01] ori: sounds mysterious:) [22:07:40] mutante: i can show you if you like :) [22:10:03] (03Abandoned) 10Ottomata: Add ananthrk to statistics-users group [puppet] - 10https://gerrit.wikimedia.org/r/183047 (owner: 10Ottomata) [22:11:22] (03PS2) 10Amire80: Enable EducationProgram in the Hebrew Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190357 [22:13:56] (03CR) 10Dzahn: "you are right though, back to "per role" patches i guess. can't easily prove it right with puppet compiler either because i would have to " [puppet] - 10https://gerrit.wikimedia.org/r/189898 (owner: 10Dzahn) [22:14:01] (03Abandoned) 10Dzahn: fix all 'variable not enclosed by {}' [puppet] - 10https://gerrit.wikimedia.org/r/189898 (owner: 10Dzahn) [22:14:56] greg-g, hi, was there a big data consumuption spike between december 23rd and january 16th? milimetric? [22:15:01] (03CR) 10Dzahn: "that said, if i made as many patches as this needed per role i'm pretty sure it would be disliked as well" [puppet] - 10https://gerrit.wikimedia.org/r/189898 (owner: 10Dzahn) [22:15:29] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [22:19:45] (03CR) 10Dzahn: [C: 04-2] "include role::nova::manager on virt1000 which isn't correct anymore" [puppet] - 10https://gerrit.wikimedia.org/r/189779 (owner: 10Dzahn) [22:19:48] (03Abandoned) 10Dzahn: site.pp: fix lint errors/warns (puppet-lint 1.1) [puppet] - 10https://gerrit.wikimedia.org/r/189779 (owner: 10Dzahn) [22:19:49] (03CR) 10Dzahn: [C: 031] "just renamed the project in phab. like non-capitalized better anyways:)" [puppet] - 10https://gerrit.wikimedia.org/r/189140 (https://phabricator.wikimedia.org/T88842) (owner: 10Dzahn) [22:24:29] 3operations, ops-codfw: take a look at fdb2001 (in fundraising rack) and see whether it actually has a bad hdd - https://phabricator.wikimedia.org/T89407#1035444 (10Jgreen) 3NEW [22:26:06] Hi! Anyone want to help figure out more about the CentralNotice hhvm.log issue? [22:30:58] (03PS2) 10Dzahn: mediawiki: add codfw monitoring groups [puppet] - 10https://gerrit.wikimedia.org/r/188895 (https://phabricator.wikimedia.org/T86894) [22:33:55] (03CR) 10Dzahn: "why cant't something outside a module reference a file in a module? is that really worse than having icinga files in completely unrelated " [puppet] - 10https://gerrit.wikimedia.org/r/187087 (owner: 10Dzahn) [22:39:49] (03CR) 10Dzahn: "bump? any reviews?" [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [22:41:53] (03CR) 10BryanDavis: "Upstream change merged but beta still needs the associated puppet chages" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190246 (https://phabricator.wikimedia.org/T88870) (owner: 10BryanDavis) [22:43:42] (03CR) 10Thcipriani: [C: 031] "Seems like this code makes the same change in production as I845730650a7861e6713783c01a5ac699a9f4db09 made on testing. Worked fine." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190246 (https://phabricator.wikimedia.org/T88870) (owner: 10BryanDavis) [22:48:50] (03PS3) 10Dzahn: phab: direct_comments_allowed for Domains tickets [puppet] - 10https://gerrit.wikimedia.org/r/189140 (https://phabricator.wikimedia.org/T88842) [22:51:44] (03CR) 10Dzahn: [C: 032] phab: direct_comments_allowed for Domains tickets [puppet] - 10https://gerrit.wikimedia.org/r/189140 (https://phabricator.wikimedia.org/T88842) (owner: 10Dzahn) [22:55:30] !log restarting phab for config change [22:55:35] Logged the message, Master [22:59:49] 3Project-Creators, operations, OTRS: Project Proposal: Label style projects for common operations tools - https://phabricator.wikimedia.org/T1147#1035535 (10Springle) +1 to a DBA or Databases tag. Although given our future includes various databases like MariaDB, Cassandra and Titan-er-some-graphdb, perhaps ta... [23:01:25] (03CR) 10Dzahn: [C: 032] Add a mobile subdomain for wikitech [dns] - 10https://gerrit.wikimedia.org/r/189761 (https://phabricator.wikimedia.org/T87633) (owner: 10MaxSem) [23:02:57] 3Project-Creators, operations, OTRS: Project Proposal: Label style projects for common operations tools - https://phabricator.wikimedia.org/T1147#1035537 (10hashar) [23:05:46] is this normal/ok in the mediawiki-config repo? [23:05:52] php: broken symbolic link to `php-1.25wmf15' [23:05:59] p: broken symbolic link to `php' [23:06:07] wmf-deployment: broken symbolic link to `php' [23:08:28] (03CR) 10Hashar: "Instead maybe you can have each patch to fix several errors but only impact a a single module ?" [puppet] - 10https://gerrit.wikimedia.org/r/189898 (owner: 10Dzahn) [23:09:57] mutante, is wikitech now behind the generic varnishes (text/mobile)? [23:10:03] MaxSem: no [23:10:07] (03PS1) 10Dzahn: enable MobileFrontend on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190373 [23:10:18] MaxSem: still has public IP, just moved to silver [23:10:32] mmmm [23:10:46] yea, i dunno, i also thought it was supposed to be "like cluster" [23:11:13] MW will just redirect to desktop domain without hostname rewrites in varnish [23:11:43] (03CR) 10Hashar: "Asked Nik on T85964" [puppet] - 10https://gerrit.wikimedia.org/r/183222 (https://phabricator.wikimedia.org/T85964) (owner: 10Hashar) [23:11:45] yea, good, that's why i merged it. i figured it doesnt hurt either eway [23:11:51] yup:P [23:11:54] but enables us to do more [23:12:14] worst case, revert and flush teh varnish:P [23:13:51] mutante, if there's no HTTP cache in front of it, we can just enable PHP autodetection:P [23:20:36] mutante, weeeeeee http://wikitech.m.wikimedia.org/ [23:20:51] that's apache [23:21:18] 3hardware-requests, ops-codfw, operations: Procure and setup rbf2001-2002 - https://phabricator.wikimedia.org/T86897#1035612 (10RobH) [23:21:20] 3operations, ops-codfw: reclaim rbf2002/WMF5833 back to spare, allocate WMF5845 as rbf2002 - https://phabricator.wikimedia.org/T88380#1035610 (10RobH) 5Open>3Resolved No clue, it works now, so resolved. [23:22:54] MaxSem: :) no error [23:24:08] 3operations, ops-codfw: rack/wire/initial setup of db2043-db2070 - https://phabricator.wikimedia.org/T89368#1035618 (10Papaul) ok will start with C6 tomorrow [23:25:13] mpreparing a config change.... [23:28:51] 3operations, ops-codfw: take a look at fdb2001 (in fundraising rack) and see whether it actually has a bad hdd - https://phabricator.wikimedia.org/T89407#1035629 (10Papaul) Yes there is a bad drive. I will contact HP tomorrow for replacement. [23:29:50] (03PS1) 10MaxSem: Enable PHP-based autodetection on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190380 [23:29:57] mutante, ^ [23:30:08] kaldari, ^^^ :P [23:33:07] (03CR) 10Kaldari: [C: 031] Enable PHP-based autodetection on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190380 (owner: 10MaxSem) [23:36:41] (03PS1) 10Ori.livneh: vbench: update shell wrapper [puppet] - 10https://gerrit.wikimedia.org/r/190382 [23:37:37] (03PS2) 10Ori.livneh: vbench: update shell wrapper [puppet] - 10https://gerrit.wikimedia.org/r/190382 [23:38:23] 3operations: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1035650 (10Halfak) 3NEW [23:38:54] (03CR) 10Ori.livneh: [C: 032] vbench: update shell wrapper [puppet] - 10https://gerrit.wikimedia.org/r/190382 (owner: 10Ori.livneh) [23:50:51] (03PS1) 10Dzahn: phab: direct_comments from wikimedia.org for 'domains' [puppet] - 10https://gerrit.wikimedia.org/r/190383 [23:58:58] mutante, are you ok with deploying MF this SWAT?