[00:08:49] <James_F>	 legoktm: I've got a great idea. Write a script to make 150 mw-config commits, each just changing one require_once() to wfLoadExtension() and not depending on each other. The backlog will shame us into merging as many of them as possible, and the not-yet-ready outstanding ones will shame us into making them ready. :-)
[00:09:02] <legoktm>	 :(
[00:09:36] <icinga-wm>	 RECOVERY - grafana.wikimedia.org on krypton is OK: HTTP OK: HTTP/1.1 200 OK - 1485 bytes in 0.007 second response time
[00:10:17] <mutante>	 !log apache restart on krypton
[00:10:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:10:25] <icinga-wm>	 RECOVERY - HTTP on krypton is OK: HTTP OK: HTTP/1.1 200 OK - 1485 bytes in 0.002 second response time
[00:11:54] <ori>	 mutante: :)
[00:12:48] <grrrit-wm>	 (03CR) 10Dzahn: "not needed anymore - fixed by https://gerrit.wikimedia.org/r/#/c/230682/" [puppet] - 10https://gerrit.wikimedia.org/r/230664 (owner: 10Dzahn)
[00:12:58] <grrrit-wm>	 (03Abandoned) 10Dzahn: Revert "grafana: add role to krypton (VM)" [puppet] - 10https://gerrit.wikimedia.org/r/230664 (owner: 10Dzahn)
[00:13:18] <mutante>	 ori: :) thanks
[00:13:38] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Add A/PTR for mr1-codfw and msw1-codfw [dns] - 10https://gerrit.wikimedia.org/r/230696 (owner: 10Faidon Liambotis)
[00:16:30] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "@krypton:/etc/apache2/sites-enabled# curl localhost 2>/dev/null | grep body" [puppet] - 10https://gerrit.wikimedia.org/r/230660 (https://phabricator.wikimedia.org/T105008) (owner: 10Dzahn)
[00:18:54] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1526267 (10Krenair)
[00:19:57] <grrrit-wm>	 (03PS2) 10Dzahn: misc-web: switch grafana to backend krypton [puppet] - 10https://gerrit.wikimedia.org/r/230660 (https://phabricator.wikimedia.org/T105008) 
[00:21:08] <grrrit-wm>	 (03PS4) 10Ori.livneh: Enforce a hard limit on RestbaseUpdateJobOnDependencyChange retries [puppet] - 10https://gerrit.wikimedia.org/r/226901 (https://phabricator.wikimedia.org/T73853) (owner: 10GWicke)
[00:21:17] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] Enforce a hard limit on RestbaseUpdateJobOnDependencyChange retries [puppet] - 10https://gerrit.wikimedia.org/r/226901 (https://phabricator.wikimedia.org/T73853) (owner: 10GWicke)
[00:23:07] <grrrit-wm>	 (03PS1) 10Tim Starling: Enable ParsoidBatchAPI everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230708 
[00:24:18] <grrrit-wm>	 (03CR) 10Tim Starling: [C: 04-2] "Can be deployed once the ParsoidBatchAPI source tree is available in all active deployment branches." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230708 (owner: 10Tim Starling)
[00:25:40] <grrrit-wm>	 (03CR) 10Dzahn: "i get:" [puppet] - 10https://gerrit.wikimedia.org/r/230660 (https://phabricator.wikimedia.org/T105008) (owner: 10Dzahn)
[00:33:05] <grrrit-wm>	 (03PS1) 10Dzahn: OTRS: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230709 
[00:36:36] <icinga-wm>	 PROBLEM - Disk space on cp1054 is CRITICAL: DISK CRITICAL - free space: / 342 MB (3% inode=88%)
[00:38:50] <wikibugs>	 10Ops-Access-Requests, 6operations, 5Patch-For-Review: stat1002 access for tgr - https://phabricator.wikimedia.org/T108417#1526317 (10Dzahn) p:5Triage>3Normal
[00:39:02] <wikibugs>	 10Ops-Access-Requests, 6operations, 5Patch-For-Review: stat1002 access for tgr - https://phabricator.wikimedia.org/T108417#1526320 (10Dzahn) a:3ArielGlenn
[00:39:02] <ori>	 bblack: cp1054 disk ^ ?
[00:39:40] <wikibugs>	 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to stat1002 for csteipp - https://phabricator.wikimedia.org/T108227#1526322 (10Dzahn) p:5Triage>3Normal
[00:39:51] <wikibugs>	 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to stat1002 for csteipp - https://phabricator.wikimedia.org/T108227#1526324 (10Dzahn) a:3ArielGlenn
[00:42:29] <wikibugs>	 10Ops-Access-Reviews: Analytics-users membership for csteipp - https://phabricator.wikimedia.org/T108351#1526326 (10Dzahn) a:3ArielGlenn
[00:48:36] <icinga-wm>	 RECOVERY - carbon-cache too many creates on graphite1001 is OK Less than 1.00% above the threshold [500.0]
[00:50:41] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: Add pybal-testsvc.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/230713 
[00:51:32] <wikibugs>	 6operations, 10Traffic: Evaluate Apache Traffic Server - https://phabricator.wikimedia.org/T96853#1526345 (10Dzahn) "comparision: Serving small static files with Nginx, Varnish, G-WAN, Lighthttpd, Apache Traffic Server"  https://x443.wordpress.com/2012/07/07/comparision-serving-small-static-files-with-nginx-va...
[00:52:36] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] "weee" [dns] - 10https://gerrit.wikimedia.org/r/230713 (owner: 10Faidon Liambotis)
[00:54:52] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Add pybal-testsvc.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/230713 (owner: 10Faidon Liambotis)
[01:00:01] <grrrit-wm>	 (03CR) 10Dzahn: "in hiera it would be public, i would assume it needs to be private in the private puppet repo and then read from there like we do for othe" [puppet] - 10https://gerrit.wikimedia.org/r/230549 (https://phabricator.wikimedia.org/T108610) (owner: 10Yurik)
[01:01:05] <icinga-wm>	 RECOVERY - Disk space on cp1054 is OK: DISK OK
[01:06:44] <wikibugs>	 6operations, 5Patch-For-Review: move grafana from zirconium to a VM - https://phabricator.wikimedia.org/T105008#1526397 (10Dzahn) that was fixed by https://gerrit.wikimedia.org/r/#/c/230682/  so the role is applied on krypton  but "Could not contact Elasticsearch. Please ensure that Elasticsearch is reachable...
[01:08:26] <grrrit-wm>	 (03PS1) 10BryanDavis: logging: Only send info and higher to logstash by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230719 
[01:09:56] <grrrit-wm>	 (03CR) 10BryanDavis: logging: Only send info and higher to logstash by default (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230719 (owner: 10BryanDavis)
[01:17:36] <grrrit-wm>	 (03CR) 10Yurik: "Existing osm_importer password is pulled via" [puppet] - 10https://gerrit.wikimedia.org/r/230549 (https://phabricator.wikimedia.org/T108610) (owner: 10Yurik)
[01:20:08] <logmsgbot>	 !log ori@tin Synchronized php-1.26wmf17/includes/resourceloader/ResourceLoader.php: I2089b21fc: Revert resourceloader: Add must-revalidate to Cache-Control (duration: 00m 12s)
[01:20:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:05:45] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[02:07:45] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 18880 bytes in 0.044 second response time
[02:08:19] <robh>	 flappy flap flap
[02:08:25] <robh>	 page =P
[02:23:33] <logmsgbot>	 !log l10nupdate@tin Synchronized php-1.26wmf17/cache/l10n: l10nupdate for 1.26wmf17 (duration: 06m 48s)
[02:23:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:26:59] <logmsgbot>	 !log l10nupdate@tin LocalisationUpdate completed (1.26wmf17) at 2015-08-11 02:26:58+00:00
[02:27:04] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:54:06] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[02:56:06] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 18879 bytes in 1.054 second response time
[03:44:46] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 6.67% of data above the critical threshold [500.0]
[03:49:34] <grrrit-wm>	 (03PS1) 10BBlack: strongswan: decrease log spam [puppet] - 10https://gerrit.wikimedia.org/r/230725 
[03:50:50] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] strongswan: decrease log spam [puppet] - 10https://gerrit.wikimedia.org/r/230725 (owner: 10BBlack)
[03:52:56] <grrrit-wm>	 (03PS1) 10BBlack: bugfix for aaa20e4ce [puppet] - 10https://gerrit.wikimedia.org/r/230726 
[03:53:13] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] bugfix for aaa20e4ce [puppet] - 10https://gerrit.wikimedia.org/r/230726 (owner: 10BBlack)
[03:54:47] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[03:56:46] <icinga-wm>	 PROBLEM - puppet last run on cp1055 is CRITICAL puppet fail
[03:57:47] <icinga-wm>	 PROBLEM - puppet last run on cp3045 is CRITICAL puppet fail
[03:58:26] <icinga-wm>	 PROBLEM - puppet last run on cp3039 is CRITICAL puppet fail
[03:58:27] <icinga-wm>	 PROBLEM - puppet last run on cp3012 is CRITICAL puppet fail
[03:58:45] <bblack>	 those will self-correct
[03:58:55] <bblack>	 I think! :)
[04:00:26] <icinga-wm>	 RECOVERY - puppet last run on cp3039 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures
[04:02:05] <icinga-wm>	 RECOVERY - puppet last run on cp3045 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures
[04:02:56] <icinga-wm>	 RECOVERY - puppet last run on cp1055 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:04:46] <icinga-wm>	 RECOVERY - puppet last run on cp3012 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[04:36:41] <grrrit-wm>	 (03PS3) 10BBlack: rename varnish backends more-explicitly [puppet] - 10https://gerrit.wikimedia.org/r/230687 
[04:36:52] <grrrit-wm>	 (03PS4) 10BBlack: rename varnish backends more-explicitly [puppet] - 10https://gerrit.wikimedia.org/r/230687 
[04:37:50] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] rename varnish backends more-explicitly [puppet] - 10https://gerrit.wikimedia.org/r/230687 (owner: 10BBlack)
[04:40:28] <grrrit-wm>	 (03PS1) 10BBlack: Revert "Revert "cache::config: replace lvs IP refs with service hostnames"" [puppet] - 10https://gerrit.wikimedia.org/r/230728 
[04:41:21] <grrrit-wm>	 (03PS2) 10BBlack: Revert "Revert "cache::config: replace lvs IP refs with service hostnames"" [puppet] - 10https://gerrit.wikimedia.org/r/230728 
[04:43:26] <icinga-wm>	 PROBLEM - puppet last run on cp1069 is CRITICAL Puppet has 1 failures
[04:44:39] <grrrit-wm>	 (03PS1) 10EBernhardson: fix incorrect whitespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230729 
[04:44:55] <icinga-wm>	 PROBLEM - puppet last run on cp3035 is CRITICAL Puppet has 2 failures
[04:45:27] <icinga-wm>	 PROBLEM - puppet last run on cp4007 is CRITICAL Puppet has 1 failures
[04:46:27] <icinga-wm>	 PROBLEM - puppet last run on cp2007 is CRITICAL Puppet has 1 failures
[04:46:56] <icinga-wm>	 RECOVERY - puppet last run on cp3035 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:47:26] <icinga-wm>	 PROBLEM - puppet last run on cp3034 is CRITICAL Puppet has 2 failures
[04:47:26] <icinga-wm>	 PROBLEM - puppet last run on cp2021 is CRITICAL Puppet has 1 failures
[04:48:26] <icinga-wm>	 PROBLEM - puppet last run on cp2023 is CRITICAL Puppet has 1 failures
[04:48:26] <icinga-wm>	 RECOVERY - puppet last run on cp2007 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures
[04:48:57] <icinga-wm>	 PROBLEM - puppet last run on cp3010 is CRITICAL Puppet has 2 failures
[04:49:05] <icinga-wm>	 PROBLEM - puppet last run on cp2026 is CRITICAL Puppet has 1 failures
[04:49:16] <icinga-wm>	 PROBLEM - puppet last run on cp3040 is CRITICAL Puppet has 2 failures
[04:49:26] <icinga-wm>	 PROBLEM - puppet last run on cp3047 is CRITICAL Puppet has 2 failures
[04:49:35] <icinga-wm>	 RECOVERY - puppet last run on cp4007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:49:56] <icinga-wm>	 PROBLEM - puppet last run on cp2011 is CRITICAL Puppet has 1 failures
[04:50:15] <icinga-wm>	 PROBLEM - puppet last run on cp3042 is CRITICAL Puppet has 2 failures
[04:50:16] <icinga-wm>	 PROBLEM - puppet last run on cp3018 is CRITICAL Puppet has 2 failures
[04:50:26] <icinga-wm>	 PROBLEM - puppet last run on cp2003 is CRITICAL Puppet has 1 failures
[04:50:30] <bblack>	 on the bulk of them it's a race condition that will work itself out...
[04:51:15] <icinga-wm>	 PROBLEM - puppet last run on cp2016 is CRITICAL Puppet has 1 failures
[04:51:27] <icinga-wm>	 PROBLEM - puppet last run on cp2014 is CRITICAL Puppet has 1 failures
[04:51:36] <icinga-wm>	 PROBLEM - puppet last run on cp4014 is CRITICAL Puppet has 1 failures
[04:51:56] <icinga-wm>	 PROBLEM - puppet last run on cp2010 is CRITICAL Puppet has 1 failures
[04:52:06] <icinga-wm>	 PROBLEM - puppet last run on cp4019 is CRITICAL Puppet has 2 failures
[04:52:16] <icinga-wm>	 PROBLEM - puppet last run on cp3030 is CRITICAL Puppet has 2 failures
[04:52:16] <icinga-wm>	 PROBLEM - puppet last run on cp1058 is CRITICAL Puppet has 1 failures
[04:52:45] <icinga-wm>	 PROBLEM - puppet last run on cp4015 is CRITICAL Puppet has 1 failures
[04:53:16] <icinga-wm>	 PROBLEM - puppet last run on cp4017 is CRITICAL Puppet has 1 failures
[04:53:16] <icinga-wm>	 PROBLEM - puppet last run on cp3006 is CRITICAL Puppet has 2 failures
[04:53:36] <icinga-wm>	 PROBLEM - puppet last run on cp4005 is CRITICAL Puppet has 1 failures
[04:54:16] <icinga-wm>	 PROBLEM - puppet last run on cp3009 is CRITICAL Puppet has 2 failures
[04:54:25] <icinga-wm>	 PROBLEM - puppet last run on cp2019 is CRITICAL Puppet has 1 failures
[04:54:26] <icinga-wm>	 PROBLEM - puppet last run on cp2022 is CRITICAL Puppet has 1 failures
[04:54:35] <icinga-wm>	 PROBLEM - puppet last run on cp2015 is CRITICAL Puppet has 1 failures
[04:56:16] <icinga-wm>	 PROBLEM - puppet last run on cp3046 is CRITICAL Puppet has 2 failures
[04:56:17] <icinga-wm>	 PROBLEM - puppet last run on cp3013 is CRITICAL Puppet has 2 failures
[04:56:17] <icinga-wm>	 PROBLEM - puppet last run on cp3031 is CRITICAL Puppet has 2 failures
[04:56:55] <icinga-wm>	 PROBLEM - puppet last run on cp3045 is CRITICAL Puppet has 2 failures
[04:56:56] <icinga-wm>	 PROBLEM - puppet last run on cp2017 is CRITICAL Puppet has 1 failures
[04:59:06] <icinga-wm>	 PROBLEM - puppet last run on cp3016 is CRITICAL Puppet has 2 failures
[04:59:26] <icinga-wm>	 PROBLEM - puppet last run on cp3039 is CRITICAL Puppet has 2 failures
[04:59:39] <icinga-wm>	 PROBLEM - puppet last run on cp3012 is CRITICAL Puppet has 2 failures
[04:59:46] <icinga-wm>	 PROBLEM - puppet last run on cp4009 is CRITICAL Puppet has 1 failures
[05:00:46] <icinga-wm>	 PROBLEM - puppet last run on cp2002 is CRITICAL Puppet has 1 failures
[05:01:06] <icinga-wm>	 PROBLEM - puppet last run on cp3008 is CRITICAL Puppet has 2 failures
[05:01:45] <icinga-wm>	 PROBLEM - puppet last run on cp3048 is CRITICAL Puppet has 2 failures
[05:01:46] <icinga-wm>	 PROBLEM - puppet last run on cp2013 is CRITICAL Puppet has 1 failures
[05:01:56] <icinga-wm>	 PROBLEM - puppet last run on cp1057 is CRITICAL Puppet has 1 failures
[05:02:05] <icinga-wm>	 PROBLEM - puppet last run on cp3007 is CRITICAL Puppet has 2 failures
[05:02:05] <icinga-wm>	 PROBLEM - puppet last run on cp3017 is CRITICAL Puppet has 2 failures
[05:02:35] <icinga-wm>	 PROBLEM - puppet last run on cp2001 is CRITICAL Puppet has 1 failures
[05:02:36] <icinga-wm>	 PROBLEM - puppet last run on cp4016 is CRITICAL Puppet has 1 failures
[05:02:37] <icinga-wm>	 PROBLEM - puppet last run on cp4010 is CRITICAL Puppet has 1 failures
[05:03:16] <icinga-wm>	 PROBLEM - puppet last run on cp3036 is CRITICAL Puppet has 2 failures
[05:03:56] <icinga-wm>	 PROBLEM - puppet last run on cp3032 is CRITICAL Puppet has 2 failures
[05:04:06] <icinga-wm>	 PROBLEM - puppet last run on cp3003 is CRITICAL Puppet has 2 failures
[05:05:15] <icinga-wm>	 PROBLEM - puppet last run on cp4018 is CRITICAL Puppet has 2 failures
[05:05:15] <icinga-wm>	 PROBLEM - puppet last run on cp4013 is CRITICAL Puppet has 2 failures
[05:05:55] <icinga-wm>	 PROBLEM - puppet last run on cp2009 is CRITICAL Puppet has 1 failures
[05:06:45] <icinga-wm>	 PROBLEM - puppet last run on cp3015 is CRITICAL Puppet has 2 failures
[05:06:56] <icinga-wm>	 PROBLEM - puppet last run on cp2004 is CRITICAL Puppet has 1 failures
[05:07:16] <icinga-wm>	 PROBLEM - puppet last run on cp3049 is CRITICAL Puppet has 2 failures
[05:07:56] <icinga-wm>	 PROBLEM - puppet last run on cp2024 is CRITICAL Puppet has 1 failures
[05:08:17] <icinga-wm>	 PROBLEM - puppet last run on cp1070 is CRITICAL Puppet has 1 failures
[05:09:16] <icinga-wm>	 PROBLEM - puppet last run on cp4006 is CRITICAL Puppet has 1 failures
[05:09:36] <icinga-wm>	 PROBLEM - puppet last run on cp4020 is CRITICAL Puppet has 2 failures
[05:09:46] <icinga-wm>	 PROBLEM - puppet last run on cp2005 is CRITICAL Puppet has 1 failures
[05:10:05] <icinga-wm>	 PROBLEM - puppet last run on cp3014 is CRITICAL Puppet has 2 failures
[05:10:36] <icinga-wm>	 PROBLEM - puppet last run on cp2008 is CRITICAL Puppet has 1 failures
[05:10:46] <icinga-wm>	 PROBLEM - puppet last run on cp3038 is CRITICAL Puppet has 2 failures
[05:10:55] <icinga-wm>	 PROBLEM - puppet last run on cp3037 is CRITICAL Puppet has 2 failures
[05:10:55] <icinga-wm>	 PROBLEM - puppet last run on cp1056 is CRITICAL Puppet has 1 failures
[05:11:16] <icinga-wm>	 PROBLEM - puppet last run on cp4008 is CRITICAL Puppet has 2 failures
[05:11:46] <icinga-wm>	 PROBLEM - puppet last run on cp4012 is CRITICAL Puppet has 2 failures
[05:11:55] <icinga-wm>	 PROBLEM - puppet last run on cp3004 is CRITICAL Puppet has 2 failures
[05:11:56] <icinga-wm>	 PROBLEM - puppet last run on cp2020 is CRITICAL Puppet has 1 failures
[05:12:05] <icinga-wm>	 RECOVERY - puppet last run on cp3034 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:12:06] <icinga-wm>	 RECOVERY - puppet last run on cp2021 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:12:16] <icinga-wm>	 PROBLEM - puppet last run on cp3041 is CRITICAL Puppet has 2 failures
[05:12:46] <icinga-wm>	 PROBLEM - puppet last run on cp3033 is CRITICAL Puppet has 2 failures
[05:13:25] <icinga-wm>	 PROBLEM - puppet last run on cp3005 is CRITICAL Puppet has 2 failures
[05:13:47] <icinga-wm>	 PROBLEM - puppet last run on cp4011 is CRITICAL Puppet has 2 failures
[05:13:55] <icinga-wm>	 PROBLEM - puppet last run on cp3043 is CRITICAL Puppet has 2 failures
[05:14:16] <icinga-wm>	 PROBLEM - puppet last run on cp3044 is CRITICAL Puppet has 2 failures
[05:14:56] <icinga-wm>	 RECOVERY - puppet last run on cp3018 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures
[05:15:05] <icinga-wm>	 RECOVERY - puppet last run on cp2023 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:15:21] <grrrit-wm>	 (03PS1) 10BBlack: cache_misc: convert backends to director-style [puppet] - 10https://gerrit.wikimedia.org/r/230730 
[05:15:36] <icinga-wm>	 RECOVERY - puppet last run on cp3010 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures
[05:15:36] <icinga-wm>	 RECOVERY - puppet last run on cp2026 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:15:47] <icinga-wm>	 RECOVERY - puppet last run on cp2016 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures
[05:15:56] <icinga-wm>	 RECOVERY - puppet last run on cp3040 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures
[05:16:06] <icinga-wm>	 RECOVERY - puppet last run on cp3047 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:16:37] <icinga-wm>	 RECOVERY - puppet last run on cp2011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:16:46] <icinga-wm>	 RECOVERY - salt-minion processes on labstore1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[05:16:55] <icinga-wm>	 RECOVERY - puppet last run on cp3042 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:16:55] <icinga-wm>	 RECOVERY - puppet last run on cp1058 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures
[05:17:06] <icinga-wm>	 RECOVERY - puppet last run on cp2003 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures
[05:17:25] <icinga-wm>	 RECOVERY - puppet last run on cp4015 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures
[05:17:56] <icinga-wm>	 RECOVERY - puppet last run on cp3006 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures
[05:18:15] <icinga-wm>	 RECOVERY - puppet last run on cp2014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:18:16] <icinga-wm>	 RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:18:27] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] cache_misc: convert backends to director-style [puppet] - 10https://gerrit.wikimedia.org/r/230730 (owner: 10BBlack)
[05:18:36] <icinga-wm>	 RECOVERY - puppet last run on cp2010 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures
[05:18:46] <icinga-wm>	 RECOVERY - puppet last run on cp4019 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:18:56] <icinga-wm>	 RECOVERY - puppet last run on cp3030 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:19:05] <icinga-wm>	 RECOVERY - puppet last run on cp2022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:19:56] <icinga-wm>	 RECOVERY - puppet last run on cp4017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:20:06] <icinga-wm>	 RECOVERY - puppet last run on cp1069 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures
[05:20:16] <icinga-wm>	 RECOVERY - puppet last run on cp4005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:20:16] <grrrit-wm>	 (03PS3) 10BBlack: Revert "Revert "cache::config: replace lvs IP refs with service hostnames"" [puppet] - 10https://gerrit.wikimedia.org/r/230728 
[05:20:56] <icinga-wm>	 RECOVERY - puppet last run on cp3031 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures
[05:20:57] <icinga-wm>	 RECOVERY - puppet last run on cp3013 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[05:20:57] <icinga-wm>	 RECOVERY - puppet last run on cp3009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:21:05] <icinga-wm>	 RECOVERY - puppet last run on cp2019 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures
[05:21:12] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] Revert "Revert "cache::config: replace lvs IP refs with service hostnames"" [puppet] - 10https://gerrit.wikimedia.org/r/230728 (owner: 10BBlack)
[05:21:16] <icinga-wm>	 RECOVERY - puppet last run on cp2015 is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures
[05:21:35] <icinga-wm>	 RECOVERY - puppet last run on cp2017 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures
[05:22:46] <icinga-wm>	 PROBLEM - salt-minion processes on labstore1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[05:22:57] <icinga-wm>	 RECOVERY - puppet last run on cp3046 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:27:07] <icinga-wm>	 PROBLEM - puppet last run on cp3037 is CRITICAL Puppet has 2 failures
[05:27:07] <icinga-wm>	 PROBLEM - puppet last run on cp1056 is CRITICAL Puppet has 1 failures
[05:27:15] <icinga-wm>	 PROBLEM - puppet last run on cp2002 is CRITICAL Puppet has 1 failures
[05:27:16] <icinga-wm>	 PROBLEM - puppet last run on cp2004 is CRITICAL Puppet has 1 failures
[05:27:33] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Aug 11 05:27:33 UTC 2015 (duration 27m 32s)
[05:27:35] <icinga-wm>	 PROBLEM - puppet last run on cp4018 is CRITICAL Puppet has 2 failures
[05:27:36] <icinga-wm>	 PROBLEM - puppet last run on cp4006 is CRITICAL Puppet has 1 failures
[05:27:36] <icinga-wm>	 PROBLEM - puppet last run on cp4008 is CRITICAL Puppet has 2 failures
[05:27:36] <icinga-wm>	 PROBLEM - puppet last run on cp4013 is CRITICAL Puppet has 2 failures
[05:27:36] <icinga-wm>	 PROBLEM - puppet last run on cp3036 is CRITICAL Puppet has 2 failures
[05:27:37] <icinga-wm>	 PROBLEM - puppet last run on cp3049 is CRITICAL Puppet has 2 failures
[05:27:37] <icinga-wm>	 RECOVERY - puppet last run on cp3045 is OK Puppet is currently enabled, last run 5 minutes ago with 0 failures
[05:27:37] <icinga-wm>	 PROBLEM - puppet last run on cp3005 is CRITICAL Puppet has 2 failures
[05:27:37] <icinga-wm>	 PROBLEM - puppet last run on cp3008 is CRITICAL Puppet has 2 failures
[05:27:38] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[05:27:46] <icinga-wm>	 PROBLEM - puppet last run on cp4020 is CRITICAL Puppet has 2 failures
[05:27:46] <icinga-wm>	 RECOVERY - puppet last run on cp3016 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[05:28:06] <icinga-wm>	 PROBLEM - puppet last run on cp2005 is CRITICAL Puppet has 1 failures
[05:28:06] <icinga-wm>	 PROBLEM - puppet last run on cp4012 is CRITICAL Puppet has 2 failures
[05:28:06] <icinga-wm>	 PROBLEM - puppet last run on cp4011 is CRITICAL Puppet has 2 failures
[05:28:07] <icinga-wm>	 PROBLEM - puppet last run on cp3043 is CRITICAL Puppet has 2 failures
[05:28:07] <icinga-wm>	 PROBLEM - puppet last run on cp3048 is CRITICAL Puppet has 2 failures
[05:28:07] <icinga-wm>	 PROBLEM - puppet last run on cp3004 is CRITICAL Puppet has 2 failures
[05:28:07] <icinga-wm>	 RECOVERY - puppet last run on cp3039 is OK Puppet is currently enabled, last run 4 minutes ago with 0 failures
[05:28:15] <icinga-wm>	 PROBLEM - puppet last run on cp2020 is CRITICAL Puppet has 1 failures
[05:28:15] <icinga-wm>	 PROBLEM - puppet last run on cp2024 is CRITICAL Puppet has 1 failures
[05:28:15] <icinga-wm>	 PROBLEM - puppet last run on cp2013 is CRITICAL Puppet has 1 failures
[05:28:15] <icinga-wm>	 PROBLEM - puppet last run on cp2009 is CRITICAL Puppet has 1 failures
[05:28:16] <icinga-wm>	 PROBLEM - puppet last run on cp1057 is CRITICAL Puppet has 1 failures
[05:28:16] <icinga-wm>	 PROBLEM - puppet last run on cp3032 is CRITICAL Puppet has 2 failures
[05:28:17] <icinga-wm>	 PROBLEM - puppet last run on cp3014 is CRITICAL Puppet has 2 failures
[05:28:17] <icinga-wm>	 RECOVERY - puppet last run on cp3012 is OK Puppet is currently enabled, last run 4 minutes ago with 0 failures
[05:28:26] <icinga-wm>	 RECOVERY - puppet last run on cp4009 is OK Puppet is currently enabled, last run 3 minutes ago with 0 failures
[05:28:26] <icinga-wm>	 PROBLEM - puppet last run on cp3044 is CRITICAL Puppet has 2 failures
[05:28:27] <icinga-wm>	 PROBLEM - puppet last run on cp3041 is CRITICAL Puppet has 2 failures
[05:28:27] <icinga-wm>	 PROBLEM - puppet last run on cp3007 is CRITICAL Puppet has 2 failures
[05:28:27] <icinga-wm>	 PROBLEM - puppet last run on cp3017 is CRITICAL Puppet has 2 failures
[05:28:27] <icinga-wm>	 PROBLEM - puppet last run on cp3003 is CRITICAL Puppet has 2 failures
[05:28:27] <icinga-wm>	 PROBLEM - puppet last run on cp1070 is CRITICAL Puppet has 1 failures
[05:28:56] <icinga-wm>	 PROBLEM - puppet last run on cp2008 is CRITICAL Puppet has 1 failures
[05:28:56] <icinga-wm>	 PROBLEM - puppet last run on cp2001 is CRITICAL Puppet has 1 failures
[05:28:56] <icinga-wm>	 PROBLEM - puppet last run on cp4016 is CRITICAL Puppet has 1 failures
[05:29:06] <icinga-wm>	 PROBLEM - puppet last run on cp4010 is CRITICAL Puppet has 1 failures
[05:29:06] <icinga-wm>	 PROBLEM - puppet last run on cp3038 is CRITICAL Puppet has 2 failures
[05:29:06] <icinga-wm>	 PROBLEM - puppet last run on cp3033 is CRITICAL Puppet has 2 failures
[05:29:06] <icinga-wm>	 PROBLEM - puppet last run on cp3015 is CRITICAL Puppet has 2 failures
[05:29:07] <icinga-wm>	 RECOVERY - puppet last run on cp1056 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures
[05:29:19] <bblack>	 that the puppet check re-emits old failures when you flip the local disable switch makes it even more spammy :P
[05:29:37] <icinga-wm>	 RECOVERY - puppet last run on cp4018 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:29:37] <icinga-wm>	 RECOVERY - puppet last run on cp4013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:29:39] <bblack>	 salting puppets all of the place now just to clean it all faster
[05:30:16] <icinga-wm>	 RECOVERY - puppet last run on cp2024 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures
[05:30:25] <icinga-wm>	 RECOVERY - puppet last run on cp2013 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures
[05:30:26] <icinga-wm>	 RECOVERY - puppet last run on cp1057 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[05:30:26] <icinga-wm>	 RECOVERY - puppet last run on cp3032 is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures
[05:30:37] <icinga-wm>	 RECOVERY - puppet last run on cp3044 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures
[05:30:37] <icinga-wm>	 RECOVERY - puppet last run on cp3003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:31:17] <icinga-wm>	 RECOVERY - puppet last run on cp4010 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures
[05:31:17] <icinga-wm>	 RECOVERY - puppet last run on cp3033 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures
[05:31:26] <icinga-wm>	 RECOVERY - puppet last run on cp3037 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures
[05:31:36] <icinga-wm>	 RECOVERY - puppet last run on cp2002 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures
[05:31:36] <icinga-wm>	 RECOVERY - puppet last run on cp2004 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures
[05:31:55] <icinga-wm>	 RECOVERY - puppet last run on cp4006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:31:55] <icinga-wm>	 RECOVERY - puppet last run on cp4008 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[05:31:56] <icinga-wm>	 RECOVERY - puppet last run on cp3005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:32:16] <icinga-wm>	 RECOVERY - puppet last run on cp2005 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures
[05:32:26] <icinga-wm>	 RECOVERY - puppet last run on cp3048 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures
[05:32:26] <icinga-wm>	 RECOVERY - puppet last run on cp2009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:32:46] <icinga-wm>	 RECOVERY - puppet last run on cp3041 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures
[05:32:46] <icinga-wm>	 RECOVERY - puppet last run on cp3017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:32:46] <icinga-wm>	 RECOVERY - puppet last run on cp1070 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures
[05:33:16] <icinga-wm>	 RECOVERY - puppet last run on cp2008 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures
[05:33:25] <icinga-wm>	 RECOVERY - puppet last run on cp3015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:33:56] <icinga-wm>	 RECOVERY - puppet last run on cp3049 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:33:56] <icinga-wm>	 RECOVERY - puppet last run on cp3008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:34:06] <icinga-wm>	 RECOVERY - puppet last run on cp4020 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:34:25] <icinga-wm>	 RECOVERY - puppet last run on cp4011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:34:26] <icinga-wm>	 RECOVERY - puppet last run on cp4012 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures
[05:34:26] <icinga-wm>	 RECOVERY - puppet last run on cp3043 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:34:36] <icinga-wm>	 RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:34:47] <icinga-wm>	 RECOVERY - puppet last run on cp3007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:35:15] <icinga-wm>	 RECOVERY - puppet last run on cp2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:35:16] <icinga-wm>	 RECOVERY - puppet last run on cp4016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:35:25] <icinga-wm>	 RECOVERY - puppet last run on cp3038 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures
[05:35:56] <icinga-wm>	 RECOVERY - puppet last run on cp3036 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:36:26] <icinga-wm>	 RECOVERY - puppet last run on cp2020 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:36:26] <icinga-wm>	 RECOVERY - puppet last run on cp3004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:45:20] <wikibugs>	 6operations, 6Analytics-Backlog, 10Analytics-EventLogging, 10Traffic: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1526649 (10awight) > That probably made sense in 2006, when the article that SO post is based on was...
[05:56:47] <wikibugs>	 6operations, 6Analytics-Backlog, 10Analytics-EventLogging, 10Traffic: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1526651 (10BBlack) Even if browsers allow >2K URLs, they seem like a poor idea in general.  Even a 1...
[06:00:00] <grrrit-wm>	 (03PS2) 10BBlack: tlsproxy: multi_accept off [puppet] - 10https://gerrit.wikimedia.org/r/230553 
[06:02:55] <grrrit-wm>	 (03PS3) 10BBlack: tlsproxy: multi_accept off [puppet] - 10https://gerrit.wikimedia.org/r/230553 
[06:03:38] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: multi_accept off [puppet] - 10https://gerrit.wikimedia.org/r/230553 (owner: 10BBlack)
[06:12:20] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1526653 (10BBlack)
[06:21:00] <grrrit-wm>	 (03CR) 10Matanya: [C: 031] OTRS: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230709 (owner: 10Dzahn)
[06:21:21] <wikibugs>	 6operations, 10MediaWiki-General-or-Unknown: Stop a poolcounter server fail from being a SPOF for the service and the api (and the site) - https://phabricator.wikimedia.org/T105378#1526661 (10Joe)
[06:30:45] <icinga-wm>	 PROBLEM - puppet last run on mc2015 is CRITICAL Puppet has 1 failures
[06:31:17] <icinga-wm>	 PROBLEM - puppet last run on lvs1003 is CRITICAL Puppet has 1 failures
[06:31:36] <icinga-wm>	 PROBLEM - puppet last run on pybal-test2002 is CRITICAL puppet fail
[06:31:47] <icinga-wm>	 PROBLEM - puppet last run on db2055 is CRITICAL Puppet has 1 failures
[06:32:46] <icinga-wm>	 PROBLEM - puppet last run on mw2145 is CRITICAL Puppet has 1 failures
[06:32:46] <icinga-wm>	 PROBLEM - puppet last run on db1045 is CRITICAL Puppet has 1 failures
[06:33:06] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL Puppet has 1 failures
[06:33:25] <icinga-wm>	 PROBLEM - puppet last run on mw2018 is CRITICAL Puppet has 1 failures
[06:33:26] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL Puppet has 1 failures
[06:33:55] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL Puppet has 1 failures
[06:33:56] <icinga-wm>	 PROBLEM - puppet last run on mw2129 is CRITICAL Puppet has 1 failures
[06:37:40] <bblack>	 puppet o'clock!
[06:53:44] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1526679 (10BBlack) Probably the most important question (since I haven't really looked at UrlShortener) is: are there subdomains involved, or just `...
[06:55:26] <icinga-wm>	 RECOVERY - puppet last run on lvs1003 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures
[06:56:55] <icinga-wm>	 RECOVERY - puppet last run on mw2145 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures
[06:56:56] <icinga-wm>	 RECOVERY - puppet last run on db1045 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:56] <icinga-wm>	 RECOVERY - puppet last run on mc2015 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures
[06:57:16] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures
[06:57:35] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures
[06:57:35] <icinga-wm>	 RECOVERY - puppet last run on mw2018 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on db2055 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:05] <icinga-wm>	 RECOVERY - puppet last run on mw2129 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:59:56] <icinga-wm>	 RECOVERY - puppet last run on pybal-test2002 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[07:06:19] <grrrit-wm>	 (03PS1) 10BBlack: no varnish::instance uses "backends" directly anymore [puppet] - 10https://gerrit.wikimedia.org/r/230733 
[07:10:27] <wikibugs>	 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 2 others: Apertium leaves a ton of stale processes, consumes all the available - https://phabricator.wikimedia.org/T107270#1526686 (10Arrbee) a:3KartikMistry
[07:15:34] <wikibugs>	 6operations, 6Analytics-Backlog, 10Analytics-EventLogging, 10Traffic: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1526691 (10Tgr) So why don't we just use POST? `sendBeacon` actually does that, we just abuse it cur...
[07:31:52] <wikibugs>	 6operations, 6Analytics-Backlog, 10Analytics-EventLogging, 10Traffic: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1526718 (10BBlack) Probably because beacon is used with the analytics pipeline rather than the appse...
[07:51:22] <grrrit-wm>	 (03PS1) 10ArielGlenn: add iridum to dumps rsync clients [puppet] - 10https://gerrit.wikimedia.org/r/230734 
[07:53:56] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] add iridum to dumps rsync clients [puppet] - 10https://gerrit.wikimedia.org/r/230734 (owner: 10ArielGlenn)
[08:30:37] <wikibugs>	 6operations, 10MediaWiki-General-or-Unknown: Stop a poolcounter server fail from being a SPOF for the service and the api (and the site) - https://phabricator.wikimedia.org/T105378#1526743 (10fgiunchedi) a:5fgiunchedi>3Joe @joe has kindly agreed to investigate this, he's been already bouncing ideas with @t...
[08:31:16] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 031] Change test for log_type to a list [software] - 10https://gerrit.wikimedia.org/r/230645 (owner: 10coren)
[08:43:59] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "thanks Bryan for the explanation!" [puppet] - 10https://gerrit.wikimedia.org/r/230233 (https://phabricator.wikimedia.org/T100735) (owner: 10BryanDavis)
[08:45:01] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: "also to clarify, since logstash pushes directly to statsd we can avoid per-host stats for now since the host where it comes from isn't int" [puppet] - 10https://gerrit.wikimedia.org/r/230233 (https://phabricator.wikimedia.org/T100735) (owner: 10BryanDavis)
[08:47:53] <grrrit-wm>	 (03PS2) 10ArielGlenn: Add Chris Steipp to analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/230142 (https://phabricator.wikimedia.org/T108227) (owner: 10Andrew Bogott)
[08:49:13] <wikibugs>	 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to stat1002 for csteipp - https://phabricator.wikimedia.org/T108227#1526758 (10ArielGlenn) updated patchset. I miscounted days after manager approval so I guess it's tomorrow that this can go out.
[08:49:43] <Shanmugamp7>	 Anyone from ops around?
[08:50:37] <wikibugs>	 6operations, 10Continuous-Integration-Infrastructure, 6Multimedia, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1526759 (10fgiunchedi) >>! In T103335#1524694, @brion wrote: > Sample command line for VP9->ogv co...
[08:51:45] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1526761 (10ArielGlenn) @BBlack, is this something I can hand to you?
[08:51:55] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "2 linting comments inline, also move this to the role, not in the module. Other services might want to use the same module and configure f" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/227960 (https://phabricator.wikimedia.org/T104964) (owner: 10Muehlenhoff)
[08:53:51] <wikibugs>	 7Blocked-on-Operations, 6operations, 5Continuous-Integration-Isolation: Backport python-os-client-config 1.3.0-1 from Debian Sid to jessie-wikimedia - https://phabricator.wikimedia.org/T104967#1526764 (10ArielGlenn) @hashar can you clarify?
[08:56:50] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: ganeti: move role from manifests/ into the role module [puppet] - 10https://gerrit.wikimedia.org/r/230735 
[08:56:56] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] etherpad: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230686 (owner: 10Dzahn)
[08:57:37] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "Just noting btw, that thanks to mod_access_compat (enabled by default?), etherpad is already on jessie. LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/230686 (owner: 10Dzahn)
[08:59:00] <_joe_>	 akosiaris: we should really get to the point where we disable mod_access_compat btw
[08:59:03] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "Looks fine to me, there is though a syntax error (missing comma)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/230601 (owner: 10Chad)
[08:59:08] <_joe_>	 I'm sure it's a perf penalty
[08:59:50] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] access: stat1002 access for tgr [puppet] - 10https://gerrit.wikimedia.org/r/230510 (owner: 10Matanya)
[08:59:54] <grrrit-wm>	 (03PS3) 10Alexandros Kosiaris: access: stat1002 access for tgr [puppet] - 10https://gerrit.wikimedia.org/r/230510 (owner: 10Matanya)
[09:00:19] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] access: stat1002 access for tgr [puppet] - 10https://gerrit.wikimedia.org/r/230510 (owner: 10Matanya)
[09:00:35] <akosiaris>	 _joe_: yup, obviously
[09:00:42] <akosiaris>	 I just never noticed it on etherpad
[09:00:52] <akosiaris>	 also etherpad needs mpm_event 
[09:01:02] <akosiaris>	 needs is an overstatement
[09:01:11] <akosiaris>	 but it would be nice to try it
[09:01:21] <_joe_>	 akosiaris: let's do it then
[09:01:32] <_joe_>	 why event and not worker, btw?
[09:02:23] <akosiaris>	 I like event more ? 
[09:02:33] <akosiaris>	 no seriously both are obviously better than prefork
[09:02:58] <akosiaris>	 but event is webscale!!!
[09:03:56] <icinga-wm>	 RECOVERY - puppet last run on ms-be2009 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures
[09:04:16] <akosiaris>	 seriously, for etherpad there shouldn't be a major difference between the 2
[09:05:04] <_joe_>	 when you say "$x is webscale" it goes with more exclamation marks and at least one "1"
[09:05:09] <akosiaris>	 but the way etherpad clients (the javascript) works event might actually be just slightly better off
[09:05:39] <akosiaris>	 occasional requests and the like
[09:05:47] <akosiaris>	 _joe_: oh yes, you are right
[09:05:54] <akosiaris>	 but event is webscale!!!!!1111
[09:05:57] <akosiaris>	 better ? 
[09:05:59] <_joe_>	 yes
[09:09:15] <godog>	 !log reboot ms-be2009, cpu soft lockup
[09:09:20] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:16:40] <grrrit-wm>	 (03PS1) 10ArielGlenn: dataset: add redirect for fundraising data and link on web page [puppet] - 10https://gerrit.wikimedia.org/r/230738 
[09:18:47] <wikibugs>	 6operations, 10Wikimedia-Fundraising: Add /fundraising to dumps.wikimedia.org - https://phabricator.wikimedia.org/T42847#1526793 (10ArielGlenn) https://gerrit.wikimedia.org/r/#/c/230738/ for fundraising.  not clear what people want to happen with frdata.wm.o
[09:21:48] <wikibugs>	 10Ops-Access-Requests, 6operations, 5Patch-For-Review: stat1002 access for tgr - https://phabricator.wikimedia.org/T108417#1526795 (10ArielGlenn) 5Open>3Resolved this was merged a little ahead of time but no matter. closing.
[09:22:26] <icinga-wm>	 PROBLEM - OCG health on ocg1001 is CRITICAL ocg_job_status 482593 msg: ocg_render_job_queue 3423 msg (=3000 critical)
[09:22:55] <icinga-wm>	 PROBLEM - OCG health on ocg1003 is CRITICAL ocg_job_status 483239 msg: ocg_render_job_queue 3646 msg (=3000 critical)
[09:24:05] <icinga-wm>	 PROBLEM - OCG health on ocg1002 is CRITICAL ocg_job_status 485316 msg: ocg_render_job_queue 4731 msg (=3000 critical)
[09:33:30] <grrrit-wm>	 (03PS1) 10ArielGlenn: dumps mirrors rsync conf: remove/update useless comments [puppet] - 10https://gerrit.wikimedia.org/r/230739 
[09:34:08] <wikibugs>	 6operations, 10ops-codfw: ms-be2009 - RAID degraded / failed disk - https://phabricator.wikimedia.org/T107877#1526805 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi thanks @papaul!  I'm assuming this is a new disk, anyways after clearing the raid array and mounting the fs the machine crashed, upon reboot t...
[09:34:39] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] dumps mirrors rsync conf: remove/update useless comments [puppet] - 10https://gerrit.wikimedia.org/r/230739 (owner: 10ArielGlenn)
[09:44:21] <wikibugs>	 6operations, 10RESTBase-Cassandra: upgrade RESTBase cluster to Cassandra 2.1.8 - https://phabricator.wikimedia.org/T107949#1526823 (10fgiunchedi) upgrade plan, starting today: * upgrade row A machines, (restbase100[127]) with `sudo apt-get install cassandra` * check regressions, http://grafana.wikimedia.org/#/...
[09:45:26] <wikibugs>	 6operations, 10Citoid, 6Security, 6Security-Team, and 2 others: http://citoid.wikimedia.org/ should force HTTPS - https://phabricator.wikimedia.org/T108632#1526828 (10mobrovac) >>! In T108632#1526127, @BBlack wrote: > If you think we can flip the switch now, I'm all for it.  I gather from this that if we f...
[09:46:15] <wikibugs>	 6operations, 10RESTBase-Cassandra: upgrade RESTBase cluster to Cassandra 2.1.8 - https://phabricator.wikimedia.org/T107949#1526829 (10fgiunchedi) ``` root@carbon:~# reprepro --noskipold --restrict cassandra update aptmethod 'http' seems to have a obsoleted redirect handling which causes reprepro to request fil...
[09:50:43] <wikibugs>	 6operations, 10Datasets-General-or-Unknown: Find docs on dataset mirrors - https://phabricator.wikimedia.org/T107510#1526843 (10ArielGlenn) 5Open>3Resolved that file is dead and at this point wouldn't have useful information in it. if there are complaints about the mirrors, I know that the administrator of...
[09:52:36] <wikibugs>	 6operations: install / setup tungsten for temp use - wikimania 2015 video transcoding - https://phabricator.wikimedia.org/T106563#1526847 (10ArielGlenn) where are we on this?
[09:56:22] <icinga-wm>	 PROBLEM - Host google is DOWN: /bin/ping6 -n -U -w 15 -c 5 google.com
[09:56:35] <icinga-wm>	 RECOVERY - Host google is UPING OK - Packet loss = 0%, RTA = 8.67 ms
[09:56:50] <paravoid>	 !log switched routing-system autonomous-system to eqiad's subAS on cr1-eqiad/cr2--eqiad
[09:56:56] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:06:57] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: update collector version [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/230582 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans)
[10:06:58] <wikibugs>	 6operations, 10Beta-Cluster, 6Labs, 10Labs-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1526856 (10ArielGlenn) @greg  so what's the decision; there's also https://phabricator.wikimedia.org/T75919 and https://phabricator.wikimedia.or...
[10:08:26] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, I've updated the code review not to remove the old version because that's racy with puppet in https://gerrit.wikimedia.org/r/#/c/230" [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/230582 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans)
[10:17:24] <wikibugs>	 6operations, 6Discovery, 10MediaWiki-Search, 7Monitoring: Search service monitoring should fail if search results only return exact matches and suggestions don't work - https://phabricator.wikimedia.org/T101914#1526896 (10ArielGlenn) it looks like they want to check a request of the form e.g. http://en.wik...
[10:24:13] <wikibugs>	 6operations, 6Release-Engineering, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1526914 (10ArielGlenn) adding @jcrespo to this to...
[10:25:31] <godog>	 !log upgrade cassandra on restbase1001
[10:25:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:29:30] <wikibugs>	 6operations, 6Release-Engineering, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1526919 (10ArielGlenn) who might be able to take o...
[10:30:37] <wikibugs>	 6operations, 10hardware-requests: eqiad: 1 hardware access request for labs on real hardware - https://phabricator.wikimedia.org/T106731#1526926 (10ArielGlenn) @yuvipanda:  can you describe network expertise you need?
[10:31:25] <godog>	 !log upgrade cassandra on restbase1002
[10:31:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:34:17] <wikibugs>	 6operations, 6Services: SCA: Move logs to /srv/ - https://phabricator.wikimedia.org/T107900#1526928 (10ArielGlenn) do we still want to do this?
[10:35:02] <wikibugs>	 6operations, 6Release-Engineering, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1526931 (10jcrespo) @ArielGlenn I already talked t...
[10:35:56] <wikibugs>	 6operations, 10RESTBase-Cassandra: upgrade RESTBase cluster to Cassandra 2.1.8 - https://phabricator.wikimedia.org/T107949#1526933 (10fgiunchedi) cosmetic issue output contains `%s` spotted while looking at the logs, benign  ```  restbase1002:~$ grep %s /var/log/cassandra/system.log INFO  [MemtableFlushWriter:...
[10:38:15] <andre__>	 Forwarding stuff from #wikimedia-tech from the last 45 minutes:
[10:38:19] <andre__>	 <jeblad> ContentTranslation-servers seems dead, anything known about that?
[10:38:21] <andre__>	 <krd> Hello. Are there any issues? 208.80.152.0/22 seems to have dropped off the routing table.
[10:38:37] <godog>	 !log upgrade cassandra on restbase1007
[10:38:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:40:01] <godog>	 andre__: the latter issue is related to a recent network maint, should be fully recovered. no idea for the former
[10:44:44] <grrrit-wm>	 (03PS2) 10ArielGlenn: remove now obselete snapshot hosts sudoers file [puppet] - 10https://gerrit.wikimedia.org/r/230524 
[10:45:21] <jynus>	 !log general maintenance on db1042 (restart, upgrade, db reconstruction)
[10:45:26] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:45:49] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] remove now obselete snapshot hosts sudoers file [puppet] - 10https://gerrit.wikimedia.org/r/230524 (owner: 10ArielGlenn)
[10:47:56] <mafk>	 'm about performing a bigdelete at enwiki for a page with +10k revids
[10:48:00] <mafk>	 so you're aware
[10:49:59] <wikibugs>	 7Puppet, 6operations: Clean up files/snapshot/sudoers.snapshot - https://phabricator.wikimedia.org/T107479#1526984 (10ArielGlenn) 5Open>3Resolved it's gone.
[10:52:41] <wikibugs>	 6operations, 6Release-Engineering, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1526992 (10ArielGlenn) yeah I saw and already remo...
[11:09:22] <andre__>	 godog, ah, thanks. will forward that!
[11:24:55] <yurik>	 akosiaris, hi, do you know if repl is done?
[11:28:39] <wikibugs>	 6operations, 6Services: SCA: Move logs to /srv/ - https://phabricator.wikimedia.org/T107900#1527035 (10mobrovac) >>! In T107900#1508656, @GWicke wrote: > ... and [for good reasons](https://wikitech.wikimedia.org/wiki/Incident_documentation/20140211-Parsoid). I think it would be preferable to do the same for ot...
[11:46:25] <icinga-wm>	 RECOVERY - OCG health on ocg1003 is OK ocg_job_status 554308 msg: ocg_render_job_queue 479 msg
[11:46:50] <kart_>	 akosiaris: can you look at apertium-apy service?
[11:47:25] <icinga-wm>	 RECOVERY - OCG health on ocg1002 is OK ocg_job_status 554359 msg: ocg_render_job_queue 0 msg
[11:47:47] <icinga-wm>	 RECOVERY - OCG health on ocg1001 is OK ocg_job_status 554393 msg: ocg_render_job_queue 0 msg
[11:49:05] <akosiaris>	 kart_: not sure what you mean. look at what ?
[11:49:18] <akosiaris>	 yurik: no it's not done yet
[11:49:31] <kart_>	 akosiaris: anything with it? (ie sca has some issues?)
[11:50:33] <akosiaris>	 kart_: not that I know of 
[11:51:20] <kart_>	 akosiaris: okay
[11:51:50] <kart_>	 akosiaris: It will be great if I can have access to /var/log/apertium till we move to service-runner :)
[11:52:36] <wikibugs>	 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 2 others: Apertium leaves a ton of stale processes, consumes all the available - https://phabricator.wikimedia.org/T107270#1527098 (10Unhammer) I've now implemented the above mentioned option -...
[11:53:16] <akosiaris>	 kart_: wanna file a task ?
[11:58:37] <wikibugs>	 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1527104 (10mark) Looking at the current quotes in the spreadsheet, I think it seems best to move forward with quote 712030866 for 3 instances. We could order with an extra drive carrier (...
[12:03:52] <kart_>	 akosiaris: sure.
[12:05:53] <wikibugs>	 6operations: Access to /var/log/apertium for Kartik - https://phabricator.wikimedia.org/T108678#1527114 (10KartikMistry) 3NEW a:3akosiaris
[12:06:02] <kart_>	 akosiaris: ^
[12:12:02] <grrrit-wm>	 (03PS2) 10BBlack: no varnish::instance uses "backends" directly anymore [puppet] - 10https://gerrit.wikimedia.org/r/230733 
[12:12:57] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] no varnish::instance uses "backends" directly anymore [puppet] - 10https://gerrit.wikimedia.org/r/230733 (owner: 10BBlack)
[12:18:46] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: Allocate neighbor block for cr2-ulsfo<->cr1-codfw [dns] - 10https://gerrit.wikimedia.org/r/230764 
[12:19:18] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Allocate neighbor block for cr2-ulsfo<->cr1-codfw [dns] - 10https://gerrit.wikimedia.org/r/230764 (owner: 10Faidon Liambotis)
[12:19:38] <grrrit-wm>	 (03PS5) 10Faidon Liambotis: Repurpose s/cr2-eqiad/cr1-eqord/ for link with codfw [dns] - 10https://gerrit.wikimedia.org/r/220811 
[12:19:42] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Repurpose s/cr2-eqiad/cr1-eqord/ for link with codfw [dns] - 10https://gerrit.wikimedia.org/r/220811 (owner: 10Faidon Liambotis)
[12:30:51] <grrrit-wm>	 (03PS8) 10Alexandros Kosiaris: Added tilerator service, granted kartotherian OSM DB read access [puppet] - 10https://gerrit.wikimedia.org/r/229727 (https://phabricator.wikimedia.org/T105074) (owner: 10Yurik)
[12:34:08] <grrrit-wm>	 (03CR) 10Yurik: Added tilerator service, granted kartotherian OSM DB read access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/229727 (https://phabricator.wikimedia.org/T105074) (owner: 10Yurik)
[12:35:39] <wikibugs>	 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: HTTPS for internal service traffic - https://phabricator.wikimedia.org/T108580#1527177 (10BBlack)
[12:43:54] <wikibugs>	 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1527183 (10fgiunchedi) >>! In T95253#1524978, @GWicke wrote: > We talked about this at the last hardware planning meeting. The consensus was to keep things simple for...
[12:59:19] <wikibugs>	 6operations: codfw misc cluster ganglia not working - https://phabricator.wikimedia.org/T108680#1527201 (10BBlack) 3NEW
[13:04:13] <grrrit-wm>	 (03CR) 10coren: [C: 032] Change test for log_type to a list [software] - 10https://gerrit.wikimedia.org/r/230645 (owner: 10coren)
[13:04:25] <grrrit-wm>	 (03CR) 10coren: [V: 032] Change test for log_type to a list [software] - 10https://gerrit.wikimedia.org/r/230645 (owner: 10coren)
[13:09:41] <grrrit-wm>	 (03PS1) 10ArielGlenn: dumps: get rid of one more eval.php call, correct usage message [puppet] - 10https://gerrit.wikimedia.org/r/230767 
[13:10:44] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] dumps: get rid of one more eval.php call, correct usage message [puppet] - 10https://gerrit.wikimedia.org/r/230767 (owner: 10ArielGlenn)
[13:11:33] <wikibugs>	 6operations, 10Traffic, 7HTTPS: Getting ssl_error_inappropriate_fallback_alert very rarely - https://phabricator.wikimedia.org/T108579#1527216 (10DaBPunkt) >>! In T108579#1524034, @BBlack wrote: > @dabpunkt can you provide details on the client software (browser version, OS version, etc?) and any local softw...
[13:20:13] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: ganeti: assign cluster variable [puppet] - 10https://gerrit.wikimedia.org/r/230768 
[13:21:56] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: Added tilerator service, granted kartotherian OSM DB read access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/229727 (https://phabricator.wikimedia.org/T105074) (owner: 10Yurik)
[13:33:05] <wikibugs>	 6operations, 10Traffic, 7HTTPS: Getting ssl_error_inappropriate_fallback_alert very rarely - https://phabricator.wikimedia.org/T108579#1527225 (10BBlack) Based on the actual error message, I don't think the issue is coming from our servers in any case.  There are various FF bug reports linked to this error t...
[13:42:02] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: ganeti: move role from manifests/ into the role module [puppet] - 10https://gerrit.wikimedia.org/r/230735 
[13:42:08] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] ganeti: move role from manifests/ into the role module [puppet] - 10https://gerrit.wikimedia.org/r/230735 (owner: 10Alexandros Kosiaris)
[13:42:13] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] ganeti: move role from manifests/ into the role module [puppet] - 10https://gerrit.wikimedia.org/r/230735 (owner: 10Alexandros Kosiaris)
[13:43:45] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: ganeti: assign cluster variable [puppet] - 10https://gerrit.wikimedia.org/r/230768 
[13:43:51] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ganeti: assign cluster variable [puppet] - 10https://gerrit.wikimedia.org/r/230768 (owner: 10Alexandros Kosiaris)
[13:45:29] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Revert "ganeti: move role from manifests/ into the role module" [puppet] - 10https://gerrit.wikimedia.org/r/230770 
[13:46:09] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Revert "ganeti: move role from manifests/ into the role module" [puppet] - 10https://gerrit.wikimedia.org/r/230770 (owner: 10Alexandros Kosiaris)
[13:46:52] <grrrit-wm>	 (03PS1) 10BBlack: define puppet ganglia stuff for caches @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/230771 
[13:47:36] <icinga-wm>	 PROBLEM - puppet last run on fluorine is CRITICAL puppet fail
[13:47:36] <icinga-wm>	 PROBLEM - puppet last run on cp2023 is CRITICAL puppet fail
[13:47:45] <icinga-wm>	 PROBLEM - puppet last run on wtp2020 is CRITICAL puppet fail
[13:47:45] <icinga-wm>	 PROBLEM - puppet last run on mw2188 is CRITICAL puppet fail
[13:47:45] <icinga-wm>	 PROBLEM - puppet last run on ganeti1002 is CRITICAL puppet fail
[13:47:46] <icinga-wm>	 PROBLEM - puppet last run on ms-be1006 is CRITICAL puppet fail
[13:47:46] <icinga-wm>	 PROBLEM - puppet last run on mw2114 is CRITICAL puppet fail
[13:47:46] <icinga-wm>	 PROBLEM - puppet last run on copper is CRITICAL puppet fail
[13:47:55] <icinga-wm>	 PROBLEM - puppet last run on mc2016 is CRITICAL puppet fail
[13:47:56] <icinga-wm>	 PROBLEM - puppet last run on mw1010 is CRITICAL puppet fail
[13:47:56] <icinga-wm>	 PROBLEM - puppet last run on mw2075 is CRITICAL puppet fail
[13:47:56] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2004 is CRITICAL puppet fail
[13:47:56] <icinga-wm>	 PROBLEM - puppet last run on ms-be2004 is CRITICAL puppet fail
[13:47:57] <icinga-wm>	 PROBLEM - puppet last run on mw1066 is CRITICAL puppet fail
[13:48:05] <bblack>	 ESC[1;31mError: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::salt::minions for cp2023.codfw.wmnet on node cp2023.codfw.wmnet
[13:48:06] <icinga-wm>	 PROBLEM - puppet last run on cp3010 is CRITICAL puppet fail
[13:48:06] <icinga-wm>	 PROBLEM - puppet last run on lvs1002 is CRITICAL puppet fail
[13:48:06] <icinga-wm>	 PROBLEM - puppet last run on lvs1005 is CRITICAL puppet fail
[13:48:16] <icinga-wm>	 PROBLEM - puppet last run on calcium is CRITICAL puppet fail
[13:48:17] <icinga-wm>	 PROBLEM - puppet last run on mw2105 is CRITICAL puppet fail
[13:48:17] <icinga-wm>	 PROBLEM - puppet last run on achernar is CRITICAL puppet fail
[13:48:17] <icinga-wm>	 PROBLEM - puppet last run on analytics1003 is CRITICAL puppet fail
[13:48:17] <icinga-wm>	 PROBLEM - puppet last run on pybal-test2001 is CRITICAL puppet fail
[13:48:25] <icinga-wm>	 PROBLEM - puppet last run on helium is CRITICAL puppet fail
[13:48:26] <icinga-wm>	 PROBLEM - puppet last run on wtp2001 is CRITICAL puppet fail
[13:48:26] <icinga-wm>	 PROBLEM - puppet last run on db2039 is CRITICAL puppet fail
[13:48:26] <icinga-wm>	 PROBLEM - puppet last run on mw2109 is CRITICAL puppet fail
[13:48:26] <icinga-wm>	 PROBLEM - puppet last run on bast1001 is CRITICAL puppet fail
[13:48:26] <icinga-wm>	 PROBLEM - puppet last run on mw2015 is CRITICAL puppet fail
[13:48:26] <icinga-wm>	 PROBLEM - puppet last run on mw2004 is CRITICAL puppet fail
[13:48:27] <icinga-wm>	 PROBLEM - puppet last run on ms-be2006 is CRITICAL puppet fail
[13:48:27] <icinga-wm>	 PROBLEM - puppet last run on ganeti2004 is CRITICAL puppet fail
[13:48:28] <icinga-wm>	 PROBLEM - puppet last run on ganeti2001 is CRITICAL puppet fail
[13:48:36] <icinga-wm>	 PROBLEM - puppet last run on cp3040 is CRITICAL puppet fail
[13:48:37] <icinga-wm>	 PROBLEM - puppet last run on db1050 is CRITICAL puppet fail
[13:48:37] <icinga-wm>	 PROBLEM - puppet last run on mw1046 is CRITICAL puppet fail
[13:48:45] <icinga-wm>	 PROBLEM - puppet last run on cp4002 is CRITICAL puppet fail
[13:48:45] <icinga-wm>	 PROBLEM - puppet last run on cp3021 is CRITICAL puppet fail
[13:48:45] <icinga-wm>	 PROBLEM - puppet last run on mw1069 is CRITICAL puppet fail
[13:48:46] <icinga-wm>	 PROBLEM - puppet last run on db1031 is CRITICAL puppet fail
[13:48:46] <icinga-wm>	 PROBLEM - puppet last run on db1033 is CRITICAL puppet fail
[13:48:46] <icinga-wm>	 PROBLEM - puppet last run on iodine is CRITICAL puppet fail
[13:48:46] <icinga-wm>	 PROBLEM - puppet last run on mw1173 is CRITICAL puppet fail
[13:48:47] <icinga-wm>	 PROBLEM - puppet last run on elastic1004 is CRITICAL puppet fail
[13:48:47] <icinga-wm>	 PROBLEM - puppet last run on cp3047 is CRITICAL puppet fail
[13:48:55] <icinga-wm>	 PROBLEM - puppet last run on einsteinium is CRITICAL puppet fail
[13:48:56] <icinga-wm>	 PROBLEM - puppet last run on mw1091 is CRITICAL puppet fail
[13:48:56] <icinga-wm>	 PROBLEM - puppet last run on mw1153 is CRITICAL puppet fail
[13:49:05] <icinga-wm>	 PROBLEM - puppet last run on lvs2004 is CRITICAL puppet fail
[13:49:05] <icinga-wm>	 PROBLEM - puppet last run on cp2011 is CRITICAL puppet fail
[13:49:05] <icinga-wm>	 PROBLEM - puppet last run on mw1068 is CRITICAL puppet fail
[13:49:06] <icinga-wm>	 PROBLEM - puppet last run on mw1241 is CRITICAL puppet fail
[13:49:06] <icinga-wm>	 PROBLEM - puppet last run on db2059 is CRITICAL puppet fail
[13:49:06] <icinga-wm>	 PROBLEM - puppet last run on db1040 is CRITICAL puppet fail
[13:49:06] <icinga-wm>	 PROBLEM - puppet last run on mw2087 is CRITICAL puppet fail
[13:49:07] <icinga-wm>	 PROBLEM - puppet last run on mw1027 is CRITICAL puppet fail
[13:49:07] <icinga-wm>	 PROBLEM - puppet last run on mw1235 is CRITICAL puppet fail
[13:49:08] <icinga-wm>	 PROBLEM - puppet last run on mw1021 is CRITICAL puppet fail
[13:49:08] <icinga-wm>	 PROBLEM - puppet last run on mw1205 is CRITICAL puppet fail
[13:49:15] <icinga-wm>	 PROBLEM - puppet last run on mw2123 is CRITICAL puppet fail
[13:49:15] <icinga-wm>	 PROBLEM - puppet last run on mw2117 is CRITICAL puppet fail
[13:49:15] <icinga-wm>	 PROBLEM - puppet last run on mw1143 is CRITICAL puppet fail
[13:49:16] <icinga-wm>	 PROBLEM - puppet last run on mw1150 is CRITICAL puppet fail
[13:49:16] <icinga-wm>	 PROBLEM - puppet last run on mw2212 is CRITICAL puppet fail
[13:49:16] <icinga-wm>	 PROBLEM - puppet last run on db2045 is CRITICAL puppet fail
[13:49:16] <icinga-wm>	 PROBLEM - puppet last run on elastic1018 is CRITICAL puppet fail
[13:49:17] <icinga-wm>	 PROBLEM - puppet last run on mw2113 is CRITICAL puppet fail
[13:49:17] <icinga-wm>	 PROBLEM - puppet last run on mw1025 is CRITICAL puppet fail
[13:49:18] <icinga-wm>	 PROBLEM - puppet last run on lvs1004 is CRITICAL puppet fail
[13:49:18] <icinga-wm>	 PROBLEM - puppet last run on ms-be1015 is CRITICAL puppet fail
[13:49:19] <icinga-wm>	 PROBLEM - puppet last run on mw2019 is CRITICAL puppet fail
[13:49:19] <icinga-wm>	 PROBLEM - puppet last run on db1022 is CRITICAL puppet fail
[13:49:25] <icinga-wm>	 PROBLEM - puppet last run on db1066 is CRITICAL puppet fail
[13:49:26] <icinga-wm>	 PROBLEM - puppet last run on lvs3003 is CRITICAL puppet fail
[13:49:26] <icinga-wm>	 PROBLEM - puppet last run on cp3042 is CRITICAL puppet fail
[13:49:26] <icinga-wm>	 PROBLEM - puppet last run on cp3018 is CRITICAL puppet fail
[13:49:26] <icinga-wm>	 PROBLEM - puppet last run on labcontrol1001 is CRITICAL puppet fail
[13:49:36] <icinga-wm>	 PROBLEM - puppet last run on mw1189 is CRITICAL puppet fail
[13:49:36] <icinga-wm>	 PROBLEM - puppet last run on analytics1035 is CRITICAL puppet fail
[13:49:46] <icinga-wm>	 PROBLEM - puppet last run on mw2134 is CRITICAL puppet fail
[13:49:46] <icinga-wm>	 PROBLEM - puppet last run on mw2163 is CRITICAL puppet fail
[13:49:46] <icinga-wm>	 PROBLEM - puppet last run on mw2176 is CRITICAL puppet fail
[13:49:46] <icinga-wm>	 PROBLEM - puppet last run on mw2083 is CRITICAL puppet fail
[13:49:46] <icinga-wm>	 PROBLEM - puppet last run on mw2079 is CRITICAL puppet fail
[13:49:46] <icinga-wm>	 PROBLEM - puppet last run on mw1092 is CRITICAL puppet fail
[13:49:47] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1002 is CRITICAL puppet fail
[13:49:55] <icinga-wm>	 PROBLEM - puppet last run on mw1003 is CRITICAL puppet fail
[13:49:57] <icinga-wm>	 PROBLEM - puppet last run on db2047 is CRITICAL puppet fail
[13:49:57] <icinga-wm>	 PROBLEM - puppet last run on mw2070 is CRITICAL puppet fail
[13:49:57] <icinga-wm>	 PROBLEM - puppet last run on mw1166 is CRITICAL puppet fail
[13:49:57] <icinga-wm>	 PROBLEM - puppet last run on mw2030 is CRITICAL puppet fail
[13:49:57] <icinga-wm>	 PROBLEM - puppet last run on mw1213 is CRITICAL puppet fail
[13:49:57] <icinga-wm>	 PROBLEM - puppet last run on mw1107 is CRITICAL puppet fail
[13:49:57] <icinga-wm>	 PROBLEM - puppet last run on elastic1027 is CRITICAL puppet fail
[13:49:57] <icinga-wm>	 PROBLEM - puppet last run on mw2184 is CRITICAL puppet fail
[13:49:58] <icinga-wm>	 PROBLEM - puppet last run on wtp2016 is CRITICAL puppet fail
[13:49:58] <icinga-wm>	 PROBLEM - puppet last run on es2010 is CRITICAL puppet fail
[13:49:59] <icinga-wm>	 PROBLEM - puppet last run on mw2039 is CRITICAL puppet fail
[13:49:59] <icinga-wm>	 PROBLEM - puppet last run on cp2026 is CRITICAL puppet fail
[13:50:00] <icinga-wm>	 PROBLEM - puppet last run on db2054 is CRITICAL puppet fail
[13:50:05] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1001 is CRITICAL puppet fail
[13:50:06] <icinga-wm>	 PROBLEM - puppet last run on elastic1008 is CRITICAL puppet fail
[13:50:06] <icinga-wm>	 PROBLEM - puppet last run on analytics1030 is CRITICAL puppet fail
[13:50:07] <icinga-wm>	 PROBLEM - puppet last run on tmh1001 is CRITICAL puppet fail
[13:50:18] <icinga-wm>	 PROBLEM - puppet last run on cp2016 is CRITICAL puppet fail
[13:50:18] <icinga-wm>	 PROBLEM - puppet last run on mw2127 is CRITICAL puppet fail
[13:50:18] <icinga-wm>	 PROBLEM - puppet last run on mw1204 is CRITICAL puppet fail
[13:50:19] <icinga-wm>	 PROBLEM - puppet last run on mw1118 is CRITICAL puppet fail
[13:50:19] <icinga-wm>	 PROBLEM - puppet last run on db1034 is CRITICAL puppet fail
[13:50:19] <icinga-wm>	 PROBLEM - puppet last run on mw1155 is CRITICAL puppet fail
[13:50:19] <icinga-wm>	 PROBLEM - puppet last run on db1002 is CRITICAL puppet fail
[13:50:20] <icinga-wm>	 PROBLEM - puppet last run on db1021 is CRITICAL puppet fail
[13:50:21] <icinga-wm>	 PROBLEM - puppet last run on mw2082 is CRITICAL puppet fail
[13:50:21] <icinga-wm>	 PROBLEM - puppet last run on db2065 is CRITICAL puppet fail
[13:50:25] <icinga-wm>	 PROBLEM - puppet last run on cp2014 is CRITICAL puppet fail
[13:50:25] <icinga-wm>	 PROBLEM - puppet last run on stat1002 is CRITICAL Puppet last ran 6 hours ago
[13:50:26] <icinga-wm>	 PROBLEM - puppet last run on mw2196 is CRITICAL puppet fail
[13:50:26] <icinga-wm>	 PROBLEM - puppet last run on mw2143 is CRITICAL puppet fail
[13:50:26] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2003 is CRITICAL puppet fail
[13:50:26] <icinga-wm>	 PROBLEM - puppet last run on ruthenium is CRITICAL puppet fail
[13:50:26] <icinga-wm>	 PROBLEM - puppet last run on mw2084 is CRITICAL puppet fail
[13:50:27] <icinga-wm>	 PROBLEM - puppet last run on mw2131 is CRITICAL puppet fail
[13:50:27] <icinga-wm>	 PROBLEM - puppet last run on mw2093 is CRITICAL puppet fail
[13:50:45] <icinga-wm>	 PROBLEM - puppet last run on db1027 is CRITICAL puppet fail
[13:50:46] <icinga-wm>	 PROBLEM - puppet last run on mw1137 is CRITICAL puppet fail
[13:50:46] <icinga-wm>	 PROBLEM - puppet last run on cp4014 is CRITICAL puppet fail
[13:50:46] <icinga-wm>	 PROBLEM - puppet last run on mw1154 is CRITICAL puppet fail
[13:50:46] <icinga-wm>	 PROBLEM - puppet last run on mw1131 is CRITICAL puppet fail
[13:50:46] <icinga-wm>	 PROBLEM - puppet last run on etherpad1001 is CRITICAL puppet fail
[13:50:46] <icinga-wm>	 PROBLEM - puppet last run on mw1054 is CRITICAL puppet fail
[13:50:56] <icinga-wm>	 PROBLEM - puppet last run on mc1012 is CRITICAL puppet fail
[13:51:05] <icinga-wm>	 PROBLEM - puppet last run on mw1253 is CRITICAL puppet fail
[13:51:06] <icinga-wm>	 PROBLEM - puppet last run on cp2010 is CRITICAL puppet fail
[13:51:06] <icinga-wm>	 PROBLEM - puppet last run on mw2142 is CRITICAL puppet fail
[13:51:07] <icinga-wm>	 PROBLEM - puppet last run on analytics1031 is CRITICAL puppet fail
[13:51:15] <icinga-wm>	 PROBLEM - puppet last run on labsdb1005 is CRITICAL puppet fail
[13:51:15] <icinga-wm>	 PROBLEM - puppet last run on mw2182 is CRITICAL puppet fail
[13:51:15] <icinga-wm>	 PROBLEM - puppet last run on mw2110 is CRITICAL puppet fail
[13:51:15] <icinga-wm>	 PROBLEM - puppet last run on es1004 is CRITICAL puppet fail
[13:51:16] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1001 is CRITICAL puppet fail
[13:51:16] <icinga-wm>	 PROBLEM - puppet last run on mw1179 is CRITICAL puppet fail
[13:51:16] <icinga-wm>	 PROBLEM - puppet last run on db1051 is CRITICAL puppet fail
[13:51:17] <icinga-wm>	 PROBLEM - puppet last run on mw1047 is CRITICAL puppet fail
[13:51:17] <icinga-wm>	 PROBLEM - puppet last run on mw1129 is CRITICAL puppet fail
[13:51:18] <icinga-wm>	 PROBLEM - puppet last run on logstash1002 is CRITICAL puppet fail
[13:51:18] <icinga-wm>	 PROBLEM - puppet last run on mw1211 is CRITICAL puppet fail
[13:51:19] <icinga-wm>	 PROBLEM - puppet last run on mw2055 is CRITICAL puppet fail
[13:51:26] <icinga-wm>	 PROBLEM - puppet last run on wtp2010 is CRITICAL puppet fail
[13:51:26] <icinga-wm>	 PROBLEM - puppet last run on mw2090 is CRITICAL puppet fail
[13:51:26] <icinga-wm>	 PROBLEM - puppet last run on mw2096 is CRITICAL puppet fail
[13:51:26] <icinga-wm>	 PROBLEM - puppet last run on mw2130 is CRITICAL puppet fail
[13:51:26] <icinga-wm>	 PROBLEM - puppet last run on mw1194 is CRITICAL puppet fail
[13:51:26] <icinga-wm>	 PROBLEM - puppet last run on mw2049 is CRITICAL puppet fail
[13:51:26] <icinga-wm>	 PROBLEM - puppet last run on cp1071 is CRITICAL puppet fail
[13:51:27] <icinga-wm>	 PROBLEM - puppet last run on mw1020 is CRITICAL puppet fail
[13:51:35] <icinga-wm>	 PROBLEM - puppet last run on mw1075 is CRITICAL puppet fail
[13:51:46] <icinga-wm>	 RECOVERY - puppet last run on cp2023 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures
[13:51:46] <icinga-wm>	 PROBLEM - puppet last run on cp2003 is CRITICAL puppet fail
[13:51:46] <icinga-wm>	 PROBLEM - puppet last run on mw1011 is CRITICAL puppet fail
[13:51:47] <icinga-wm>	 PROBLEM - puppet last run on palladium is CRITICAL puppet fail
[13:51:47] <icinga-wm>	 PROBLEM - puppet last run on mw1128 is CRITICAL puppet fail
[13:51:55] <icinga-wm>	 PROBLEM - puppet last run on cp1066 is CRITICAL puppet fail
[13:51:55] <icinga-wm>	 PROBLEM - puppet last run on db1005 is CRITICAL puppet fail
[13:51:56] <icinga-wm>	 PROBLEM - puppet last run on db1068 is CRITICAL puppet fail
[13:51:57] <icinga-wm>	 PROBLEM - puppet last run on mw2056 is CRITICAL puppet fail
[13:51:57] <icinga-wm>	 PROBLEM - puppet last run on mw2168 is CRITICAL puppet fail
[13:52:06] <icinga-wm>	 PROBLEM - puppet last run on mw2092 is CRITICAL puppet fail
[13:52:06] <icinga-wm>	 PROBLEM - puppet last run on mw2047 is CRITICAL puppet fail
[13:52:07] <icinga-wm>	 PROBLEM - puppet last run on cp4015 is CRITICAL puppet fail
[13:52:17] <icinga-wm>	 PROBLEM - puppet last run on mw1208 is CRITICAL puppet fail
[13:52:22] <jynus>	 bblack, so commit related?
[13:52:26] <icinga-wm>	 PROBLEM - puppet last run on polonium is CRITICAL puppet fail
[13:52:35] <icinga-wm>	 PROBLEM - puppet last run on db2007 is CRITICAL puppet fail
[13:52:36] <icinga-wm>	 PROBLEM - puppet last run on cp1058 is CRITICAL puppet fail
[13:52:55] <icinga-wm>	 PROBLEM - puppet last run on db1016 is CRITICAL puppet fail
[13:53:28] <bblack>	 jynus: akosiaris reverted the offending commit (although I still can't understand how it caused this).  I've re-run one node manually and it was fixed.
[13:53:45] <bblack>	 it's just going to take a while for them all to re-run and recover I think
[13:54:02] <akosiaris>	 and I have no idea why that commit failed
[13:54:06] <jynus>	 perfect, I do not care about the error, but do not want to duplicate work
[13:54:41] <akosiaris>	 or that it was global
[13:54:45] <akosiaris>	 weird...
[13:55:00] <bblack>	 yeah it's somehow interrelated with hieradata's lookup of the $cluster variable
[13:55:03] <bblack>	 I think
[13:55:13] <akosiaris>	 I hope not
[13:55:18] <akosiaris>	 cause if it is ... 
[13:55:26] <bblack>	 that still doesn't completely explain it, but I think it's related
[13:55:33] <akosiaris>	 https://gerrit.wikimedia.org/r/#/c/220085/
[13:55:40] <akosiaris>	 this also exhibits the same exact problem
[13:55:46] <akosiaris>	 which is why I haven't merged it
[13:55:58] <akosiaris>	 probably the role/manifests/etherpad.pp ?
[13:56:13] <akosiaris>	 cause the others already there are under various directories
[13:56:28] <akosiaris>	 but where should I put the file then ? and how should I name it ?
[13:56:35] <jynus>	 akosiaris, indeed it is strange
[13:56:50] <bblack>	 I think that's just a catalyst, not the real problem
[13:57:14] <bblack>	 possibly the custom role.rb thing is interrelated with this somehow too
[13:57:25] <akosiaris>	 and my poor catalog compiler is broken again so I can't test
[13:57:32] <akosiaris>	 bblack: hmm could be
[13:58:36] <bblack>	 but also, ganeti is one of the few that set "cluster:" in hieradata per-DC instead of in common/
[13:58:51] <bblack>	 by that I mean:
[13:58:51] <bblack>	 hieradata/role/common/wdqs.yaml:cluster: wdqs
[13:58:51] <bblack>	 hieradata/role/eqiad/ganeti.yaml:cluster: ganeti
[13:59:00] <bblack>	 that could be some factor in this somehow too
[14:00:16] <akosiaris>	 bblack: no, that showed up before I merge the cluster change
[14:00:20] <akosiaris>	 it's unrelated
[14:02:20] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Revert "Revert "ganeti: move role from manifests/ into the role module"" [puppet] - 10https://gerrit.wikimedia.org/r/230773 
[14:07:58] <bblack>	 is it possible it's temporary and would've fixed itself later regardless? as in, some kind of FS sync issue where a catalog compilation sees both files missing or both files existing temporarily when the role moves paths, and that confuses/kills puppet role lookup for a short window?
[14:08:15] <bblack>	 similar to what we see when we move a template or file path, race condition for a bit and then it cleans up
[14:10:56] <wikibugs>	 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1527327 (10mobrovac) >>! In T95253#1524978, @GWicke wrote: > The exact startup solution remains tbd, but one option might be to leverage http://0pointer.de/blog/projec...
[14:13:02] <icinga-wm>	 RECOVERY - puppet last run on db1031 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures
[14:13:42] <icinga-wm>	 RECOVERY - puppet last run on wtp2020 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures
[14:13:52] <icinga-wm>	 RECOVERY - puppet last run on calcium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:14:03] <icinga-wm>	 RECOVERY - puppet last run on achernar is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures
[14:14:03] <grrrit-wm>	 (03PS12) 10Giuseppe Lavagetto: puppet-compiler: first commit [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/228849 (https://phabricator.wikimedia.org/T96802) 
[14:14:11] <icinga-wm>	 RECOVERY - puppet last run on iodine is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:14:11] <icinga-wm>	 RECOVERY - puppet last run on ganeti2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:14:12] <icinga-wm>	 RECOVERY - puppet last run on db1033 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures
[14:14:12] <icinga-wm>	 RECOVERY - puppet last run on elastic1004 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures
[14:14:22] <icinga-wm>	 RECOVERY - puppet last run on ms-be2006 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures
[14:14:22] <icinga-wm>	 RECOVERY - puppet last run on helium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:14:31] <icinga-wm>	 RECOVERY - puppet last run on db2039 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures
[14:14:32] <icinga-wm>	 RECOVERY - puppet last run on mw2015 is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures
[14:14:32] <icinga-wm>	 RECOVERY - puppet last run on db1050 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures
[14:14:32] <icinga-wm>	 RECOVERY - puppet last run on db2059 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures
[14:14:42] <icinga-wm>	 RECOVERY - puppet last run on ms-be1006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:14:51] <icinga-wm>	 RECOVERY - puppet last run on copper is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[14:14:52] <icinga-wm>	 RECOVERY - puppet last run on mw1091 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures
[14:15:02] <icinga-wm>	 RECOVERY - puppet last run on mc2016 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures
[14:15:02] <icinga-wm>	 RECOVERY - puppet last run on cp3040 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures
[14:15:02] <icinga-wm>	 RECOVERY - puppet last run on cp2026 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures
[14:15:02] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:03] <icinga-wm>	 RECOVERY - puppet last run on ganeti1002 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures
[14:15:03] <icinga-wm>	 RECOVERY - puppet last run on cp3018 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures
[14:15:03] <icinga-wm>	 RECOVERY - puppet last run on mw2075 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures
[14:15:12] <icinga-wm>	 RECOVERY - puppet last run on mw1066 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures
[14:15:12] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1001 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures
[14:15:22] <icinga-wm>	 RECOVERY - puppet last run on db1040 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures
[14:15:30] <_joe_>	 what happened?
[14:15:31] <icinga-wm>	 RECOVERY - puppet last run on mw2188 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures
[14:15:31] <icinga-wm>	 RECOVERY - puppet last run on elastic1018 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures
[14:15:31] <icinga-wm>	 RECOVERY - puppet last run on elastic1008 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures
[14:15:31] <icinga-wm>	 RECOVERY - puppet last run on cp3047 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures
[14:15:32] <icinga-wm>	 RECOVERY - puppet last run on mw1173 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures
[14:15:33] <icinga-wm>	 RECOVERY - puppet last run on analytics1035 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[14:15:33] <icinga-wm>	 RECOVERY - puppet last run on labcontrol1001 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures
[14:15:39] <_joe_>	 I was not looking at the chat, sorry
[14:15:41] <icinga-wm>	 RECOVERY - puppet last run on fluorine is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:42] <icinga-wm>	 RECOVERY - puppet last run on cp3010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:42] <icinga-wm>	 RECOVERY - puppet last run on lvs1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:51] <icinga-wm>	 RECOVERY - puppet last run on db2045 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:51] <icinga-wm>	 RECOVERY - puppet last run on cp4002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:51] <icinga-wm>	 RECOVERY - puppet last run on lvs1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:52] <icinga-wm>	 RECOVERY - puppet last run on einsteinium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:52] <icinga-wm>	 RECOVERY - puppet last run on mw1189 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:15:53] <icinga-wm>	 RECOVERY - puppet last run on cp2016 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures
[14:15:53] <icinga-wm>	 RECOVERY - puppet last run on mw1253 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures
[14:16:01] <icinga-wm>	 RECOVERY - puppet last run on mw2004 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures
[14:16:02] <icinga-wm>	 RECOVERY - puppet last run on db1002 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures
[14:16:02] <icinga-wm>	 RECOVERY - puppet last run on mw1155 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures
[14:16:02] <icinga-wm>	 RECOVERY - puppet last run on analytics1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:03] <icinga-wm>	 RECOVERY - puppet last run on ganeti2004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:11] <icinga-wm>	 RECOVERY - puppet last run on mw2082 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures
[14:16:12] <icinga-wm>	 RECOVERY - puppet last run on mw2105 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:12] <icinga-wm>	 RECOVERY - puppet last run on mw2019 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures
[14:16:12] <icinga-wm>	 RECOVERY - puppet last run on mw1027 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures
[14:16:17] <bblack>	 _joe_: strange puppet bug with a commit, reverted
[14:16:21] <icinga-wm>	 RECOVERY - puppet last run on db1022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:21] <icinga-wm>	 RECOVERY - puppet last run on mw2087 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures
[14:16:21] <icinga-wm>	 RECOVERY - puppet last run on cp3021 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:21] <icinga-wm>	 RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures
[14:16:21] <icinga-wm>	 RECOVERY - puppet last run on cp2014 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures
[14:16:21] <icinga-wm>	 RECOVERY - puppet last run on mw1046 is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures
[14:16:22] <icinga-wm>	 RECOVERY - puppet last run on lvs3003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:22] <icinga-wm>	 RECOVERY - puppet last run on mw1153 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:23] <icinga-wm>	 RECOVERY - puppet last run on mw2117 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures
[14:16:31] <icinga-wm>	 RECOVERY - puppet last run on cp2011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:31] <icinga-wm>	 RECOVERY - puppet last run on ruthenium is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures
[14:16:31] <icinga-wm>	 RECOVERY - puppet last run on db1066 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:31] <icinga-wm>	 RECOVERY - puppet last run on bast1001 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures
[14:16:31] <icinga-wm>	 RECOVERY - puppet last run on wtp2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:32] <icinga-wm>	 RECOVERY - puppet last run on mw2093 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures
[14:16:32] <icinga-wm>	 RECOVERY - puppet last run on mw2109 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:33] <icinga-wm>	 RECOVERY - puppet last run on mw1241 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:16:33] <icinga-wm>	 RECOVERY - puppet last run on mw1205 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:34] <icinga-wm>	 RECOVERY - puppet last run on mw2003 is OK Puppet is currently enabled, last run 1 second ago with 0 failures
[14:16:34] <icinga-wm>	 RECOVERY - puppet last run on es1004 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures
[14:16:36] <_joe_>	 which commit specifically?
[14:16:39] <_joe_>	 I hate icinga
[14:16:41] <icinga-wm>	 RECOVERY - puppet last run on labsdb1005 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures
[14:16:41] <icinga-wm>	 RECOVERY - puppet last run on mw1150 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures
[14:16:42] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:52] <icinga-wm>	 RECOVERY - puppet last run on ms-be1015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:52] <icinga-wm>	 RECOVERY - puppet last run on mw1025 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures
[14:16:52] <icinga-wm>	 RECOVERY - puppet last run on db1068 is OK Puppet is currently enabled, last run 1 second ago with 0 failures
[14:16:52] <icinga-wm>	 RECOVERY - puppet last run on mw1092 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures
[14:16:53] <icinga-wm>	 RECOVERY - puppet last run on mw1069 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:53] <icinga-wm>	 RECOVERY - puppet last run on mw1166 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures
[14:16:53] <icinga-wm>	 RECOVERY - puppet last run on mw1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:16:54] <icinga-wm>	 RECOVERY - puppet last run on mw2114 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures
[14:17:01] <icinga-wm>	 RECOVERY - puppet last run on db2047 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures
[14:17:01] <icinga-wm>	 RECOVERY - puppet last run on mw1107 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:02] <icinga-wm>	 RECOVERY - puppet last run on elastic1027 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:02] <icinga-wm>	 RECOVERY - puppet last run on mw1010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:03] <icinga-wm>	 RECOVERY - puppet last run on etherpad1001 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures
[14:17:03] <icinga-wm>	 RECOVERY - puppet last run on db2054 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:04] <icinga-wm>	 RECOVERY - puppet last run on es2010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:04] <icinga-wm>	 RECOVERY - puppet last run on lvs1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:11] <icinga-wm>	 RECOVERY - puppet last run on ms-be2004 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:17:11] <icinga-wm>	 RECOVERY - puppet last run on wtp2016 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures
[14:17:11] <icinga-wm>	 RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures
[14:17:11] <icinga-wm>	 RECOVERY - puppet last run on mw1068 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:12] <icinga-wm>	 RECOVERY - puppet last run on mw1143 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:12] <icinga-wm>	 RECOVERY - puppet last run on palladium is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures
[14:17:18] <bblack>	 _joe_: https://gerrit.wikimedia.org/r/230735 , which lead to this on all nodes: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::salt::minions for cp2023.codfw.wmnet on node cp2023.codfw.wmnet
[14:17:21] <icinga-wm>	 RECOVERY - puppet last run on mw2083 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:21] <icinga-wm>	 RECOVERY - puppet last run on mw2176 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures
[14:17:22] <icinga-wm>	 RECOVERY - puppet last run on mw1235 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:31] <icinga-wm>	 RECOVERY - puppet last run on db1051 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:32] <icinga-wm>	 RECOVERY - puppet last run on wtp1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:32] <icinga-wm>	 RECOVERY - puppet last run on mw2212 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:32] <icinga-wm>	 RECOVERY - puppet last run on analytics1030 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:32] <icinga-wm>	 RECOVERY - puppet last run on lvs2004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:42] <icinga-wm>	 RECOVERY - puppet last run on mw1104 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:42] <icinga-wm>	 RECOVERY - puppet last run on mw1154 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:42] <icinga-wm>	 RECOVERY - puppet last run on tmh1001 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures
[14:17:48] <bblack>	 the commit in question moved an unrelated role class from manifests/role/foo to modules/role/manifests/
[14:17:52] <icinga-wm>	 RECOVERY - puppet last run on cp2010 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures
[14:17:52] <icinga-wm>	 RECOVERY - puppet last run on cp1066 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:52] <icinga-wm>	 RECOVERY - puppet last run on logstash1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:17:52] <icinga-wm>	 RECOVERY - puppet last run on mw1021 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures
[14:17:52] <icinga-wm>	 RECOVERY - puppet last run on db1005 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures
[14:17:53] <icinga-wm>	 RECOVERY - puppet last run on mw1131 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:01] <icinga-wm>	 RECOVERY - puppet last run on mw1047 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures
[14:18:01] <icinga-wm>	 RECOVERY - puppet last run on mw2079 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:02] <icinga-wm>	 RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures
[14:18:02] <icinga-wm>	 RECOVERY - puppet last run on mw2163 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:02] <icinga-wm>	 RECOVERY - puppet last run on mw2049 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures
[14:18:02] <icinga-wm>	 RECOVERY - puppet last run on mw1204 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:18:02] <icinga-wm>	 RECOVERY - puppet last run on mw1118 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:11] <icinga-wm>	 RECOVERY - puppet last run on db1034 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:11] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:11] <icinga-wm>	 RECOVERY - puppet last run on mw2127 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:12] <icinga-wm>	 RECOVERY - puppet last run on db1021 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:12] <icinga-wm>	 RECOVERY - puppet last run on labvirt1009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:12] <icinga-wm>	 RECOVERY - puppet last run on cp2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:12] <icinga-wm>	 RECOVERY - puppet last run on polonium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:13] <icinga-wm>	 RECOVERY - puppet last run on cp3042 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:21] <icinga-wm>	 RECOVERY - puppet last run on mw2123 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures
[14:18:21] <icinga-wm>	 RECOVERY - puppet last run on mw2113 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:21] <icinga-wm>	 RECOVERY - puppet last run on mw2142 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures
[14:18:21] <icinga-wm>	 RECOVERY - puppet last run on mw1054 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:22] <icinga-wm>	 RECOVERY - puppet last run on mw1075 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures
[14:18:22] <icinga-wm>	 RECOVERY - puppet last run on cp1071 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures
[14:18:22] <icinga-wm>	 RECOVERY - puppet last run on analytics1031 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:23] <icinga-wm>	 RECOVERY - puppet last run on db2065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:31] <icinga-wm>	 RECOVERY - puppet last run on mw1137 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:32] <icinga-wm>	 RECOVERY - puppet last run on wtp2010 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:32] <icinga-wm>	 RECOVERY - puppet last run on pybal-test2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:32] <icinga-wm>	 RECOVERY - puppet last run on mw2182 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures
[14:18:32] <icinga-wm>	 RECOVERY - puppet last run on mw2196 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:32] <icinga-wm>	 RECOVERY - puppet last run on cp1058 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:32] <icinga-wm>	 RECOVERY - puppet last run on mw2143 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures
[14:18:33] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:41] <icinga-wm>	 RECOVERY - puppet last run on mw1020 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures
[14:18:41] <icinga-wm>	 RECOVERY - puppet last run on mw2084 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:41] <icinga-wm>	 RECOVERY - puppet last run on mw2131 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures
[14:18:42] <icinga-wm>	 RECOVERY - puppet last run on db2007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:42] <icinga-wm>	 RECOVERY - puppet last run on mw1194 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures
[14:18:42] <icinga-wm>	 RECOVERY - puppet last run on mw2067 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:42] <icinga-wm>	 RECOVERY - puppet last run on mw2055 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures
[14:18:43] <icinga-wm>	 RECOVERY - puppet last run on mw1211 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:43] <icinga-wm>	 RECOVERY - puppet last run on mw2134 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures
[14:18:51] <icinga-wm>	 RECOVERY - puppet last run on mc1012 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:18:51] <icinga-wm>	 RECOVERY - puppet last run on mw1011 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures
[14:18:52] <icinga-wm>	 RECOVERY - puppet last run on db1027 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:01] <icinga-wm>	 RECOVERY - puppet last run on mw1129 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:01] <icinga-wm>	 RECOVERY - puppet last run on mw1213 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:11] <icinga-wm>	 RECOVERY - puppet last run on mw2070 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:19:12] <icinga-wm>	 RECOVERY - puppet last run on mw2030 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:12] <icinga-wm>	 RECOVERY - puppet last run on mw2056 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:12] <icinga-wm>	 RECOVERY - puppet last run on mw2168 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures
[14:19:12] <icinga-wm>	 RECOVERY - puppet last run on mw2110 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:12] <icinga-wm>	 RECOVERY - puppet last run on db1016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:12] <icinga-wm>	 RECOVERY - puppet last run on mw2092 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures
[14:19:13] <icinga-wm>	 RECOVERY - puppet last run on mw2039 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:13] <icinga-wm>	 RECOVERY - puppet last run on mw2047 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:21] <icinga-wm>	 RECOVERY - puppet last run on mw1179 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:21] <icinga-wm>	 RECOVERY - puppet last run on mw2090 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:22] <icinga-wm>	 RECOVERY - puppet last run on cp4015 is OK Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:19:32] <icinga-wm>	 RECOVERY - puppet last run on mw1128 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:19:32] <icinga-wm>	 RECOVERY - puppet last run on mw2130 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures
[14:20:02] <icinga-wm>	 RECOVERY - puppet last run on mw1208 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[14:20:11] <_joe_>	 bblack: it was the same on all nodes?
[14:20:19] <_joe_>	 because I think that might be a race condition
[14:20:32] <_joe_>	 you have "import(roles/*.pp)"
[14:20:39] <_joe_>	 in site.pp
[14:21:07] <bblack>	 yeah that was my last thought too
[14:21:14] <_joe_>	 meh
[14:21:16] <bblack>	 14:07 < bblack> is it possible it's temporary and would've fixed itself later regardless? as in, some kind of FS sync issue where a catalog compilation sees both files missing or both files existing temporarily when the role moves paths, and that confuses/kills puppet role lookup for a short window?
[14:21:21] <bblack>	 14:08 < bblack> similar to what we see when we move a template or file path, race condition for a bit and then it cleans up
[14:22:01] <_joe_>	 yeah seems plausible
[14:22:06] <_joe_>	 lemme check a random host
[14:25:04] <_joe_>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Role class role::mediawiki::appserver not found at /etc/puppet/manifests/site.pp:1925 on node mw2090.codfw.wmnet
[14:25:13] <_joe_>	 so yeah our hypothesis seems almost correct
[14:28:57] <_joe_>	 and I see just a few thread having that problem, it seems
[14:30:07] <_joe_>	 ok, I stand corrected - I fear we need to restart puppet every time we want to move away any of the role files from manifest/roles
[14:32:27] <_joe_>	 akosiaris: ^^
[14:34:30] <wikibugs>	 6operations, 10ops-codfw: ms-be2009 - RAID degraded / failed disk - https://phabricator.wikimedia.org/T107877#1527418 (10Papaul) @  fgiunchedi no it wasn't a new drive. I just pulled the drive out an plug it back in.
[14:35:31] <akosiaris>	 argh ?
[14:37:16] <wikibugs>	 6operations, 10RESTBase-Cassandra: upgrade RESTBase cluster to Cassandra 2.1.8 - https://phabricator.wikimedia.org/T107949#1527437 (10Eevans) >>! In T107949#1526823, @fgiunchedi wrote: > upgrade plan, starting today: > * upgrade row A machines, (restbase100[127]) with `nodetool flush && sudo apt-get -o Dpkg::O...
[14:37:41] <wikibugs>	 6operations, 10RESTBase-Cassandra: upgrade RESTBase cluster to Cassandra 2.1.8 - https://phabricator.wikimedia.org/T107949#1527439 (10Eevans) >>! In T107949#1526933, @fgiunchedi wrote: > cosmetic issue output contains `%s` spotted while looking at the logs, benign >  > ``` >  > restbase1002:~$ grep %s /var/log...
[14:38:56] <bblack>	 _joe_: really it's just another aspect of the same problem we see with the files/templates, etc.  Just worse.  there's no transactionality to the git filesystem updates on the master -> active threads running for clients, etc
[14:42:03] <wikibugs>	 6operations, 10RESTBase-Cassandra: upgrade RESTBase cluster to Cassandra 2.1.8 - https://phabricator.wikimedia.org/T107949#1527473 (10Eevans) >>! In T107949#1526823, @fgiunchedi wrote: > upgrade plan, starting today: > * upgrade row A machines, (restbase100[127]) with `nodetool flush && sudo apt-get -o Dpkg::O...
[14:43:57] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Introduce mobileapps.svc.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/230780 (https://phabricator.wikimedia.org/T105538) 
[14:50:48] <grrrit-wm>	 (03CR) 10Eevans: [C: 031] "> LGTM, I've updated the code review not to remove the old version because that's racy with puppet" [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/230582 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans)
[14:51:19] <valhallasw`cloud>	 Is there a bug open for the non-atomic updates? Should be solvable with some symlink magic
[14:55:35] <_joe_>	 bblack: i think it's actually worse
[14:55:49] <_joe_>	 bblack: imports are set in stone in the ruby process
[14:55:57] <wikibugs>	 6operations, 10ContentTranslation-Deployments, 3LE-CX6-Sprint 2: Access to /var/log/apertium for Kartik - https://phabricator.wikimedia.org/T108678#1527517 (10KartikMistry)
[14:55:59] <_joe_>	 I am pretty sure of that
[14:57:33] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: lvs: remove the old unused osm lvs_service [puppet] - 10https://gerrit.wikimedia.org/r/230783 
[15:00:04] <jouncebot>	 anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150811T1500). Please do the needful.
[15:06:20] <bd808>	 No SWAT changes this monring?
[15:07:24] <greg-g>	 bd808: don't jinx it
[15:07:43] <bd808>	 :) I've got something that can go
[15:08:13] <grrrit-wm>	 (03CR) 10BryanDavis: "> (ideally we'd load balance across hosts anyways)" [puppet] - 10https://gerrit.wikimedia.org/r/230233 (https://phabricator.wikimedia.org/T100735) (owner: 10BryanDavis)
[15:08:32] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] lvs: remove the old unused osm lvs_service [puppet] - 10https://gerrit.wikimedia.org/r/230783 (owner: 10Alexandros Kosiaris)
[15:09:42] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: postgres: Enable streaming replication [puppet] - 10https://gerrit.wikimedia.org/r/230785 
[15:10:16] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] logging: Only send info and higher to logstash by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230719 (owner: 10BryanDavis)
[15:10:24] <grrrit-wm>	 (03Merged) 10jenkins-bot: logging: Only send info and higher to logstash by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230719 (owner: 10BryanDavis)
[15:11:53] <logmsgbot>	 !log bd808@tin Synchronized wmf-config/logging.php: logging: Only send info and higher to logstash by default (4388a84) 1/2 (duration: 00m 12s)
[15:11:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:12:27] <logmsgbot>	 !log bd808@tin Synchronized wmf-config/InitialiseSettings.php: logging: Only send info and higher to logstash by default (4388a84) 2/2 (duration: 00m 12s)
[15:12:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:14:25] <bd808>	 hmmm... do jobrunners not pick up wmf-config changes rapidly?
[15:15:29] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 031] labstore: add timers for backups [puppet] - 10https://gerrit.wikimedia.org/r/230569 (https://phabricator.wikimedia.org/T106474) (owner: 10coren)
[15:15:35] <YuviPanda>	 Coren: ^ +1'd it
[15:16:43] * Coren goes to test now. Yeay.
[15:16:55] <grrrit-wm>	 (03PS2) 10coren: labstore: add timers for backups [puppet] - 10https://gerrit.wikimedia.org/r/230569 (https://phabricator.wikimedia.org/T106474) 
[15:17:24] <grrrit-wm>	 (03CR) 10coren: [C: 032] "LGTM. Let's see if it also looks good to systemd." [puppet] - 10https://gerrit.wikimedia.org/r/230569 (https://phabricator.wikimedia.org/T106474) (owner: 10coren)
[15:17:28] <logmsgbot>	 !log bd808@tin Synchronized wmf-config/InitialiseSettings.php: Touched wmf-config/InitialiseSettings.php (duration: 00m 13s)
[15:17:33] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:17:56] <wikibugs>	 6operations, 10Beta-Cluster, 6Labs, 10Labs-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1527574 (10greg) @ArielGlenn: there's an NDA's task at T97593 which Brandon is driving.
[15:22:51] <grrrit-wm>	 (03CR) 10BryanDavis: "I am still seeing debug level logs in Logstash after syncing this and a var_dump() of $wgMWLoggerDefaultSpi is still showing the use of th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230719 (owner: 10BryanDavis)
[15:24:17] <godog>	 bd808: wave when you are ready with logstash to merge the relevant puppet changes btw
[15:24:26] <Coren>	 YuviPanda: The backups are now enabled.
[15:24:40] * Coren watches the alerts, which should go critical now.
[15:25:19] <bd808>	 godog: thanks. give me a few minutes to try an puzzle out why my wmf-config change isn't doing what I'd hoped
[15:27:13] <Coren>	 Aaaah.  No.  They /never/ ran so they don't have a last run time.
[15:28:09] <godog>	 bd808: np, is it not picking up the change at all?
[15:29:03] <bd808>	 godog: apparently not
[15:29:31] <bd808>	 uhhhh... maybe I forgot to rebase
[15:29:54] * bd808 facepalms
[15:29:56] <bd808>	 yup
[15:30:28] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: postgres: stream WALs while doing pg_basebackup [puppet] - 10https://gerrit.wikimedia.org/r/230785 
[15:30:36] <YuviPanda>	 Coren: :) I guess we will have to wait?
[15:30:43] <logmsgbot>	 !log bd808@tin Synchronized wmf-config/logging.php: logging: Only send info and higher to logstash by default (4388a84) 1/2 (actually rebased this time) (duration: 00m 11s)
[15:30:45] <grrrit-wm>	 (03PS3) 10Alexandros Kosiaris: postgres: stream WALs while doing pg_basebackup [puppet] - 10https://gerrit.wikimedia.org/r/230785 
[15:30:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:30:51] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] postgres: stream WALs while doing pg_basebackup [puppet] - 10https://gerrit.wikimedia.org/r/230785 (owner: 10Alexandros Kosiaris)
[15:31:04] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: new_wmf_service: fix bug with wrong function name [puppet] - 10https://gerrit.wikimedia.org/r/230787 
[15:31:06] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Introducing mobileapps role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) 
[15:31:08] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Assign mobileapps service to sca cluster [puppet] - 10https://gerrit.wikimedia.org/r/230789 (https://phabricator.wikimedia.org/T105538) 
[15:31:09] <logmsgbot>	 !log bd808@tin Synchronized wmf-config/InitialiseSettings.php: logging: Only send info and higher to logstash by default (4388a84) 2/2 (actually rebased this time) (duration: 00m 11s)
[15:31:10] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Setup LVS for mobileapps service on sca cluster [puppet] - 10https://gerrit.wikimedia.org/r/230790 (https://phabricator.wikimedia.org/T105538) 
[15:31:17] <Coren>	 YuviPanda: I'm starting a manual run of replicate-others to see if that updates the last run - but I think that the timer might only track the last time /it/ started it.  We'll soon see.
[15:31:24] <bd808>	 that's better
[15:31:37] <YuviPanda>	 Ok
[15:31:41] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "probably needs config.yaml.erb updated ?" [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris)
[15:31:48] <Coren>	 YuviPanda: Not as useful as using the log like my first draft, but doesn't require elevated privileges so it's a reasonable compromise.
[15:32:04] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "some minor editing needed to assign to scb" [puppet] - 10https://gerrit.wikimedia.org/r/230789 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris)
[15:32:10] <grrrit-wm>	 (03CR) 10BryanDavis: "*facepalm* I fetched to tin but didn't rebase before running sync-file. Nothing to see here. :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230719 (owner: 10BryanDavis)
[15:32:37] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: Allocate neighbor blocks for cr1/2-codfw<->mr1-codfw [dns] - 10https://gerrit.wikimedia.org/r/230791 
[15:32:39] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: Add AAAA/PTR for mr1-codfw [dns] - 10https://gerrit.wikimedia.org/r/230792 
[15:32:45] <YuviPanda>	 Ok
[15:34:47] <icinga-wm>	 PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-others/snapshot is not accessible: Permission denied
[15:35:51] <YuviPanda>	 Coren: ^
[15:36:00] <YuviPanda>	 We should figure outwhy this is happening 
[15:36:00] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: new_wmf_service: fix bug with wrong function name [puppet] - 10https://gerrit.wikimedia.org/r/230787 
[15:36:06] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] new_wmf_service: fix bug with wrong function name [puppet] - 10https://gerrit.wikimedia.org/r/230787 (owner: 10Alexandros Kosiaris)
[15:36:15] * Coren grumbles.
[15:36:22] <bd808>	 !log Disabled puppet on logstash100[1-3] in preparation for upgrade to 1.5.3
[15:36:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:36:35] <Coren>	 YuviPanda: I'll have to dig in the check source.  There's obviously a bug in it.
[15:36:48] <bd808>	 godog: I'm ready for you to merge puppet patches when you can get to it
[15:37:10] <Coren>	 YuviPanda: Annoyingly though, it's part of the icinga default.
[15:37:13] <YuviPanda>	 Coren: is there a task for it? If not can you create one?
[15:37:22] <godog>	 bd808: yup, I'll merge the three patches from https://phabricator.wikimedia.org/T99735#1525049
[15:37:33] <bd808>	 perfect
[15:37:36] <Coren>	 YuviPanda: Nope.  I'll create one now.
[15:38:00] <YuviPanda>	 Thanks 
[15:38:39] <grrrit-wm>	 (03PS18) 10Filippo Giunchedi: Update configuration for logstash 1.5.3 [puppet] - 10https://gerrit.wikimedia.org/r/226991 (https://phabricator.wikimedia.org/T99735) (owner: 10BryanDavis)
[15:38:51] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Update configuration for logstash 1.5.3 [puppet] - 10https://gerrit.wikimedia.org/r/226991 (https://phabricator.wikimedia.org/T99735) (owner: 10BryanDavis)
[15:39:08] <grrrit-wm>	 (03PS27) 10Filippo Giunchedi: labs: new role::logstash::stashbot class [puppet] - 10https://gerrit.wikimedia.org/r/227175 (owner: 10BryanDavis)
[15:39:15] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] labs: new role::logstash::stashbot class [puppet] - 10https://gerrit.wikimedia.org/r/227175 (owner: 10BryanDavis)
[15:39:29] <grrrit-wm>	 (03PS4) 10Filippo Giunchedi: logstash: Enable doc_values in template mapping [puppet] - 10https://gerrit.wikimedia.org/r/230250 (https://phabricator.wikimedia.org/T74930) (owner: 10BryanDavis)
[15:39:35] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] logstash: Enable doc_values in template mapping [puppet] - 10https://gerrit.wikimedia.org/r/230250 (https://phabricator.wikimedia.org/T74930) (owner: 10BryanDavis)
[15:40:03] <wikibugs>	 6operations, 10ops-codfw: es2007 degraded RAID - disk failure - https://phabricator.wikimedia.org/T108592#1527624 (10Papaul) @jcrespo please check and see if  you stay have the same error. Thanks.
[15:40:50] <godog>	 bd808: kicking puppet on tin
[15:42:01] <bd808>	 godog: if that works as hoped, you should see it create /srv/deployment/logstash/plugins
[15:42:41] <wikibugs>	 10Ops-Access-Requests, 6operations, 7LDAP: Add WMF engineer VolkerE <veckl@wikimedia.org> to ldap/wmf group - https://phabricator.wikimedia.org/T107985#1527630 (10greg)
[15:43:28] <wikibugs>	 10Ops-Access-Requests, 6operations, 7LDAP: Add WMF engineer VolkerE <veckl@wikimedia.org> to ldap/wmf group - https://phabricator.wikimedia.org/T107985#1527636 (10greg) Sorry for the late reply. RelEng doesn't manage LDAP. Did something indicate that you should ask us? Old documentation?
[15:43:51] <ostriches>	 greg-g: The process for that has mostly been "ask ostriches"
[15:43:57] <godog>	 bd808: ish, puppet needs to run on palladium first
[15:44:03] <ostriches>	 Which is an absolutely terrible process.
[15:44:04] <ostriches>	 :)
[15:44:06] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: new_wmf_service.py: fix icinga monitoring [puppet] - 10https://gerrit.wikimedia.org/r/230795 
[15:44:24] <bd808>	 ostriches: I was just about to say the same thing on the ticket
[15:44:40] <bd808>	 godog: oh? for the salt master?
[15:44:54] <supernino>	 hello, sorry for the intrusion and the silly qs; is this request correct? can someone take a look? https://phabricator.wikimedia.org/T107992
[15:45:03] <godog>	 bd808: yeah, otherwise repo_config isn't updated
[15:45:25] <ostriches>	 supernino: Seems fine, yeah.
[15:45:40] <godog>	 bd808: you should be good to go
[15:46:33] <bd808>	 godog: looks good. thanks
[15:46:34] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] new_wmf_service.py: fix icinga monitoring [puppet] - 10https://gerrit.wikimedia.org/r/230795 (owner: 10Alexandros Kosiaris)
[15:46:38] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: new_wmf_service.py: fix icinga monitoring [puppet] - 10https://gerrit.wikimedia.org/r/230795 
[15:46:50] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] new_wmf_service.py: fix icinga monitoring [puppet] - 10https://gerrit.wikimedia.org/r/230795 (owner: 10Alexandros Kosiaris)
[15:46:54] <wikibugs>	 7Puppet, 6operations, 6Release-Engineering, 6Services, 7service-runner: Create a standard puppet module for service-runner services - https://phabricator.wikimedia.org/T89901#1527648 (10greg)
[15:47:12] <supernino>	 ok thanks ostriches, who usually take care of these things?
[15:47:13] <bd808>	 !log Trebuchet deploy of logstash/plugins: Add logstash-filter-prune 0.1.5 (36144b2)
[15:47:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:47:39] <bd808>	 supernino: magical elves! Or Krenair and Reedy and the releng folks
[15:47:55] <greg-g>	 ostriches: for ldap?
[15:47:57] * ostriches wonders if he's a magical elf or a releng folk.
[15:47:59] <ostriches>	 greg-g: Yes.
[15:48:03] <greg-g>	 ostriches: :(
[15:48:08] <supernino>	 lol elves are cool
[15:48:35] <Krenair>	 I don't add users to groups in LDAP.
[15:48:50] <Krenair>	 I probably shouldn't be able to.
[15:48:53] <ostriches>	 Krenair: No, but you would do something like supernino's question, T107992
[15:49:04] <ostriches>	 Two conversations going on :p
[15:49:15] <Krenair>	 oh, right
[15:49:21] <bd808>	 !log upgrading logstash on logstash1001
[15:49:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:49:29] <Krenair>	 yes
[15:50:12] <godog>	 bd808: np, let me know if sth comes up
[15:50:21] <wikibugs>	 6operations: Investigate why Icinga's check_disk panics on snatshot mounts - https://phabricator.wikimedia.org/T108694#1527659 (10coren) 3NEW a:3coren
[15:50:35] <jynus>	 !log nuking db1002-db1007 on icinga
[15:50:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:50:55] <wikibugs>	 6operations: Investigate why Icinga's check_disk panics on snatshot mounts - https://phabricator.wikimedia.org/T108694#1527667 (10coren) p:5Triage>3High Set to high priority as this causes inapropriate critical icinga alerts.
[15:51:03] <jynus>	 everithing seems to be going ok, but just in case
[15:52:33] <wikibugs>	 10Ops-Access-Requests, 6operations, 7LDAP: Add WMF engineer VolkerE <veckl@wikimedia.org> to ldap/wmf group - https://phabricator.wikimedia.org/T107985#1527683 (10demon) 5Open>3Resolved a:3demon Done.
[15:53:13] <wikibugs>	 6operations: Investigate why Icinga's check_disk panics on snatshot mounts - https://phabricator.wikimedia.org/T108694#1527692 (10fgiunchedi) see also discussion in {T104975}, possibly a filesystem we shouldn't be checking anyway
[15:53:14] <greg-g>	 mutante: shouldn't that ^ (LDAP) be done by someone in ops vs "ask chad"? /me shrugs
[15:53:36] <greg-g>	 mutante: er, mis-timing, that == adding someone to LDAP
[15:53:53] <Krenair>	 It should be done by ldap-admins.
[15:54:01] <Krenair>	 ostriches is one of them, so are Reedy and robla
[15:54:56] <ostriches>	 In practice, anyone in ops can do it too
[15:55:17] <greg-g>	 I'd just love to merge processes and get rid of random "ask $X" processes
[15:55:37] <greg-g>	 ldap is just fubar'd
[15:55:54] <greg-g>	 see also: new employee onboarding and ex-employee off-boarding pain felt by ops/oit
[15:56:32] <ostriches>	 I don't think wikitech ldap is on any on/offboarding documents at all.
[15:56:45] <Krenair>	 It should be.
[15:56:49] <greg-g>	 right.
[15:56:49] <ostriches>	 It's basically "find out you need it and ask until someone points you to the right person"
[15:56:54] <ostriches>	 Which is bad, right
[15:56:56] <Krenair>	 At least offboarding.
[15:57:00] <greg-g>	 aka: 'not a process' ;)
[15:57:00] <chasemp>	 there is a ticket open in phab now to redefine this process I think
[15:57:07] <bd808>	 godog: found a problem. In https://gerrit.wikimedia.org/r/#/c/227175/27/manifests/role/kibana.pp,unified I messed up the ldap_bindpass
[15:57:09] <greg-g>	 chasemp: yeah, probably at least part of it
[15:57:12] <Krenair>	 I'd prefer to get T62412 fixed before putting it on onboarding docs
[15:57:15] <bd808>	 godog: I'll make a patch to fix
[15:57:19] <grrrit-wm>	 (03PS5) 10Ori.livneh: logstash: Count MediaWiki log events with statsd [puppet] - 10https://gerrit.wikimedia.org/r/230233 (https://phabricator.wikimedia.org/T100735) (owner: 10BryanDavis)
[15:57:19] <wikibugs>	 6operations, 10ops-eqiad, 7Database, 5Patch-For-Review: Remove db1002-db1007 from production - https://phabricator.wikimedia.org/T105768#1527724 (10jcrespo) * Removed from icinga * Puppet certs revoked * Salt keys revoked
[15:57:43] <greg-g>	 Krenair: heh :)
[15:57:49] <godog>	 bd808: oh ok
[15:58:37] <greg-g>	 Krenair: I forget where I made the comment (must have been IRC as it's not on that ticket) but I was very surprised when I found out I had +2 :)
[15:59:20] <greg-g>	 oh right, meeting time
[15:59:56] <bd808>	 godog: hmmm... I think I may need help to fix correctly
[16:00:04] <jouncebot>	 bd808: Respected human, time to deploy Logstash cluster updates (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150811T1600). Please do the needful.
[16:00:14] <bd808>	 godog: I did this 'role::kibana::ldap_bindpass: "%{scope('passwords::ldap::production::proxypass')}
[16:00:14] <bd808>	 "' in hieradata/role/common/logstash.yaml
[16:00:34] <ostriches>	 T62412 is a silly meta bug
[16:00:45] <bd808>	 which apparently doesn't work (probably because the private passwords aren't in scope at the right time)
[16:01:03] <bd808>	 godog: so maybe the best fix would be to set that in the private hiera to the right value
[16:01:15] <bd808>	 and remove from the non-private
[16:01:16] <godog>	 bd808: failing on logstash1001 ?
[16:01:19] <bd808>	 yeah
[16:01:35] <wikibugs>	 6operations, 10ops-codfw: es2007 degraded RAID - disk failure - https://phabricator.wikimedia.org/T108592#1527733 (10jcrespo) 5Open>3Resolved a:3jcrespo Error went away. @Papaul Did you changed it or did it went away on its own?   ``` Firmware state: Rebuild ```
[16:01:40] <bd808>	 It applies as an empty string and that keeps apache2 from starting
[16:01:52] <greg-g>	 Krenair: for the record, I don't think that's a blocker for getting the onboarding process written down more clearly, we can always change after
[16:03:13] <grrrit-wm>	 (03CR) 10BryanDavis: "Problem with hiera settings and apache2 config noted inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/227175 (owner: 10BryanDavis)
[16:03:51] <Krenair>	 greg-g, okay, but we shouldn't start giving out ldap/wmf automatically to new engineers as part of onboarding until it's fixed
[16:04:02] <bd808>	 godog: the hack thing I could to would be to pull the param out and set the variable inline with a scoped lookkup
[16:04:43] <godog>	 bd808: ah yeah, like graphite does  $ldap_bindpass = $passwords::ldap::production::proxypass so it'll be in scope in the template
[16:05:07] <bd808>	 yeah. that's how it worked before my patch tried to move it all to hiera
[16:05:46] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: Introducing mobileapps role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) 
[16:05:48] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: Assign mobileapps service to sca cluster [puppet] - 10https://gerrit.wikimedia.org/r/230789 (https://phabricator.wikimedia.org/T105538) 
[16:05:50] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: Setup LVS for mobileapps service on sca cluster [puppet] - 10https://gerrit.wikimedia.org/r/230790 (https://phabricator.wikimedia.org/T105538) 
[16:06:17] <godog>	 bd808: nod, I'm not deep enough in the puppet rabbithole to tell for sure why that doesn't work
[16:06:39] <bd808>	 k. I'll make the quick and dirty fix patch then
[16:08:00] <wikibugs>	 6operations, 10ops-codfw: EQDFW/EQORD Deployment Prep Task - https://phabricator.wikimedia.org/T91077#1527748 (10RobH)
[16:09:16] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Allocate neighbor blocks for cr1/2-codfw<->mr1-codfw [dns] - 10https://gerrit.wikimedia.org/r/230791 (owner: 10Faidon Liambotis)
[16:09:26] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Add AAAA/PTR for mr1-codfw [dns] - 10https://gerrit.wikimedia.org/r/230792 (owner: 10Faidon Liambotis)
[16:10:40] <grrrit-wm>	 (03PS1) 10BryanDavis: logstash: fix ldap_bindpass [puppet] - 10https://gerrit.wikimedia.org/r/230798 
[16:10:46] <bd808>	 godog: ^
[16:11:16] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] logstash: fix ldap_bindpass [puppet] - 10https://gerrit.wikimedia.org/r/230798 (owner: 10BryanDavis)
[16:11:31] <godog>	 bd808: looks good, merged!
[16:11:51] * bd808 tries it out
[16:12:25] <bd808>	 godog: that worked :)
[16:12:36] <wikibugs>	 6operations, 10ops-codfw: ms-be2003.codfw.wmnet: slot=1 dev=sdb failed - https://phabricator.wikimedia.org/T108561#1527753 (10Papaul) @Filippo drive replacement complete
[16:13:06] <greg-g>	 Krenair: yuo mean "we should stop" ;)
[16:13:28] <godog>	 bd808: \o/
[16:14:04] <Krenair>	 greg-g, right now they have to know what to get and ask for it, right?
[16:15:13] <icinga-wm>	 PROBLEM - RAID on db2023 is CRITICAL 1 failed LD(s) (Degraded)
[16:16:33] <bd808>	 !log logstash upgrade on logstash1001 complete
[16:16:34] <icinga-wm>	 RECOVERY - RAID on es2007 is OK optimal, 1 logical, 2 physical
[16:16:38] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:17:39] <jynus>	 ^oh, one goes up, another down
[16:18:19] <godog>	 they are team-tagging!
[16:20:13] <bd808>	 !log logstash upgrade on logstash1002 complete
[16:20:16] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] update collector version [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/230582 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans)
[16:20:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:23:14] <bd808>	 !log logstash upgrade on logstash1003 complete
[16:23:19] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:23:31] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Get scb up to par with sca [puppet] - 10https://gerrit.wikimedia.org/r/230800 
[16:23:53] <bd808>	 godog: done with the logstash upgrades. Let's wait 10-15 minutes to make sure nothing melts and then start on the elasticsearch update
[16:24:21] <bd808>	 godog: you could schedule the icinga watch downtime now from T108040 if you want
[16:25:24] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: Fix mr1-codfw AAAA to match PTR [dns] - 10https://gerrit.wikimedia.org/r/230802 
[16:25:46] <bd808>	 godog: the check that needs to be silenced is "ElasticSearch health check for shards" for all 6 hosts in the logstash_eqiad group
[16:25:59] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] Fix mr1-codfw AAAA to match PTR [dns] - 10https://gerrit.wikimedia.org/r/230802 (owner: 10Faidon Liambotis)
[16:27:14] <icinga-wm>	 RECOVERY - Disk space on labstore1002 is OK: DISK OK
[16:27:15] <godog>	 bd808: yup, that's muted until 19.30 UTC
[16:27:48] <wikibugs>	 6operations, 10RESTBase, 10RESTBase-Cassandra: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1527783 (10Eevans) a:3Eevans
[16:27:58] <bd808>	 godog: awesome, thanks
[16:30:19] <mutante>	 greg-g: should be "ask an LDAP admin" which has ops plus a couple others, but via phab would make a difference for process
[16:30:23] <greg-g>	 Krenair: I didn't
[16:30:34] <Krenair>	 ?
[16:30:39] <greg-g>	 explicitly ask for +2
[16:30:40] <Krenair>	 oh
[16:30:56] <godog>	 bd808: np, we'll need to look at es-tool and jessie, I think it might run out of the box
[16:30:58] <bd808>	 greg-g: but you were hanging out with the cool kids (mw-core)
[16:33:04] <wikibugs>	 6operations, 10RESTBase, 10RESTBase-Cassandra: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1527795 (10Eevans)
[16:33:09] <grrrit-wm>	 (03CR) 10Dzahn: "oh, interesting re: "mod_access_compat", well, it did not seem to be default everywhere, saw at least one host break because of Apache 2.2" [puppet] - 10https://gerrit.wikimedia.org/r/230686 (owner: 10Dzahn)
[16:33:26] <grrrit-wm>	 (03PS1) 10Alex Monk: Set wgNamespaceRobotPolicies on itwiki's NS_USER to noindex,follow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230804 (https://phabricator.wikimedia.org/T107992) 
[16:34:13] <icinga-wm>	 PROBLEM - Last backup of the others filesystem on labstore1002 is CRITICAL - Last run was over 1:00:00 ago
[16:34:18] <greg-g>	 bd808: indeed :)
[16:34:54] <bd808>	 godog: things look stable. I'm going to start the elasticsearch 1.7.1 update now
[16:36:33] <bd808>	 !log upgraded elaasticsearch to 1.7.1 on logstash1001
[16:36:38] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:37:00] <mutante>	 the onboarding process shouldn't even be defined as limited to 1 or 2 teams. to actually work it needs to be a WMF-wide thing, from when a contract gets signed until they have all they need to work, so at least HR + Facilities + IT + Ops + TeamTheyWorkIn, theoretically Finances
[16:37:21] <bd808>	 !log upgraded elaasticsearch to 1.7.1 on logstash1002
[16:37:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:38:20] <bd808>	 !log upgraded elasticsearch to 1.7.1 on logstash1003
[16:38:24] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:38:36] <grrrit-wm>	 (03PS4) 10Dzahn: etherpad: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230686 
[16:39:55] <wikibugs>	 6operations, 10Traffic: Fix Varnish TTLs across the board - https://phabricator.wikimedia.org/T108612#1527843 (10BBlack) Ok I've dug into this some (read varnish source code to confirm behavior there, re-read our VCL, stared at lots of parsed varnish logs, etc) and it's not as bad as I initially thought.  Most...
[16:40:50] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] etherpad: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230686 (owner: 10Dzahn)
[16:40:59] <grrrit-wm>	 (03CR) 10Ori.livneh: "> If that is a high traffic service we might still want to exclude it from connection tracking, though." [puppet] - 10https://gerrit.wikimedia.org/r/223844 (https://phabricator.wikimedia.org/T104970) (owner: 10Dzahn)
[16:41:23] <grrrit-wm>	 (03PS1) 10Alex Monk: Allow ptwiki bureaucrats to remove sysop+bureaucrat rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230805 (https://phabricator.wikimedia.org/T107661) 
[16:41:44] <grrrit-wm>	 (03PS1) 10BBlack: varnish: director->backends is now always an array [puppet] - 10https://gerrit.wikimedia.org/r/230806 
[16:42:13] <wikibugs>	 6operations, 6Collaboration-Team-Backlog, 10Flow, 10MediaWiki-Redirects, 3Reading-Web: Flow url doesn't redirect to mobile - https://phabricator.wikimedia.org/T107108#1527852 (10Krenair)
[16:42:16] <mutante>	 !log restarted Apache on Etherpad
[16:42:20] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:42:26] <grrrit-wm>	 (03PS1) 10BBlack: cache_(text|upload): remove runtime_params conditional [puppet] - 10https://gerrit.wikimedia.org/r/230807 
[16:42:28] <grrrit-wm>	 (03PS1) 10BBlack: cache_(text|upload): frontend default_ttl => 30d [puppet] - 10https://gerrit.wikimedia.org/r/230808 (https://phabricator.wikimedia.org/T108612) 
[16:42:30] <grrrit-wm>	 (03PS1) 10BBlack: cache_mobile: def_ttl 30d [puppet] - 10https://gerrit.wikimedia.org/r/230809 (https://phabricator.wikimedia.org/T108612) 
[16:42:38] <wikibugs>	 6operations, 6Collaboration-Team-Backlog, 10Flow, 10MediaWiki-Redirects, 3Reading-Web: Flow url doesn't redirect to mobile - https://phabricator.wikimedia.org/T107108#1486874 (10Krenair) Moved from #Wikimedia-site-requests to #operations since that's in the puppet repository, not mediawiki-config
[16:42:47] <bd808>	 !log upgraded elasticsearch to 1.7.1 on logstash1004; logstash-2015.08.11 shard recovering
[16:42:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:46:29] <wikibugs>	 6operations, 6Collaboration-Team-Backlog, 10Flow, 10MediaWiki-Redirects, 3Reading-Web: Flow url doesn't redirect to mobile - https://phabricator.wikimedia.org/T107108#1527871 (10Dzahn) Yes, in the puppet repository but in the mediawiki module. Since it's config you might argue for it to be in mediawiki-c...
[16:46:55] <grrrit-wm>	 (03PS1) 10coren: Labs: put the real check interval for backups [puppet] - 10https://gerrit.wikimedia.org/r/230810 
[16:47:07] <Coren>	 YuviPanda: ^^ extra easy changeset to +1?
[16:47:24] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Labs: put the real check interval for backups [puppet] - 10https://gerrit.wikimedia.org/r/230810 (owner: 10coren)
[16:47:31] <wikibugs>	 6operations, 10ops-codfw: RAID disk failure on db2023 - https://phabricator.wikimedia.org/T108701#1527873 (10jcrespo) 3NEW
[16:47:44] <Coren>	 Oh d'uh
[16:47:55] <mutante>	 Coren: line 35 looks wrong
[16:47:59] <YuviPanda>	 Coren: :)
[16:48:08] <grrrit-wm>	 (03PS2) 10coren: Labs: put the real check interval for backups [puppet] - 10https://gerrit.wikimedia.org/r/230810 
[16:48:09] <Coren>	 Yeah, copypasta failz
[16:48:19] <wikibugs>	 6operations, 10ops-codfw: RAID disk failure on db2023 - https://phabricator.wikimedia.org/T108701#1527884 (10jcrespo) @Papaul, lets wait for now: it says "Rebuild", it may fix itself as the other one.
[16:48:49] <wikibugs>	 7Blocked-on-Operations, 6Collaboration-Team-Backlog, 10Flow, 3Collaboration-Team-Current, and 2 others: Separate reference tables by wiki - https://phabricator.wikimedia.org/T107204#1527887 (10DannyH) p:5High>3Unbreak!
[16:48:54] <icinga-wm>	 ACKNOWLEDGEMENT - RAID on db2023 is CRITICAL 1 failed LD(s) (Degraded) Jcrespo T108701
[16:48:55] <Coren>	 YuviPanda: You noticed the properly triggering 1h crit?
[16:49:23] <YuviPanda>	 Coren: nice!
[16:49:33] <YuviPanda>	 (and I haven't, no0
[16:49:35] <YuviPanda>	 but whee
[16:51:10] <grrrit-wm>	 (03PS3) 10Mobrovac: Introducing mobileapps role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris)
[16:51:54] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Introducing mobileapps role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris)
[16:52:54] <icinga-wm>	 RECOVERY - puppet last run on ms-be2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:53:20] <grrrit-wm>	 (03PS4) 10Mobrovac: Introducing mobileapps role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris)
[16:56:30] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 031] "Looks GTG to me." [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris)
[16:59:30] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 04-1] "The plan is to have it only on SCB, so I don't understand why would we want to assign it to SCA at all." [puppet] - 10https://gerrit.wikimedia.org/r/230789 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris)
[17:04:47] <wikibugs>	 6operations, 10ops-codfw: ms-be2003.codfw.wmnet: slot=1 dev=sdb failed - https://phabricator.wikimedia.org/T108561#1527930 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi rebuilding, thanks @papaul !  ``` /dev/sdb1       1.9T  4.2G  1.9T   1% /srv/swift-storage/sdb1 ```
[17:06:14] <wikibugs>	 6operations, 10ops-codfw: ms-be2003.codfw.wmnet: slot=1 dev=sdb failed - https://phabricator.wikimedia.org/T108561#1527938 (10Papaul) You welcome.
[17:11:02] <grrrit-wm>	 (03PS2) 10BBlack: cache_(text|upload): remove runtime_params conditional [puppet] - 10https://gerrit.wikimedia.org/r/230807 
[17:11:10] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] cache_(text|upload): remove runtime_params conditional [puppet] - 10https://gerrit.wikimedia.org/r/230807 (owner: 10BBlack)
[17:11:20] <grrrit-wm>	 (03PS2) 10BBlack: cache_(text|upload): frontend default_ttl => 30d [puppet] - 10https://gerrit.wikimedia.org/r/230808 (https://phabricator.wikimedia.org/T108612) 
[17:13:14] <grrrit-wm>	 (03PS2) 10Dzahn: OTRS: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230709 
[17:13:18] <bd808>	 !log logstash cluster recovered after upgrade of elasticsearch on logstash1004
[17:13:22] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] cache_(text|upload): frontend default_ttl => 30d [puppet] - 10https://gerrit.wikimedia.org/r/230808 (https://phabricator.wikimedia.org/T108612) (owner: 10BBlack)
[17:13:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:14:04] <bd808>	 !log log event volume in logstash dropped dramatically at 16:49; investigating
[17:14:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:16:45] <wikibugs>	 6operations, 10Continuous-Integration-Infrastructure, 6Multimedia, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1527982 (10brion) Made an attempt to sidestep ffmpeg2theora by using ffmpeg for conversion and ogg...
[17:20:51] <grrrit-wm>	 (03PS2) 10BBlack: cache_mobile: def_ttl 30d [puppet] - 10https://gerrit.wikimedia.org/r/230809 (https://phabricator.wikimedia.org/T108612) 
[17:21:50] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] cache_mobile: def_ttl 30d [puppet] - 10https://gerrit.wikimedia.org/r/230809 (https://phabricator.wikimedia.org/T108612) (owner: 10BBlack)
[17:22:03] <icinga-wm>	 RECOVERY - RAID on db2023 is OK optimal, 1 logical, 2 physical
[17:22:34] <grrrit-wm>	 (03PS2) 10BBlack: define puppet ganglia stuff for caches @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/230771 
[17:22:41] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] define puppet ganglia stuff for caches @ codfw [puppet] - 10https://gerrit.wikimedia.org/r/230771 (owner: 10BBlack)
[17:25:07] <grrrit-wm>	 (03PS13) 10Giuseppe Lavagetto: puppet-compiler: first commit [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/228849 (https://phabricator.wikimedia.org/T96802) 
[17:27:20] <bd808>	 !log logstash event volume recovered after restarting all 3 logstash services
[17:27:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:28:25] <bd808>	 !log upgrading elasticsearch on logstash1005
[17:28:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:29:09] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "- Diff detection is broken with the latest catalog diff puppet face" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/228849 (https://phabricator.wikimedia.org/T96802) (owner: 10Giuseppe Lavagetto)
[17:29:46] <bd808>	 !log upgraded elasticsearch to 1.7.1 on logstash1005; logstash-2015.08.11 shard recovering
[17:29:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:30:19] <grrrit-wm>	 (03PS3) 10coren: Labs: put the real check interval for backups [puppet] - 10https://gerrit.wikimedia.org/r/230810 
[17:35:51] <grrrit-wm>	 (03CR) 10coren: [C: 032] Labs: put the real check interval for backups [puppet] - 10https://gerrit.wikimedia.org/r/230810 (owner: 10coren)
[17:38:54] <icinga-wm>	 RECOVERY - Last backup of the others filesystem on labstore1002 is OK - Last run successful
[17:39:37] <Coren>	 YuviPanda: ^^
[17:43:03] <wikibugs>	 6operations: Investigate why Icinga's check_disk panics on snatshot mounts - https://phabricator.wikimedia.org/T108694#1528103 (10coren) Good catch, @fgiunchedi - clearly the same issue.  Merging.
[17:43:07] <bd808>	 !log log event volume in logstash dropped dramatically again; seems to correlate with final recovery of logstash-2015.08.11 shard
[17:43:12] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:43:22] <wikibugs>	 6operations: Investigate why Icinga's check_disk panics on snatshot mounts - https://phabricator.wikimedia.org/T108694#1528104 (10coren)
[17:43:24] <wikibugs>	 6operations, 5Continuous-Integration-Isolation, 7Icinga, 7Monitoring, 7Nodepool: flapping "permission denied" disk space alarm for temporary image on labnodepool1001 - https://phabricator.wikimedia.org/T104975#1528106 (10coren)
[17:45:38] <wikibugs>	 6operations, 5Patch-For-Review: move grafana from zirconium to a VM - https://phabricator.wikimedia.org/T105008#1528115 (10Dzahn) 10:08 <mutante>  "Could not contact Elasticsearch. Please ensure that Elasticsearch is reachable from your browser."  10:09 <mutante> and the way i tested is: 10:09 <mutante> ssh -D...
[17:50:19] <wikibugs>	 6operations, 5Continuous-Integration-Isolation, 7Icinga, 7Monitoring, 7Nodepool: flapping "permission denied" disk space alarm for temporary image on labnodepool1001 - https://phabricator.wikimedia.org/T104975#1528152 (10coren) We get the same issue on the labstore* with the required exclusion being `/va...
[18:00:04] <jouncebot>	 twentyafterfour greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150811T1800). Please do the needful.
[18:00:48] <bd808>	 twentyafterfour: don't trust logstash/kibana to tell you what's broken/working right now.
[18:01:34] <bd808>	 !log logstash cluster recovered after upgrade of elasticsearch on logstash1005
[18:01:39] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:02:15] <bd808>	 !log upgrading elasticsearch on logstash1006
[18:02:20] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:03:44] <bd808>	 !log upgraded elasticsearch to 1.7.1 on logstash1006; logstash-2015.08.11 shard recovering
[18:03:49] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:04:37] <twentyafterfour>	 bd808: I don't usually trust it ;)
[18:05:01] <bd808>	 fine then ;)
[18:06:57] <bd808>	 !log logstash cluster recovered after upgrade of elasticsearch on logstash1006
[18:07:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:07:20] <bd808>	 godog: I'm done! Thanks for your help
[18:07:50] <grrrit-wm>	 (03PS1) 1020after4: 1.26wmf18 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230819 
[18:07:52] <grrrit-wm>	 (03PS1) 1020after4: delete 1.26wmf10 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230820 
[18:08:04] <grrrit-wm>	 (03CR) 1020after4: [C: 032] 1.26wmf18 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230819 (owner: 1020after4)
[18:08:10] <grrrit-wm>	 (03Merged) 10jenkins-bot: 1.26wmf18 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230819 (owner: 1020after4)
[18:08:20] <grrrit-wm>	 (03CR) 1020after4: [C: 032] delete 1.26wmf10 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230820 (owner: 1020after4)
[18:08:26] <grrrit-wm>	 (03Merged) 10jenkins-bot: delete 1.26wmf10 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230820 (owner: 1020after4)
[18:14:58] <grrrit-wm>	 (03PS1) 10Dzahn: grafana: needs to load Apache mod_proxy_http too [puppet] - 10https://gerrit.wikimedia.org/r/230824 
[18:15:18] <logmsgbot>	 !log twentyafterfour@tin Started scap: sync new branch 1.26wmf18 and update testwiki
[18:15:23] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:16:38] <wikibugs>	 6operations, 5Patch-For-Review: move grafana from zirconium to a VM - https://phabricator.wikimedia.org/T105008#1528327 (10Dzahn) >>! In T105008#1528115, @Dzahn wrote: > 10:33 <mutante> it's related to mod_proxy, figuring it out > 10:34 <mutante> 3945] AH01144: No protocol handler was valid for the URL /grafan...
[18:16:52] <grrrit-wm>	 (03PS1) 10Mforns: Change percentage in EventLogging validation alert [puppet] - 10https://gerrit.wikimedia.org/r/230825 (https://phabricator.wikimedia.org/T108339) 
[18:18:14] <grrrit-wm>	 (03PS2) 10Dzahn: grafana: needs to load Apache mod_proxy_http too [puppet] - 10https://gerrit.wikimedia.org/r/230824 
[18:19:08] <grrrit-wm>	 (03PS3) 10Dzahn: grafana: needs to load Apache mod_proxy_http too [puppet] - 10https://gerrit.wikimedia.org/r/230824 
[18:19:21] <grrrit-wm>	 (03PS4) 10Dzahn: grafana: needs to load Apache mod_proxy_http too [puppet] - 10https://gerrit.wikimedia.org/r/230824 (https://phabricator.wikimedia.org/T105008) 
[18:19:47] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] grafana: needs to load Apache mod_proxy_http too [puppet] - 10https://gerrit.wikimedia.org/r/230824 (https://phabricator.wikimedia.org/T105008) (owner: 10Dzahn)
[18:21:30] <bd808>	 !log logstash log event volume back to normal levels following elasticsearch upgrade
[18:21:35] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:23:47] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "works now after issues fixed with I0e8f2e7aaf1ae7e178b9" [puppet] - 10https://gerrit.wikimedia.org/r/230660 (https://phabricator.wikimedia.org/T105008) (owner: 10Dzahn)
[18:23:54] <grrrit-wm>	 (03PS3) 10Dzahn: misc-web: switch grafana to backend krypton [puppet] - 10https://gerrit.wikimedia.org/r/230660 (https://phabricator.wikimedia.org/T105008) 
[18:27:28] <wikibugs>	 6operations, 10RESTBase-Cassandra: upgrade RESTBase cluster to Cassandra 2.1.8 - https://phabricator.wikimedia.org/T107949#1528368 (10Eevans) >>! In T107949#1527439, @Eevans wrote: >>>! In T107949#1526933, @fgiunchedi wrote: >> cosmetic issue output contains `%s` spotted while looking at the logs, benign >>  >...
[18:27:45] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] misc-web: switch grafana to backend krypton [puppet] - 10https://gerrit.wikimedia.org/r/230660 (https://phabricator.wikimedia.org/T105008) (owner: 10Dzahn)
[18:29:53] <grrrit-wm>	 (03PS1) 10Dzahn: grafana: remove role from zirconium [puppet] - 10https://gerrit.wikimedia.org/r/230827 (https://phabricator.wikimedia.org/T104946) 
[18:30:16] <grrrit-wm>	 (03PS2) 10Dzahn: grafana: remove role from zirconium [puppet] - 10https://gerrit.wikimedia.org/r/230827 (https://phabricator.wikimedia.org/T104946) 
[18:31:57] <mutante>	 ah, the switched varnishes
[18:32:09] <mutante>	 gotta use a bunch of ssh-keygen -f
[18:32:23] <mutante>	 and 4 boxen now for misc-web
[18:34:17] <wikibugs>	 6operations, 7network: smokeping loss of ping for codfw rows - https://phabricator.wikimedia.org/T108715#1528390 (10fgiunchedi) 3NEW
[18:36:36] <godog>	 paravoid: ^ https://phabricator.wikimedia.org/T108715
[18:36:40] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1528411 (10chasemp) @bblack @faidon @mmodell    I have talked this over with a  few people gracious enough to lend me their ear but t...
[18:37:13] <mutante>	 !log grafana switched to node krypton (jessie/VM)
[18:37:19] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:37:30] <godog>	 bd808: no problem! glad it was smooth
[18:38:10] <yurik>	 ottomata2, hi, any updates on the hadoop? https://gerrit.wikimedia.org/r/#/c/230535/
[18:39:28] <wikibugs>	 6operations, 5Patch-For-Review, 7Tracking: tracking: move all misc services from zirconium to a VM - https://phabricator.wikimedia.org/T104946#1528423 (10Dzahn)
[18:39:30] <wikibugs>	 6operations, 5Patch-For-Review: move grafana from zirconium to a VM - https://phabricator.wikimedia.org/T105008#1528421 (10Dzahn) 5Open>3Resolved 11:38 < mutante> !log grafana switched to node krypton (jessie/VM)
[18:40:14] <godog>	 mutante: that made me realize it'd be nice if we could cc phab tickets with !log
[18:40:48] <mutante>	 godog: yes, that would be nice
[18:41:07] <mutante>	 morebots could mail phab i suppose
[18:41:08] <morebots>	 I am a logbot running on tools-exec-1210.
[18:41:08] <morebots>	 Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log.
[18:41:08] <morebots>	 To log a message, type !log <msg>.
[18:42:08] <wikibugs>	 6operations: move grafana from zirconium to a VM - https://phabricator.wikimedia.org/T105008#1528439 (10Dzahn)
[18:42:51] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] grafana: remove role from zirconium [puppet] - 10https://gerrit.wikimedia.org/r/230827 (https://phabricator.wikimedia.org/T104946) (owner: 10Dzahn)
[18:44:45] <logmsgbot>	 !log twentyafterfour@tin scap failed: OSError [Errno 1] Operation not permitted: '/srv/mediawiki-staging/wikiversions.php' (duration: 29m 27s)
[18:44:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:45:25] <wikibugs>	 6operations: reclaim zirconium - https://phabricator.wikimedia.org/T105510#1528451 (10Dzahn)
[18:45:53] <twentyafterfour>	 ori: ^
[18:46:14] <bd808>	 twentyafterfour: was it the chmod that failed?
[18:47:11] <twentyafterfour>	 bd808: yes
[18:47:15] <twentyafterfour>	 os.chmod(php_file, 0664)
[18:47:34] <twentyafterfour>	 -rw-rw-r--  1 ori             wikidev 33069 Aug 11 18:44 wikiversions.php
[18:47:57] <twentyafterfour>	 I would have to own it to chmod, right?
[18:48:19] <bd808>	 we do the same thing for the cdb...
[18:48:28] <bd808>	 but it is moved into place
[18:49:08] <twentyafterfour>	 created, then moved instead of chmod on the existing file?
[18:49:26] <bd808>	 yeah see https://github.com/wikimedia/mediawiki-tools-scap/blob/master/scap/tasks.py#L163-L164
[18:49:58] <bd808>	 the problem here is chmod on a file owned by another deployer
[18:50:21] <bd808>	 you can hot patch it by taking the chmod() on line 178 out entirely
[18:50:26] <wikibugs>	 6operations, 5Patch-For-Review, 7Tracking: tracking: move all misc services from zirconium to a VM - https://phabricator.wikimedia.org/T104946#1528450 (10Dzahn) 5Open>3Resolved
[18:50:57] <bd808>	 but it should probably follow the tmp_file, rename, chmod model used for the cdb
[18:52:41] <twentyafterfour>	 yeah
[18:52:42] <grrrit-wm>	 (03PS1) 10Dzahn: zirconium: decom, rm from site.pp,DHCP,netboot [puppet] - 10https://gerrit.wikimedia.org/r/230828 (https://phabricator.wikimedia.org/T105510) 
[18:53:10] <logmsgbot>	 !log twentyafterfour@tin Started scap: again: sync new branch 1.26wmf18 and update testwiki
[18:53:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:54:37] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] zirconium: decom, rm from site.pp,DHCP,netboot [puppet] - 10https://gerrit.wikimedia.org/r/230828 (https://phabricator.wikimedia.org/T105510) (owner: 10Dzahn)
[18:57:50] <wikibugs>	 6operations, 5Patch-For-Review: reclaim zirconium - https://phabricator.wikimedia.org/T105510#1528492 (10Dzahn) root@palladium:~# puppet cert clean zirconium.wikimedia.org  Notice: Revoked certificate with serial 346 Notice: Removing file Puppet::SSL::Certificate zirconium.wikimedia.org at '/var/lib/puppet/ser...
[18:58:08] <logmsgbot>	 !log twentyafterfour@tin Finished scap: again: sync new branch 1.26wmf18 and update testwiki (duration: 04m 58s)
[18:58:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:59:39] <wikibugs>	 6operations, 10Continuous-Integration-Infrastructure, 6Multimedia, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1528512 (10brion) Ok I've got a provisional patch for ffmpeg2theora master, which gets a local bui...
[18:59:44] <wikibugs>	 6operations, 10MediaWiki-extensions-TimedMediaHandler, 6Multimedia: Support VP9 in TMH (Unable to decode) - https://phabricator.wikimedia.org/T55863#1528513 (10brion) Ok I've got a provisional patch for ffmpeg2theora master, which gets a local build of ffmpeg2theora working in MediaWiki-Vagrant for me. https...
[19:01:25] <grrrit-wm>	 (03PS1) 10Dzahn: decom zirconium [dns] - 10https://gerrit.wikimedia.org/r/230830 (https://phabricator.wikimedia.org/T105510) 
[19:02:12] <grrrit-wm>	 (03CR) 10Dzahn: "just wait a little while, make sure it's gone from icinga and shut it down first" [dns] - 10https://gerrit.wikimedia.org/r/230830 (https://phabricator.wikimedia.org/T105510) (owner: 10Dzahn)
[19:04:16] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: maps: Include tuning.conf in slaves as well [puppet] - 10https://gerrit.wikimedia.org/r/230832 
[19:05:20] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1528555 (10mmodell) >>! In T100519#1528411, @chasemp wrote: > 2) Put iridium in a public VLAN setting up SSH to go through LVS incomi...
[19:05:40] <grrrit-wm>	 (03PS1) 10BryanDavis: Fix wikiversions compilation problem [tools/scap] - 10https://gerrit.wikimedia.org/r/230833 
[19:05:46] <bd808>	 twentyafterfour: ^
[19:05:56] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] maps: Include tuning.conf in slaves as well [puppet] - 10https://gerrit.wikimedia.org/r/230832 (owner: 10Alexandros Kosiaris)
[19:06:13] <wikibugs>	 6operations, 5Patch-For-Review: reclaim zirconium - https://phabricator.wikimedia.org/T105510#1528570 (10Dzahn) once the name zirconium is gone from DNS, this hardware can still be accessed as wmf3427.mgmt.eqiad.wmnet.
[19:07:06] <wikibugs>	 6operations, 5Patch-For-Review: reclaim zirconium - https://phabricator.wikimedia.org/T105510#1528578 (10Dzahn) a:3Dzahn
[19:07:50] <yurik>	 akosiaris, the backup process seems to have finished!
[19:07:57] <yurik>	 (slaves repl)
[19:10:32] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1528619 (10mmodell) summary of my goals, roughly in order of importance:  # Get phabricator's git repos exposed over ssh, with a stro...
[19:11:39] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: maps: Fix typo introduced in 5daab4f [puppet] - 10https://gerrit.wikimedia.org/r/230838 
[19:12:41] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1528625 (10BBlack) >>! In T100519#1528411, @chasemp wrote: > The model of having SSH (for git) set up as a service in LVS and termina...
[19:13:19] <grrrit-wm>	 (03CR) 1020after4: [C: 032] Fix wikiversions compilation problem [tools/scap] - 10https://gerrit.wikimedia.org/r/230833 (owner: 10BryanDavis)
[19:14:58] <grrrit-wm>	 (03CR) 10Ori.livneh: "Yikes, thanks." [tools/scap] - 10https://gerrit.wikimedia.org/r/230833 (owner: 10BryanDavis)
[19:15:36] <grrrit-wm>	 (03Merged) 10jenkins-bot: Fix wikiversions compilation problem [tools/scap] - 10https://gerrit.wikimedia.org/r/230833 (owner: 10BryanDavis)
[19:15:49] <RoanKattouw>	 twentyafterfour: What's the status of the wmf18 roll-out? Do I have time to sneak in an unbreak now fix for the wmf18 branch?
[19:17:24] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] maps: Fix typo introduced in 5daab4f [puppet] - 10https://gerrit.wikimedia.org/r/230838 (owner: 10Alexandros Kosiaris)
[19:18:00] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1528677 (10BBlack) >>! In T100519#1528619, @mmodell wrote: > summary of my goals, roughly in order of importance: >  > # Get phabrica...
[19:19:08] <twentyafterfour>	 RoanKattouw: it's on testwiki
[19:19:37] <twentyafterfour>	 RoanKattouw: so, sneak it in by all means
[19:21:30] <akosiaris>	 yurik: yeah, it's fixed now
[19:21:35] <wikibugs>	 6operations, 6Discovery, 10Maps, 10Traffic, and 2 others: Set up standard HTTPS Termination -> 2layer caching for maps service - https://phabricator.wikimedia.org/T105076#1528698 (10Yurik)
[19:21:37] <akosiaris>	 and with that I am going to bed
[19:21:43] <yurik>	 akosiaris, i can't connect
[19:21:59] <akosiaris>	 ?
[19:22:09] <RoanKattouw>	 twentyafterfour: OK, going to do that now, Jenkins volente
[19:22:28] <yurik>	 akosiaris, never mind, works now!
[19:22:30] <yurik>	 awesome!!!
[19:22:33] <yurik>	 thanks!!!!!!!!!!!!!!!!
[19:22:41] <yurik>	 gnight :)
[19:25:43] <wikibugs>	 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1528724 (10GWicke) As discussed on the mail thread, my proposal is to go with 6 nodes with 8 cores, 96G RAM and 4 Samsung SSDs each. This variant gives us the best cost / performance rati...
[19:26:39] <logmsgbot>	 !log catrope@tin Synchronized php-1.26wmf18/extensions/Flow/modules/editor/editors/visualeditor/mw.flow.ve.Target.js: Fix missing editor switcher (duration: 00m 12s)
[19:26:45] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:26:58] <RoanKattouw>	 twentyafterfour: OK, done. Thanks!
[19:28:10] <logmsgbot>	 !log ori@tin Synchronized php-1.26wmf17/includes/resourceloader/ResourceLoader.php: I2089b21fc: ResourceLoader: make "cacheReport" option false by default (duration: 00m 11s)
[19:28:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:28:24] <logmsgbot>	 !log ori@tin Synchronized php-1.26wmf18/includes/resourceloader/ResourceLoader.php: I2089b21fc: ResourceLoader: make "cacheReport" option false by default (duration: 00m 13s)
[19:28:28] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:33:03] <Negative24>	 thcipriani: I applaud your vim live it thing
[19:34:17] <thcipriani>	 Negative24: heh, thanks. Nice github stalking. Some guy on the internet actually made vinyl stickers of that version of the vim logo and sent me a bunch. Made the whole project worth-while :)
[19:35:04] <Negative24>	 thcipriani: I actually saw a link of it on http://vimcasts.org/blog/2013/02/habit-breaking-habit-making/ then saw it was you :P WMF is everywhere!
[19:35:38] <Negative24>	 but your github is awesome as well
[19:36:17] <wikibugs>	 6operations: codfw misc cluster ganglia not working - https://phabricator.wikimedia.org/T108680#1528766 (10Dzahn) a:3Dzahn
[19:36:57] <twentyafterfour>	 so testwiki is 503, trying to figure out the cause
[19:40:12] <Negative24>	 ah so that's why nothings working. Its always great when its not just me
[19:42:27] <ori>	 2015-08-11 19:42:01 mw1017 testwiki fatal ERROR: [f40e9261] /wiki/Main_Page   ErrorException from line 267 of /srv/mediawiki/php-1.26wmf18/includes/exception/MWExceptionHandler.php: Fatal Error: Cannot pass parameter 2 by reference {"exception":"[Exception ErrorException] (/srv/mediawiki/php-1.26wmf18/includes/exception/MWExceptionHandler.php:267) Fatal Error: Cannot pass parameter 2 by reference\n[stacktrace]\n#0 [internal func
[19:42:28] <ori>	 tion]: MWExceptionHandler::handleFatalError()\n#1 {main}\n"}
[19:43:40] <wikibugs>	 6operations: codfw misc cluster ganglia not working - https://phabricator.wikimedia.org/T108680#1528775 (10Dzahn) 12:38 < mutante> bblack: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&s=by+name&c=Miscellaneous%2520codfw&tab=m&vn=&hide-hf=false 12:38 < mutante> the ganglia data is coming in no...
[19:43:58] <wikibugs>	 6operations: codfw misc cluster ganglia not working - https://phabricator.wikimedia.org/T108680#1528776 (10Dzahn) 5Open>3Resolved
[19:44:29] <ori>	 bd808: ^^
[19:55:21] <ori>	 twentyafterfour: i think it might be I7ea050a2eabba635f2aadb4e33b6f8fbfb1b01a8
[19:57:04] <ori>	 yeah
[19:57:08] <ori>	 i fixed it, i'll submit a patch
[19:57:29] <twentyafterfour>	 ori: awesome thanks
[19:59:06] <ori>	 AaronSchulz: yt?
[19:59:25] <AaronSchulz>	 hm
[19:59:59] <ori>	 AaronSchulz: it's http://stackoverflow.com/a/9716982/582542
[20:00:18] <ori>	 so the question is, is the proper fix to create $dummy = null; and pass that instead of a null literal (that's what I live-hacked on mw1017, and it works)
[20:00:21] <ori>	 or to pass $casToken
[20:01:20] <ori>	 I suspect you meant to do the latter, no?
[20:01:34] * AaronSchulz is trying to find the lines in question
[20:01:59] <ori>	 AaronSchulz: includes/objectcache/MultiWriteBagOStuff.php L66
[20:02:21] <ori>	 also did you mean to clobber $flags like that?
[20:04:56] * AaronSchulz looks at what $casToken did before
[20:06:04] <grrrit-wm>	 (03PS1) 10Alex Monk: Kill ee-prototype.wikipedia.beta.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/230854 (https://phabricator.wikimedia.org/T107397) 
[20:06:32] <wikibugs>	 6operations, 6Discovery, 10SEO, 3Discovery-Analysis-Sprint: Get Oliver Keyes access to Google Webmaster Tools for all Wikimedia domains - https://phabricator.wikimedia.org/T101157#1528884 (10Deskana) Taking out of the sprint, because I'm clearly not finding time to get around to this.
[20:06:37] <wikibugs>	 6operations, 6Discovery, 10SEO: Get Oliver Keyes access to Google Webmaster Tools for all Wikimedia domains - https://phabricator.wikimedia.org/T101157#1528885 (10Deskana)
[20:07:33] <AaronSchulz>	 yeah the actual cas value doesn't matter for that class
[20:07:46] <AaronSchulz>	 so I can make that <<$value = $cache->get( $key, $casToken, $flags );>>
[20:08:11] <AaronSchulz>	 phpstorm actually sees that error when the $caches var doc is fixed too
[20:09:08] <ori>	 AaronSchulz: https://gerrit.wikimedia.org/r/#/c/230853/
[20:10:44] <logmsgbot>	 !log ori@tin Synchronized php-1.26wmf18/includes/objectcache/MultiWriteBagOStuff.php: 0acfe6a5bb: Fix argument handling in MultiWriteBagOStuff::get() (duration: 00m 12s)
[20:10:49] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:11:09] <ori>	 twentyafterfour: all yours
[20:13:49] <grrrit-wm>	 (03PS1) 1020after4: group0 wikis to 1.26wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230913 
[20:14:05] <grrrit-wm>	 (03CR) 1020after4: [C: 032] group0 wikis to 1.26wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230913 (owner: 1020after4)
[20:14:10] <grrrit-wm>	 (03Merged) 10jenkins-bot: group0 wikis to 1.26wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230913 (owner: 1020after4)
[20:18:54] <twentyafterfour>	 TypeError: not all arguments converted during string formatting
[20:19:07] <twentyafterfour>	 when running sync-wikiversions
[20:19:20] <twentyafterfour>	 TypeError: not all arguments converted during string formatting
[20:19:24] <twentyafterfour>	 er,
[20:19:33] <twentyafterfour>	  err_msg = 'ExtensionMessages not found in {}' % ext_msg
[20:19:47] <twentyafterfour>	 isn't the {} syntax only applicable to string.format()
[20:20:37] <ori>	 yes
[20:20:41] <ori>	 it should be %s
[20:21:03] <twentyafterfour>	 that's what I thought
[20:21:05] <twentyafterfour>	 ok fixing it
[20:21:44] <ori>	 it's your bug :P
[20:21:45] <ori>	 I50adefa19c0e3916d25703d78934b743a5f64da7
[20:22:06] * ori only responsible for 1 of 3 deployment bugs \o/
[20:22:18] <ori>	 greg-g: are you not proud?
[20:23:32] * twentyafterfour is getting better at python
[20:31:16] <grrrit-wm>	 (03PS1) 1020after4: use %s not {} for string templating [tools/scap] - 10https://gerrit.wikimedia.org/r/230914 
[20:32:15] <grrrit-wm>	 (03CR) 1020after4: [C: 032] use %s not {} for string templating [tools/scap] - 10https://gerrit.wikimedia.org/r/230914 (owner: 1020after4)
[20:34:35] <grrrit-wm>	 (03Merged) 10jenkins-bot: use %s not {} for string templating [tools/scap] - 10https://gerrit.wikimedia.org/r/230914 (owner: 1020after4)
[20:42:05] <Nemo_bis>	 queste diavolerie moderne
[20:59:31] <grrrit-wm>	 (03PS1) 1020after4: fix method naming missmatch [tools/scap] - 10https://gerrit.wikimedia.org/r/230917 
[21:00:32] <grrrit-wm>	 (03CR) 1020after4: [C: 032] fix method naming missmatch [tools/scap] - 10https://gerrit.wikimedia.org/r/230917 (owner: 1020after4)
[21:00:53] <grrrit-wm>	 (03Merged) 10jenkins-bot: fix method naming missmatch [tools/scap] - 10https://gerrit.wikimedia.org/r/230917 (owner: 1020after4)
[21:02:30] <twentyafterfour>	 !log deployed scap fixes for my dumb mistakes
[21:02:35] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:03:09] <logmsgbot>	 !log twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf18
[21:03:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:06:32] <greg-g>	 ori: well done, 2 out of 3 "not me!" ain't bad!
[21:06:44] <greg-g>	 (sorry, was afk for longer than I planned)
[21:07:10] <twentyafterfour>	 greg-g: 1.26wmf18 is {{done}}
[21:07:18] <twentyafterfour>	 (and I updated the wiki)
[21:07:21] <wikibugs>	 6operations, 3Discovery-Maps-Sprint: Postgres replication is not working - https://phabricator.wikimedia.org/T108545#1529194 (10Yurik) 5Open>3Resolved a:3Yurik Awesome, works, thanks!
[21:09:03] <greg-g>	 twentyafterfour: word
[21:24:14] <grrrit-wm>	 (03PS1) 10Hoo man: Add ssh key for new notebook [puppet] - 10https://gerrit.wikimedia.org/r/230920 
[21:26:31] <wikibugs>	 7Blocked-on-Operations, 10Beta-Cluster, 6Collaboration-Team-Backlog, 5Patch-For-Review: Decide what to do with ee_prototypewiki in beta - https://phabricator.wikimedia.org/T107397#1529284 (10Krenair)
[21:28:35] <grrrit-wm>	 (03CR) 10Lucie Kaffee: [C: 031] "This is really Marius, I am sitting next to him. Absolutely legit." [puppet] - 10https://gerrit.wikimedia.org/r/230920 (owner: 10Hoo man)
[21:32:04] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1529324 (10Legoktm) >>! In T108649#1526679, @BBlack wrote: > Probably the most important question (since I haven't really looked at UrlShortener) is...
[21:33:47] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1529366 (10Legoktm)
[21:42:09] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1529450 (10ori) >>! In T108649#1526679, @BBlack wrote: > 1) We could add another SAN onto our unified list for w.wiki > 2) We could simply get a new...
[21:42:23] <icinga-wm>	 PROBLEM - puppet last run on mw1018 is CRITICAL Puppet has 1 failures
[21:46:11] <grrrit-wm>	 (03PS1) 10BryanDavis: logstash: normalize "level" fields across log types [puppet] - 10https://gerrit.wikimedia.org/r/230922 
[21:51:26] <grrrit-wm>	 (03PS2) 10BryanDavis: logstash: normalize "level" fields across log types [puppet] - 10https://gerrit.wikimedia.org/r/230922 
[22:04:18] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 04-1] "Cherry-picked to beta cluster. Case normalization working but gelf level changes not applied." [puppet] - 10https://gerrit.wikimedia.org/r/230922 (owner: 10BryanDavis)
[22:05:46] <wikibugs>	 6operations, 7network: smokeping loss of ping for codfw rows - https://phabricator.wikimedia.org/T108715#1529617 (10faidon) 5Open>3Resolved a:3faidon Yup, I set up an overly aggressive security policy that blocked pings among other things. Thanks for noticing! Should be fixed now.
[22:06:33] <grrrit-wm>	 (03PS1) 10EBernhardson: Introduce new labs role for vagrant+lxc [puppet] - 10https://gerrit.wikimedia.org/r/230928 
[22:07:14] <icinga-wm>	 RECOVERY - puppet last run on mw1018 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures
[22:16:58] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1529683 (10faidon) >>! In T100519#1528677, @BBlack wrote: >>>! In T100519#1528619, @mmodell wrote: >> summary of my goals, roughly in...
[22:22:30] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1529705 (10BBlack) >>! In T108649#1529324, @Legoktm wrote: > SAN = https://en.wikipedia.org/wiki/SubjectAltName ? I don't understand the details and...
[22:24:32] <grrrit-wm>	 (03PS1) 10Ori.livneh: Introduce ConfigurationObserver class [debs/pybal] - 10https://gerrit.wikimedia.org/r/230931 
[22:25:52] <ori>	 paravoid: ^ (when you're done with the Other Thing.)
[22:26:34] <icinga-wm>	 PROBLEM - mysqld processes on db1042 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld
[22:26:36] <grrrit-wm>	 (03PS3) 10BryanDavis: logstash: normalize "level" fields across log types [puppet] - 10https://gerrit.wikimedia.org/r/230922 
[22:29:14] <icinga-wm>	 PROBLEM - DPKG on db1042 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[22:29:46] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1529708 (10BBlack) Basically, yeah.  I ran down a similar plan with @Chasemp and I think he's working on some patches for it.  Howeve...
[22:33:03] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 04-1] "Updated default mapping in beta cluster and prod. Will retry tomorrow with new mapping in place." [puppet] - 10https://gerrit.wikimedia.org/r/230922 (owner: 10BryanDavis)
[22:33:25] <icinga-wm>	 RECOVERY - DPKG on db1042 is OK: All packages OK
[22:36:47] <wikibugs>	 6operations, 10Traffic, 10Wikimedia-General-or-Unknown, 7HTTPS: Set up "w.wiki" domain for usage with UrlShortener - https://phabricator.wikimedia.org/T108649#1529720 (10BBlack) In the interest of full disclosure of options, there's a middle-ground option where we get a simple separate cert for w.wiki, and...
[22:49:13] <icinga-wm>	 RECOVERY - mysqld processes on db1042 is OK: PROCS OK: 1 process with command name mysqld
[22:57:44] <yurik>	 is there a way to see disk io load in ganglia?    https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=maps+Cluster+codfw&h=&tab=m&vn=&hide-hf=false&m=cpu_report&sh=2&z=small&hc=4&host_regex=&max_graphs=0&s=by+name
[22:59:16] <ori>	 yurik: ganglia, no, graphite, yes
[22:59:17] <ori>	 http://graphite.wikimedia.org/render/?width=586&height=308&_salt=1439333942.041&target=servers.maps-*.iostat.sda.io
[22:59:57] <ori>	 there are lots of metrics available, see servers/maps-test2001/iostat hierarchy in graphite.wikimedia.org
[23:00:04] <jouncebot>	 RoanKattouw ostriches Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150811T2300). Please do the needful.
[23:00:04] <jouncebot>	 Krenair matt_flaschen: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[23:00:36] <matt_flaschen>	 Present
[23:00:48] <Krenair>	 ok
[23:01:12] <yurik>	 ori, thx, is there a view there to combine all maps servers?
[23:01:16] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 032] Set wgNamespaceRobotPolicies on itwiki's NS_USER to noindex,follow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230804 (https://phabricator.wikimedia.org/T107992) (owner: 10Alex Monk)
[23:01:32] <ori>	 neither here nor there, but the FWIW the tidy extension we're currently using uses HNI and is pretty clean and small https://github.com/wikimedia/mediawiki-php-tidy/blob/hni/ext_tidy.cpp
[23:01:49] <ori>	 mischan
[23:01:51] <grrrit-wm>	 (03Merged) 10jenkins-bot: Set wgNamespaceRobotPolicies on itwiki's NS_USER to noindex,follow [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230804 (https://phabricator.wikimedia.org/T107992) (owner: 10Alex Monk)
[23:02:02] <James_F>	 "Adsum"
[23:02:06] <grrrit-wm>	 (03CR) 10BryanDavis: Introduce new labs role for vagrant+lxc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/230928 (owner: 10EBernhardson)
[23:02:33] <Krenair>	 Coren, gwicke, twentyafterfour: Anyone deploying?
[23:02:56] <ori>	 yurik: if you find a metric that is useful, click on graph data, then edit, and replace e.g. "servers.maps-test2001.iostat.sda.io" with "servers.maps-test*.iostat.sda.io"
[23:03:04] <ori>	 graphite supports wildcards that way
[23:03:34] <yurik>	 ty
[23:04:04] <ori>	 yurik: once you identify the metrics you find useful, it's easy to create a persistent view of these metrics in the form of a dashboard in graphite
[23:04:11] <ori>	 *a dashboard in grafana
[23:05:15] <Krenair>	 no?
[23:05:16] <Krenair>	 ok
[23:05:37] <logmsgbot>	 !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/230804/ (duration: 00m 12s)
[23:05:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:05:56] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 032] Allow ptwiki bureaucrats to remove sysop+bureaucrat rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230805 (https://phabricator.wikimedia.org/T107661) (owner: 10Alex Monk)
[23:06:21] <grrrit-wm>	 (03Merged) 10jenkins-bot: Allow ptwiki bureaucrats to remove sysop+bureaucrat rights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230805 (https://phabricator.wikimedia.org/T107661) (owner: 10Alex Monk)
[23:08:25] <logmsgbot>	 !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/230805/ (duration: 00m 12s)
[23:08:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:09:58] <Krenair>	 matt_flaschen, want to do your one?
[23:10:01] <Krenair>	 or do you want me to?
[23:10:09] <matt_flaschen>	 Krenair, sure, I can.
[23:10:34] <Krenair>	 You know this code far better than I do :)
[23:11:09] <matt_flaschen>	 Krenair, it's just a submodule bump.  I will start the script tonight, but not immediately.
[23:12:28] <Krenair>	 Would someone in ops be able to do https://gerrit.wikimedia.org/r/#/c/230854/ ?
[23:12:32] <Krenair>	 YuviPanda, ^
[23:19:46] <logmsgbot>	 !log mattflaschen@tin Synchronized php-1.26wmf18/extensions/Flow/: Sync Flow 1.26wmf18 for memory leaks (duration: 00m 14s)
[23:19:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:22:34] <icinga-wm>	 PROBLEM - OCG health on ocg1001 is CRITICAL ocg_job_status 555290 msg: ocg_render_job_queue 3120 msg (=3000 critical)
[23:22:34] <icinga-wm>	 PROBLEM - OCG health on ocg1003 is CRITICAL ocg_job_status 555301 msg: ocg_render_job_queue 3132 msg (=3000 critical)
[23:23:44] <icinga-wm>	 PROBLEM - OCG health on ocg1002 is CRITICAL ocg_job_status 557785 msg: ocg_render_job_queue 4758 msg (=3000 critical)
[23:24:41] <greg-g>	 we ok? ^
[23:25:16] <Krenair>	 ocg issues = PDF downloads might not work
[23:25:30] <Krenair>	 not the end of the world, unless you're an info-en volunteer
[23:25:45] <Krenair>	 in which case it might be the end of your inbox
[23:26:38] <greg-g>	 well, we have CRITICAL warnings, we either care and respond or we don't and we don't warn
[23:26:49] * greg-g is hardline sometimes
[23:27:40] <Krenair>	 https://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=rendering&return_to=Ch%C3%A2teau+de+Louveciennes&collection_id=1e53e473afb8b2337e06e50f39ea1603ae0b032b&writer=rdf2latex is stuck for me, so it's probably broken
[23:28:01] <Krenair>	 gwicke, know anything about that? ^
[23:28:07] <James_F>	 cscott's not around this week.
[23:28:41] <Krenair>	 or  subbu|gardening
[23:28:46] <Krenair>	 arlolra is not in this channel
[23:29:06] <Krenair>	 dammit
[23:29:12] <Krenair>	 and just as I switched to -parsoid to ask he quit
[23:29:30] <matt_flaschen>	 Krenair, SWAT is done, right?
[23:29:41] <Krenair>	 I guess so
[23:29:51] <Krenair>	 I was hoping to get rid of ee-prototype
[23:30:03] <Krenair>	 But that can't really happen properly without ops
[23:30:29] <matt_flaschen>	 ping apergos
[23:31:01] <ori>	 http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=PDF+servers+eqiad&m=cpu_report&s=descending&mc=2&g=cpu_report
[23:31:01] <gwicke>	 Krenair: IIRC there was some work going oin to decommision some of the OCG servers
[23:31:11] <Krenair>	 because beta sites stuff is all in operations/puppet :(
[23:31:30] <ori>	 since there doesn't seem to be anyone around with OCG-specific expertise, here's what I suggest:
[23:31:50] <ori>	 - high CPU probably means they're all stuck on some job that is causing pathological performance
[23:32:03] <ori>	 - clearing the queue sucks but better than having the service be totally down
[23:32:10] <ori>	 - restarting the service plausibly clears the queue
[23:32:13] <icinga-wm>	 PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds
[23:32:16] <ori>	 .:. we should restart the service
[23:32:21] <gwicke>	 no, the queue is in redis
[23:32:27] <gwicke>	 and hooked up in intricate ways
[23:32:35] <ori>	 hrm
[23:32:42] <greg-g>	 (+1 to ".:.")
[23:32:44] <gwicke>	 clearing the redis instances used completely might help
[23:32:56] <Krenair>	 did someone just press something on an ocg box?
[23:33:06] <gwicke>	 looking at the logs could help too
[23:33:19] <gwicke>	 I'd propose doing that fisrt
[23:33:44] <ori>	 -rw-r--r-- 1 syslog adm            0 Aug 10 06:25 ocg.log
[23:33:44] <ori>	 -rw-r--r-- 1 syslog adm    109444181 Aug 10 06:25 ocg.log-20150810.gz
[23:33:53] <gwicke>	 and, indeed, there are a lot of logs
[23:33:54] <ori>	 OCG doesn't like SIGHUP? :)
[23:34:18] <Krenair>	 my example url above just started working
[23:34:48] <gwicke>	 there are "Bundle completed successfully!" messages in the logs
[23:35:06] <ori>	 yeah
[23:35:10] <gwicke>	 ('ocg' in logstash)
[23:35:16] <ori>	 there is quite a lot of backlog, but it is making progress
[23:35:16] <Krenair>	 sometimes I get "Progress: 0.00% Status: Waiting for job runner to pick up render job"
[23:35:17] <ori>	 http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=PDF+servers+eqiad&h=&tab=m&vn=&hide-hf=false&m=ocg_job_queue&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=descending
[23:35:22] <Krenair>	 a couple of times it has just worked
[23:35:24] <ori>	 so i suggest we leave it
[23:35:26] <ori>	 and let it recover
[23:35:31] <ori>	 http://ganglia.wikimedia.org/latest/stacked.php?m=ocg_job_queue&c=PDF%20servers%20eqiad&r=hour&st=1439336084&host_regex=
[23:37:19] <gwicke>	 I still regret that I didn't manage to convince Matt to go for a simple stateless design
[23:39:01] <gwicke>	 https://wikitech.wikimedia.org/wiki/OCG#Pruning_the_queue
[23:39:44] <gwicke>	 worth trying, imho ^^
[23:40:58] <ori>	 it has nearly recovered
[23:41:37] <gwicke>	 yeah, dropping quickly now
[23:42:14] <icinga-wm>	 RECOVERY - OCG health on ocg1002 is OK ocg_job_status 565794 msg: ocg_render_job_queue 96 msg
[23:42:56] <grrrit-wm>	 (03PS1) 10Dzahn: admin: mailman-admins on fermium, not just users [puppet] - 10https://gerrit.wikimedia.org/r/230946 (https://phabricator.wikimedia.org/T108349) 
[23:43:04] <icinga-wm>	 RECOVERY - OCG health on ocg1001 is OK ocg_job_status 565972 msg: ocg_render_job_queue 0 msg
[23:43:04] <icinga-wm>	 RECOVERY - OCG health on ocg1003 is OK ocg_job_status 565972 msg: ocg_render_job_queue 0 msg
[23:44:22] <ori>	 greg-g: ^
[23:45:20] <greg-g>	 ori: yay
[23:45:35] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] admin: mailman-admins on fermium, not just users [puppet] - 10https://gerrit.wikimedia.org/r/230946 (https://phabricator.wikimedia.org/T108349) (owner: 10Dzahn)
[23:47:23] <wikibugs>	 10Ops-Access-Reviews, 5Patch-For-Review: John Lewis sudo as 'list' on mailman staging VM - https://phabricator.wikimedia.org/T108349#1529960 (10Dzahn) [fermium:/etc/sudoers.d] $ id johnflewis uid=2744(johnflewis) gid=500(wikidev) groups=500(wikidev),756(mailman-users),757(mailman-admins)  [fermium:/etc/sudoers...
[23:47:57] <wikibugs>	 10Ops-Access-Reviews, 5Patch-For-Review: John Lewis sudo as 'list' on mailman staging VM - https://phabricator.wikimedia.org/T108349#1529961 (10Dzahn) 5Open>3Resolved
[23:48:32] <wikibugs>	 10Ops-Access-Reviews: John Lewis sudo as 'list' on mailman staging VM - https://phabricator.wikimedia.org/T108349#1518752 (10Dzahn)
[23:48:45] <wikibugs>	 10Ops-Access-Requests, 10Ops-Access-Reviews, 6operations: John Lewis sudo as 'list' on mailman staging VM - https://phabricator.wikimedia.org/T108349#1529968 (10Dzahn)
[23:52:13] <grrrit-wm>	 (03PS3) 10Dzahn: OTRS: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230709 
[23:53:22] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] OTRS: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230709 (owner: 10Dzahn)
[23:55:44] <grrrit-wm>	 (03CR) 10Dzahn: "[iodine:/etc/apache2/sites-available] $ sudo apache2ctl configtest" [puppet] - 10https://gerrit.wikimedia.org/r/230709 (owner: 10Dzahn)
[23:59:03] <grrrit-wm>	 (03PS3) 10Dzahn: dbtree: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230693