[00:05:15] <icinga-wm>	 RECOVERY - puppet last run on radium is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[00:12:05] <icinga-wm>	 RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[00:14:05] <icinga-wm>	 PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:17:25] <icinga-wm>	 RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[00:42:05] <icinga-wm>	 RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures
[01:30:15] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 83022.462849 Seconds
[01:30:15] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2004 is CRITICAL: CRITICAL - Rep Delay is: 83336.743009 Seconds
[01:30:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 83032.225885 Seconds
[01:32:05] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 83134.498687 Seconds
[01:32:15] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 83457.481693 Seconds
[01:32:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: CRITICAL - Rep Delay is: 83463.405082 Seconds
[01:35:05] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[01:39:05] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 83554.484763 Seconds
[01:44:05] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[01:47:05] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 84034.577108 Seconds
[01:50:25] <icinga-wm>	 PROBLEM - puppet last run on dbmonitor2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:53:16] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2004 is OK: OK - Rep Delay is: 49.749438 Seconds
[01:53:16] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 49.769887 Seconds
[01:53:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 4.448113 Seconds
[01:53:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2002 is OK: OK - Rep Delay is: 55.061666 Seconds
[01:54:05] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 46.48539 Seconds
[01:54:15] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 54.699507 Seconds
[02:11:05] <icinga-wm>	 PROBLEM - Disk space on labtestcontrol2001 is CRITICAL: DISK CRITICAL - free space: / 347 MB (3% inode=69%)
[02:18:25] <icinga-wm>	 RECOVERY - puppet last run on dbmonitor2001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[02:24:18] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.18) (duration: 09m 39s)
[02:24:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:50:16] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2004 is CRITICAL: CRITICAL - Rep Delay is: 2134.795114 Seconds
[02:50:16] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 2134.800085 Seconds
[02:51:16] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 0.0 Seconds
[02:51:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 2083.863499 Seconds
[02:52:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[02:54:15] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 2374.8852 Seconds
[02:56:35] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 0.0 Seconds
[02:57:05] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 2426.033831 Seconds
[02:57:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2004 is OK: OK - Rep Delay is: 0.0 Seconds
[02:58:26] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: CRITICAL - Rep Delay is: 2619.697084 Seconds
[02:59:05] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[02:59:35] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 2691.360625 Seconds
[03:00:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 2623.749258 Seconds
[03:00:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2002 is OK: OK - Rep Delay is: 0.0 Seconds
[03:00:35] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 0.0 Seconds
[03:01:15] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 2674.096896 Seconds
[03:01:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[03:02:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2004 is CRITICAL: CRITICAL - Rep Delay is: 2859.822896 Seconds
[03:03:15] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[03:03:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: CRITICAL - Rep Delay is: 2919.656697 Seconds
[03:03:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2004 is OK: OK - Rep Delay is: 0.0 Seconds
[03:03:35] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 2931.275564 Seconds
[03:04:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 2863.770715 Seconds
[03:04:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2002 is OK: OK - Rep Delay is: 0.0 Seconds
[03:05:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[03:05:35] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 0.0 Seconds
[03:10:35] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 3351.445726 Seconds
[03:11:25] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2004 is CRITICAL: CRITICAL - Rep Delay is: 3399.89602 Seconds
[03:11:35] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 0.0 Seconds
[03:12:25] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2004 is OK: OK - Rep Delay is: 0.0 Seconds
[03:58:21] <wikibugs>	 (03CR) 10Yuvipanda: [C: 032] "This was built and deployed" [debs/kubernetes] - 10https://gerrit.wikimedia.org/r/345909 (owner: 10Yuvipanda)
[04:03:15] <icinga-wm>	 PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:12:55] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=4963.20 Read Requests/Sec=5542.10 Write Requests/Sec=19.80 KBytes Read/Sec=22240.80 KBytes_Written/Sec=192.40
[04:13:25] <icinga-wm>	 PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:17:55] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=0.50 Read Requests/Sec=3.90 Write Requests/Sec=53.70 KBytes Read/Sec=18.40 KBytes_Written/Sec=348.00
[04:29:15] <icinga-wm>	 PROBLEM - puppet last run on elastic1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:30:15] <icinga-wm>	 RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures
[04:38:45] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[04:41:25] <icinga-wm>	 RECOVERY - puppet last run on mc1007 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[04:43:45] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 18 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[04:57:15] <icinga-wm>	 RECOVERY - puppet last run on elastic1050 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[05:27:15] <icinga-wm>	 PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:49:15] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Fix typo in discovery name [switchdc] - 10https://gerrit.wikimedia.org/r/345868 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[05:50:59] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: swift: use discovery url for thumb server [puppet] - 10https://gerrit.wikimedia.org/r/345804
[05:53:55] <_joe_>	 !log powercycling mw2256, unresponsive to ping, blank console
[05:54:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:55:15] <icinga-wm>	 RECOVERY - puppet last run on wtp1008 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[05:55:49] <wikibugs>	 (03PS5) 10Tim Starling: Deploy ParserMigration extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/344276 (https://phabricator.wikimedia.org/T141586)
[05:55:55] <icinga-wm>	 RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.08 ms
[05:58:05] <icinga-wm>	 PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:59:50] <marostegui>	 !log Resume pt-table-checksum on wikidata - T161294
[05:59:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:59:56] <stashbot>	 T161294: run pt-tablechecksum on s5 - https://phabricator.wikimedia.org/T161294
[06:12:23] <marostegui>	 !log Remove partitions from metawiki.pagelinks (s7) on codfw master (db2029) this will generate lag on codfw - T153300
[06:12:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:12:30] <stashbot>	 T153300: Remove partitions from metawiki.pagelinks in s7 - https://phabricator.wikimedia.org/T153300
[06:12:33] <wikibugs>	 06Operations: etcd cluster in codfw has raft consensus issues - https://phabricator.wikimedia.org/T162013#3149754 (10Joe)
[06:12:46] <wikibugs_>	 06Operations, 15User-Joe: etcd cluster in codfw has raft consensus issues - https://phabricator.wikimedia.org/T162013#3149766 (10Joe) p:05Triage>03Low
[06:18:25] <icinga-wm>	 RECOVERY - etcdmirror-conftool-eqiad-wmnet service on conf2002 is OK: OK - etcdmirror-conftool-eqiad-wmnet is active
[06:18:33] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-codfw,db-eqiad.php: Remove db1057 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345856 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui)
[06:19:46] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw,db-eqiad.php: Remove db1057 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345856 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui)
[06:19:56] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw,db-eqiad.php: Remove db1057 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345856 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui)
[06:20:24] <wikibugs_>	 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 13Patch-For-Review, and 6 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#3149770 (10Joe) Status as of now:  - DNS based discovery is live and functioning for most things,...
[06:21:25] <icinga-wm>	 PROBLEM - etcdmirror-conftool-eqiad-wmnet service on conf2002 is CRITICAL: CRITICAL - Expecting active but unit etcdmirror-conftool-eqiad-wmnet is failed
[06:21:32] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db1057 entry - T160435 (duration: 00m 54s)
[06:21:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:21:39] <stashbot>	 T160435: db1057 does not react to powercycle/powerdown/powerup commands - https://phabricator.wikimedia.org/T160435
[06:23:25] <wikibugs>	 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#3149779 (10Joe)
[06:25:01] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db1057 entry - T160435 (duration: 00m 44s)
[06:25:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:25:45] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[06:27:05] <icinga-wm>	 RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures
[06:30:45] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 18 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[06:33:57] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346102
[06:36:15] <icinga-wm>	 RECOVERY - Check systemd state on conf2002 is OK: OK - running: The system is fully operational
[06:36:25] <icinga-wm>	 RECOVERY - etcdmirror-conftool-eqiad-wmnet service on conf2002 is OK: OK - etcdmirror-conftool-eqiad-wmnet is active
[06:36:56] <icinga-wm>	 RECOVERY - Etcd replication lag on conf2002 is OK: HTTP OK: HTTP/1.1 200 OK - 149 bytes in 0.074 second response time
[06:38:22] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346102 (owner: 10Marostegui)
[06:39:01] <wikibugs>	 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 13Patch-For-Review, and 6 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#3149782 (10Joe)
[06:39:04] <wikibugs_>	 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 13Patch-For-Review, and 6 others: DNS: dynamically generate entries for service discovery - https://phabricator.wikimedia.org/T156100#3149781 (10Joe) 05Open>03Resolved
[06:39:05] <icinga-wm>	 RECOVERY - Disk space on labtestcontrol2001 is OK: DISK OK
[06:39:35] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346102 (owner: 10Marostegui)
[06:39:44] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346102 (owner: 10Marostegui)
[06:40:54] <_joe_>	 !log manually restarted replication for etcd
[06:41:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:21] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1070 to compress it - T153743 (duration: 00m 44s)
[06:41:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:27] <stashbot>	 T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743
[06:42:55] <wikibugs>	 06Operations, 07RfC, 06Services (watching), 15User-Joe, and 2 others: [RFC] Define the on-disk and live structure of etcd pool data - https://phabricator.wikimedia.org/T100793#3149786 (10Joe) 05Open>03declined
[06:43:34] <wikibugs_>	 06Operations, 07RfC, 06Services (watching), 15User-Joe, and 2 others: [RFC] Define the on-disk and live structure of etcd pool data - https://phabricator.wikimedia.org/T100793#1320581 (10Joe) This has been practically superseded by so many specific tickets it doesn't really make much sense anymore.
[06:51:35] <icinga-wm>	 RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[06:51:54] <marostegui>	 !log Deploy InnoDB compression on dewiki - db1070 - T150438
[06:52:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:52:01] <stashbot>	 T150438: Meta ticket: Deploy InnoDB compression where possible - https://phabricator.wikimedia.org/T150438
[07:00:55] <icinga-wm>	 PROBLEM - Check systemd state on copper is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[07:03:16] <moritzm>	 !log instaling gnutls security updates on trusty
[07:03:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:06:12] <_joe_>	 !log removing stale files on copper for docker, all local images will be wiped away
[07:06:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:24] <wikibugs>	 (03PS2) 10Muehlenhoff: Change email address for Yuvi [puppet] - 10https://gerrit.wikimedia.org/r/344133
[07:08:53] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Change email address for Yuvi [puppet] - 10https://gerrit.wikimedia.org/r/344133 (owner: 10Muehlenhoff)
[07:09:35] <icinga-wm>	 PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 25 seconds ago with 1 failures. Failed resources (up to 3 shown): Service[docker]
[07:09:43] <wikibugs>	 (03PS2) 10Muehlenhoff: Install jessie systems with Linux 4.9 by default [puppet] - 10https://gerrit.wikimedia.org/r/345314 (https://phabricator.wikimedia.org/T154934)
[07:14:43] <moritzm>	 !log switched default kernel for jessie installations to Linux 4.9
[07:14:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:15:55] <icinga-wm>	 RECOVERY - Check systemd state on copper is OK: OK - running: The system is fully operational
[07:25:12] <_joe_>	 !log rebooting copper to clean up at least partially the docker mess
[07:25:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:51] <marostegui>	 !log Deploy alter table dbstore2001 (s7) on revision table to unify PK and indexes - T160390
[07:25:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:57] <stashbot>	 T160390: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390
[07:27:35] <icinga-wm>	 RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[07:32:15] <icinga-wm>	 PROBLEM - Apache HTTP on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:32:15] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:32:16] <icinga-wm>	 PROBLEM - HHVM rendering on mw1261 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:33:25] <icinga-wm>	 PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:35:17] <wikibugs>	 (03CR) 10Marostegui: [C: 031] "I would test this manually first for some days with manual tables, before letting it run by itself" [puppet] - 10https://gerrit.wikimedia.org/r/345646 (https://phabricator.wikimedia.org/T124307) (owner: 10Ottomata)
[07:39:04] <logmsgbot>	 !log elukey@puppetmaster1001 conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
[07:39:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:39:16] <elukey>	 moritzm: --^
[07:39:51] <moritzm>	 elukey: please don't restart yet
[07:40:05] <elukey>	 sure sure, hhvm-dump-debug in /tmp/hhvm.60283.bt.
[07:40:07] <moritzm>	 I repooled it to let it crash to collect more information
[07:40:09] <moritzm>	 thanks :-)
[07:40:14] <elukey>	 ahhh sorry!
[07:40:23] <moritzm>	 did you restart?
[07:40:27] <elukey>	 nono
[07:40:34] <moritzm>	 ok, great
[07:40:48] <elukey>	 I just wanted to remove traffic since I didn't see you online
[07:41:04] <_joe_>	 elukey: traffic is removed from pybal already
[07:41:54] <elukey>	 _joe_ sure but there is the case that the host might show intermittent failures, just wanted to be sure :)
[07:44:13] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: certspotter: Silence the cronspam [puppet] - 10https://gerrit.wikimedia.org/r/346103
[07:53:32] <wikibugs>	 (03Abandoned) 10Giuseppe Lavagetto: salt: add conftool module [puppet] - 10https://gerrit.wikimedia.org/r/295202 (owner: 10Giuseppe Lavagetto)
[08:02:25] <icinga-wm>	 RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[08:15:04] <wikibugs_>	 (03PS1) 10Gehel: maps - collect OSM sync lag to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/346106
[08:23:05] <icinga-wm>	 PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:25:49] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346108 (https://phabricator.wikimedia.org/T160390)
[08:29:38] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346108 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[08:30:16] <wikibugs_>	 (03PS2) 10Gehel: maps - collect OSM sync lag to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/346106 (https://phabricator.wikimedia.org/T160011)
[08:30:40] <wikibugs_>	 (03PS1) 10Volans: Puppet: do not deactivate hosts in PuppetDB automatically [puppet] - 10https://gerrit.wikimedia.org/r/346110 (https://phabricator.wikimedia.org/T159163)
[08:34:15] <icinga-wm>	 RECOVERY - Apache HTTP on mw1261 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.672 second response time
[08:34:15] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1261 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.673 second response time
[08:34:15] <icinga-wm>	 RECOVERY - HHVM rendering on mw1261 is OK: HTTP OK: HTTP/1.1 200 OK - 77987 bytes in 0.925 second response time
[08:34:36] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346108 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[08:34:48] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346108 (https://phabricator.wikimedia.org/T160390) (owner: 10Marostegui)
[08:38:25] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1086 - T160390 (duration: 00m 44s)
[08:38:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:31] <stashbot>	 T160390: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390
[08:39:16] <wikibugs_>	 (03PS3) 10Alexandros Kosiaris: maps - collect OSM sync lag to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/346106 (https://phabricator.wikimedia.org/T160011) (owner: 10Gehel)
[08:39:25] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] maps - collect OSM sync lag to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/346106 (https://phabricator.wikimedia.org/T160011) (owner: 10Gehel)
[08:39:28] <wikibugs_>	 (03CR) 10Volans: "Puppet compiler results:" [puppet] - 10https://gerrit.wikimedia.org/r/346110 (https://phabricator.wikimedia.org/T159163) (owner: 10Volans)
[08:40:37] <wikibugs>	 06Operations, 07Puppet, 13Patch-For-Review: PuppetDB is auto-deactivating hosts - https://phabricator.wikimedia.org/T159163#3150000 (10Volans) @Joe @akosiaris, actually looks like this is a NOOP on the puppetmasters, but a change on just the puppetdb hosts:  ``` $ sudo cumin --dry-run 'R:class = puppetmaster...
[08:41:14] <wikibugs>	 (03CR) 10Gehel: [C: 032] maps - collect OSM sync lag to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/346106 (https://phabricator.wikimedia.org/T160011) (owner: 10Gehel)
[08:42:26] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 031] "LGTM but keep in mind this will cause IRC alert spam" [puppet] - 10https://gerrit.wikimedia.org/r/346110 (https://phabricator.wikimedia.org/T159163) (owner: 10Volans)
[08:42:48] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: certspotter: Silence the cronspam [puppet] - 10https://gerrit.wikimedia.org/r/346103
[08:42:54] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] certspotter: Silence the cronspam [puppet] - 10https://gerrit.wikimedia.org/r/346103 (owner: 10Alexandros Kosiaris)
[08:43:35] <wikibugs>	 06Operations, 07Puppet, 13Patch-For-Review: PuppetDB is auto-deactivating hosts - https://phabricator.wikimedia.org/T159163#3150002 (10Volans) Ops, I read the previous message as it required a restart of puppetmasters, not puppetdb, sorry for the misunderstanding.
[08:45:47] <ema>	 akosiaris: there's T159137 open FYI
[08:45:48] <stashbot>	 T159137: certspotter: Error retrieving STH from log - https://phabricator.wikimedia.org/T159137
[08:46:26] <marostegui>	 !log Deploy alter table db1086 (s7) on revision table to unify PK and indexes - T160390
[08:46:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:46:34] <stashbot>	 T160390: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390
[08:49:08] <wikibugs_>	 (03PS1) 10Elukey: Fix Redis Hiera configuration for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/346112
[08:49:36] <elukey>	 hashar: --^
[08:49:56] <elukey>	 do you prefer that I only fix the inconsistecy or that https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep will be copied over?
[08:51:05] <icinga-wm>	 RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[08:53:01] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] "this was a brainfart on my side the other day." [puppet] - 10https://gerrit.wikimedia.org/r/346112 (owner: 10Elukey)
[08:53:24] <ema>	 akosiaris: mmh, it looks like https://gerrit.wikimedia.org/r/346103 would also silence valid certspotter cron emails?
[08:55:47] <wikibugs>	 (03CR) 10Elukey: [C: 032] Fix Redis Hiera configuration for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/346112 (owner: 10Elukey)
[08:57:25] <akosiaris>	 ema: hmm
[08:57:40] <akosiaris>	 maybe I should revert indeed
[08:57:47] <_joe_>	 yeah that was my doubt as well
[08:57:54] <_joe_>	 I wanted to suggest to use a log file
[08:57:56] <akosiaris>	 looks like the point of that cron IS to e-mail out
[08:58:07] <ema>	 akosiaris: it is :)
[08:58:10] <akosiaris>	 yeah reverting, need to rethink
[08:58:17] <akosiaris>	 thanks for pointing out
[08:58:46] <ema>	 sure
[08:59:47] <wikibugs_>	 06Operations, 06Analytics-Kanban, 06WMDE-Analytics-Engineering, 13Patch-For-Review, 15User-Addshore: /a/mw-log/archive/api on stat1002 no longer being populated - https://phabricator.wikimedia.org/T160888#3113734 (10Addshore) >>! In T160888#3140077, @elukey wrote: > @Addshore: I am going to close this ta...
[09:01:15] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Revert "certspotter: Silence the cronspam" [puppet] - 10https://gerrit.wikimedia.org/r/346116
[09:03:36] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] Revert "certspotter: Silence the cronspam" [puppet] - 10https://gerrit.wikimedia.org/r/346116 (owner: 10Alexandros Kosiaris)
[09:03:41] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: Revert "certspotter: Silence the cronspam" [puppet] - 10https://gerrit.wikimedia.org/r/346116
[09:03:43] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "certspotter: Silence the cronspam" [puppet] - 10https://gerrit.wikimedia.org/r/346116 (owner: 10Alexandros Kosiaris)
[09:03:47] <wikibugs_>	 (03Abandoned) 10Giuseppe Lavagetto: swift: use discovery url for thumb server [puppet] - 10https://gerrit.wikimedia.org/r/345804 (owner: 10Giuseppe Lavagetto)
[09:07:05] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 875.72 seconds
[09:07:44] <marostegui>	 ^ checking
[09:09:04] <marostegui>	 create table select….Queried about 650140000
[09:09:08] <marostegui>	 from research
[09:22:26] <_joe_>	 marostegui: ahah
[09:22:35] <icinga-wm>	 PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:23:01] <volans>	 marostegui: my bet is on quarterly report ;)
[09:27:16] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: base::puppet: add disable-puppet script [puppet] - 10https://gerrit.wikimedia.org/r/346118
[09:28:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] base::puppet: add disable-puppet script [puppet] - 10https://gerrit.wikimedia.org/r/346118 (owner: 10Giuseppe Lavagetto)
[09:30:30] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: base::puppet: add disable-puppet script [puppet] - 10https://gerrit.wikimedia.org/r/346118
[09:36:05] <icinga-wm>	 PROBLEM - puppet last run on restbase-dev1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:45:32] <jynus>	 the problem is that CREATE ... SELECT or INSERT...SELECT should only be run on transactional tables
[09:46:10] <jynus>	 the size is not that important, that server is ok to do that
[09:50:35] <icinga-wm>	 RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[10:05:06] <icinga-wm>	 RECOVERY - puppet last run on restbase-dev1001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[10:09:05] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 200.00 seconds
[10:16:56] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] Swift-proxy: use discovery everywhere for rewrites [puppet] - 10https://gerrit.wikimedia.org/r/345860 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[10:20:54] <wikibugs>	 06Operations, 07Documentation: Improve SSH access information in onboarding documentation - https://phabricator.wikimedia.org/T160941#3150151 (10MoritzMuehlenhoff) p:05Triage>03Normal a:03MoritzMuehlenhoff I'll take care of that.
[10:27:44] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove access credentials for csteipp [puppet] - 10https://gerrit.wikimedia.org/r/346125
[10:28:05] <icinga-wm>	 PROBLEM - puppet last run on snapshot1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:29:06] <wikibugs>	 06Operations, 10Traffic: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#3150178 (10faidon) After a few back and forths and a lot of supporting documentation, we've passed the verification step of the process and we moved on to the next step as of today:  > At this...
[10:30:53] <wikibugs>	 (03PS2) 10Volans: Swift-proxy: use discovery everywhere for rewrites [puppet] - 10https://gerrit.wikimedia.org/r/345860 (https://phabricator.wikimedia.org/T160178)
[10:38:14] <volans>	 !log upgrading swift-proxy in eqiad to use discovery URLs
[10:38:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:31] <wikibugs_>	 (03CR) 10Volans: [C: 032] Swift-proxy: use discovery everywhere for rewrites [puppet] - 10https://gerrit.wikimedia.org/r/345860 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans)
[10:41:35] <wikibugs_>	 (03PS2) 10Faidon Liambotis: ssh: update comments to remove precise mentions [puppet] - 10https://gerrit.wikimedia.org/r/345834
[10:41:38] <wikibugs>	 (03PS2) 10Faidon Liambotis: puppet: remove fail() guard for precise [puppet] - 10https://gerrit.wikimedia.org/r/345835
[10:41:39] <wikibugs_>	 (03PS3) 10Faidon Liambotis: mediawiki: kill more precise references [puppet] - 10https://gerrit.wikimedia.org/r/345546
[10:41:42] <wikibugs>	 (03PS3) 10Faidon Liambotis: hhvm: kill a precise reference [puppet] - 10https://gerrit.wikimedia.org/r/345547
[10:41:44] <wikibugs_>	 (03PS3) 10Faidon Liambotis: apache: remove precise (and Apache 2.2) support [puppet] - 10https://gerrit.wikimedia.org/r/345548
[10:41:46] <wikibugs>	 (03PS1) 10Faidon Liambotis: Remove Apache <IfVersion < 2.4> across the tree [puppet] - 10https://gerrit.wikimedia.org/r/346128
[10:42:36] <wikibugs_>	 (03CR) 10Faidon Liambotis: [V: 032 C: 032] ssh: update comments to remove precise mentions [puppet] - 10https://gerrit.wikimedia.org/r/345834 (owner: 10Faidon Liambotis)
[10:42:54] <wikibugs_>	 (03CR) 10Faidon Liambotis: [C: 032] puppet: remove fail() guard for precise [puppet] - 10https://gerrit.wikimedia.org/r/345835 (owner: 10Faidon Liambotis)
[10:43:20] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1005.eqiad.wmnet
[10:43:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:43:25] <wikibugs>	 06Operations: videoscalers (mw1168, mw1169) - high load / overheating - https://phabricator.wikimedia.org/T161918#3147165 (10MoritzMuehlenhoff) This ticket has (at least) two independant tasks: (1) Fine-tuning the video scaler queue and (2) applying thermal paste to the video scalers (which turned out to be effe...
[10:43:52] <wikibugs>	 06Operations, 10ops-ulsfo: decommission backup4001 - https://phabricator.wikimedia.org/T161904#3150208 (10MoritzMuehlenhoff) p:05Triage>03Normal
[10:45:26] <wikibugs_>	 06Operations: Migrate all jessie hosts to Linux 4.9 - https://phabricator.wikimedia.org/T162029#3150214 (10MoritzMuehlenhoff)
[10:50:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 031] releases: remove the precise suite [puppet] - 10https://gerrit.wikimedia.org/r/345838 (owner: 10Faidon Liambotis)
[10:50:49] <wikibugs>	 (03PS1) 10Joal: Update analytics-cluster refinery cron regularity [puppet] - 10https://gerrit.wikimedia.org/r/346129
[10:50:55] <joal>	 elukey: --^
[10:52:12] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Update analytics-cluster refinery cron regularity [puppet] - 10https://gerrit.wikimedia.org/r/346129 (owner: 10Joal)
[10:53:38] <hashar>	 jouncebot: refresh
[10:53:44] <jouncebot>	 I refreshed my knowledge about deployments.
[10:53:46] <hashar>	 jouncebot: next
[10:53:46] <jouncebot>	 In 2 hour(s) and 6 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170403T1300)
[10:57:05] <icinga-wm>	 RECOVERY - puppet last run on snapshot1007 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[11:01:11] <wikibugs_>	 06Operations, 07HHVM: HHVM 3.18 deadlocks after 4-6 hours (stuck in in HPHP::Treadmill::getAgeOldestRequest() ) - https://phabricator.wikimedia.org/T161684#3150238 (10MoritzMuehlenhoff) p:05Triage>03High a:03MoritzMuehlenhoff
[11:01:44] <wikibugs>	 06Operations, 10hardware-requests: EQIAD: (4) hardware access request for ganeti - https://phabricator.wikimedia.org/T161702#3150240 (10MoritzMuehlenhoff) p:05Triage>03Normal
[11:01:46] <wikibugs_>	 06Operations, 10hardware-requests: COFW: (2) hardware access request for ganeti - https://phabricator.wikimedia.org/T161701#3150241 (10MoritzMuehlenhoff) p:05Triage>03Normal
[11:01:53] <wikibugs>	 06Operations, 10hardware-requests: CODFW: (4) hardware access request for kubernetes - https://phabricator.wikimedia.org/T161700#3150242 (10MoritzMuehlenhoff) p:05Triage>03Normal
[11:02:06] <icinga-wm>	 PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:04:15] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1005.eqiad.wmnet
[11:04:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:05:50] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1006.eqiad.wmnet
[11:05:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:06:31] <wikibugs_>	 (03PS2) 10Joal: Update analytics-cluster refinery cron [puppet] - 10https://gerrit.wikimedia.org/r/346129
[11:06:40] <joal>	 elukey: patched --^
[11:07:40] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Update analytics-cluster refinery cron [puppet] - 10https://gerrit.wikimedia.org/r/346129 (owner: 10Joal)
[11:08:56] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1006.eqiad.wmnet
[11:09:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:12] <wikibugs_>	 06Operations, 06Performance-Team, 10Traffic: What happened 2017-03-09 04:00 - 06:00 UTC - https://phabricator.wikimedia.org/T160109#3150245 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff I think the immediate question as to what happened 2017-03-09 04:00 - 06:00 UTC has been resolved, so...
[11:10:51] <elukey>	 joal: jenkins is upset by your alignment of => :)
[11:11:17] <joal>	 elukey: I actually didn't do that, I used existing patch
[11:12:31] <elukey>	 joal: nono I mean that the arrow in monthday => '1' is not aligned with the rest of the cron block
[11:12:39] <elukey>	 (the other arrows)
[11:12:45] <elukey>	 so puppet lint is not happy :)
[11:13:01] <joal>	 elukey: Ahhhh !
[11:13:04] <joal>	 elukey: patching
[11:13:24] * joal is ignorant in puppet - and in linting even more
[11:13:29] <joal>	 git st
[11:13:32] <joal>	 oop
[11:14:20] <wikibugs_>	 (03PS3) 10Joal: Update analytics-cluster refinery cron [puppet] - 10https://gerrit.wikimedia.org/r/346129
[11:16:17] <wikibugs_>	 (03CR) 10Elukey: [C: 032] Update analytics-cluster refinery cron [puppet] - 10https://gerrit.wikimedia.org/r/346129 (owner: 10Joal)
[11:18:03] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1007.eqiad.wmnet
[11:18:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:18:35] <joal>	 Thanks elukey 
[11:21:42] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1007.eqiad.wmnet
[11:21:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:18] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1008.eqiad.wmnet
[11:28:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:26] <logmsgbot>	 !log joal@tin Started deploy [analytics/refinery@cc73c40]: (no justification provided)
[11:28:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:46] <wikibugs>	 (03Abandoned) 10Addshore: Add wmde ldap group to grafana [puppet] - 10https://gerrit.wikimedia.org/r/333024 (https://phabricator.wikimedia.org/T161484) (owner: 10Addshore)
[11:31:05] <icinga-wm>	 RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:31:45] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1008.eqiad.wmnet
[11:31:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:49] <logmsgbot>	 !log joal@tin Finished deploy [analytics/refinery@cc73c40]: (no justification provided) (duration: 07m 23s)
[11:35:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:36:33] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove access credentials for csteipp [puppet] - 10https://gerrit.wikimedia.org/r/346125
[11:39:05] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Remove access credentials for csteipp [puppet] - 10https://gerrit.wikimedia.org/r/346125 (owner: 10Muehlenhoff)
[11:40:42] <wikibugs>	 06Operations: videoscalers (mw1168, mw1169) - high load / overheating - https://phabricator.wikimedia.org/T161918#3150330 (10elukey) Yes there is some work to do for 1), I'll take care of it in a separate code review. For this particular issue, namely the videoscalers alarming, I am not sure what fixed it, since...
[11:45:32] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1006.eqiad.wmnet
[11:45:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:47:35] <wikibugs>	 06Operations, 10Pybal, 10Traffic: Unhandled pybal ValueError: need more than 1 value to unpack - https://phabricator.wikimedia.org/T143078#3150356 (10ema) 05Open>03Resolved a:03ema Confirmed, upgrading twisted to 16.2.0 fixed this.
[11:49:48] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1006.eqiad.wmnet
[11:49:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:51:06] <wikibugs_>	 (03CR) 10Liuxinyu970226: [C: 04-1] "I'm sorry, but currently we should not handle khw here, see https://lists.wikimedia.org/pipermail/langcom/2017-April/001207.html" [puppet] - 10https://gerrit.wikimedia.org/r/343584 (https://phabricator.wikimedia.org/T160865) (owner: 10Dereckson)
[11:51:38] <wikibugs>	 (03CR) 10Liuxinyu970226: [C: 04-1] RESTBase: add kbp. and khw.wikipedia (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/343584 (https://phabricator.wikimedia.org/T160865) (owner: 10Dereckson)
[11:52:51] <wikibugs_>	 06Operations, 06Office-IT, 07LDAP: Remove disabled users from internal mailing lists - https://phabricator.wikimedia.org/T161004#3150359 (10MoritzMuehlenhoff) @mmodell : Disabled mail accounts should be a problem independant of disabled @wikimedia.org accounts, can you describe how Phabricator handles those?...
[11:53:26] <Dereckson>	 Hi
[11:54:52] <wikibugs>	 06Operations, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Solve missing 200px size of File:Status_iucn3.1_LC_cs.svg - https://phabricator.wikimedia.org/T162035#3150362 (10Urbanecm)
[11:55:13] <wikibugs_>	 06Operations, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Solve missing 200px size of File:Status_iucn3.1_LC_cs.svg - https://phabricator.wikimedia.org/T162035#3150376 (10Urbanecm) p:05Triage>03Unbreak! Breaking change => UBN!
[11:55:30] <Urbanecm>	 Hi all, may somebody have a look at T162035 ?
[11:55:30] <stashbot>	 T162035: Solve missing 200px size of File:Status_iucn3.1_LC_cs.svg - https://phabricator.wikimedia.org/T162035
[11:56:09] <wikibugs>	 (03Abandoned) 10Dereckson: RESTBase: add kbp. and khw.wikipedia [puppet] - 10https://gerrit.wikimedia.org/r/343584 (https://phabricator.wikimedia.org/T160865) (owner: 10Dereckson)
[11:57:03] <elukey>	 Urbanecm: sorry to ask but what should be the image displayed? 
[11:57:34] <elukey>	 I can see something but not sure if it is the right one or not
[11:57:46] <Urbanecm>	 elukey, I can't understand. You can have a look at https://cs.wikipedia.org/w/index.php?title=Wikipedie:Pod_l%C3%ADpou_(technika)&oldid=14873106#Dal.C5.A1.C3.AD_problematick.C3.BD_obr.C3.A1zek what it displays now. 
[11:58:17] <Urbanecm>	 elukey, ah, you want correct image. It should be https://upload.wikimedia.org/wikipedia/commons/thumb/5/57/Status_iucn3.1_LC_cs.svg/201px-Status_iucn3.1_LC_cs.svg.png (but of course 1 px smaller)
[11:58:27] <Urbanecm>	 I've linked the image in the task BTW
[11:59:18] <wikibugs_>	 06Operations, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Solve missing 200px size of File:Status_iucn3.1_LC_cs.svg - https://phabricator.wikimedia.org/T162035#3150400 (10Urbanecm)
[11:59:34] <elukey>	 Urbanecm: yes thanks, I can see your link correctly then on my browser (I am going through the esams cache though)
[11:59:56] <elukey>	 what error do you see? 
[11:59:58] <Urbanecm>	 elukey, I live in EU. Do I use another cluster? I don't know shortcuts. 
[12:00:10] <wikibugs>	 (03PS3) 10Hashar: Test for throttle rules: parameters logic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318301 (owner: 10Dereckson)
[12:00:22] <Urbanecm>	 ERR_CONTENT_DECODING_FAILED
[12:00:48] <Urbanecm>	 Last I saw this the server was sending compressed header even the content wasn't compressed (or vica-versa, I can't remember it exactly...)
[12:00:58] <Urbanecm>	 elukey, ^^^
[12:01:42] <Urbanecm>	 BTW when I try to download it using wget, it downloads correctly...
[12:02:10] <ema>	 Urbanecm: confirmed, I can also reproduce the problem
[12:02:30] <elukey>	 good :)
[12:02:54] <Urbanecm>	 elukey, BTW one other user reported it at a wiki page at cswiki (and I converted the report to Phabricator). 
[12:02:58] <ema>	 Content Encoding Error The page you are trying to view cannot be shown because it uses an invalid or unsupported form of compression.
[12:04:08] <elukey>	 let's update the task's description with more info :)
[12:06:09] <Urbanecm>	 elukey, should I update it? Or was it for somebody else?
[12:06:35] <wikibugs_>	 (03CR) 10Hashar: [C: 032] Test for throttle rules: parameters logic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318301 (owner: 10Dereckson)
[12:06:45] <wikibugs>	 (03CR) 10Hashar: [C: 032] "Thanks for the follow up!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318301 (owner: 10Dereckson)
[12:07:32] <wikibugs_>	 06Operations, 07Puppet, 06Discovery, 06Maps, 03Interactive-Sprint: Puppet fails with "Could not find init script for 'postgresql@9.4-main'" on maps / labs server - https://phabricator.wikimedia.org/T161893#3150410 (10MoritzMuehlenhoff) p:05Triage>03Normal
[12:07:34] <elukey>	 Urbanecm: if you could (including browsers that you tried etc..) it would be great
[12:07:46] <Urbanecm>	 elukey, okay, working on it
[12:07:58] <wikibugs>	 (03Merged) 10jenkins-bot: Test for throttle rules: parameters logic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318301 (owner: 10Dereckson)
[12:08:07] <wikibugs>	 (03CR) 10jenkins-bot: Test for throttle rules: parameters logic [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318301 (owner: 10Dereckson)
[12:08:43] <hashar>	 ^^^ I have rebased on tin.eqiad.wmnet .  That only a test filel
[12:13:57] <wikibugs>	 06Operations, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Solve missing 200px size of File:Status_iucn3.1_LC_cs.svg - https://phabricator.wikimedia.org/T162035#3150426 (10Urbanecm)
[12:14:05] <Urbanecm>	 ema, elukey, updated ^^^^
[12:14:14] <elukey>	 thanks :)
[12:14:16] <Urbanecm>	 yw
[12:14:46] <Reedy>	 Isn't that bug a dupe?
[12:15:12] <Reedy>	 https://phabricator.wikimedia.org/T161836
[12:16:26] <elukey>	 Reedy: I think those are two separate issues
[12:16:46] <wikibugs>	 06Operations, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Solve missing 200px size of File:Status_iucn3.1_LC_cs.svg - https://phabricator.wikimedia.org/T162035#3150431 (10Urbanecm)
[12:22:42] <wikibugs>	 06Operations, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Solve missing 200px size of File:Status_iucn3.1_LC_cs.svg - https://phabricator.wikimedia.org/T162035#3150438 (10Aklapper) p:05Unbreak!>03High The file is not missing, it just has a wrong type and cannot be rendered ([text/html]...
[12:23:16] <wikibugs_>	 06Operations, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150440 (10Aklapper)
[12:27:20] <Urbanecm>	 Reedy, I think so too. In the bug you linked you receive 404 but there an image with MIME type text/html (with no visible reason)
[12:28:51] <ema>	 !log banning 200px-Status_iucn3.1_LC_cs.svg.png from esams frontends T162035
[12:28:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:28:57] <stashbot>	 T162035: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035
[12:29:00] <wikibugs_>	 (03PS1) 10Bmansurov: enwiki: Temporarily disable Wikidata descriptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346136 (https://phabricator.wikimedia.org/T161805)
[12:29:16] <icinga-wm>	 PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:29:49] <elukey>	 Urbanecm: can  you re check if you still see the issue?
[12:31:05] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1028 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[12:31:21] <elukey>	 this is me
[12:31:30] <Urbanecm>	 elukey, yes, confirmed. 
[12:31:39] <elukey>	 thanks! 
[12:31:45] <Urbanecm>	 elukey, but when I refreshed, the problem has solved. 
[12:31:50] <Urbanecm>	 Maybe caching of errors?
[12:32:26] <elukey>	 ema just banned the item from the esam cache (the one that you are hitting)
[12:32:54] <Urbanecm>	 Thank you ema !
[12:33:00] <wikibugs_>	 06Operations, 10Traffic, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150458 (10ema)
[12:33:16] <wikibugs_>	 06Operations, 10Traffic, 10Wikimedia-Site-requests, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150362 (10Nemo_bis) Same as T162033?
[12:33:46] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150463 (10Nemo_bis)
[12:33:48] <wikibugs>	 06Operations, 10Traffic, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150362 (10stjn) https://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Flag_of_Cross_of_Burgund...
[12:34:05] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1028 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[12:34:07] <wikibugs>	 06Operations, 10Traffic, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150467 (10Urbanecm) It seems like it.
[12:37:22] <wikibugs_>	 06Operations, 10Traffic: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3150472 (10MoritzMuehlenhoff) Text looks good to me, but two points: - "(upgrade or use Firefox!)" is somethat confusing since people might think an updated IE would be av...
[12:37:44] <elukey>	 !log reimage analytics10[29,30,31] to Debian Jessie
[12:37:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:21] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150475 (10Urbanecm)
[12:51:57] <wikibugs_>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3150500 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analytics1029.eqiad.wmnet', 'analytics1030....
[12:54:53] <ema>	 Urbanecm: thank you!
[12:55:07] <Urbanecm>	 ema, you're welcome!
[12:56:46] <hashar>	 ema: volans can we proceed with the mediawiki SWAT?
[12:57:00] <hashar>	 or should we hold due to the PNG/thumb madness?
[12:57:35] <icinga-wm>	 RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[12:57:45] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[12:58:28] <zeljkof>	 hashar: are you doing the swat?
[12:58:42] <hashar>	 i can 
[12:58:53] <Urbanecm>	 I'm here :)
[12:59:22] <volans>	 hashar: not sure, still under investigation, I would probably hold a bit if not urgent but ask ema ;)
[12:59:34] <zeljkof>	 hashar: I can too, if the patches look good to you
[12:59:43] <hashar>	 lets review the patches
[13:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170403T1300).
[13:00:04] <jouncebot>	 Urbanecm and bmansurov: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[13:00:11] <bmansurov>	 here
[13:00:12] <hashar>	 pretty sure none of them are related to thumb nailing ema :}
[13:00:30] <wikibugs_>	 (03PS2) 10Hashar: enwiki: Temporarily disable Wikidata descriptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346136 (https://phabricator.wikimedia.org/T161805) (owner: 10Bmansurov)
[13:00:51] <ema>	 hashar: yeah go ahead
[13:00:54] <hashar>	 zeljkof: can you deploy https://gerrit.wikimedia.org/r/#/c/346136/ for bmansurov please ?
[13:00:58] <hashar>	 I am reviewing the other patches
[13:01:05] <zeljkof>	 hashar: sure
[13:01:13] <wikibugs>	 (03CR) 10Hashar: [C: 031] enwiki: Temporarily disable Wikidata descriptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346136 (https://phabricator.wikimedia.org/T161805) (owner: 10Bmansurov)
[13:01:25] <icinga-wm>	 PROBLEM - dhclient process on thumbor1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:02:15] <icinga-wm>	 RECOVERY - dhclient process on thumbor1001 is OK: PROCS OK: 0 processes with command name dhclient
[13:02:32] <wikibugs>	 (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346136 (https://phabricator.wikimedia.org/T161805) (owner: 10Bmansurov)
[13:02:45] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[13:03:34] <wikibugs_>	 (03Merged) 10jenkins-bot: enwiki: Temporarily disable Wikidata descriptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346136 (https://phabricator.wikimedia.org/T161805) (owner: 10Bmansurov)
[13:03:47] <wikibugs_>	 (03CR) 10jenkins-bot: enwiki: Temporarily disable Wikidata descriptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346136 (https://phabricator.wikimedia.org/T161805) (owner: 10Bmansurov)
[13:03:58] <wikibugs>	 (03PS2) 10Hashar: Add NS100 (Portal) to ladwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345647 (https://phabricator.wikimedia.org/T161843) (owner: 10Urbanecm)
[13:04:00] <wikibugs_>	 (03PS2) 10Hashar: Add rollback user group in fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345964 (https://phabricator.wikimedia.org/T161946) (owner: 10Urbanecm)
[13:04:02] <wikibugs>	 (03PS2) 10Hashar: Optimalize all not-optimalized logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346057 (https://phabricator.wikimedia.org/T161999) (owner: 10Urbanecm)
[13:04:04] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346144
[13:04:07] <wikibugs>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346144
[13:04:13] <hashar>	 all the rest looks fine and I rebased them
[13:05:08] <zeljkof>	 bmansurov: can 346136 be tested at mwdebug1002?
[13:05:22] <bmansurov>	 zeljkof, let me see
[13:05:32] <zeljkof>	 bmansurov: will be there in a minute
[13:05:54] <bmansurov>	 ok
[13:06:43] <zeljkof>	 bmansurov: the patch is at mwdebug1002, please test
[13:06:51] <bmansurov>	 zeljkof, testing
[13:07:59] <Dereckson>	 Hello
[13:08:16] <Dereckson>	 I've a cherry pick for wmf28 to add.
[13:08:25] <bmansurov>	 zeljkof, working! thanks for deploying.
[13:08:41] <zeljkof>	 bmansurov: ok, deploying to cluster then
[13:08:52] <bmansurov>	 ok
[13:09:35] <logmsgbot>	 !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:346136|enwiki: Temporarily disable Wikidata descriptions (T161805)]] (duration: 00m 45s)
[13:09:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:09:44] <stashbot>	 T161805: Turn tagline wikidata descriptions off in enwiki  - https://phabricator.wikimedia.org/T161805
[13:09:49] <hashar>	 Dereckson: can you add it to https://wikitech.wikimedia.org/wiki/Deployments#Monday.2C.C2.A0April.C2.A003 please ?:)
[13:09:58] <zeljkof>	 bmansurov: deployed, please check on production
[13:10:03] <hashar>	 Dereckson: i will CR+2 it
[13:10:41] <bmansurov>	 zeljkof, tested, working.
[13:10:44] <Dereckson>	 Urbanecm: please always use [config] for anything in operations/mediawiki-config, as the current use of this task is to indicate the place we want to go to deploy it actually
[13:10:47] <bmansurov>	 zeljkof, thanks again.
[13:11:01] <zeljkof>	  bmansurov: great, thanks for deploying with #releng ;)
[13:11:19] <Urbanecm>	 Dereckson, okay.
[13:11:21] <bmansurov>	 ;)
[13:11:26] <Dereckson>	 hashar: added, thanks
[13:11:26] <zeljkof>	 Urbanecm: around for swat?
[13:11:34] <Urbanecm>	 zeljkof, yeah
[13:11:57] <zeljkof>	 hashar: can I continue with Urbanecm's patches? did you review them?
[13:12:33] <Dereckson>	 Urbanecm: so for example config = /srv/
[13:12:44] * Dereckson pressed enter too soon
[13:12:49] <Dereckson>	 Urbanecm: so for example config = /srv/
[13:13:02] <Dereckson>	 copy paste issues day
[13:13:20] <wikibugs>	 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch: Use SSL certificates with discovery entry for elasticsearch - https://phabricator.wikimedia.org/T162037#3150511 (10Gehel)
[13:13:55] <Urbanecm>	 Dereckson, I see two same posts. 
[13:14:01] <wikibugs_>	 (03PS1) 10Gehel: elasticsearch - create LVS service for relforge [dns] - 10https://gerrit.wikimedia.org/r/346146 (https://phabricator.wikimedia.org/T162037)
[13:14:19] <Urbanecm>	 But okay, I'll use [config] for every patch in operations/mediawiki-config
[13:14:34] <Dereckson>	 Urbanecm: yes, so what I wanted to say is the tag allows to determine the working directory: for example config = /srv/mediawiki-staging, the root deploy directory, wmf18 = /srv/mediawiki-staging/php-1.29.0-wmf.18, wmf19 = /srv/mediawiki-staging/php-1.29.0-wmf.19 etc.                                                                                                          
[13:14:52] <Urbanecm>	 Now I understand. 
[13:14:55] <Urbanecm>	 Thank you
[13:15:23] <moritzm>	 !log upgrading restbase-dev* to Linux 4.9
[13:15:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:14] <wikibugs>	 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, 13Patch-For-Review: Use SSL certificates with discovery entry for elasticsearch - https://phabricator.wikimedia.org/T162037#3150531 (10Gehel) I think that the transfer_to_es job is using a specific node instead of the service to simplify fir...
[13:18:41] <wikibugs_>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345647 (https://phabricator.wikimedia.org/T161843) (owner: 10Urbanecm)
[13:18:44] <wikibugs>	 06Operations, 10Traffic, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150533 (10Urbanecm) https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Ambox_currentevent.s...
[13:18:46] <wikibugs_>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345964 (https://phabricator.wikimedia.org/T161946) (owner: 10Urbanecm)
[13:19:07] <hashar>	 lets do Urbanecm patches
[13:19:14] <wikibugs_>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346057 (https://phabricator.wikimedia.org/T161999) (owner: 10Urbanecm)
[13:19:25] <Urbanecm>	 zeljkof, Dereckson, hashar: May somebody ban https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Ambox_currentevent.svg/48px-Ambox_currentevent.svg.png from esams cache? T162035 This is template-icon template which is frequently used. 
[13:19:26] <stashbot>	 T162035: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035
[13:19:52] <hashar>	 Urbanecm: the thumbs have an issue right now
[13:19:57] <wikibugs_>	 (03Merged) 10jenkins-bot: Add NS100 (Portal) to ladwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345647 (https://phabricator.wikimedia.org/T161843) (owner: 10Urbanecm)
[13:20:06] <wikibugs_>	 (03Merged) 10jenkins-bot: Add rollback user group in fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345964 (https://phabricator.wikimedia.org/T161946) (owner: 10Urbanecm)
[13:20:09] <wikibugs>	 06Operations, 10Traffic, 10media-storage, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150536 (10ema) Note that the issue is pretty widespread, I'm seeing lots of requests affected by t...
[13:20:12] <wikibugs>	 (03CR) 10jenkins-bot: Add NS100 (Portal) to ladwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345647 (https://phabricator.wikimedia.org/T161843) (owner: 10Urbanecm)
[13:20:21] <zeljkof>	 hashar: should I continue with swat?
[13:20:29] <Urbanecm>	 hashar, when I can expect fixing? And by what is the problem caused?
[13:20:32] <wikibugs_>	 (03Merged) 10jenkins-bot: Optimalize all not-optimalized logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346057 (https://phabricator.wikimedia.org/T161999) (owner: 10Urbanecm)
[13:20:42] <hashar>	 Urbanecm: I have no idea I am not looking into it
[13:20:52] <hashar>	 zeljkof: I CR+2 all three patches,  pushing them to mwdebug1001
[13:20:54] <Urbanecm>	 hashar, ok.
[13:21:11] <zeljkof>	 hashar: ok, so you are taking over swat then?
[13:21:31] <hashar>	 not really
[13:21:51] <hashar>	 zeljkof: guess you can baby sit https://gerrit.wikimedia.org/r/#/c/346058/  for Dereckson :}
[13:22:08] <wikibugs_>	 (03CR) 10jenkins-bot: Add rollback user group in fawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345964 (https://phabricator.wikimedia.org/T161946) (owner: 10Urbanecm)
[13:22:53] <zeljkof>	 hashar: sure, should I deploy it now, or will you let me know when you are done'
[13:22:55] <zeljkof>	 ?
[13:23:02] <hashar>	 deploy it
[13:23:05] <hashar>	 that can be done in parallel
[13:23:15] <icinga-wm>	 PROBLEM - puppet last run on db1081 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:23:17] <zeljkof>	 yes, looks like completely unrelated file
[13:23:22] <hashar>	 Urbanecm: so your 3 patches are on mwdebug1001
[13:23:34] <hashar>	 Urbanecm: will do the namespace check for ladwiki
[13:23:35] <Urbanecm>	 hashar, okay, I'll test them
[13:23:42] <Urbanecm>	 hashar, ok
[13:23:54] <zeljkof>	 hashar:  ok, merging and deploying 
[13:24:09] <zeljkof>	 Dereckson: can 346058 be tested at mwdebug1002?
[13:25:49] <zeljkof>	 (once it is there)
[13:25:51] <Dereckson>	 yes I think
[13:26:03] <zeljkof>	 ok, will ping you in a few minutes
[13:26:11] <Dereckson>	 if not, that would mean a full scap is required like for other l10n changes
[13:26:24] <Urbanecm>	 hashar, working
[13:27:12] <hashar>	 !log terbium: scap pull for ladwiki namespace additions
[13:27:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:42] <hashar>	 Urbanecm: syncing.  Though I am holding the static/images/project-logos/*.png thing for now
[13:29:51] <Urbanecm>	 hashar, okay
[13:29:54] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Add NS100 (Portal) to ladwiki, Add rollback user group in fawikisource (duration: 00m 47s)
[13:29:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:04] <zeljkof>	 Dereckson: the patch is at mwdebug1002
[13:30:08] <zeljkof>	 please test
[13:33:29] <Dereckson>	 hi MatmaRex 
[13:33:30] <Dereckson>	 zeljkof: testing
[13:33:42] <wikibugs_>	 (03PS1) 10Gehel: relforge - add LVS entry [puppet] - 10https://gerrit.wikimedia.org/r/346148 (https://phabricator.wikimedia.org/T162037)
[13:33:57] <wikibugs_>	 06Operations, 07HHVM: HHVM 3.18 deadlocks after 4-6 hours (stuck in in HPHP::Treadmill::getAgeOldestRequest() ) - https://phabricator.wikimedia.org/T161684#3150576 (10MoritzMuehlenhoff) After further debugging I now believe that stat_cache is still broken in 3.18 and causing these deadlocks. To confirm, I've d...
[13:34:00] <MatmaRex>	 hi
[13:34:04] <MatmaRex>	 did i break something?
[13:34:50] <_joe_>	 MatmaRex: how exactly?
[13:35:39] <MatmaRex>	 i got pinged on this channel, so i assume i must've broken something :D
[13:36:05] <wikibugs>	 (03PS1) 10Volans: Revert "Swift-proxy: use discovery everywhere for rewrites" [puppet] - 10https://gerrit.wikimedia.org/r/346149
[13:36:08] <MatmaRex>	 (just joking)
[13:36:22] <wikibugs>	 (03PS2) 10Volans: Revert "Swift-proxy: use discovery everywhere for rewrites" [puppet] - 10https://gerrit.wikimedia.org/r/346149
[13:37:36] <wikibugs>	 06Operations, 06Analytics-Kanban, 06WMDE-Analytics-Engineering, 13Patch-For-Review, 15User-Addshore: /a/mw-log/archive/api on stat1002 no longer being populated - https://phabricator.wikimedia.org/T160888#3150579 (10Ottomata) > we might want to open another one to track down how to move away api-logs fro...
[13:37:36] <Dereckson>	 zeljkof: working
[13:37:47] <zeljkof>	 Dereckson: ok, deploying to cluster then
[13:38:40] <logmsgbot>	 !log zfilipin@tin Synchronized php-1.29.0-wmf.18/extensions/cldr/: SWAT: [[gerrit:346058|Translate Atikamekw language name in French]] (duration: 00m 51s)
[13:38:43] <Dereckson>	 thanks
[13:38:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:52] <zeljkof>	 Dereckson: deployed, please test on production
[13:39:09] <zeljkof>	 hashar: is everything deployed now? can we close the eu swat?
[13:39:18] <wikibugs_>	 06Operations, 05Goal, 07kubernetes: Prepare to service applications from kubernetes - https://phabricator.wikimedia.org/T162039#3150583 (10akosiaris)
[13:39:18] <Dereckson>	 works also in prod
[13:39:22] <hashar>	 Urbanecm: will have to sync the static/images/project-logos later on.  There is a thumb issue on going
[13:39:23] <Dereckson>	 so no need for full scap, good news
[13:39:29] <hashar>	 Urbanecm: I am watching it though
[13:40:02] <Urbanecm>	 hashar, ack
[13:40:08] <wikibugs>	 (03PS6) 10Matthias Mullie: Add 3d2png deploy repo to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur)
[13:41:08] <zeljkof>	 hashar: ping me if there is anything else to deploy, I'm around but doing other stuff, looks like all patches for today are deployed
[13:41:31] <wikibugs>	 06Operations, 05Goal, 07kubernetes: Eliminate SPOFs in the existing eqiad infrastructure - https://phabricator.wikimedia.org/T162040#3150598 (10akosiaris)
[13:42:19] <hashar>	 zeljkof: yeah all done but one
[13:42:35] <hashar>	 which I will take care of once the thumb/png issue is resolved
[13:44:37] <wikibugs_>	 06Operations, 05Goal, 07kubernetes: Expand the infrastructure to codfw - https://phabricator.wikimedia.org/T162041#3150620 (10akosiaris)
[13:47:45] <wikibugs>	 06Operations: Migrate all jessie hosts to Linux 4.9 - https://phabricator.wikimedia.org/T162029#3150639 (10MoritzMuehlenhoff) p:05Triage>03Normal
[13:49:42] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[13:50:12] <wikibugs_>	 06Operations, 13Patch-For-Review: Package the next LTS kernel (4.9) - https://phabricator.wikimedia.org/T154934#3150651 (10MoritzMuehlenhoff) 05Open>03Resolved The new kernel is available on apt.wikimedia.org and is used by default on jessie installations. Closing, the migration of existing jessie installa...
[13:50:33] <wikibugs_>	 06Operations, 06Release-Engineering-Team, 06Services, 05Goal, 07kubernetes: Prepare and maintain base container images - https://phabricator.wikimedia.org/T162042#3150656 (10akosiaris)
[13:51:22] <icinga-wm>	 RECOVERY - puppet last run on db1081 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[13:51:50] <wikibugs>	 06Operations, 05Goal, 07kubernetes: Define a process to keep images up-to-date on similar standards as the rest of production - https://phabricator.wikimedia.org/T162043#3150672 (10akosiaris)
[13:51:54] <elukey>	 cmjohnson1: hi! do you have a minute?
[13:52:10] <cmjohnson1>	 Hi elukey sure
[13:52:53] <elukey>	 thanks! I tried to reimage analytics1030.eqiad.wmnet and for some reason it is now stuck while booting, and powercycle/hardreset does not work
[13:53:22] <cmjohnson1>	 okay, you tried via mgmt?
[13:54:40] <wikibugs>	 06Operations, 05Goal, 07kubernetes: Design and implement a Kubernetes-based staging environment. (stretch) - https://phabricator.wikimedia.org/T162045#3150713 (10akosiaris)
[13:54:42] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[13:56:48] <wikibugs_>	 06Operations, 10hardware-requests: EQIAD: (4) hardware access request for ganeti - https://phabricator.wikimedia.org/T161702#3150731 (10akosiaris)
[13:56:51] <wikibugs>	 06Operations, 05Goal, 07kubernetes: Eliminate SPOFs in the existing eqiad infrastructure - https://phabricator.wikimedia.org/T162040#3150730 (10akosiaris)
[13:57:27] <wikibugs_>	 06Operations, 10hardware-requests: CODFW: (4) hardware access request for kubernetes - https://phabricator.wikimedia.org/T161700#3150733 (10akosiaris)
[13:57:30] <wikibugs>	 06Operations, 05Goal, 07kubernetes: Expand the infrastructure to codfw - https://phabricator.wikimedia.org/T162041#3150732 (10akosiaris)
[13:58:40] <marostegui>	 zeljkof: so swat not done yet then?
[13:59:13] <zeljkof>	 marostegui: I think hashar is waiting for something to be resolved before pushing one last thing
[13:59:23] <zeljkof>	 but swat time is up anyway in a minute
[14:01:39] <wikibugs_>	 06Operations, 10netops: asw-a1-codfw spontaneous reboot - https://phabricator.wikimedia.org/T159464#3150744 (10faidon) 05Open>03Resolved a:03faidon Logs didn't show anything and it hasn't happened in a month. Let's resolve for now.
[14:01:58] <marostegui>	 zeljkof: sure, I can wait until you guys are fine with it :)
[14:02:38] <zeljkof>	 marostegui: I think our time is up, if you have deployment scheduled, wait a few minutes if hashar replies, if not, go ahead
[14:02:53] <hashar>	 marostegui: yeah it is done
[14:03:07] <hashar>	 well not completely, still have to sync some project logos  but that is not an issue 
[14:03:12] <marostegui>	 Ah, great! Thanks guys :)
[14:03:20] <marostegui>	 Seriously, I can wait, if you guys prefer me to wait
[14:03:40] <Zppix>	 jouncebot:  now
[14:03:40] <jouncebot>	 No deployments scheduled for the next 2 hour(s) and 56 minute(s)
[14:07:15] <wikibugs>	 (03PS3) 10Marostegui: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346144
[14:08:53] <wikibugs_>	 06Operations, 10ops-eqiad, 06Analytics-Kanban, 06DC-Ops: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3150761 (10elukey)
[14:09:13] <hashar>	 marostegui: you can deploy just fine :-}
[14:09:44] <hashar>	 the only thing I am holding is https://gerrit.wikimedia.org/r/#/c/346057/   which updates a few images in static/images/project-logos/
[14:10:00] <wikibugs>	 06Operations, 06Analytics-Kanban, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3150775 (10elukey) Analytics1030 is refusing to boot, opened a phab task: https://phabricator.wikimedia.org/T162046
[14:12:30] <marostegui>	 hashar: ok! thank you :)
[14:13:10] <wikibugs>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346144 (owner: 10Marostegui)
[14:14:43] <wikibugs>	 (03PS1) 10Ema: cache_upload: unset Content-Type on 304 responses [puppet] - 10https://gerrit.wikimedia.org/r/346157 (https://phabricator.wikimedia.org/T162035)
[14:15:41] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346144 (owner: 10Marostegui)
[14:17:05] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1086 - T160390 (duration: 00m 51s)
[14:17:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:11] <stashbot>	 T160390: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390
[14:17:50] <wikibugs>	 (03CR) 10BBlack: [C: 031] cache_upload: unset Content-Type on 304 responses [puppet] - 10https://gerrit.wikimedia.org/r/346157 (https://phabricator.wikimedia.org/T162035) (owner: 10Ema)
[14:17:52] <wikibugs_>	 (03PS3) 10Giuseppe Lavagetto: base::puppet: add puppet helper scripts [puppet] - 10https://gerrit.wikimedia.org/r/346118
[14:18:51] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: elasticsearch: Fix ERB instance variable notation [puppet] - 10https://gerrit.wikimedia.org/r/345845
[14:18:56] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] elasticsearch: Fix ERB instance variable notation [puppet] - 10https://gerrit.wikimedia.org/r/345845 (owner: 10Alexandros Kosiaris)
[14:20:01] <wikibugs>	 (03PS2) 10Ema: cache_upload: unset Content-Type on 304 responses [puppet] - 10https://gerrit.wikimedia.org/r/346157 (https://phabricator.wikimedia.org/T162035)
[14:20:02] <icinga-wm>	 PROBLEM - puppet last run on dbmonitor2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:20:14] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 032] cache_upload: unset Content-Type on 304 responses [puppet] - 10https://gerrit.wikimedia.org/r/346157 (https://phabricator.wikimedia.org/T162035) (owner: 10Ema)
[14:20:29] <wikibugs_>	 (03PS1) 10ArielGlenn: fix bug introduced in local variable cleanup for recombine jobs [dumps] - 10https://gerrit.wikimedia.org/r/346158
[14:22:41] <wikibugs_>	 06Operations, 06Release-Engineering-Team, 05Goal, 06Services (designing), and 2 others: Prepare and maintain base container images - https://phabricator.wikimedia.org/T162042#3150814 (10mobrovac)
[14:23:44] <wikibugs_>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346160 (https://phabricator.wikimedia.org/T159319)
[14:25:18] <wikibugs_>	 (03CR) 10ArielGlenn: [C: 032] fix bug introduced in local variable cleanup for recombine jobs [dumps] - 10https://gerrit.wikimedia.org/r/346158 (owner: 10ArielGlenn)
[14:26:24] <wikibugs_>	 06Operations, 10Traffic: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3150818 (10BBlack)  Depending on the context I've been flipping between whether we're talking about just 3DES or both of the non-FS ciphers, sorry.  In current weekly stat...
[14:26:28] <logmsgbot>	 !log ariel@tin Started deploy [dumps/dumps@905a845]: fix stub recombines, broken by too agressive 'cleanup' of local vars
[14:26:30] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@905a845]: fix stub recombines, broken by too agressive 'cleanup' of local vars (duration: 00m 02s)
[14:26:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:42] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:26:58] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346160 (https://phabricator.wikimedia.org/T159319) (owner: 10Marostegui)
[14:28:21] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346160 (https://phabricator.wikimedia.org/T159319) (owner: 10Marostegui)
[14:29:53] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1015 - T159319 (duration: 00m 44s)
[14:29:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:31:42] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:33:20] <wikibugs_>	 (03PS1) 10Hoo man: Don't set removed Wikibase client settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346161
[14:33:28] <wikibugs>	 (03PS1) 10Muehlenhoff: Set wireshark-common in debconf to avoid setuid prompt [puppet] - 10https://gerrit.wikimedia.org/r/346162
[14:33:40] <wikibugs_>	 (03PS2) 10Muehlenhoff: Set wireshark-common in debconf to avoid setuid prompt [puppet] - 10https://gerrit.wikimedia.org/r/346162
[14:38:33] <wikibugs>	 06Operations, 10ops-codfw: Plug in ex-graphite2001 SSDs to recover coal data - https://phabricator.wikimedia.org/T161538#3150855 (10Papaul) a:05Papaul>03RobH Flash Drive in place
[14:45:33] <wikibugs>	 06Operations, 10ops-eqiad: Degraded RAID on ocg1001 - https://phabricator.wikimedia.org/T161158#3150869 (10faidon) a:05Cmjohnson>03Dzahn Someone, unfortunately, needs to follow the process outlined here: https://wikitech.wikimedia.org/wiki/OCG#Decommissioning_a_host  @Dzahn, can I ask you to have a look at...
[14:48:02] <icinga-wm>	 RECOVERY - puppet last run on dbmonitor2001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[14:49:32] <ema>	 !log cache_upload: ban all objects with content-type: text/html T162035
[14:49:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:39] <stashbot>	 T162035: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035
[14:50:02] <wikibugs>	 06Operations, 07Puppet, 06Discovery, 06Maps, 03Interactive-Sprint: Puppet fails with "Could not find init script for 'postgresql@9.4-main'" on maps / labs server - https://phabricator.wikimedia.org/T161893#3150878 (10Gehel) The simplest possible test I can think of is `sudo puppet apply -e "service {'pos...
[14:54:50] <marostegui>	 !log Deploy alter table to unify revision table across all the s3 wikis on db1015 - T159319
[14:54:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:55:42] <Urbanecm>	 ema, are you sure they were baned? I still can download some.
[14:56:27] <ema>	 Urbanecm: in progress :)
[14:56:28] <bblack>	 it takes a little bit to execute the ban on all the affected hosts, I'm not sure if he's done yet
[14:58:41] <wikibugs>	 06Operations, 06Office-IT, 07LDAP: Remove disabled users from internal mailing lists - https://phabricator.wikimedia.org/T161004#3150898 (10Aklapper) Please also check the discussion in {T100400}.
[15:00:03] <ema>	 Urbanecm: done
[15:02:05] <wikibugs_>	 (03Abandoned) 10Volans: Revert "Swift-proxy: use discovery everywhere for rewrites" [puppet] - 10https://gerrit.wikimedia.org/r/346149 (owner: 10Volans)
[15:05:08] <wikibugs_>	 (03PS1) 10Hashar: contint: PHP packages cleanup [puppet] - 10https://gerrit.wikimedia.org/r/346165
[15:05:32] <Urbanecm>	 ema, it doesn't seems be done. For example https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Ambox_currentevent.svg/48px-Ambox_currentevent.svg.png is still bad. 
[15:05:51] <paladox>	 works for me.
[15:06:02] <Urbanecm>	 paladox, which cluster do you use?
[15:06:15] <paladox>	 What do you mean cluster?
[15:06:16] <Reedy>	 esama
[15:06:18] <Reedy>	 *esams
[15:06:23] <paladox>	 yeh, europe
[15:06:37] <Reedy>	 Urbanecm: that WFM from europe too
[15:06:45] <Urbanecm>	 Thank you both. 
[15:06:45] <ema>	 Urbanecm: what's the value of the X-Cache response header you get with the bad response? 
[15:07:00] <Urbanecm>	 ema, the bad response isn't send anymore, sorry
[15:07:09] <Urbanecm>	 It works now...
[15:07:10] <ema>	 \o/
[15:07:17] <wikibugs_>	 (03CR) 10Hashar: "That follow up https://gerrit.wikimedia.org/r/#/c/325877/ .  A bit nicer since mediawiki::packages::php5 is now included explicitly.   Onc" [puppet] - 10https://gerrit.wikimedia.org/r/346165 (owner: 10Hashar)
[15:14:02] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1064 is CRITICAL: CRITICAL: expiry mailbox lag is 791658
[15:17:33] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 1008349
[15:24:12] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[15:25:12] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[15:25:52] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1050 is CRITICAL: CRITICAL: expiry mailbox lag is 564916
[15:27:54] <ema>	 503s in upload ulsfo
[15:28:02] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1062 is CRITICAL: CRITICAL: expiry mailbox lag is 733803
[15:28:55] <bblack>	 hmmm
[15:29:22] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1073 is CRITICAL: CRITICAL: expiry mailbox lag is 766144
[15:29:32] <icinga-wm>	 PROBLEM - Varnish HTTP upload-backend - port 3128 on cp4015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:30:15] <bblack>	 related to the ban or the change?
[15:30:32] <ema>	 I don't think so
[15:30:47] <bblack>	 why the slew of mailbox issues on cp1?
[15:31:13] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[15:31:15] <bblack>	 seems likely related to the execution of the ban at least (perhaps that stalls out reaping space, or affects the pattern of it)
[15:32:29] <ema>	 Varnish HTTP upload-backend - port 3128 on cp4015 is CRITICAL <- this seems a false positive? port 3128 is fine there
[15:32:46] <bblack>	 it may be intermittent
[15:32:55] <bblack>	 or we may have a ulsfo<->codfw/eqiad networking issue
[15:32:55] <_joe_>	 ema: it's a timeout on a request 
[15:33:12] <bblack>	 (which could cause that icinga port issue and the 5xx)
[15:33:32] <icinga-wm>	 PROBLEM - Varnish HTTP upload-backend - port 3128 on cp4013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:34:12] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[15:34:32] <icinga-wm>	 RECOVERY - Varnish HTTP upload-backend - port 3128 on cp4015 is OK: HTTP OK: HTTP/1.1 200 OK - 181 bytes in 8.971 second response time
[15:34:32] <icinga-wm>	 PROBLEM - Varnish HTTP upload-backend - port 3128 on cp4014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:35:23] <icinga-wm>	 RECOVERY - Varnish HTTP upload-backend - port 3128 on cp4014 is OK: HTTP OK: HTTP/1.1 200 OK - 180 bytes in 2.034 second response time
[15:36:32] <icinga-wm>	 RECOVERY - Varnish HTTP upload-backend - port 3128 on cp4013 is OK: HTTP OK: HTTP/1.1 200 OK - 180 bytes in 2.416 second response time
[15:37:49] <bblack>	 network graphs looked reasonable in librenms
[15:38:04] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3151086 (10stjn) Another one to the pile: https://upload.wikimedia.org/wikipe...
[15:38:09] <bblack>	 but all the cp1 mailboxes backlogging at the same time is suspect.  maybe somehow that caused some indirect fallout
[15:38:38] <bblack>	 the 5xx rate is tapering off from initial peak, but still not back to normal
[15:38:48] <ema>	 the lagging cp1s are not throwing 503s though
[15:39:16] <bblack>	 yeah but ulsfo's backend requests on misses ultimately end up requesting from those same daemons/storage
[15:39:35] <wikibugs>	 06Operations, 07Puppet, 06Discovery, 06Maps, 03Interactive-Sprint: Puppet fails with "Could not find init script for 'postgresql@9.4-main'" on maps / labs server - https://phabricator.wikimedia.org/T161893#3151105 (10Gehel) It looks like puppet is using `systemctl('list-unit-files', '--type', 'service',...
[15:40:03] <ema>	 bblack: we should see those requests with varnishlog on the cp1* machines tho, right?
[15:40:11] <bblack>	 yeah
[15:40:15] <ema>	 varnishlog -q 'RespStatus ~ 503' gives no output
[15:40:40] <bblack>	 maybe they're just slow and ulsfo is giving up?
[15:41:09] <ema>	 mmh
[15:41:12] <bblack>	 the 503 rate is fairly small
[15:41:55] <bblack>	 the initial peak was 16.6/s with GETs at ~9k/s
[15:43:04] <hoo>	 !log Updated email for "Lucie Kaffee" on wikitech from work address (wikimedia.de) to known volunteer address (upon request)
[15:43:08] <bblack>	 6.2% of reqs should leave ulsfo towards further-back
[15:43:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:43:52] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] package_builder: remove precise support [puppet] - 10https://gerrit.wikimedia.org/r/345836 (owner: 10Faidon Liambotis)
[15:43:58] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: package_builder: remove precise support [puppet] - 10https://gerrit.wikimedia.org/r/345836 (owner: 10Faidon Liambotis)
[15:43:59] <bblack>	 so very napkin-math estimate is that even the 16.6/s peak of 503s there on ulsfo only represented ~3% of the ulsfo->[codfw,eqiad,app] traffic failing
[15:44:02] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] package_builder: remove precise support [puppet] - 10https://gerrit.wikimedia.org/r/345836 (owner: 10Faidon Liambotis)
[15:45:32] <ema>	 bblack: cp4015 has a fairly large mailbox lag
[15:46:51] <ema>	 --  FetchError     http first read error: EOF
[15:47:06] <ema>	 not too frequently, but they're happening ^
[15:48:17] <bblack>	 yeah so if mailbox lag is generally getting worse than it has been, my two top hypotheses would be either the ttl/keep change affecting it negatively (I would've assumed positively, but who knows really) or the ban
[15:48:40] <bblack>	 but given the timing and functional correlation, I'd suspect the ban has some impact on the storage purging process
[15:48:42] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[15:49:04] <ema>	 ripe-atlas-ulsfo perhaps related? 
[15:49:10] <bblack>	 or maybe we are saturating network somehow?
[15:50:10] <wikibugs_>	 (03PS4) 10Andrew Bogott: Keystone 2fa:  Use the wikitech API rather than checking the db directly. [puppet] - 10https://gerrit.wikimedia.org/r/345231
[15:50:29] <ema>	 varnish-backend hitrate went down significantly 
[15:50:30] <ema>	 https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=15&fullscreen&orgId=1&var-server=cp4015&var-datasource=ulsfo%20prometheus%2Fops
[15:51:10] <ema>	 maybe unsetting Content-Type has side effects?
[15:51:35] <wikibugs>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3150362 (10Joe) @stjn I cannot reproduce your case and we should've fixed the...
[15:51:44] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] Keystone 2fa:  Use the wikitech API rather than checking the db directly. [puppet] - 10https://gerrit.wikimedia.org/r/345231 (owner: 10Andrew Bogott)
[15:53:39] <wikibugs>	 (03CR) 10Alexandros Kosiaris: "just did the manual cleanup on copper" [puppet] - 10https://gerrit.wikimedia.org/r/345836 (owner: 10Faidon Liambotis)
[15:53:42] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[15:54:22] <bblack>	 ema: it could also be that there's no functionally-bad side-effect, but that the affected (temporary bad html content type) objects were more numerous than we thought, and this purged a whole lot of things
[15:54:53] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3151162 (10Urbanecm) No problem for me. Try to clear cache of your browser. T...
[15:54:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] url_downloader: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/344729 (owner: 10Dzahn)
[15:55:13] <ema>	 bblack: the number of cached objects in varnish-be didn't change much so I'd say that's not the case
[15:55:23] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] "damn, needs manual rebase" [puppet] - 10https://gerrit.wikimedia.org/r/344729 (owner: 10Dzahn)
[15:55:43] <bblack>	 ok
[15:56:30] <ema>	 but the lagging mailbox issue triggers additional backend connections so that might explain the additional network traffic
[15:57:02] <bblack>	 maybe check on whether the bans are still running anywhere or already done? (e.g. cumin "varnishadm ban.list")
[15:57:21] <bblack>	 and yeah maybe try reverting the CT unset
[15:58:06] <bblack>	 I'd think once they're done executing (scanning storage to purge objects) their impact would go away, if any
[15:58:57] <ema>	 there's stuff like:
[15:58:58] <ema>	 Present bans:                                                                   
[15:59:01] <ema>	 1491231520.525840 4320833 C
[15:59:06] <ema>	 but the CT bans are gone
[15:59:17] <bblack>	 pl
[15:59:21] <bblack>	 err, "ok"
[15:59:30] <bblack>	 I think there's always a base entry in there with C
[15:59:56] <ema>	 on some hosts there are multiple C entries, not sure what that means
[15:59:59] <bblack>	 whereas e.g. when I looked earlier (as you were executing them initially), some caches would show:
[16:00:02] <bblack>	 1491231520.153275 3433299 -  obj.http.content-type == text/html; charset=UTF-8
[16:00:04] <bblack>	 1491231100.964850 848910 C  
[16:00:07] <bblack>	 1490739431.603784     0 C  
[16:01:16] <wikibugs_>	 (03PS7) 10Alexandros Kosiaris: url_downloader: convert to profile/role [puppet] - 10https://gerrit.wikimedia.org/r/344729 (owner: 10Dzahn)
[16:03:05] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3151195 (10stjn) Yes, the same ERR_CONTENT_DECODING_FAILED even with disabled...
[16:04:45] <wikibugs_>	 (03PS1) 10Andrew Bogott: keystone.conf:  Define labs_osm_host [puppet] - 10https://gerrit.wikimedia.org/r/346171
[16:06:37] <ottomata>	 cmjohnson1: any update on those new hadoop servers?
[16:06:44] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] keystone.conf:  Define labs_osm_host [puppet] - 10https://gerrit.wikimedia.org/r/346171 (owner: 10Andrew Bogott)
[16:06:52] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 031] "https://puppet-compiler.wmflabs.org/6006/ says NOOP" [puppet] - 10https://gerrit.wikimedia.org/r/344729 (owner: 10Dzahn)
[16:06:52] <icinga-wm>	 PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:07:13] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[16:07:48] <bblack>	 ulsfo 5xx just dropped back to zero now
[16:07:54] <ema>	 yeah
[16:07:59] <bblack>	 odd! :)
[16:09:05] <ema>	 hitrate up again https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=15&fullscreen&orgId=1&var-server=cp4015&var-datasource=ulsfo%20prometheus%2Fops
[16:09:12] <icinga-wm>	 RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[16:09:52] <cmjohnson1>	 ottomata: there still in the boxes...barring other misc tasks that take me away from them they're my primary focus this week. Do you have preferred racking instructions?
[16:14:02] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1064 is OK: OK: expiry mailbox lag is 0
[16:15:12] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp4015 is CRITICAL: CRITICAL: expiry mailbox lag is 567254
[16:16:03] <wikibugs>	 06Operations, 10ops-eqiad, 10Analytics-Cluster, 06Analytics-Kanban, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3151240 (10Nuria)
[16:18:02] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp4014 is CRITICAL: CRITICAL: expiry mailbox lag is 521563
[16:18:23] <ema>	 bblack: and now cp4* mailboxes lagging, fun!
[16:19:07] <wikibugs>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3151248 (10Vachovec1) I can confirm ERR_CONTENT_DECODING_FAILED for https://u...
[16:20:12] <icinga-wm>	 PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:22:08] <ottomata>	 cmjohnson1: other than evenly distributed in different rows
[16:22:08] <ottomata>	 nope
[16:22:10] <ottomata>	 thank you!
[16:22:40] <cmjohnson1>	 okay..that works
[16:26:17] <wikibugs>	 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: 6 Nodes for Kafka refresh/upgrade - https://phabricator.wikimedia.org/T161636#3151276 (10Nuria) p:05Triage>03Normal
[16:28:37] <wikibugs>	 06Operations, 15User-Elukey, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3151306 (10elukey) The issue with QUIT seems more subtle, namely only sometimes the RST happens after a QUIT...
[16:28:43] <wikibugs_>	 (03PS1) 10Giuseppe Lavagetto: mediawiki::cron: general encapsulation for mediawiki cronjobs [puppet] - 10https://gerrit.wikimedia.org/r/346173
[16:30:57] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] mediawiki::cron: general encapsulation for mediawiki cronjobs [puppet] - 10https://gerrit.wikimedia.org/r/346173 (owner: 10Giuseppe Lavagetto)
[16:32:22] <icinga-wm>	 PROBLEM - puppet last run on analytics1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:34:52] <icinga-wm>	 RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[16:38:16] <wikibugs_>	 06Operations, 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 13Patch-For-Review: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#3151383 (10Nuria)
[16:43:53] <wikibugs>	 06Operations, 15User-Elukey, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3151420 (10elukey) Returning to the main timeout issue, it seems to me that the next step is trying to find...
[16:47:33] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp4013 is CRITICAL: CRITICAL: expiry mailbox lag is 552977
[16:48:12] <icinga-wm>	 RECOVERY - puppet last run on cp1099 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[16:52:02] <icinga-wm>	 PROBLEM - Host analytics1030 is DOWN: PING CRITICAL - Packet loss = 100%
[16:52:12] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1031 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[16:52:22] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 49614.215709 Seconds
[16:52:22] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 49615.110857 Seconds
[16:52:22] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 49615.966464 Seconds
[16:53:13] <wikibugs_>	 06Operations: Weak digest algorithm (SHA1) used to sign InRelease on apt.wikimedia.org - https://phabricator.wikimedia.org/T132325#2194482 (10faidon) First off, I'm surprised that sid's apt worked with the jessie-wikimedia suite, since jessie-wikimedia is signed with a weak DSA key that shouldn't be accepted by...
[16:55:22] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[16:55:22] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[16:55:22] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[16:55:42] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[16:57:21] <wikibugs>	 06Operations, 06DC-Ops, 06Labs: Move labstore1002 and labstore1002-array1 and labstore1002-array2 to different rack (currently in C3) - https://phabricator.wikimedia.org/T158913#3151459 (10madhuvishy) Hi @Cmjohnson, apologies for the delay here, we were working through the possibilities of what the next step...
[16:57:23] <elukey>	 setting downtime for an1030
[16:58:50] <cmjohnson1>	 elukey: powering it back on
[16:59:37] <elukey>	 ah nice cmjohnson1, did you manage to fix it?
[17:00:05] <jouncebot>	 gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170403T1700).
[17:00:12] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp4006 is CRITICAL: CRITICAL: expiry mailbox lag is 558262
[17:00:42] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[17:01:22] <icinga-wm>	 RECOVERY - puppet last run on analytics1050 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[17:01:24] <logmsgbot>	 !log gehel@tin Started deploy [wdqs/wdqs@d7c367a]: (no justification provided)
[17:01:30] <wikibugs>	 (03PS2) 10Madhuvishy: tools: Deprecate precise_reminder role and clean up related script [puppet] - 10https://gerrit.wikimedia.org/r/342658 (https://phabricator.wikimedia.org/T149214)
[17:01:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:54] <logmsgbot>	 !log gehel@tin Finished deploy [wdqs/wdqs@d7c367a]: (no justification provided) (duration: 01m 29s)
[17:02:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:03:43] <cmjohnson1>	 elukey: it looks like there is going to be a problem that will required Dell...
[17:03:55] <gehel>	 SMalyshev: wdqs deployment completed, tests looking good...
[17:04:09] <wikibugs_>	 (03Abandoned) 10Madhuvishy: tools: Deprecate precise_reminder role and clean up related script [puppet] - 10https://gerrit.wikimedia.org/r/342658 (https://phabricator.wikimedia.org/T149214) (owner: 10Madhuvishy)
[17:06:06] <elukey>	 cmjohnson1: ack, thanks :)
[17:06:07] <wikibugs_>	 (03PS2) 10Giuseppe Lavagetto: mediawiki::cron: general encapsulation for mediawiki cronjobs [puppet] - 10https://gerrit.wikimedia.org/r/346173
[17:07:04] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mediawiki::cron: general encapsulation for mediawiki cronjobs [puppet] - 10https://gerrit.wikimedia.org/r/346173 (owner: 10Giuseppe Lavagetto)
[17:07:08] <wikibugs_>	 (03PS1) 10Madhuvishy: nfsclient: Enable lookupcache by default for all nfs client instances [puppet] - 10https://gerrit.wikimedia.org/r/346177 (https://phabricator.wikimedia.org/T136712)
[17:07:23] <wikibugs>	 06Operations, 10DBA: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070#3151523 (10faidon)
[17:08:15] <SMalyshev>	 gehel: great, thanks!
[17:08:24] <gehel>	 SMalyshev: you're welcomed!
[17:16:22] <icinga-wm>	 PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:27:05] <wikibugs>	 06Operations, 07HHVM: HHVM 3.18 deadlocks after 4-6 hours (stuck in in HPHP::Treadmill::getAgeOldestRequest() ) - https://phabricator.wikimedia.org/T161684#3151575 (10MoritzMuehlenhoff) mw1261 is stable with stat_cache=false for ten hours of production traffic now. I've reported this back to upstream along wit...
[17:27:44] <wikibugs>	 (03Draft1) 10Paladox: Gerrit: Disable md5 in ssh [puppet] - 10https://gerrit.wikimedia.org/r/346180
[17:27:59] <wikibugs_>	 (03PS2) 10Paladox: Gerrit: Disable md5 in ssh [puppet] - 10https://gerrit.wikimedia.org/r/346180
[17:28:15] <paladox>	 mutante ^^ :)
[17:29:23] <wikibugs>	 (03CR) 10Paladox: "I have no idea what eddsa is called in mac." [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[17:30:12] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4006 is OK: OK: expiry mailbox lag is 0
[17:32:49] <wikibugs>	 (03CR) 10Dzahn: "disabling MD5 is good. is it enabled now though? the eddsa host key thing should be unrelated to the MAC choice" [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[17:33:47] <wikibugs>	 (03CR) 10Paladox: "> disabling MD5 is good. is it enabled now though? the eddsa host key" [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[17:38:52] <wikibugs_>	 (03CR) 10Paladox: "Here is the output of ssh -vvv to gerrit.wikimedia.org" [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[17:39:16] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: mediawiki::cron: general encapsulation for mediawiki cronjobs [puppet] - 10https://gerrit.wikimedia.org/r/346173
[17:40:08] <wikibugs_>	 (03CR) 10Thcipriani: [C: 031] "So the error output in the puppet compiler seems to be complaining about conftool. I don't think this is being caused by this change since" [puppet] - 10https://gerrit.wikimedia.org/r/345377 (https://phabricator.wikimedia.org/T160185) (owner: 10MarkTraceur)
[17:44:22] <icinga-wm>	 RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[17:56:04] <wikibugs>	 06Operations, 10Ops-Access-Requests: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151609 (10BBlack)
[17:56:58] <XioNoX>	 hey, that's me!
[17:57:21] <wikibugs_>	 (03PS1) 10BBlack: Add ayounsi shell account in ops [puppet] - 10https://gerrit.wikimedia.org/r/346182 (https://phabricator.wikimedia.org/T162073)
[17:57:32] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4013 is OK: OK: expiry mailbox lag is 40
[17:57:42] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[17:57:57] <wikibugs_>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "good work, see a few inline comments." (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342248 (https://phabricator.wikimedia.org/T147718) (owner: 10Gehel)
[17:58:02] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4014 is OK: OK: expiry mailbox lag is 34808
[17:58:08] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151626 (10BBlack)
[17:58:27] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/346182 (https://phabricator.wikimedia.org/T162073) (owner: 10BBlack)
[17:58:44] <wikibugs_>	 (03CR) 10BBlack: [C: 032] Add ayounsi shell account in ops [puppet] - 10https://gerrit.wikimedia.org/r/346182 (https://phabricator.wikimedia.org/T162073) (owner: 10BBlack)
[18:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170403T1800). Please do the needful.
[18:00:05] <jouncebot>	 DatGuy: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[18:01:07] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151609 (10Dzahn) added to ops mailing list
[18:02:12] <Dereckson>	 Hello, I can SWAT.
[18:02:15] <Dereckson>	 DatGuy: ping?
[18:02:19] <wikibugs_>	 06Operations, 10Ops-Access-Requests, 10Traffic: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151609 (10faidon) Added to all 34 network devices (cr, asw/csw, msw, mr, pfw).
[18:02:42] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[18:03:26] <wikibugs_>	 06Operations, 10Ops-Access-Requests, 10Traffic: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151651 (10BBlack)
[18:03:35] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151609 (10BBlack)
[18:05:13] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp4015 is OK: OK: expiry mailbox lag is 183
[18:06:29] <wikibugs>	 (03PS1) 10Dzahn: ssh: avoid hardcoded hostname for yubiauth, add to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/346183
[18:07:35] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] ssh: avoid hardcoded hostname for yubiauth, add to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/346183 (owner: 10Dzahn)
[18:07:46] <wikibugs_>	 (03CR) 10Dzahn: [C: 04-1] ssh: avoid hardcoded hostname for yubiauth, add to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/346183 (owner: 10Dzahn)
[18:07:48] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151661 (10BBlack)
[18:08:16] <wikibugs>	 (03CR) 10Dzahn: [C: 04-2] ssh: avoid hardcoded hostname for yubiauth, add to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/346183 (owner: 10Dzahn)
[18:09:03] <wikibugs_>	 (03PS1) 10Dzahn: icinga: allow command execution for Ayounsi [puppet] - 10https://gerrit.wikimedia.org/r/346184 (https://phabricator.wikimedia.org/T162073)
[18:09:58] <wikibugs>	 (03CR) 10Dereckson: "Initially planned for 2017-04-03 morning SWAT, but change author weren't available." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346043 (https://phabricator.wikimedia.org/T161804) (owner: 10DatGuy)
[18:10:52] <wikibugs_>	 (03CR) 10Dereckson: "Initially planned for 2017-04-03 morning SWAT, but change author weren't available." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346044 (https://phabricator.wikimedia.org/T161593) (owner: 10DatGuy)
[18:13:25] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151686 (10Dzahn) added to root@ mail alias (prepare for incoming wave of mails :p)
[18:14:25] <DatGuy>	 ah crikey
[18:14:30] <DatGuy>	 Dereckson, thought it was tomorrow
[18:15:10] <DatGuy>	 is it too late?
[18:15:18] <Dereckson>	 No, we can't deploy them now :)
[18:15:25] <DatGuy>	 alright
[18:15:31] <Dereckson>	 Any preference order?
[18:16:15] <DatGuy>	 you mean order of merges?
[18:17:18] <Dereckson>	 yes
[18:17:21] <DatGuy>	 not really
[18:17:25] <wikibugs>	 (03PS3) 10Dereckson: Convert reference lists to 'responsive' on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346043 (https://phabricator.wikimedia.org/T161804) (owner: 10DatGuy)
[18:17:32] <wikibugs_>	 (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346043 (https://phabricator.wikimedia.org/T161804) (owner: 10DatGuy)
[18:18:36] <Dereckson>	 DatGuy: you've already the X-Wikimedia-Debug extension installed?
[18:18:41] <DatGuy>	 yep
[18:18:44] <DatGuy>	 it's mw1002 right?
[18:18:53] <Dereckson>	 mwdebug1002 indeed
[18:19:05] <Dereckson>	 (but we're still waiting zuul to find a free slot to run tests)
[18:20:33] <wikibugs_>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151689 (10Dzahn) added Icinga contact in private repo (with just email notification method for now, no phone number / paging just yet)
[18:22:04] <wikibugs>	 (03CR) 10Dzahn: [C: 031] "contact exists now (private repo)" [puppet] - 10https://gerrit.wikimedia.org/r/346184 (https://phabricator.wikimedia.org/T162073) (owner: 10Dzahn)
[18:22:58] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151694 (10BBlack) Added to other email aliases in private repo as well: dns-admin, peering, ripe-updates
[18:23:31] <wikibugs>	 (03Merged) 10jenkins-bot: Convert reference lists to 'responsive' on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346043 (https://phabricator.wikimedia.org/T161804) (owner: 10DatGuy)
[18:23:40] <bblack>	 mutante: hold that commit for icinga cmd access, can test his +2 on it?
[18:23:54] <DatGuy>	 alright, checking now
[18:24:21] <Dereckson>	 DatGuy: not yet on mwdebug1002 (but will in 20 seconds)
[18:24:39] <Dereckson>	 DatGuy: ok, live now
[18:24:40] <mutante>	 bblack: sure, good ida
[18:25:23] <wikibugs>	 (03PS2) 10Dereckson: Configure Babel for elwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346044 (https://phabricator.wikimedia.org/T161593) (owner: 10DatGuy)
[18:25:30] <wikibugs_>	 (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346044 (https://phabricator.wikimedia.org/T161593) (owner: 10DatGuy)
[18:27:05] <DatGuy>	 hewiki looks good, but only checked one page
[18:27:33] <Dereckson>	 a page with references?
[18:27:56] <wikibugs>	 (03Merged) 10jenkins-bot: Configure Babel for elwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346044 (https://phabricator.wikimedia.org/T161593) (owner: 10DatGuy)
[18:28:04] <DatGuy>	 yep
[18:28:26] <Dereckson>	 So that's good :)
[18:28:39] <Dereckson>	 Syncing
[18:29:14] <logmsgbot>	 !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Convert reference lists to 'responsive' on hewiki (T161804) (duration: 00m 52s)
[18:29:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:29:21] <stashbot>	 T161804: Convert reference lists over to `responsive` on hewiki - https://phabricator.wikimedia.org/T161804
[18:29:36] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] icinga: allow command execution for Ayounsi [puppet] - 10https://gerrit.wikimedia.org/r/346184 (https://phabricator.wikimedia.org/T162073) (owner: 10Dzahn)
[18:30:32] <Dereckson>	 DatGuy: Babel change live on mwdebug1002.eqiad.wmnet
[18:31:40] <wikibugs>	 (03PS2) 10Ayounsi: icinga: allow command execution for Ayounsi [puppet] - 10https://gerrit.wikimedia.org/r/346184 (https://phabricator.wikimedia.org/T162073) (owner: 10Dzahn)
[18:32:52] <DatGuy>	 http://imgur.com/a/Txu2Y babel looks good on elwikisource
[18:35:32] <DatGuy>	 ping Dereckson
[18:35:45] <Dereckson>	 pong
[18:35:55] <DatGuy>	 pang
[18:36:02] <DatGuy>	 babel good to go
[18:36:10] <DatGuy>	 :)
[18:36:11] <Dereckson>	 Yes I've seen it, will sync in a few moments
[18:36:23] <DatGuy>	 alright, cheers
[18:37:01] <Dereckson>	 syncing
[18:37:39] <logmsgbot>	 !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Configure Babel for elwikisource (T161593) (duration: 00m 44s)
[18:37:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:46] <stashbot>	 T161593: Configure extension:Babel for el.wikisource - https://phabricator.wikimedia.org/T161593
[18:38:24] <DatGuy>	 great, thanks for stilling merging the changes even though I was absent
[18:38:38] <Dereckson>	 Thanks for the changes :)
[18:39:11] <DatGuy>	 I've notified the communities it's live.
[18:41:14] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[18:41:59] <DatGuy>	 also, Dereckson, may I also make gerrit patches for https://phabricator.wikimedia.org/T161529 or is it only people with specific access?
[18:42:29] <paladox>	 DatGut you can create patches for it
[18:42:38] <paladox>	 i helped to create a wikipedia before.
[18:42:52] <paladox>	 I forgot the name of the wikipedia though as it was last year
[18:44:49] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[18:45:51] <DatGuy>	 alright, thanks
[18:46:09] <icinga-wm>	 PROBLEM - HP RAID on dbstore2001 is CRITICAL: CHECK_NRPE: Socket timeout after 50 seconds.
[18:46:55] <wikibugs>	 (03PS2) 10Rush: nfsclient: Enable lookupcache by default for all nfs client instances [puppet] - 10https://gerrit.wikimedia.org/r/346177 (https://phabricator.wikimedia.org/T136712) (owner: 10Madhuvishy)
[18:46:57] <wikibugs_>	 (03CR) 10Rush: [C: 031] nfsclient: Enable lookupcache by default for all nfs client instances [puppet] - 10https://gerrit.wikimedia.org/r/346177 (https://phabricator.wikimedia.org/T136712) (owner: 10Madhuvishy)
[18:47:49] <icinga-wm>	 RECOVERY - HP RAID on dbstore2001 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, Controller, Battery/Capacitor
[18:48:34] <wikibugs>	 (03CR) 10Madhuvishy: [C: 032] nfsclient: Enable lookupcache by default for all nfs client instances [puppet] - 10https://gerrit.wikimedia.org/r/346177 (https://phabricator.wikimedia.org/T136712) (owner: 10Madhuvishy)
[18:48:41] <Dereckson>	 paladox: tcy? ady?
[18:48:46] <Dereckson>	 jam?
[18:48:49] <paladox>	 tcy i think
[18:48:50] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151772 (10BBlack)
[18:50:39] <paladox>	 Did gerrit just become faster?
[18:50:45] <Dereckson>	 Let's hope this one is actionable, two Wikipedia requests wasn't, one as language engineering wasn't happy with the translation progress, one as there wasn't enough community (according an en. block and a CU on Incubator, sockpuppets was used by the unique active contributor to give the impression they were 3)
[18:50:47] <paladox>	 It looks to be very fast now.
[18:51:10] <Dereckson>	 I'll see to create pa.wikisource Thursday
[18:56:18] <icinga-wm>	 PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:56:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[18:56:55] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3151794 (10ema) Our understanding of the problem so far is that some of our s...
[18:59:44] <wikibugs>	 (03PS1) 10Andrew Bogott: toolschecker:  Test ldap by checking ou=groups instead of ou=projects [puppet] - 10https://gerrit.wikimedia.org/r/346187 (https://phabricator.wikimedia.org/T126758)
[19:01:32] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] toolschecker:  Test ldap by checking ou=groups instead of ou=projects [puppet] - 10https://gerrit.wikimedia.org/r/346187 (https://phabricator.wikimedia.org/T126758) (owner: 10Andrew Bogott)
[19:01:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[19:07:58] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1062 is OK: OK: expiry mailbox lag is 0
[19:08:28] <icinga-wm>	 PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:10:25] <hashar>	 jouncebot: next
[19:10:25] <jouncebot>	 In 0 hour(s) and 49 minute(s): Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170403T2000)
[19:10:29] <hashar>	 jouncebot: refresh
[19:10:32] <jouncebot>	 I refreshed my knowledge about deployments.
[19:10:46] <wikibugs_>	 (03PS1) 10Andrew Bogott: toolschecker:  The group is 'project-testlabs,' not 'testlabs' [puppet] - 10https://gerrit.wikimedia.org/r/346189 (https://phabricator.wikimedia.org/T126758)
[19:12:02] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] toolschecker:  The group is 'project-testlabs,' not 'testlabs' [puppet] - 10https://gerrit.wikimedia.org/r/346189 (https://phabricator.wikimedia.org/T126758) (owner: 10Andrew Bogott)
[19:13:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 21 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[19:15:50] <mutante>	 !log phabricator/ops: adding ayounsi to WMF-NDA (project 61) and acl*operations-team (project 29) (T162073) 
[19:15:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:15:57] <stashbot>	 T162073: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073
[19:16:28] <wikibugs_>	 06Operations, 10Wikimedia-Mailing-lists: mailman issue for ops team? - https://phabricator.wikimedia.org/T162080#3151923 (10Legoktm)
[19:16:41] <hashar>	 I am going to sync some project logos
[19:16:44] <andrewbogott>	 !log in testlabs, deleted ou=projects,dc=wikimedia,dc=org and ou=roles,dc=wikimedia,dc=org as per T126758
[19:16:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:16:51] <stashbot>	 T126758: Clean up after ldap->mysql keystone migration - https://phabricator.wikimedia.org/T126758
[19:18:54] <logmsgbot>	 !log hashar@tin Synchronized static/images/project-logos: Optimize a few project logos - T161999 (duration: 00m 44s)
[19:19:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:02] <stashbot>	 T161999: Make sure all logos are optimalized - https://phabricator.wikimedia.org/T161999
[19:21:26] <hashar>	 !log Finished deployment of project-logos optimization for T161999 / https://gerrit.wikimedia.org/r/#/c/346057/ .  And purged the related logos
[19:21:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:22:38] <wikibugs_>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151981 (10Dzahn) [x] phabricator permissions to see NDA and Ops restricted tickets  I did the same steps that were done by @Aklapper in T144496#2601909.  - https...
[19:23:09] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3151984 (10Dzahn)
[19:23:13] <wikibugs_>	 06Operations, 10Traffic, 10media-storage, 13Patch-For-Review, 15User-Urbanecm: Specific PNG thumbnail delivered as [text/html] instead of [image/png] and hence not rendered in browser - https://phabricator.wikimedia.org/T162035#3151985 (10ema) There is probably another type of bug responsible for the ERR...
[19:24:18] <icinga-wm>	 RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[19:28:38] <wikibugs>	 (03PS3) 10Andrew Bogott: Add skin, language, and variant to user_properties_anon [puppet] - 10https://gerrit.wikimedia.org/r/344302 (https://phabricator.wikimedia.org/T152043) (owner: 10Reedy)
[19:28:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[19:32:42] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Add skin, language, and variant to user_properties_anon [puppet] - 10https://gerrit.wikimedia.org/r/344302 (https://phabricator.wikimedia.org/T152043) (owner: 10Reedy)
[19:36:28] <icinga-wm>	 RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[19:54:37] <wikibugs>	 (03CR) 10Andrew Bogott: "An important thing to keep in mind is that ldaplist doesn't currently work correectly for many searches, due to query limits." [puppet] - 10https://gerrit.wikimedia.org/r/337842 (https://phabricator.wikimedia.org/T114063) (owner: 10Hashar)
[19:55:48] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1050 is OK: OK: expiry mailbox lag is 0
[20:00:04] <jouncebot>	 gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170403T2000). Please do the needful.
[20:00:29] <subbu>	 no parsoid deploy today
[20:03:28] <icinga-wm>	 PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:06:30] <wikibugs_>	 (03PS1) 10Subramanya Sastry: Delink new parsoid-vd test runs from updates to parsoid git repo [puppet] - 10https://gerrit.wikimedia.org/r/346196
[20:06:35] <wikibugs>	 06Operations, 07HHVM: HHVM 3.18 deadlocks after 4-6 hours (stuck in in HPHP::Treadmill::getAgeOldestRequest() ) - https://phabricator.wikimedia.org/T161684#3152168 (10hashar) {T89912} has some related clues, specially a debugging session T89912#1286874 which mentions concurrent_hash_map and the StatCache holdi...
[20:07:19] <wikibugs_>	 06Operations, 07HHVM: HHVM lock-ups - https://phabricator.wikimedia.org/T89912#1048540 (10hashar) HHVM 3.18 has a similar deadlock that happens after just a few hours of live traffic. T161684  Most probably the same deadlock.
[20:08:45] <wikibugs_>	 06Operations, 07HHVM: HHVM 3.18 deadlocks after 4-6 hours (stuck in in HPHP::Treadmill::getAgeOldestRequest() ) - https://phabricator.wikimedia.org/T161684#3139527 (10hashar) And I mentioned it somewhere else, the statcache got enabled via T75706. At the time that has cut system CPU by half.
[20:10:47] <wikibugs_>	 (03CR) 10Subramanya Sastry: "https://github.com/wikimedia/integration-visualdiff/blob/e5ce302e8ab51303d8d2fc49f6463815e9c3adee/testreduce/client.scripts.js#L56-L60 is " [puppet] - 10https://gerrit.wikimedia.org/r/346196 (owner: 10Subramanya Sastry)
[20:13:19] <wikibugs_>	 06Operations, 07HHVM: HHVM 3.18 deadlocks after 4-6 hours (stuck in in HPHP::Treadmill::getAgeOldestRequest() ) - https://phabricator.wikimedia.org/T161684#3152193 (10MoritzMuehlenhoff) The current performance loss seems less significant though (e.g. compare mw1261 with HHVM 3.18 and stat_cache disabled to mw1...
[20:15:41] <wikibugs_>	 (03CR) 10Ottomata: [C: 031] Create a separate sysctl configuration for setting conntrack settings [puppet] - 10https://gerrit.wikimedia.org/r/319071 (owner: 10Muehlenhoff)
[20:19:28] <icinga-wm>	 RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[20:24:37] <wikibugs>	 06Operations, 06Labs: Investigate alternative RAID strategies for labstore1001/2 - https://phabricator.wikimedia.org/T162090#3152197 (10madhuvishy)
[20:26:17] <wikibugs_>	 06Operations, 07HHVM: HHVM 3.18 deadlocks after 4-6 hours (stuck in in HPHP::Treadmill::getAgeOldestRequest() ) - https://phabricator.wikimedia.org/T161684#3139527 (10Joe) >>! In T161684#3152193, @MoritzMuehlenhoff wrote: > The current performance loss seems less significant though (e.g. compare mw1261 with HH...
[20:27:18] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1031 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[20:27:18] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [1000.0]
[20:29:25] <wikibugs_>	 (03CR) 10Chad: "Is this an actual problem we're solving?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[20:30:21] <wikibugs_>	 (03PS3) 10Paladox: Gerrit: Disable md5 in ssh [puppet] - 10https://gerrit.wikimedia.org/r/346180
[20:30:51] <wikibugs>	 06Operations, 10Ops-Access-Requests, 10Traffic, 13Patch-For-Review: Ops Onboarding for Arzhel Younsi - https://phabricator.wikimedia.org/T162073#3152222 (10ayounsi) @Muehlenhoff, here is my public GPG key for pwstore.  ``` -----BEGIN PGP PUBLIC KEY BLOCK-----  mQGiBEtGU7gRBADRV1Z96fsxR6riZOD1bL3PVhyKntVakX...
[20:31:06] <wikibugs_>	 (03CR) 10Paladox: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[20:31:29] <wikibugs>	 (03CR) 10Paladox: Gerrit: Disable md5 in ssh (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[20:34:28] <icinga-wm>	 PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:42:18] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[20:44:26] <logmsgbot>	 !log bsitzmann@tin Started deploy [mobileapps/deploy@20ab197]: Update mobileapps to fdd4e31
[20:44:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:30] <wikibugs>	 (03CR) 10Mobrovac: [C: 031] "+1 for this change, but the referenced function should really take into account the fact that it may be trying to open a file that doesn't" [puppet] - 10https://gerrit.wikimedia.org/r/346196 (owner: 10Subramanya Sastry)
[20:47:31] <logmsgbot>	 !log bsitzmann@tin Finished deploy [mobileapps/deploy@20ab197]: Update mobileapps to fdd4e31 (duration: 03m 05s)
[20:47:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:52:12] <wikibugs>	 (03CR) 10Subramanya Sastry: "I initially added a puppet config to initialize the file on ruthenium ... but then backed it out since it is overly defensive code for wha" [puppet] - 10https://gerrit.wikimedia.org/r/346196 (owner: 10Subramanya Sastry)
[20:52:13] <wikibugs_>	 06Operations, 06Labs, 13Patch-For-Review: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3152276 (10Andrew) Oddly, labs instances seem to be getting their dhcp leases from install1001:  lease {   interface "eth0";   fixed-address 10.68.21.59;...
[20:55:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[21:00:04] <jouncebot>	 dapatrick, bawolff, and Reedy: Dear anthropoid, the time has come. Please deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170403T2100).
[21:01:28] <icinga-wm>	 PROBLEM - puppet last run on rdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:03:28] <icinga-wm>	 RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[21:07:09] <wikibugs_>	 (03PS3) 10Ottomata: Improvements to eventlogging_sync.sh script [puppet] - 10https://gerrit.wikimedia.org/r/345646 (https://phabricator.wikimedia.org/T124307)
[21:09:40] <wikibugs_>	 (03CR) 10Ottomata: "Ook, I've added a -s <slave-database> option to make testing this easier." [puppet] - 10https://gerrit.wikimedia.org/r/345646 (https://phabricator.wikimedia.org/T124307) (owner: 10Ottomata)
[21:10:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[21:22:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[21:27:38] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 374
[21:27:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[21:28:08] <icinga-wm>	 PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack
[21:29:28] <icinga-wm>	 RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[21:30:09] <paladox>	 chasemp andrewbogott ^^
[21:30:25] <wikibugs>	 06Operations, 10Stashbot: [IDEA] Backup bot for morebots - https://phabricator.wikimedia.org/T148694#3152351 (10Zppix) Just cleaning up some tasks ive authored, are we done with this task are we still discussing here?
[21:30:29] <chasemp>	 tx paladox
[21:30:39] <paladox>	 your welcome :)
[21:31:12] <chasemp>	 andrewbogott: we just met our 7 limit
[21:31:25] <chasemp>	 andrewbogott: I'm going to clean house to keep the metrics going
[21:31:37] <wikibugs_>	 (03PS4) 10Madhuvishy: WIP tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (owner: 10Rush)
[21:37:08] <icinga-wm>	 RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack
[21:38:18] <icinga-wm>	 PROBLEM - puppet last run on conf1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:38:36] <wikibugs_>	 (03CR) 10Dzahn: [C: 031] Gerrit: Disable md5 in ssh [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[21:57:18] <icinga-wm>	 RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:59:39] <paladox>	 "“We’ve decided to reorganise our operations further and therefore, as of 21 March 2014, services provided from all our European websites will be provided by Yahoo! EMEA, in Ireland."
[21:59:40] <paladox>	 woops
[21:59:44] <paladox>	 wrong place
[22:00:08] <icinga-wm>	 PROBLEM - Host lvs2002 is DOWN: PING CRITICAL - Packet loss = 100%
[22:00:38] <icinga-wm>	 PROBLEM - puppet last run on db2068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:00:48] <icinga-wm>	 PROBLEM - puppet last run on mw2232 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:02:05] <wikibugs_>	 (03CR) 10Paladox: "@BBlack hi could you review please?" [puppet] - 10https://gerrit.wikimedia.org/r/346180 (owner: 10Paladox)
[22:04:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[22:06:18] <icinga-wm>	 RECOVERY - puppet last run on conf1002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[22:06:24] <mutante>	 !log power cycling lvs2002, it was down and console showed nothing
[22:06:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:09:28] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1073 is OK: OK: expiry mailbox lag is 294349
[22:09:39] <icinga-wm>	 RECOVERY - Host lvs2002 is UP: PING OK - Packet loss = 0%, RTA = 36.08 ms
[22:09:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[22:16:16] <wikibugs>	 (03CR) 10Volans: "Approach looks good, just few comments inline, nothing really blocking" (0318 comments) [puppet] - 10https://gerrit.wikimedia.org/r/346118 (owner: 10Giuseppe Lavagetto)
[22:16:55] <volans>	 elukey: the "few" was for you ;) ^^^
[22:24:12] <wikibugs>	 06Operations, 10Stashbot: [IDEA] Backup bot for morebots - https://phabricator.wikimedia.org/T148694#3152537 (10Krinkle) 05Open>03declined
[22:28:38] <icinga-wm>	 RECOVERY - puppet last run on db2068 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[22:29:48] <icinga-wm>	 RECOVERY - puppet last run on mw2232 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[22:34:34] <wikibugs_>	 (03PS1) 10Subramanya Sastry: ruthenium: increase parsoid-vd clients from 4 to 6 [puppet] - 10https://gerrit.wikimedia.org/r/346209
[22:35:22] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=no; selector: dc=eqiad,name=ms-fe1005.eqiad.wmnet
[22:35:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:37:28] <icinga-wm>	 PROBLEM - puppet last run on mw1207 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:37:45] <logmsgbot>	 !log volans@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,name=ms-fe1005.eqiad.wmnet
[22:37:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:39:07] <volans>	 !log completed restart of swift-proxies in eqiad, ms-fe1005 was missing due to swiftrepl stuck/running
[22:39:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:41:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[22:41:56] <wikibugs>	 06Operations, 10ops-codfw, 10Traffic: lvs2002 random shut down - https://phabricator.wikimedia.org/T162099#3152568 (10Dzahn)
[22:46:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[22:48:15] <wikibugs_>	 06Operations, 10Wikimedia-Mailing-lists: Reset admin password for wikimania-program mailing list - https://phabricator.wikimedia.org/T162080#3152584 (10Aklapper) p:05Triage>03High
[22:57:23] <wikibugs_>	 06Operations, 10Wikimedia-Mailing-lists: Reset admin password for wikimania-program mailing list - https://phabricator.wikimedia.org/T162080#3151892 (10Dzahn) @eyoung Hi, you should have received an automatic mail from mailman with a new randomly generated password that you can use to login. Best, Daniel  --...
[23:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170403T2300). Please do the needful.
[23:00:04] <jouncebot>	 TimStarling and Niharika: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[23:03:52] <wikibugs>	 (03PS2) 10Dzahn: ssh: avoid hardcoded hostname for yubiauth, add to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/346183
[23:04:05] <thcipriani>	 I can SWAT. TimStarling Niharika ping me when you're available.
[23:04:14] <Niharika>	 thcipriani: Hi! I'm here. 
[23:05:14] <wikibugs_>	 (03PS2) 10Thcipriani: Test LoginNotify on Beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345726 (https://phabricator.wikimedia.org/T158878) (owner: 10Niharika29)
[23:05:21] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345726 (https://phabricator.wikimedia.org/T158878) (owner: 10Niharika29)
[23:06:22] <thcipriani>	 Niharika: hello :) so after this merges I'll sync it out to production to make sure everything merged is deployed, but it will deploy to beta cluster on the next beta-code-update run
[23:06:28] <icinga-wm>	 RECOVERY - puppet last run on mw1207 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[23:06:48] <wikibugs_>	 (03Merged) 10jenkins-bot: Test LoginNotify on Beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345726 (https://phabricator.wikimedia.org/T158878) (owner: 10Niharika29)
[23:06:53] <thcipriani>	 which...I just realized is stuck :)
[23:06:57] <thcipriani>	 will fix after swat
[23:07:17] <Niharika>	 thcipriani: Okay!
[23:09:37] <TimStarling>	 thcipriani: I'm here now
[23:10:30] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config: SWAT: [[gerrit:345726|Test LoginNotify on Beta cluster]] T158878 (duration: 00m 46s)
[23:10:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:10:37] <stashbot>	 T158878: Test LoginNotify Extension on Beta Cluster - https://phabricator.wikimedia.org/T158878
[23:10:50] <thcipriani>	 Niharika: ^ sync'd live, will fix beta deploy shortly :)
[23:11:05] <Niharika>	 thcipriani: Thank you. 
[23:11:21] <saper>	 hi Niharika :)
[23:11:32] <Niharika>	 Hey saper. How're you doing?
[23:11:57] <thcipriani>	 TimStarling: hello! Looks like we need to do a full scap for this one so I'll get that cracking.
[23:12:04] <TimStarling>	 yes
[23:12:37] <wikibugs_>	 (03PS6) 10Thcipriani: Deploy ParserMigration extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/344276 (https://phabricator.wikimedia.org/T141586) (owner: 10Tim Starling)
[23:12:58] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/344276 (https://phabricator.wikimedia.org/T141586) (owner: 10Tim Starling)
[23:13:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[23:13:59] <wikibugs_>	 (03Merged) 10jenkins-bot: Deploy ParserMigration extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/344276 (https://phabricator.wikimedia.org/T141586) (owner: 10Tim Starling)
[23:14:28] <icinga-wm>	 PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:15:46] <thcipriani>	 TimStarling: ok, so I think I'm going to run a full scap without the commonsettings.php change, then after the full scap we can test with the commonsettings.php change on mwdebug1002 and then go all in. Sound sane?
[23:16:07] <TimStarling>	 sounds good
[23:16:39] <TimStarling>	 sounds commendably cautious
[23:17:11] <wikibugs>	 (03CR) 10Mobrovac: "The CPUs are at 80% though and each diff run seems to take at least 10% more. Do we really want to do it? Perhaps try with 5 first?" [puppet] - 10https://gerrit.wikimedia.org/r/346209 (owner: 10Subramanya Sastry)
[23:18:34] <wikibugs_>	 (03PS3) 10Dzahn: ssh: avoid hardcoded hostname for yubiauth, add to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/346183
[23:18:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[23:19:19] <logmsgbot>	 !log thcipriani@tin Started scap: SWAT: [[gerrit:344276|Deploy ParserMigration extension]] T141586 (l10nupdate only)
[23:19:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:25] <stashbot>	 T141586: Deploy ParserMigration extension - https://phabricator.wikimedia.org/T141586
[23:23:58] <wikibugs_>	 (03CR) 10Subramanya Sastry: "It is because of parsoid-rt and parsoid-vd kicking off at the same time. Plus testreduce picks the worst failures to retry first and crash" [puppet] - 10https://gerrit.wikimedia.org/r/346209 (owner: 10Subramanya Sastry)
[23:25:24] <wikibugs>	 06Operations, 15User-Elukey, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3152656 (10aaron) In $wmgRedisQueueBaseConfig in wmf-config/jobqueue.php I see the timeout is currently 0.3....
[23:40:48] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[23:41:43] <logmsgbot>	 !log thcipriani@tin Finished scap: SWAT: [[gerrit:344276|Deploy ParserMigration extension]] T141586 (l10nupdate only) (duration: 22m 24s)
[23:41:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:41:50] <stashbot>	 T141586: Deploy ParserMigration extension - https://phabricator.wikimedia.org/T141586
[23:42:28] <icinga-wm>	 RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[23:43:36] <thcipriani>	 ^ TimStarling ok so l10n should be up-to-date, I pulled the updated commonsettings.php on mwdebug1002, check please
[23:45:48] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 279 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[23:46:34] <wikibugs>	 06Operations, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Doteli - https://phabricator.wikimedia.org/T161529#3152678 (10Dereckson)
[23:51:08] <wikibugs>	 (03CR) 10jenkins-bot: Deploy ParserMigration extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/344276 (https://phabricator.wikimedia.org/T141586) (owner: 10Tim Starling)
[23:52:18] <TimStarling>	 thcipriani: looks fine
[23:52:29] <thcipriani>	 TimStarling: ok, going live everywhere
[23:53:00] <TimStarling>	 I enabled the user preference and tested the new editor
[23:54:01] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:344276|Deploy ParserMigration extension]] T141586 (for real) (duration: 00m 44s)
[23:54:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:54:08] <stashbot>	 T141586: Deploy ParserMigration extension - https://phabricator.wikimedia.org/T141586
[23:54:35] <TimStarling>	 looks good
[23:55:24] <wikibugs>	 (03PS4) 10Dzahn: ssh: avoid hardcoded hostname for yubiauth, add to Hiera [puppet] - 10https://gerrit.wikimedia.org/r/346183
[23:57:07] <thcipriani>	 cool, thanks for checking, logs seem ok too :)
[23:57:51] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346144 (owner: 10Marostegui)
[23:58:47] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1015 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346160 (https://phabricator.wikimedia.org/T159319) (owner: 10Marostegui)
[23:59:38] <wikibugs_>	 (03CR) 10jenkins-bot: Optimalize all not-optimalized logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/346057 (https://phabricator.wikimedia.org/T161999) (owner: 10Urbanecm)