[00:05:28] <icinga-wm>	 PROBLEM - puppet last run on maps1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:33:28] <icinga-wm>	 RECOVERY - puppet last run on maps1003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[00:41:18] <icinga-wm>	 PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:09:18] <icinga-wm>	 RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[02:16:59] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.9) (duration: 06m 11s)
[02:17:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:21:21] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Jan 30 02:21:21 UTC 2017 (duration 4m 22s)
[02:21:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:29:48] <icinga-wm>	 PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:58:48] <icinga-wm>	 RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[03:00:38] <icinga-wm>	 PROBLEM - puppet last run on wtp1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:05:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 1814.10332 Seconds
[03:06:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 44.669994 Seconds
[03:21:18] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 820.13 seconds
[03:26:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 253.20 seconds
[03:28:38] <icinga-wm>	 RECOVERY - puppet last run on wtp1014 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[03:59:50] <wikibugs>	 06Operations, 10Gerrit, 06Release-Engineering-Team: Enable the git:// protocole on gerrit - https://phabricator.wikimedia.org/T156597#2980855 (10demon) 05Open>03declined There's zero reason to do this and just adds complexity we'll have to support.
[04:03:28] <icinga-wm>	 PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:17:28] <icinga-wm>	 PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:31:28] <icinga-wm>	 RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[04:45:28] <icinga-wm>	 RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[05:14:37] <wikibugs>	 (03PS1) 10Legoktm: toollabs: Install mktorrent [puppet] - 10https://gerrit.wikimedia.org/r/334962 (https://phabricator.wikimedia.org/T155470)
[06:01:59] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.876 second response time
[06:02:59] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.807 second response time
[06:03:59] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.613 second response time
[06:06:59] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.867 second response time
[06:07:59] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.308 second response time
[06:09:59] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.307 second response time
[06:25:59] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.203 second response time
[06:28:59] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.765 second response time
[06:29:59] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.190 second response time
[06:30:59] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.533 second response time
[06:31:39] <icinga-wm>	 PROBLEM - Host mw1236 is DOWN: PING CRITICAL - Packet loss = 100%
[06:44:59] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 1.159 second response time
[06:45:59] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.748 second response time
[06:48:19] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[06:53:49] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[06:57:05] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.190 second response time
[06:58:05] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.805 second response time
[07:03:09] <wikibugs>	 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 06Services (watching), and 5 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#2980969 (10Joe) >>! In T149617#2977429, @Legoktm wrote: > @joe could you also upload the slides...
[07:20:05] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2358
[07:22:44] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2968039 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1072.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage...
[07:24:05] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.426 second response time
[07:24:54] <friendly12345>	 _joe_: Re: Dynamic configuration - I assume this precludes any form of credentials being dynamic
[07:25:05] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.428 second response time
[07:25:05] <icinga-wm>	 RECOVERY - check_mysql on frdb2001 is OK: Uptime: 312777 Threads: 1 Questions: 3097182 Slow queries: 1425 Opens: 2119 Flush tables: 1 Open tables: 561 Queries per second avg: 9.902 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[07:26:09] <_joe_>	 friendly12345: credentials are configuration, not state. Hence, it's not suited for being set at runtime
[07:26:15] <_joe_>	 friendly12345: https://commons.wikimedia.org/wiki/File:Integrating_MediaWiki_(and_other_services)_with_dynamic_configuration.webm :)
[07:29:05] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.287 second response time
[07:29:11] <wikibugs>	 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2980984 (10Gilles)
[07:29:55] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.254 second response time
[07:30:05] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.781 second response time
[07:30:55] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.388 second response time
[07:31:09] <wikibugs>	 06Operations, 10DBA, 10Phabricator, 06Release-Engineering-Team, 07Upstream: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373#2980985 (10Marostegui) I have upgraded db2012 to 10.0.29-2 (actually done a full upgrade) and the A...
[07:35:37] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Record CVE ID fixed in earlier 4.4.x kernel [debs/linux44] - 10https://gerrit.wikimedia.org/r/334666 (owner: 10Muehlenhoff)
[07:35:55] <wikibugs>	 (03PS3) 10Muehlenhoff: Switch app servers in codfw to systemd-timesyncd [puppet] - 10https://gerrit.wikimedia.org/r/332976 (https://phabricator.wikimedia.org/T150257)
[07:40:22] <wikibugs>	 06Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Asynchronous processing in production: one queue to rule them all - https://phabricator.wikimedia.org/T149408#2980990 (10Joe) https://commons.wikimedia.org/wiki/File:Asynchronous_processing_on_the_WMF_cluster.pdf is the uploaded file.
[07:46:28] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2980992 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1072.eqiad.wmnet'] ```  Of which those **FAILED**: ``` set(['db1072.eqiad.wmnet']) ```
[07:47:09] <wikibugs>	 06Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Asynchronous processing in production: one queue to rule them all - https://phabricator.wikimedia.org/T149408#2980993 (10Joe) 05Open>03Resolved
[07:50:43] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2980995 (10Marostegui) >>! In T156226#2980992, @ops-monitoring-bot wrote: > Completed auto-reimage of hosts: > ``` > ['db1072.eqiad.wmnet'] > ``` >  > Of which those **FAILED**: > ``` > set(['db1072....
[07:52:30] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334973 (https://phabricator.wikimedia.org/T156226)
[07:55:07] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2980998 (10Marostegui) And from the reimage command outout:  ``` sudo -E wmf-auto-reimage -p T156226 db1072.eqiad.wmnet START To monitor the full log: tail -F /var/log/wmf-auto-reimage/201701300722_m...
[07:56:28] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334973 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[07:57:05] <icinga-wm>	 PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:57:53] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334973 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[07:58:21] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334973 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[08:01:06] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1073 - T156226 (duration: 02m 45s)
[08:01:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:01:12] <stashbot>	 T156226: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226
[08:02:20] <wikibugs>	 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 06Services (watching), and 5 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#2981020 (10Joe) >>! In T149617#2980969, @Joe wrote: >>>! In T149617#2977429, @Legoktm wrote: >>...
[08:03:35] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2968039 (10MoritzMuehlenhoff) I've seen this once or twice during the app server reimages as well. IIRC it was related to a race in adding the salt key and difficult to fix in the current design of w...
[08:03:55] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Switch app servers in codfw to systemd-timesyncd [puppet] - 10https://gerrit.wikimedia.org/r/332976 (https://phabricator.wikimedia.org/T150257) (owner: 10Muehlenhoff)
[08:04:53] <marostegui>	 !log Stop mysql db1073 to use it to clone db1072 - T156226
[08:04:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:42] <moritzm>	 !log switched application servers in codfw to systemd-timesyncd
[08:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:06:43] <wikibugs>	 (03CR) 10Nemo bis: dumps: Modernize design of the index page (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/334856 (https://phabricator.wikimedia.org/T155697) (owner: 10Ladsgroup)
[08:08:46] <marostegui>	 mw1236.eqiad.wmnet down?
[08:10:17] <marostegui>	 racadm serveraction powerstatus
[08:10:17] <marostegui>	 Server power status: OFF
[08:10:18] <marostegui>	 indeed
[08:10:45] <elukey>	 nothing on the SAL afaics
[08:10:54] <marostegui>	 yeah, there is nothing there
[08:11:24] <elukey>	 probably better to depool it explictly
[08:11:32] <elukey>	 so it will not be part of scap dsh
[08:12:07] <marostegui>	 never done that before :)
[08:12:19] <elukey>	 doing it now :)
[08:12:26] <marostegui>	 oh, thanks!
[08:12:33] <marostegui>	 maybe you can teach me how to? 
[08:12:55] <elukey>	 sure sure, I was going to copy paste commands
[08:13:01] <elukey>	 checked its status
[08:13:01] <marostegui>	 <3
[08:13:02] <elukey>	 confctl --quiet select 'name=mw1236.eqiad.wmnet' get
[08:13:05] <elukey>	 then
[08:13:28] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226#2981031 (10Marostegui) >>! In T156226#2981023, @MoritzMuehlenhoff wrote: > I've seen this once or twice during the app server reimages as well. IIRC it was related to a race in adding the salt key an...
[08:13:33] <elukey>	 confctl --quiet select 'name=mw1236.eqiad.wmnet' --action set/pooled=inactive
[08:13:42] <elukey>	 (maybe without --quiet)
[08:13:47] <elukey>	 (so it will log)
[08:14:03] <elukey>	 and finally puppet run on tin to make sure that scap's dsh is updated
[08:14:22] <elukey>	 no sorry the inactive command is wrong :P
[08:14:24] <elukey>	 let me chck
[08:14:28] <marostegui>	 haha :)
[08:14:59] <wikibugs>	 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2981032 (10Gilles) Accept headers and Vary: Accept are missing from the current task description.  Standardizing entry point to thumb.php (and I guess...
[08:15:27] <elukey>	 confctl --quiet select 'name=mw1236.eqiad.wmnet' set/pooled=inactive
[08:15:31] <elukey>	 no --action
[08:15:33] <elukey>	 bad history
[08:15:33] <elukey>	 :)
[08:15:58] <marostegui>	 noted! :)
[08:15:59] <marostegui>	 thanks
[08:19:06] <elukey>	 !log set mw1236.eqiad.wmnet pooled=inactive because powered off (no mentions on the SAL, still trying to find why)
[08:19:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:21:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 031] "Looks fine to me" [puppet] - 10https://gerrit.wikimedia.org/r/334719 (https://phabricator.wikimedia.org/T156529) (owner: 10Dzahn)
[08:22:14] <elukey>	 ah there you go "07:31  <icinga-wm> PROBLEM - Host mw1236 is DOWN: PING CRITICAL - Packet loss = 100%"
[08:22:19] <elukey>	 went down a couple of hours ago
[08:25:05] <icinga-wm>	 RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[08:25:44] <elukey>	 let's see if it powersup
[08:26:27] <elukey>	 ERROR: Timeout while waiting for server to perform requested power action.
[08:26:44] <elukey>	 opening a phab task.. 
[08:28:34] <marostegui>	 thanks :)
[08:30:01] <wikibugs>	 06Operations, 10ops-eqiad: mw1236 powered down and not able to powerup - https://phabricator.wikimedia.org/T156610#2981044 (10elukey)
[08:30:05] <icinga-wm>	 PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:30:09] <wikibugs>	 06Operations, 10ops-eqiad: mw1236 powered down and not able to powerup - https://phabricator.wikimedia.org/T156610#2981057 (10elukey) p:05Triage>03Normal
[08:35:05] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.450 second response time
[08:36:05] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.583 second response time
[08:45:27] <elukey>	 !log restarting aqs on aqs100[4567] to pick up NSS updates
[08:45:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:51:05] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.479 second response time
[08:52:05] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.605 second response time
[08:54:39] <moritzm>	 !log installing NSS security updates on kafka and Hadoop clusters
[08:54:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:56:12] <wikibugs>	 (03PS7) 10Elukey: Refactor role memcached in multiple profiles [puppet] - 10https://gerrit.wikimedia.org/r/333880
[08:58:07] <icinga-wm>	 RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[09:03:34] <wikibugs>	 (03PS1) 10Zhuyifei1999: kubernetesbackend: change absolute kubectl path to '/usr/bin/kubectl' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/334978 (https://phabricator.wikimedia.org/T156605)
[09:05:47] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[09:05:51] <marostegui>	 !log Start slaves from s1 to s7 on dbstore2001 - T156373
[09:05:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:57] <stashbot>	 T156373: During Phabricator upgrade on 2017-01-26, all m3 replica dbs crashed at the same time - https://phabricator.wikimedia.org/T156373
[09:06:18] <marostegui>	 !log Upgrade db2012 to 10.0.29-2 (this was done couple of hours ago, but for the record) - T156373
[09:06:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:07] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.763 second response time
[09:11:28] <wikibugs>	 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2782449 (10fgiunchedi) WRT deployment strategy note that ATM all thumb accesses after varnish go through our [[https://github.com/wikimedia/operations-...
[09:12:07] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.782 second response time
[09:16:20] <wikibugs>	 (03PS2) 10Elukey: Enable AQS aqs1007-b cassandra instance [puppet] - 10https://gerrit.wikimedia.org/r/334753 (https://phabricator.wikimedia.org/T155654)
[09:17:07] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 1.025 second response time
[09:19:07] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.180 second response time
[09:19:28] <moritzm>	 !log installing tcpdump security updates
[09:19:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:25:07] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.924 second response time
[09:25:29] <wikibugs>	 (03CR) 10Elukey: [C: 032] Enable AQS aqs1007-b cassandra instance [puppet] - 10https://gerrit.wikimedia.org/r/334753 (https://phabricator.wikimedia.org/T155654) (owner: 10Elukey)
[09:25:51] <elukey>	 !log bootstrapping new cassandra instance (aqs1007-b) on AQS - https://gerrit.wikimedia.org/r/#/c/334753/
[09:25:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:27:07] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.465 second response time
[09:33:16] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] Make md5sums.txt files compatible with md5sum --check [dumps] - 10https://gerrit.wikimedia.org/r/328219 (https://phabricator.wikimedia.org/T69886) (owner: 10Awight)
[09:34:46] <wikibugs>	 (03CR) 10ArielGlenn: [V: 032 C: 032] Make md5sums.txt files compatible with md5sum --check [dumps] - 10https://gerrit.wikimedia.org/r/328219 (https://phabricator.wikimedia.org/T69886) (owner: 10Awight)
[09:36:30] <wikibugs>	 (03CR) 10Hashar: "There is a nice crazy change in modules/confluent/manifests/kafka/mirror/instance.pp   which I am not sure what would be the end result:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/334317 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[09:37:07] <logmsgbot>	 !log ariel@tin Starting deploy [dumps/dumps@4a9e952]: proper md5sum format for adds/changes dumps
[09:37:09] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@4a9e952]: proper md5sum format for adds/changes dumps (duration: 00m 02s)
[09:37:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:07] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK
[09:42:27] <elukey>	 grrr
[09:43:19] <wikibugs>	 (03PS1) 10ArielGlenn: Turn off centralauth table dumps. [puppet] - 10https://gerrit.wikimedia.org/r/334985 (https://phabricator.wikimedia.org/T153633)
[09:44:07] <godog>	 !log upgrade to thumbor 0.1.33 - T151066
[09:44:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:12] <stashbot>	 T151066: Implement PoolCounter support in Thumbor - https://phabricator.wikimedia.org/T151066
[09:47:05] <wikibugs>	 (03CR) 10Hashar: "I have seen it in other changes, I am not a huge fan of adding trailing commas to hash that have a single element.  Specially puppet-lint " (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/334309 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[09:48:47] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Revert "Remove broken Thumbor IP throttling from configuration" [puppet] - 10https://gerrit.wikimedia.org/r/334252 (owner: 10Gilles)
[09:51:30] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] Revert "Remove broken Thumbor IP throttling from configuration" [puppet] - 10https://gerrit.wikimedia.org/r/334252 (owner: 10Gilles)
[09:51:51] <godog>	 ci is backed up, self +2
[09:52:57] <icinga-wm>	 PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata]
[09:54:01] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Repool db1072,db1073 with less load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334987 (https://phabricator.wikimedia.org/T156226)
[09:54:24] <wikibugs>	 (03CR) 10Marostegui: [C: 04-2] "wait for the servers to catch up" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334987 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[09:57:23] <wikibugs>	 (03PS2) 10Marostegui: db-eqiad.php: Repool db1073 with less load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334987 (https://phabricator.wikimedia.org/T156226)
[09:58:18] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Enable Prometheus JMX exporter on Cassandra nodes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans)
[09:58:49] <godog>	 _joe_: I had a question for you re: conftool(-data) in ^
[09:58:51] <gehel>	 !log upgrade and restart nginx on relforge cluster
[09:58:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:08] <wikibugs>	 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2981357 (10Gilles) I don't think that the existing requirements are to be touched at first. I.e. thumbnails would still be width-based only.  Some adap...
[09:59:10] <wikibugs>	 (03CR) 10Marostegui: [C: 032] "db1073 caught up" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334987 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[10:00:43] <gehel>	 !log upgrade and restart nginx on elasticsearch codfw cluster
[10:00:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:01:13] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.64.0.237:9042 on aqs1007 is CRITICAL: connect to address 10.64.0.237 and port 9042: Connection refused
[10:01:44] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1073 with less load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334987 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[10:01:53] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Repool db1073 with less load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334987 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[10:03:50] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1073 with less weight - T156226 (duration: 00m 49s)
[10:03:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:54] <_joe_>	 godog: looking
[10:03:55] <stashbot>	 T156226: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226
[10:06:23] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] Enable Prometheus JMX exporter on Cassandra nodes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans)
[10:07:02] <godog>	 _joe_: thanks!
[10:07:03] <icinga-wm>	 PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:10:12] <elukey>	 silencing aqs1007-b! Sorry for the delay
[10:10:53] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK
[10:11:55] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-b CQL 10.64.0.237:9042 on aqs1007 is CRITICAL: connect to address 10.64.0.237 and port 9042: Connection refused Elukey bootstrapping cassandra
[10:17:52] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 032] etcd.py: log a warning on empty responses from etcd [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/334369 (https://phabricator.wikimedia.org/T134893) (owner: 10Ema)
[10:18:56] <wikibugs>	 (03PS2) 10Addshore: Add twocolconflict to wgBetaFeaturesWhitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332904 (https://phabricator.wikimedia.org/T150184)
[10:19:06] <addshore>	 James_F: ^^
[10:21:03] <icinga-wm>	 RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[10:21:18] <wikibugs>	 (03CR) 10Jforrester: ">" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332904 (https://phabricator.wikimedia.org/T150184) (owner: 10Addshore)
[10:21:28] <James_F>	 addshore: Sorry.
[10:21:50] <addshore>	 James_F: thats what we did for the RevisionSlider ;)
[10:22:02] <addshore>	 we were simply aiming for the same :P
[10:22:11] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Thumbor resource consumption is spiky - https://phabricator.wikimedia.org/T151851#2981440 (10Gilles)
[10:22:14] <James_F>	 addshore: Yeah, it was wrong then and it's wrong now.
[10:23:10] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Thumbor resource consumption is spiky - https://phabricator.wikimedia.org/T151851#2829906 (10Gilles) @fgiunchedi please open read rights for me on the nginx access logs on thumbor100*  This way I can look for some kind of pattern in the requests around the time th...
[10:23:37] <addshore>	 James_F: ack, will check it then!
[10:24:15] <James_F>	 Sorry, didn't see the gerrit reply from last month until now otherwise I'd have replied then.
[10:24:20] <James_F>	 Don't want to mess you around.
[10:24:44] <wikibugs>	 (03PS2) 10Ema: Use caller function module name as default log prefix [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/334567
[10:27:13] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.326 second response time
[10:28:13] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.972 second response time
[10:28:38] <wikibugs>	 (03PS1) 10Urbanecm: Create Wikiprojekti namespace on fiwiki and enable VE in it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334997 (https://phabricator.wikimedia.org/T156621)
[10:32:26] <wikibugs>	 (03CR) 10Ema: Use caller function module name as default log prefix (032 comments) [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/334567 (owner: 10Ema)
[10:32:45] <wikibugs>	 (03CR) 10Jforrester: [C: 031] Create Wikiprojekti namespace on fiwiki and enable VE in it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334997 (https://phabricator.wikimedia.org/T156621) (owner: 10Urbanecm)
[10:33:04] <wikibugs>	 (03CR) 10Jforrester: "(Otherwise please consider this a +1 from me.)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332904 (https://phabricator.wikimedia.org/T150184) (owner: 10Addshore)
[10:33:40] <wikibugs>	 (03CR) 10Addshore: "Brilliant!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332904 (https://phabricator.wikimedia.org/T150184) (owner: 10Addshore)
[10:36:03] <icinga-wm>	 RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[10:40:34] <wikibugs>	 (03PS2) 10Ema: cache: remove varnish_version4 from hiera and salt [puppet] - 10https://gerrit.wikimedia.org/r/334043
[10:41:19] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 032] cache: remove varnish_version4 from hiera and salt [puppet] - 10https://gerrit.wikimedia.org/r/334043 (owner: 10Ema)
[10:50:12] <wikibugs>	 (03CR) 10Gehel: [C: 031] "LGTM, trivial enough change..." [puppet] - 10https://gerrit.wikimedia.org/r/334293 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[10:56:07] <wikibugs>	 (03CR) 10Muehlenhoff: "Two additional comments. Looking at the rules file, it can be reduced to a few lines by moving to dh, this would simplify things a lot. Se" (032 comments) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox)
[10:56:25] <wikibugs>	 (03PS2) 10ArielGlenn: Turn off centralauth table dumps. [puppet] - 10https://gerrit.wikimedia.org/r/334985 (https://phabricator.wikimedia.org/T153633)
[10:57:49] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] Turn off centralauth table dumps. [puppet] - 10https://gerrit.wikimedia.org/r/334985 (https://phabricator.wikimedia.org/T153633) (owner: 10ArielGlenn)
[11:08:57] <wikibugs>	 (03PS6) 10Juniorsys: Linting fixes (Multiple modules) [puppet] - 10https://gerrit.wikimedia.org/r/334276 (https://phabricator.wikimedia.org/T93645)
[11:10:11] <wikibugs>	 (03PS5) 10Juniorsys: librenms/locales/logstash/lshell linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334293 (https://phabricator.wikimedia.org/T93645)
[11:11:13] <icinga-wm>	 PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.787 second response time
[11:12:13] <icinga-wm>	 RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.135 second response time
[11:12:21] <wikibugs>	 (03PS5) 10Juniorsys: deployment: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334278 (https://phabricator.wikimedia.org/T93645)
[11:20:27] <wikibugs>	 (03CR) 10Hashar: [C: 031] librenms/locales/logstash/lshell linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334293 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[11:23:59] <gehel>	 !log upgrade and restart nginx on elasticsearch eqiad cluster
[11:24:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:25:21] <wikibugs>	 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 13Patch-For-Review, and 2 others: Create an etcd cluster in codfw - https://phabricator.wikimedia.org/T156009#2981585 (10Joe) The cluster in codfw is installed and tested to work correctly with conftool. The performance of the cluster using nginx as a...
[11:32:10] <wikibugs>	 06Operations, 10Gerrit, 06Release-Engineering-Team: Enable the git:// protocole on gerrit - https://phabricator.wikimedia.org/T156597#2981610 (10Aklapper) > Theres no steps as this is a feature request. For future reference, please always provide **reasons why** you request something.
[11:35:33] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1018 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:37:03] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1019 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:38:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1020 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:39:03] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1019 is OK: SSL OK - Certificate elastic1019.eqiad.wmnet valid until 2021-03-15 20:20:18 +0000 (expires in 1505 days)
[11:39:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1021 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:40:22] <wikibugs>	 (03PS1) 10Muehlenhoff: More email addresses [puppet] - 10https://gerrit.wikimedia.org/r/335005
[11:41:18] <wikibugs>	 (03PS8) 10Giuseppe Lavagetto: Generalize entities definitions [software/conftool] - 10https://gerrit.wikimedia.org/r/288609 (https://phabricator.wikimedia.org/T155823)
[11:41:20] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823)
[11:43:03] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1023 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:44:03] <icinga-wm>	 PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:44:33] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1018 is OK: SSL OK - Certificate elastic1018.eqiad.wmnet valid until 2021-03-15 20:19:15 +0000 (expires in 1505 days)
[11:44:47] <wikibugs>	 (03PS9) 10Giuseppe Lavagetto: Generalize entities definitions [software/conftool] - 10https://gerrit.wikimedia.org/r/288609 (https://phabricator.wikimedia.org/T155823)
[11:44:49] <wikibugs>	 (03PS7) 10Giuseppe Lavagetto: Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823)
[11:45:02] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: Generalize entities definitions (0310 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/288609 (https://phabricator.wikimedia.org/T155823) (owner: 10Giuseppe Lavagetto)
[11:45:36] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] More email addresses [puppet] - 10https://gerrit.wikimedia.org/r/335005 (owner: 10Muehlenhoff)
[11:45:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1025 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:49:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1028 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:50:17] <wikibugs>	 (03PS1) 10Elukey: Add aqs1008-a to the AQS Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/335006 (https://phabricator.wikimedia.org/T155654)
[11:50:54] <wikibugs>	 (03PS1) 10ArielGlenn: sample uwsgi app that would produce json status output for dumps [dumps/statusapi] - 10https://gerrit.wikimedia.org/r/335007 (https://phabricator.wikimedia.org/T147177)
[11:51:10] <wikibugs>	 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Wikimedia-Multiple-active-datacenters: Assess SCB@CODFW  preparedness for the DC switchover - https://phabricator.wikimedia.org/T156361#2981625 (10akosiaris) Turns out the CPU increase mentioned above is not the result of some bug or otherwise malfunction/chan...
[11:51:14] <wikibugs>	 (03PS2) 10Elukey: Add aqs1008-a to the AQS Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/335006 (https://phabricator.wikimedia.org/T155654)
[11:51:33] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1029 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:51:50] <_joe_>	 uh?
[11:52:01] <_joe_>	 oh ok
[11:52:03] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1023 is OK: SSL OK - Certificate elastic1023.eqiad.wmnet valid until 2021-03-15 20:24:40 +0000 (expires in 1505 days)
[11:52:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1030 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:53:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1020 is OK: SSL OK - Certificate elastic1020.eqiad.wmnet valid until 2021-03-15 20:21:21 +0000 (expires in 1505 days)
[11:53:53] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1031 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:54:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1032 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:54:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1021 is OK: SSL OK - Certificate elastic1021.eqiad.wmnet valid until 2021-08-31 15:30:11 +0000 (expires in 1674 days)
[11:54:44] <apergos>	 one user. meh
[11:56:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1033 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:57:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1034 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:59:03] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1035 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[11:59:43] <icinga-wm>	 PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:00:33] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1036 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[12:00:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1028 is OK: SSL OK - Certificate elastic1028.eqiad.wmnet valid until 2021-08-31 16:12:42 +0000 (expires in 1674 days)
[12:01:53] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1037 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[12:01:53] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1031 is OK: SSL OK - Certificate elastic1031.eqiad.wmnet valid until 2021-03-15 20:33:49 +0000 (expires in 1505 days)
[12:02:42] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/5276/aqs1008.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/335006 (https://phabricator.wikimedia.org/T155654) (owner: 10Elukey)
[12:05:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1033 is OK: SSL OK - Certificate elastic1033.eqiad.wmnet valid until 2021-06-21 12:10:36 +0000 (expires in 1603 days)
[12:05:52] <wikibugs>	 06Operations: Review of ferm services without srange - https://phabricator.wikimedia.org/T149804#2981638 (10MoritzMuehlenhoff) p:05Triage>03Normal
[12:05:53] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1040 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[12:05:53] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - search-https_9243 - Could not depool server elastic1046.eqiad.wmnet because of too many down!
[12:06:33] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1029 is OK: SSL OK - Certificate elastic1029.eqiad.wmnet valid until 2021-08-31 18:02:18 +0000 (expires in 1674 days)
[12:06:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1025 is OK: SSL OK - Certificate elastic1025.eqiad.wmnet valid until 2021-03-15 20:26:54 +0000 (expires in 1505 days)
[12:06:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1030 is OK: SSL OK - Certificate elastic1030.eqiad.wmnet valid until 2021-03-15 20:32:44 +0000 (expires in 1505 days)
[12:07:43] <icinga-wm>	 PROBLEM - Elasticsearch HTTPS on elastic1041 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[12:07:51] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove otto from piwik-roots [puppet] - 10https://gerrit.wikimedia.org/r/335010 (https://phabricator.wikimedia.org/T142836)
[12:07:53] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy
[12:08:12] * gehel is looking at elasticsearch ...
[12:08:33] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1036 is OK: SSL OK - Certificate elastic1036.eqiad.wmnet valid until 2021-06-21 12:10:51 +0000 (expires in 1603 days)
[12:08:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1034 is OK: SSL OK - Certificate elastic1034.eqiad.wmnet valid until 2021-06-21 12:10:41 +0000 (expires in 1603 days)
[12:08:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1032 is OK: SSL OK - Certificate elastic1032.eqiad.wmnet valid until 2021-06-21 08:40:25 +0000 (expires in 1602 days)
[12:08:43] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1041 is OK: SSL OK - Certificate elastic1041.eqiad.wmnet valid until 2021-06-21 13:36:01 +0000 (expires in 1603 days)
[12:08:53] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1037 is OK: SSL OK - Certificate elastic1037.eqiad.wmnet valid until 2021-06-21 12:10:56 +0000 (expires in 1603 days)
[12:08:53] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1040 is OK: SSL OK - Certificate elastic1040.eqiad.wmnet valid until 2021-06-21 13:35:56 +0000 (expires in 1603 days)
[12:09:03] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1035 is OK: SSL OK - Certificate elastic1035.eqiad.wmnet valid until 2021-06-21 12:10:46 +0000 (expires in 1603 days)
[12:09:15] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: ores::redis: Enable diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/334663
[12:11:03] <icinga-wm>	 RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[12:11:04] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove aaron from logstash-roots [puppet] - 10https://gerrit.wikimedia.org/r/335011 (https://phabricator.wikimedia.org/T142836)
[12:13:22] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove otto from aqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/335012 (https://phabricator.wikimedia.org/T142836)
[12:18:11] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove elukey from analytics-admins [puppet] - 10https://gerrit.wikimedia.org/r/335013 (https://phabricator.wikimedia.org/T142836)
[12:20:27] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove aaron from contint-users [puppet] - 10https://gerrit.wikimedia.org/r/335014 (https://phabricator.wikimedia.org/T142836)
[12:21:16] <wikibugs>	 06Operations, 10hardware-requests: Site: 2 hardware access request for SCB@CODFW - https://phabricator.wikimedia.org/T156631#2981648 (10akosiaris)
[12:21:30] <wikibugs>	 (03CR) 10Elukey: [C: 032] Remove elukey from analytics-admins [puppet] - 10https://gerrit.wikimedia.org/r/335013 (https://phabricator.wikimedia.org/T142836) (owner: 10Muehlenhoff)
[12:21:35] <wikibugs>	 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Wikimedia-Multiple-active-datacenters: Assess SCB@CODFW  preparedness for the DC switchover - https://phabricator.wikimedia.org/T156361#2972283 (10akosiaris)
[12:21:37] <wikibugs>	 06Operations, 10hardware-requests: Site: 2 hardware access request for SCB@CODFW - https://phabricator.wikimedia.org/T156631#2981648 (10akosiaris)
[12:22:00] <wikibugs>	 06Operations, 10hardware-requests: Site: 2 hardware access request for SCB@CODFW - https://phabricator.wikimedia.org/T156631#2981648 (10akosiaris) p:05Triage>03High
[12:26:02] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] "https://puppet-compiler.wmflabs.org/5277/ NOOP, merging" [puppet] - 10https://gerrit.wikimedia.org/r/334663 (owner: 10Alexandros Kosiaris)
[12:26:09] <wikibugs>	 (03PS3) 10Alexandros Kosiaris: ores::redis: Enable diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/334663
[12:26:12] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] ores::redis: Enable diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/334663 (owner: 10Alexandros Kosiaris)
[12:27:43] <icinga-wm>	 RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[12:28:02] <_joe_>	 thanks akosiaris :)
[12:28:11] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "Er, wrong paste. https://puppet-compiler.wmflabs.org/5278/oresrdb1001.eqiad.wmnet/ say's it's fine, merging" [puppet] - 10https://gerrit.wikimedia.org/r/334663 (owner: 10Alexandros Kosiaris)
[12:37:32] <wikibugs>	 06Operations, 10ops-codfw, 15User-Elukey: codfw:rack/setup mc2019-mc2036 - https://phabricator.wikimedia.org/T155755#2981676 (10elukey) I quickly tested salt/puppet to help out and and everything seems working as expected except mc2033 and mc2034 (powered down?).
[12:38:30] <zeljkof>	 hashar: as usual on Mondays, today's eu swat is full https://wikitech.wikimedia.org/wiki/Deployments#Monday.2C.C2.A0January.C2.A030
[12:43:13] <icinga-wm>	 PROBLEM - puppet last run on labvirt1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:44:47] <wikibugs>	 06Operations, 10hardware-requests: Site: 2 hardware access request for SCB@CODFW - https://phabricator.wikimedia.org/T156631#2981696 (10mark) @RobH: let's get this out ASAP, we need to have this in place before the switchover, start of April.
[12:54:30] <hashar>	 zeljkof: yeah they look quite straightforward
[12:54:32] <wikibugs>	 06Operations, 13Patch-For-Review: Cross-validation of account data - https://phabricator.wikimedia.org/T142836#2981702 (10MoritzMuehlenhoff) "volans" and "elukey" have been removed from the nda group since they're WMF staff and present in the "wmf" group already.
[12:55:13] <zeljkof>	 hashar: except the last one
[12:55:30] <zeljkof>	 it links to commit in core in master, already merged o.O
[12:58:03] <icinga-wm>	 PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:09:20] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "There are a few similar pending requests all tracked from T51357.  I would rather NOT land this until it is reviewed/confirmed by someone " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334787 (https://phabricator.wikimedia.org/T155892) (owner: 10Urbanecm)
[13:10:45] <wikibugs>	 (03CR) 10Hashar: [C: 031] Remove flaggedrevs-protect-review page protection from enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334511 (https://phabricator.wikimedia.org/T156448) (owner: 10Urbanecm)
[13:11:34] <wikibugs>	 (03CR) 10Hashar: [C: 031] Enable SandboxLink on gdwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334157 (https://phabricator.wikimedia.org/T156281) (owner: 10Urbanecm)
[13:11:38] <wikibugs>	 (03CR) 10Hashar: [C: 031] Enable SandboxLink on tgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334551 (https://phabricator.wikimedia.org/T156473) (owner: 10Urbanecm)
[13:12:13] <icinga-wm>	 RECOVERY - puppet last run on labvirt1011 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[13:14:22] <wikibugs>	 (03PS4) 10Hashar: Enable expiring user groups on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333652 (owner: 10TTO)
[13:14:51] <wikibugs>	 (03CR) 10Hashar: [C: 032] "You are more than welcome to add beta cluster only changes to the SWAT, though most of the time we just merge them on request :]" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333652 (owner: 10TTO)
[13:15:15] <tto>	 hashar: Thanks, I was never quite sure what to do with beta cluster only ones :)
[13:16:02] <tto>	 There's a list of inclusion criteria but no list of "what you don't need to use SWAT for"
[13:16:19] <wikibugs>	 (03Merged) 10jenkins-bot: Enable expiring user groups on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333652 (owner: 10TTO)
[13:16:22] <tto>	 and "who to ping about things you don't need to use SWAT for"
[13:17:52] <wikibugs>	 (03CR) 10jenkins-bot: Enable expiring user groups on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333652 (owner: 10TTO)
[13:18:51] <hashar>	 tto: well you can add beta cluster config changes to SWAT for sure :D
[13:19:03] <hashar>	 tto: or just poke deployers to land them at anytime :]
[13:19:27] <hashar>	 if you add it to swat, you have a guarantee it will land without further action
[13:19:33] <hashar>	 so it is all fine :]
[13:19:46] <tto>	 I'll remember that for future. Especially as first SWAT of the week is always close to full; others will be happier if unnecessary changes are done outside SWAT :)
[13:26:13] <icinga-wm>	 RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[13:33:08] <wikibugs>	 (03PS4) 10ArielGlenn: dumps: Modernize design of the index page [puppet] - 10https://gerrit.wikimedia.org/r/334856 (https://phabricator.wikimedia.org/T155697) (owner: 10Ladsgroup)
[13:35:45] <wikibugs>	 (03CR) 10ArielGlenn: "Because this file is processed as a template with % substitutions, lone % in css properties must be replaced with %%; I've done so in this" [puppet] - 10https://gerrit.wikimedia.org/r/334856 (https://phabricator.wikimedia.org/T155697) (owner: 10Ladsgroup)
[13:40:32] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove elukey from analytics-admins [puppet] - 10https://gerrit.wikimedia.org/r/335013 (https://phabricator.wikimedia.org/T142836)
[13:47:43] <icinga-wm>	 PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:52:33] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db2012 is OK: OK slave_sql_lag Replication lag: 0.39 seconds
[13:53:33] <wikibugs>	 (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/335006 (https://phabricator.wikimedia.org/T155654) (owner: 10Elukey)
[13:54:31] <wikibugs>	 (03PS1) 10Gehel: maps - increase replication frequency to each minute on maps-test cluster [puppet] - 10https://gerrit.wikimedia.org/r/335022 (https://phabricator.wikimedia.org/T145534)
[13:56:04] <wikibugs>	 (03PS2) 10Gehel: maps - increase replication frequency to each minute on maps-test cluster [puppet] - 10https://gerrit.wikimedia.org/r/335022 (https://phabricator.wikimedia.org/T145534)
[13:57:40] <wikibugs>	 (03CR) 10Gehel: [C: 032] maps - increase replication frequency to each minute on maps-test cluster [puppet] - 10https://gerrit.wikimedia.org/r/335022 (https://phabricator.wikimedia.org/T145534) (owner: 10Gehel)
[14:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T1400). Please do the needful.
[14:00:04] <jouncebot>	 Urbanecm, tto, and MatmaRex: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[14:00:06] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Add aqs1008-a to the AQS Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/335006 (https://phabricator.wikimedia.org/T155654) (owner: 10Elukey)
[14:00:39] <zeljkof>	 hashar: want to do swat today? I am in the middle of webdriverio patch...
[14:01:04] <hashar>	 zeljkof: will do
[14:01:11] <zeljkof>	 hashar: thanks!
[14:01:26] <zeljkof>	 I'm around if you need help ;)
[14:01:33] <addshore>	 me too ;)
[14:01:41] <wikibugs>	 (03PS2) 10Hashar: Enable SandboxLink on gdwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334157 (https://phabricator.wikimedia.org/T156281) (owner: 10Urbanecm)
[14:01:43] <wikibugs>	 (03PS2) 10Hashar: Enable SandboxLink on tgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334551 (https://phabricator.wikimedia.org/T156473) (owner: 10Urbanecm)
[14:01:46] <hashar>	 doing the sandboxlinks for Urbanecm
[14:02:01] <wikibugs>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334157 (https://phabricator.wikimedia.org/T156281) (owner: 10Urbanecm)
[14:02:03] <wikibugs>	 (03CR) 10Hashar: [C: 031] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334551 (https://phabricator.wikimedia.org/T156473) (owner: 10Urbanecm)
[14:02:54] <Urbanecm>	 zeljkof, late but around today :)
[14:03:37] <wikibugs>	 (03Merged) 10jenkins-bot: Enable SandboxLink on gdwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334157 (https://phabricator.wikimedia.org/T156281) (owner: 10Urbanecm)
[14:03:40] <Urbanecm>	 zeljkof, late but around today :)
[14:03:48] <wikibugs>	 (03CR) 10jenkins-bot: Enable SandboxLink on gdwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334157 (https://phabricator.wikimedia.org/T156281) (owner: 10Urbanecm)
[14:03:53] <icinga-wm>	 PROBLEM - puppet last run on ganeti1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata]
[14:04:29] <wikibugs>	 (03CR) 10Hashar: [C: 032] Enable SandboxLink on tgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334551 (https://phabricator.wikimedia.org/T156473) (owner: 10Urbanecm)
[14:04:47] <Urbanecm>	 I'm a bit confused. Who is SWAtter for today? zeljkof? Or hashar? Or somebody else?
[14:04:50] <hashar>	 Urbanecm: Enable SandboxLink on gdwiki  is enabled
[14:04:57] <hashar>	 Urbanecm: I will do it
[14:05:03] <Urbanecm>	 At prod?
[14:05:11] <Urbanecm>	 hashar, ok
[14:05:27] <zeljkof>	 Urbanecm: it's hashar
[14:05:45] <Urbanecm>	 zeljkof, thank you too :)
[14:05:51] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Restore db1073 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335023 (https://phabricator.wikimedia.org/T156226)
[14:06:09] <wikibugs>	 (03Merged) 10jenkins-bot: Enable SandboxLink on tgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334551 (https://phabricator.wikimedia.org/T156473) (owner: 10Urbanecm)
[14:06:17] <wikibugs>	 (03CR) 10jenkins-bot: Enable SandboxLink on tgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334551 (https://phabricator.wikimedia.org/T156473) (owner: 10Urbanecm)
[14:06:29] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable SandboxLink on gdwiki - T156281 (duration: 00m 48s)
[14:06:32] <wikibugs>	 (03PS2) 10Hashar: Remove flaggedrevs-protect-review page protection from enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334511 (https://phabricator.wikimedia.org/T156448) (owner: 10Urbanecm)
[14:06:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:34] <stashbot>	 T156281: Enable SandboxLink on gdwiki - https://phabricator.wikimedia.org/T156281
[14:06:44] <wikibugs>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334511 (https://phabricator.wikimedia.org/T156448) (owner: 10Urbanecm)
[14:07:05] <Urbanecm>	 hashar, gdwiki working now (I thought it was already enabled not you're going to enable it)
[14:07:21] <hashar>	 tgwiki as well
[14:07:28] <Urbanecm>	 thx
[14:08:10] <wikibugs>	 (03Merged) 10jenkins-bot: Remove flaggedrevs-protect-review page protection from enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334511 (https://phabricator.wikimedia.org/T156448) (owner: 10Urbanecm)
[14:08:13] <hashar>	 the enwiki flaggedrevs / PC2 change I have no idea how to test it :(
[14:08:25] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable SandboxLink on tgwiki - T156473 (duration: 00m 40s)
[14:08:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:08:29] <stashbot>	 T156473: Enable SandboxLink on tg.wikipedia - https://phabricator.wikimedia.org/T156473
[14:08:40] <wikibugs>	 (03CR) 10Elukey: "Thanks! Will wait for aqs1007-b to be fully boostrapped before proceeding :)" [puppet] - 10https://gerrit.wikimedia.org/r/335006 (https://phabricator.wikimedia.org/T155654) (owner: 10Elukey)
[14:08:41] <Urbanecm>	 hashar, ask local sysops?
[14:09:02] <hashar>	 I am going to just push it
[14:09:21] <Urbanecm>	 Okay, let's push it and I'll ask for confirmation in the task. Do you agree?
[14:09:29] <hashar>	 yup
[14:09:35] <Urbanecm>	 Ok
[14:09:44] <hashar>	 the Sandbox link, I believe we should just enable it on all wiki
[14:09:45] <hashar>	 s
[14:09:51] <wikibugs>	 (03CR) 10jenkins-bot: Remove flaggedrevs-protect-review page protection from enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334511 (https://phabricator.wikimedia.org/T156448) (owner: 10Urbanecm)
[14:09:53] <elukey>	 hashar: I was about to ask some info for the new rake stuff, I promise that I'll try to review the patches today/tomorrow
[14:10:15] <hashar>	 elukey: sure thing!  Poke me any time for more details / if you want a demo / crash course or whatever :]
[14:10:30] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/flaggedrevs.php: Remove flaggedrevs-protect-review page protection from enwiki - T156448 (duration: 00m 41s)
[14:10:30] <hashar>	 elukey: I will be more than happy to demo it over a hangout session with screen sharing :]
[14:10:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:10:34] <stashbot>	 T156448: Remove flaggedrevs-protect-review (PC2) page protection option from the English Wikipedia - https://phabricator.wikimedia.org/T156448
[14:10:36] <hashar>	 Urbanecm: [config] 334787 Increase default thumb size to 250px at nowiki  
[14:10:37] <wikibugs>	 (03PS1) 10Elukey: Add aqs1007 to AQS's conftool data [puppet] - 10https://gerrit.wikimedia.org/r/335024 (https://phabricator.wikimedia.org/T155654)
[14:10:43] <Urbanecm>	 hashar, I agree. But shouldn't we notify the communities at least?
[14:10:57] <hashar>	 Urbanecm: I have skipped that one. Not quite sure what are the technical impacts of changing the thumbsize. That is really all a technical debt
[14:11:13] <hashar>	 Urbanecm: we would probably want to change the thumbsize for all wikis
[14:11:29] <wikibugs>	 (03PS2) 10Hashar: Create namespace alias وگ for NS_PROJECT in fawikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334510 (https://phabricator.wikimedia.org/T156451) (owner: 10Urbanecm)
[14:11:35] <wikibugs>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334510 (https://phabricator.wikimedia.org/T156451) (owner: 10Urbanecm)
[14:12:39] <Urbanecm>	 hashar, okay, you don't 100% it, you skipped it but the task exist since 2013...
[14:12:59] <hashar>	 Urbanecm: yeah that is an old topic :/
[14:13:13] <wikibugs>	 (03Merged) 10jenkins-bot: Create namespace alias وگ for NS_PROJECT in fawikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334510 (https://phabricator.wikimedia.org/T156451) (owner: 10Urbanecm)
[14:13:28] <Urbanecm>	 I missed the "trust" word...
[14:13:28] <hashar>	 I blocked such updates a few years ago because I could not assert the impact on the Wikimedia thumbnailing infrastructure
[14:13:31] <wikibugs>	 (03CR) 10jenkins-bot: Create namespace alias وگ for NS_PROJECT in fawikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334510 (https://phabricator.wikimedia.org/T156451) (owner: 10Urbanecm)
[14:13:55] <hashar>	 but thumbnailing has changed a lot, so maybe we can just change the default size for everyone.  I am not sure really
[14:14:21] <Urbanecm>	 hashar, okay. Should I mark the task as declined? And/or abandon the change?
[14:14:27] <hashar>	 keep it open
[14:14:31] <hashar>	 until the parent is figured out
[14:14:35] <Urbanecm>	 Ok
[14:14:38] <hashar>	 potentially you can raise it on wikitech-l 
[14:14:45] <hashar>	 so appropriate people look at the impact
[14:14:54] <hashar>	 and take a decision as to what should be the default thumbsize
[14:15:02] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Create namespace alias وگ for NS_PROJECT in fawikiquote - T156451 (duration: 00m 40s)
[14:15:03] <Urbanecm>	 Ok, I'll write there.
[14:15:05] <hashar>	 nowayda,s maybe it is not a big deal for the cache infra
[14:15:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:06] <stashbot>	 T156451: Create namespace alias وگ for NS_PROJECT in fawikiquote - https://phabricator.wikimedia.org/T156451
[14:15:43] <icinga-wm>	 RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[14:16:30] <zhuyifei1999_>	 um I don't think MatmaRex is here right now
[14:22:28] <hashar>	 Urbanecm: I quite messed up some pages on fawikiquote :(
[14:22:58] <hashar>	 I really need to learn farsi
[14:23:00] <Urbanecm>	 hashar, will it be easy to demess them?
[14:23:04] <Urbanecm>	 What is farsi?
[14:23:11] <hashar>	 persian
[14:23:14] <hashar>	 the language from Iran
[14:23:24] <Urbanecm>	 Yeah, thanks
[14:23:40] <hashar>	 Iran is quite a fascinating country and more or less a unique culture in the middle east
[14:24:16] <hashar>	 well I guess the bulk of the work is done for fawikiquote, we can keep it open
[14:24:27] <hashar>	 I am subscribed to the task so I can follow up / help if people ask questions
[14:24:27] <Urbanecm>	 I am quite familiar with geography of our planet. I just thought it is something technical, some way how to fix it :D
[14:24:58] <wikibugs>	 (03PS2) 10Hashar: Enable RSS extension at metawiki, enable one feed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334864 (https://phabricator.wikimedia.org/T155830) (owner: 10Urbanecm)
[14:25:03] <wikibugs>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334864 (https://phabricator.wikimedia.org/T155830) (owner: 10Urbanecm)
[14:25:05] <Urbanecm>	 Hm, I don't understand you. If bulk of work is done why it should be open? Should I do anything?
[14:25:22] <hashar>	 Urbanecm: there are seven pages that conflicted
[14:25:49] <hashar>	 having the same name in the NS_MAIN and the new NS_PROJECT 
[14:26:22] <wikibugs>	 (03Merged) 10jenkins-bot: Enable RSS extension at metawiki, enable one feed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334864 (https://phabricator.wikimedia.org/T155830) (owner: 10Urbanecm)
[14:26:27] <Urbanecm>	 Maybe we should just list them in task/paste and ask them for fixing. 
[14:26:36] <hashar>	 I did :]
[14:26:39] <Urbanecm>	 Good :)
[14:26:49] <Urbanecm>	 BTW are they accessible?
[14:27:48] <wikibugs>	 (03CR) 10jenkins-bot: Enable RSS extension at metawiki, enable one feed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334864 (https://phabricator.wikimedia.org/T155830) (owner: 10Urbanecm)
[14:28:27] <Urbanecm>	 Ignore my question, I read the task :)
[14:29:39] <hashar>	 [config] 334864 Enable RSS extension at metawiki, enable one feed
[14:29:39] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable RSS extension at metawiki, enable one feed - T155830 (duration: 00m 42s)
[14:29:41] <hashar>	 that one works
[14:29:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:43] <stashbot>	 T155830: Enable "Wikimedia DE Policy News Update" RSS feed on meta.wikimedia.org - https://phabricator.wikimedia.org/T155830
[14:29:44] <hashar>	 validated it myself
[14:29:51] <Urbanecm>	 thx
[14:30:00] <hashar>	 tto: ] 333652 Enable expiring user groups on beta
[14:30:10] <hashar>	 tto: I have deployed that one an hour or so ago so it is all set :D
[14:30:14] <hashar>	 Urbanecm: thank you for all those changes!
[14:30:21] <tto>	 It is working indeed :)
[14:30:29] <Urbanecm>	 hashar, thanks for deploying all those changes!
[14:31:53] <icinga-wm>	 RECOVERY - puppet last run on ganeti1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[14:33:24] <wikibugs>	 06Operations, 13Patch-For-Review: Cross-validation of account data - https://phabricator.wikimedia.org/T142836#2981868 (10MoritzMuehlenhoff) "dcausses" and "matmarex" have been removed from the nda group since they're WMF staff and present in the "wmf" group already.
[14:45:53] <logmsgbot>	 !log hashar@tin Synchronized php-1.29.0-wmf.9/languages/Language.php: translateBlockExpiry: Duration is block expiry minus current time - T156453 (duration: 00m 42s)
[14:45:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:45:57] <stashbot>	 T156453: BlockLogFormatter formats relative timestamps with duration since Unix epoch - https://phabricator.wikimedia.org/T156453
[14:51:45] <wikibugs>	 06Operations, 06Discovery, 06Maps, 10Traffic, 03Interactive-Sprint: Rate-limit browsers without referers - https://phabricator.wikimedia.org/T154704#2981917 (10Gehel) This is worth discussing with our #traffic team. @BBlack, @ema: what is your point of view on rate-limiting browser without referer?  Varn...
[14:56:49] <wikibugs>	 (03PS2) 10DCausse: [WIP] Configure A/B test for CrossProject search results sidebar [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334673 (https://phabricator.wikimedia.org/T149806)
[15:01:13] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#2981946 (10Marostegui)
[15:02:18] <wikibugs>	 (03PS2) 10Elukey: Add aqs1007 to AQS's conftool data [puppet] - 10https://gerrit.wikimedia.org/r/335024 (https://phabricator.wikimedia.org/T155654)
[15:21:46] <wikibugs>	 (03CR) 10Elukey: [C: 032] Add aqs1007 to AQS's conftool data [puppet] - 10https://gerrit.wikimedia.org/r/335024 (https://phabricator.wikimedia.org/T155654) (owner: 10Elukey)
[15:23:41] <wikibugs>	 (03PS6) 10Rush: Tools: Disable automatic backups of aptly repositories [puppet] - 10https://gerrit.wikimedia.org/r/328031 (https://phabricator.wikimedia.org/T150726) (owner: 10Tim Landscheidt)
[15:28:03] <icinga-wm>	 PROBLEM - DPKG on cp3020 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[15:31:03] <icinga-wm>	 RECOVERY - DPKG on cp3020 is OK: All packages OK
[15:33:53] <wikibugs>	 (03CR) 10Rush: [C: 032] Tools: Disable automatic backups of aptly repositories [puppet] - 10https://gerrit.wikimedia.org/r/328031 (https://phabricator.wikimedia.org/T150726) (owner: 10Tim Landscheidt)
[15:35:23] <wikibugs>	 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2982093 (10Fjalapeno) Thank you all for pushing this forward! This is really great for the reading platforms where we have had lots of ambiguity with t...
[15:37:01] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Restore db1073 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335023 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[15:37:33] <wikibugs>	 (03PS12) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475
[15:37:39] <wikibugs>	 (03CR) 10Paladox: [C: 031] Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox)
[15:38:03] <wikibugs>	 (03CR) 10Paladox: [C: 031] Gerrit: Add a systemd init script fro gerrit (032 comments) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox)
[15:38:25] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Restore db1073 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335023 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[15:38:33] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Restore db1073 original weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335023 (https://phabricator.wikimedia.org/T156226) (owner: 10Marostegui)
[15:39:54] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1073 with its original weight - T156226 (duration: 00m 52s)
[15:39:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:39:59] <stashbot>	 T156226: Reimage and clone db1072 - https://phabricator.wikimedia.org/T156226
[15:42:08] <logmsgbot>	 !log hashar@tin Synchronized php-1.29.0-wmf.9/extensions/timeline/Timeline.body.php: debug log EasyTimeline error - T138036 (duration: 00m 46s)
[15:42:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:12] <stashbot>	 T138036: proc line:  2959: warning: points must have either 4 or 2 values per line - https://phabricator.wikimedia.org/T138036
[15:57:15] <Amir1>	 apergos: hey, should we wait for more feedback or wait some more? tell me when it's okay to merge it
[16:00:04] <jouncebot>	 legoktm and arlolra: Respected human, time to deploy Linter deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T1600). Please do the needful.
[16:01:16] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: Degraded RAID on db2011 - https://phabricator.wikimedia.org/T153740#2982158 (10Papaul) Disk replacement complete on slot 11
[16:01:34] <apergos>	 Amir1: which?
[16:01:55] <Amir1>	 apergos: the dumps UI patch
[16:02:06] <jynus>	 thank you, pap*ul
[16:05:04] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: Degraded RAID on db2011 - https://phabricator.wikimedia.org/T153740#2982163 (10Marostegui) Thanks - It is getting rebuilt  ``` root@db2011:/usr/local/bin# megacli -PDRbld -ShowProg -PhysDrv [32:11] -aALL  Rebuild Progress on Device at Enclosure 32, Slot 11 Completed 44% in 1...
[16:06:29] <apergos>	 Amir1: did you see Nemo_bis'  comment about the font? Can we use open fonts?
[16:06:33] <icinga-wm>	 PROBLEM - puppet last run on ms-be1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:06:37] <apergos>	 and I would leave it a couple days
[16:07:05] <apergos>	 but I'll be merging it before the end of the week unless we get a whole bunch of people adding changest to it (pretty unlikely)
[16:09:48] <wikibugs>	 06Operations, 10hardware-requests: Site: 2 hardware access request for SCB@CODFW - https://phabricator.wikimedia.org/T156631#2982181 (10RobH) a:03RobH I'll create the required sub-tasks today.
[16:10:47] <Amir1>	 apergos: Yup, I'll fix it. Thanks!
[16:11:24] <apergos>	 yw, thanks for the fix!
[16:14:23] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Depool db2034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335048 (https://phabricator.wikimedia.org/T156478)
[16:16:34] <marostegui>	 legoktm can I deploy mediawiki config? I saw you have a deployment window now, so I will wait for you guys :)
[16:20:17] <wikibugs>	 06Operations, 10ops-codfw: Codfw: Missing mgmt dns for db2025-db2027 - https://phabricator.wikimedia.org/T156342#2982238 (10Papaul) a:05Marostegui>03Papaul
[16:23:13] <icinga-wm>	 RECOVERY - MegaRAID on db2011 is OK: OK: optimal, 1 logical, 2 physical
[16:23:25] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: Degraded RAID on db2011 - https://phabricator.wikimedia.org/T153740#2982242 (10Marostegui) Rebuilt finished successfully   ```                 Device Present                 ================ Virtual Drives    : 1   Degraded        : 0   Offline         : 0 Physical Devices...
[16:23:41] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: Degraded RAID on db2011 - https://phabricator.wikimedia.org/T153740#2982243 (10Marostegui) 05Open>03Resolved a:03Papaul
[16:26:43] <icinga-wm>	 PROBLEM - puppet last run on ms-be1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:27:55] <legoktm>	 marostegui: yes go for it
[16:28:02] <marostegui>	 thanks!
[16:28:09] <legoktm>	 I'll start my scap once you're done
[16:28:09] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335048 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui)
[16:28:29] <marostegui>	 thanks, it should take no time :)
[16:29:10] <wikibugs>	 (03PS1) 10Legoktm: Add Linter to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335049
[16:34:26] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs2003.codfw.wmnet
[16:34:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:33] <icinga-wm>	 RECOVERY - puppet last run on ms-be1023 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[16:35:15] <wikibugs>	 06Operations, 10ops-codfw, 06Discovery, 10Wikidata, and 2 others: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline. - https://phabricator.wikimedia.org/T124627#2982316 (10Gehel)
[16:35:18] <wikibugs>	 06Operations, 10ops-codfw, 06Discovery, 10Wikidata, and 2 others: rack/setup/install wdqs2003 - https://phabricator.wikimedia.org/T152644#2982313 (10Gehel) 05Open>03Resolved wdqs2003 has completed data import, it is now pooled
[16:36:13] <wikibugs>	 (03CR) 10Marostegui: [C: 032] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335048 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui)
[16:38:12] <legoktm>	 marostegui: I think jenkins is just behind...
[16:39:24] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Depool db2034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335048 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui)
[16:39:32] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Depool db2034 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335048 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui)
[16:40:59] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2034 for maintenance - T156478 (duration: 00m 40s)
[16:41:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:41:03] <stashbot>	 T156478: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478
[16:41:03] <marostegui>	 legoktm: you are good to go :-)
[16:41:15] <legoktm>	 thanks!
[16:41:18] <wikibugs>	 (03CR) 10Legoktm: [C: 032] Add Linter to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335049 (owner: 10Legoktm)
[16:43:49] <wikibugs>	 (03Merged) 10jenkins-bot: Add Linter to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335049 (owner: 10Legoktm)
[16:44:04] <wikibugs>	 (03CR) 10jenkins-bot: Add Linter to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335049 (owner: 10Legoktm)
[16:46:27] <logmsgbot>	 !log legoktm@tin Started scap: Build l10n cache for linter
[16:46:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:52:03] <icinga-wm>	 PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:54:16] <wikibugs>	 06Operations, 06Labs, 10netops: asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues - https://phabricator.wikimedia.org/T155875#2982398 (10faidon) p:05Unbreak!>03High Looks stable for now, lowering priority.
[16:54:43] <icinga-wm>	 RECOVERY - puppet last run on ms-be1022 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[16:55:31] <legoktm>	 ostriches: thcipriani: uh, scap is complaning about stuff
[16:55:32] <legoktm>	 16:55:03 Started cache_git_info
[16:55:32] <legoktm>	 16:55:03 Unable to find remote tracking branch/tag for /srv/mediawiki-staging/php-1.29.0-wmf.9/extensions/Popups
[16:55:39] <legoktm>	 and then every extension
[16:56:05] <legoktm>	 I assume cache_git_info breaking isn't the end of the world so I'm letting it continue for now
[17:00:04] <jouncebot>	 godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T1700). Please do the needful.
[17:00:04] <jouncebot>	 Krenair: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[17:04:13] <thcipriani>	 legoktm: yup. That's fixed in master/release/scap release that is ready to go, probably get it out later today.
[17:04:32] <legoktm>	 thcipriani: ok, so it's fine to ignore?
[17:04:58] <thcipriani>	 legoktm: yes, please do ignore for now. Thanks for the ping about it though.
[17:05:19] <marostegui>	 !log Shutdown mysql and poweroff db2034 for maintenance - T156478
[17:05:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:05:25] <stashbot>	 T156478: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478
[17:09:10] <logmsgbot>	 !log legoktm@tin Finished scap: Build l10n cache for linter (duration: 22m 43s)
[17:09:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:48] <legoktm>	 (I'm done)
[17:12:42] <wikibugs>	 (03PS1) 10Legoktm: [WIP] Enable Linter on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335052
[17:13:47] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops, 10hardware-requests: Decommission neptunium - https://phabricator.wikimedia.org/T122101#2982443 (10RobH)
[17:13:49] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops, 13Patch-For-Review: Decommission plutonium - https://phabricator.wikimedia.org/T118586#2982444 (10RobH)
[17:13:51] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops, 10hardware-requests: Decommission calcium - https://phabricator.wikimedia.org/T116790#2982446 (10RobH)
[17:13:54] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops, 10hardware-requests, 13Patch-For-Review: Decommission rubidium - https://phabricator.wikimedia.org/T118213#2982445 (10RobH)
[17:13:56] <wikibugs>	 06Operations, 10hardware-requests: eqiad out of warranty spares to decommission - approval request - https://phabricator.wikimedia.org/T120679#2982442 (10RobH) 05Open>03Resolved
[17:14:52] <wikibugs>	 (03PS1) 10Marostegui: db-codfw,db-eqiad.php: Update db2034 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335053 (https://phabricator.wikimedia.org/T156478)
[17:19:41] <wikibugs>	 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 06Services (watching), 15User-mobrovac, 07Wikimedia-Multiple-active-datacenters: Assess SCB@CODFW  preparedness for the DC switchover - https://phabricator.wikimedia.org/T156361#2982461 (10mobrovac)
[17:21:05] <icinga-wm>	 RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[17:22:05] <wikibugs>	 06Operations, 10hardware-requests, 06Services (watching), 15User-mobrovac: Site: 2 hardware access request for SCB@CODFW - https://phabricator.wikimedia.org/T156631#2982466 (10mobrovac)
[17:23:47] <wikibugs>	 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 13Patch-For-Review, and 6 others: DNS: dynamically generate entries for service discovery - https://phabricator.wikimedia.org/T156100#2982469 (10BBlack) We should probably divorce the RO/RW distinction from the core design here.  Not all services...
[17:25:48] <wikibugs>	 06Operations, 13Patch-For-Review: Upgrade fluorine to trusty/jessie - https://phabricator.wikimedia.org/T123728#2982472 (10fgiunchedi)
[17:26:58] <wikibugs>	 (03PS1) 10Papaul: DNS: Change db2034 production dns.Server has been moved from row C to Row A Bug:T156478 [dns] - 10https://gerrit.wikimedia.org/r/335054
[17:29:51] <wikibugs>	 (03CR) 10Marostegui: [C: 031] "Looks good" [dns] - 10https://gerrit.wikimedia.org/r/335054 (owner: 10Papaul)
[17:41:39] <Pchelolo>	 !log update RESTBase to cd2b5e019: staging
[17:41:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:45:11] <icinga-wm>	 PROBLEM - Restbase root url on xenon is CRITICAL: connect to address 10.64.0.200 and port 7231: Connection refused
[17:45:11] <icinga-wm>	 PROBLEM - restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.200, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7fd88a82b990: Failed to establish a new connection: [Errno 111] Connection refused,))
[17:45:48] <Pchelolo>	 ^^ this is OK, it's expected - new keyspaces are being created
[17:46:11] <icinga-wm>	 RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15500 bytes in 0.028 second response time
[17:46:11] <icinga-wm>	 RECOVERY - restbase endpoints health on xenon is OK: All endpoints are healthy
[17:49:01] <wikibugs>	 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 13Patch-For-Review, and 6 others: DNS: dynamically generate entries for service discovery - https://phabricator.wikimedia.org/T156100#2982595 (10GWicke) > if specific services needs a split into "active/passive RW + active/active RO", we can solv...
[17:51:01] <icinga-wm>	 PROBLEM - Restbase root url on restbase-dev1001 is CRITICAL: connect to address 10.64.0.35 and port 7231: Connection refused
[17:51:12] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.35, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f54b7b5c950: Failed to establish a new connection: [Errno 111] Connection refused,))
[17:52:01] <icinga-wm>	 RECOVERY - Restbase root url on restbase-dev1001 is OK: HTTP OK: HTTP/1.1 200 - 15500 bytes in 0.010 second response time
[17:52:11] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase-dev1001 is OK: All endpoints are healthy
[17:52:41] <wikibugs>	 (03CR) 10Tim Landscheidt: [V: 032 C: 031] "I had the same idea while AFK, and my patience saved me work :-); thanks.  I tested this successfully by patching /usr/lib/python2.7/dist-" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/334978 (https://phabricator.wikimedia.org/T156605) (owner: 10Zhuyifei1999)
[17:53:50] <wikibugs>	 06Operations, 06Analytics-Kanban: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#2982604 (10Milimetric) Thanks, @jcrespo, I didn't see the ping, it looks like Phabricator had some notification issues.  The idea with this service is that it wouldn't take time away from ops, s...
[17:56:27] <twentyafterfour>	 Something caused MW error rate to skyrocket ~3 hours ago:  https://grafana.wikimedia.org/dashboard/db/production-logging?from=now-4h&to=now
[17:57:16] <twentyafterfour>	 er ~1 hour ago
[17:58:30] <wikibugs>	 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#2982608 (10Papaul) @Robh we about to move db2034 in row c rack C6 to row A rack 5. I will like for you please if you have time to make some changes on the both switches ....
[17:59:32] <bd808>	 twentyafterfour: DBReplication channel looks to be the culprit -- https://logstash.wikimedia.org/goto/1da8da5ef2a75d1505730e15bec9e036
[17:59:57] <bd808>	 something is lagging I'd guess
[18:00:04] <jouncebot>	 Niharika and bd808: Dear anthropoid, the time has come. Please deploy Wikimania scholarships deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T1800).
[18:00:04] <jouncebot>	 gehel: Respected human, time to deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T1800). Please do the needful.
[18:00:04] <jouncebot>	 gehel: A patch you scheduled for Weekly Wikidata query service deployment window is about to be deployed. Please be available during the process.
[18:01:31] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-codfw,db-eqiad.php: Update db2034 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335053 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui)
[18:02:54] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw,db-eqiad.php: Update db2034 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335053 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui)
[18:03:04] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw,db-eqiad.php: Update db2034 IP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335053 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui)
[18:04:09] <gehel>	 !log rolling restart of nginx and wdqs for updates
[18:04:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:04:14] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Change db2034 IP - T156478 (duration: 00m 40s)
[18:04:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:04:20] <stashbot>	 T156478: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478
[18:05:04] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Change db2034 IP - T156478 (duration: 00m 40s)
[18:05:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:06:31] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[18:06:35] <icinga-wm>	 PROBLEM - DPKG on wdqs1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[18:06:35] <icinga-wm>	 PROBLEM - WDQS HTTP Port on wdqs2001 is CRITICAL: connect to address 127.0.0.1 and port 80: Connection refused
[18:06:35] <icinga-wm>	 PROBLEM - Check systemd state on wdqs2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[18:06:41] <icinga-wm>	 PROBLEM - WDQS HTTP Port on wdqs1002 is CRITICAL: connect to address 127.0.0.1 and port 80: Connection refused
[18:06:42] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs2001 is CRITICAL: connect to address 10.192.32.148 and port 80: Connection refused
[18:06:51] <icinga-wm>	 PROBLEM - WDQS HTTP Port on wdqs2002 is CRITICAL: connect to address 127.0.0.1 and port 80: Connection refused
[18:06:51] <icinga-wm>	 PROBLEM - DPKG on wdqs1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[18:06:52] <icinga-wm>	 PROBLEM - DPKG on wdqs2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[18:06:53] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs2002 is CRITICAL: connect to address 10.192.48.65 and port 80: Connection refused
[18:06:53] <icinga-wm>	 PROBLEM - WDQS HTTP on wdqs2002 is CRITICAL: connect to address 10.192.48.65 and port 80: Connection refused
[18:07:01] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - wdqs_80 - Could not depool server wdqs2002.codfw.wmnet because of too many down!
[18:07:11] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - wdqs_80 - Could not depool server wdqs2002.codfw.wmnet because of too many down!
[18:07:18] <gehel>	 Oops, that's me failing my nginx upgrade on wdqs, I'0m on it
[18:07:48] <Pchelolo>	 !log update RESTBase to cd2b5e019: canary on restbase1007
[18:07:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:08:31] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1002 is OK: OK - running: The system is fully operational
[18:08:32] <icinga-wm>	 RECOVERY - DPKG on wdqs1001 is OK: All packages OK
[18:08:38] <wikibugs>	 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#2982644 (10RobH) >>! In T156478#2982608, @Papaul wrote: > @Robh we about to move db2034 in row c rack C6 to row A rack 5. I will like for you please if you have time to m...
[18:08:41] <icinga-wm>	 RECOVERY - WDQS HTTP Port on wdqs1002 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 80
[18:08:51] <icinga-wm>	 RECOVERY - DPKG on wdqs1002 is OK: All packages OK
[18:09:31] <icinga-wm>	 RECOVERY - WDQS HTTP Port on wdqs2001 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 80
[18:09:32] <icinga-wm>	 RECOVERY - Check systemd state on wdqs2001 is OK: OK - running: The system is fully operational
[18:09:41] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs2001 is OK: HTTP OK: HTTP/1.1 200 OK - 10479 bytes in 0.073 second response time
[18:09:51] <icinga-wm>	 RECOVERY - WDQS HTTP Port on wdqs2002 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 80
[18:09:51] <icinga-wm>	 RECOVERY - DPKG on wdqs2001 is OK: All packages OK
[18:09:52] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs2002 is OK: HTTP OK: HTTP/1.1 200 OK - 10479 bytes in 0.073 second response time
[18:09:53] <icinga-wm>	 RECOVERY - WDQS HTTP on wdqs2002 is OK: HTTP OK: HTTP/1.1 200 OK - 10479 bytes in 0.073 second response time
[18:10:01] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy
[18:10:13] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1007 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.223, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[18:10:41] <icinga-wm>	 PROBLEM - Restbase root url on restbase1007 is CRITICAL: connect to address 10.64.0.223 and port 7231: Connection refused
[18:11:22] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1001.codfw.wmnet
[18:11:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:51] <moritzm>	 !log upgrading firejail on scb cluster
[18:11:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:27] <Niharika>	 !log updated scholarships Fixed some bugs with the login form
[18:12:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:12:59] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1002.codfw.wmnet
[18:13:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:11] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy
[18:14:48] <wikibugs>	 06Operations, 06Analytics-Kanban: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#2982674 (10jcrespo) I recently packaged and puppetized [[ http://proxysql.com/ | ProxySQL ]]: https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/proxysql/manifests/init.p...
[18:15:42] <wikibugs>	 (03PS2) 10Zhuyifei1999: kubernetesbackend: change absolute kubectl path to '/usr/bin/kubectl' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/334978 (https://phabricator.wikimedia.org/T156605)
[18:18:56] <gehel>	 !log nginx upgrade and wdqs restart complete - sorry for the noise
[18:18:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:28:13] <wikibugs>	 (03CR) 10Mobrovac: [C: 031] graphoid/gridengine/grub/haproxy/hhvm lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/334319 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[18:28:46] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on db1072 is CRITICAL: CRITICAL slave_sql_state could not connect
[18:28:53] <marostegui>	 I thought I had downtimed that :(
[18:29:09] <jynus>	 I think that is depooled
[18:29:12] <marostegui>	 it is
[18:29:18] <marostegui>	 i see I only downtimed the lag
[18:29:19] <volans>	 ok, so no worries :D
[18:29:28] <wikibugs>	 (03PS1) 10Andrew Bogott: novaproxy:  Specify ssl_settings of [] if not using ssl. [puppet] - 10https://gerrit.wikimedia.org/r/335064
[18:29:36] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on db1072 is CRITICAL: CRITICAL slave_io_state could not connect
[18:29:40] <jynus>	 alters based on etcd
[18:29:46] <jynus>	 will check if a server is pooled
[18:29:53] <jynus>	 and avoid criticals in that case
[18:30:05] <jynus>	 amirite?
[18:30:09] <gehel>	 marostegui: at least you are less verbose than I am :)
[18:30:13] <marostegui>	 XDD
[18:30:44] <jynus>	 gehel, wait until a master goes down, and 20 servers page at the same time
[18:31:22] <gehel>	 jynus: yeah, but that never happens...
[18:31:26] <jynus>	 ha
[18:31:29] <wikibugs>	 (03PS1) 10Yuvipanda: labs: Fix novaproxy not working when use_ssl is false [puppet] - 10https://gerrit.wikimedia.org/r/335065
[18:31:29] <marostegui>	 nooooo
[18:31:31] <jynus>	 so young
[18:31:32] <marostegui>	 don't say that :(
[18:31:36] <jynus>	 so inocent
[18:32:16] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 031] labs: Fix novaproxy not working when use_ssl is false [puppet] - 10https://gerrit.wikimedia.org/r/335065 (owner: 10Yuvipanda)
[18:32:40] <wikibugs>	 (03PS3) 10ArielGlenn: Move default config into a file [dumps] - 10https://gerrit.wikimedia.org/r/43156 (owner: 10Awight)
[18:33:06] <wikibugs>	 (03Abandoned) 10Andrew Bogott: novaproxy:  Specify ssl_settings of [] if not using ssl. [puppet] - 10https://gerrit.wikimedia.org/r/335064 (owner: 10Andrew Bogott)
[18:33:13] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labs: Fix novaproxy not working when use_ssl is false [puppet] - 10https://gerrit.wikimedia.org/r/335065 (owner: 10Yuvipanda)
[18:35:41] <icinga-wm>	 RECOVERY - Restbase root url on restbase1007 is OK: HTTP OK: HTTP/1.1 200 - 15500 bytes in 0.007 second response time
[18:36:11] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1007 is OK: All endpoints are healthy
[18:36:30] <godog>	 !log upload scap 3.5.0-1 - T127762
[18:36:32] <godog>	 thcipriani: ^
[18:36:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:36:35] <stashbot>	 T127762: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762
[18:36:44] <thcipriani>	 godog: awesome :)
[18:37:17] <thcipriani>	 godog: scap version/config update: https://gerrit.wikimedia.org/r/#/c/334677/
[18:37:53] <wikibugs>	 (03CR) 10Awight: "Hi!  I just saw your comment from November...  I agree, how about a xmldumps-backup/conf directory which would hold default config?" [dumps] - 10https://gerrit.wikimedia.org/r/43156 (owner: 10Awight)
[18:38:13] <Pchelolo>	 !log update RESTBase to cd2b5e019: canary on restbase2001
[18:38:16] <stashbot_>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:38:57] <godog>	 thcipriani: ok! going with that
[18:39:47] <thcipriani>	 godog: cool. Possible to force a puppet run on tin with that? I'd like to test it there as soon as it's available.
[18:39:56] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Scap: Bump version to 3.5.0-1 [puppet] - 10https://gerrit.wikimedia.org/r/334677 (https://phabricator.wikimedia.org/T127762) (owner: 10Thcipriani)
[18:40:38] <godog>	 thcipriani: yep, I've upgraded scap manually on tin in the meantime
[18:40:55] <thcipriani>	 ah, cool. Lemme give that a shot real quick.
[18:41:22] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Scap: Bump version to 3.5.0-1 [puppet] - 10https://gerrit.wikimedia.org/r/334677 (https://phabricator.wikimedia.org/T127762) (owner: 10Thcipriani)
[18:42:25] <thcipriani>	 ok, running a test sync on tin of README
[18:42:26] <wikibugs>	 (03CR) 10ArielGlenn: "That's going to be the only file in there. Let's see what else is in here: doc, samples.  Not loving it." [dumps] - 10https://gerrit.wikimedia.org/r/43156 (owner: 10Awight)
[18:42:35] <Pchelolo>	 !log update RESTBase to cd2b5e019
[18:42:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:36] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s1 on db1072 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:43:42] <wikibugs>	 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#2982801 (10Papaul) @Marostegui server is now in A5. Just waiting for https://gerrit.wikimedia.org/r/#/c/335054/ to be merge.
[18:43:56] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on db1072 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:46:20] <logmsgbot>	 !log nuria@tin Started deploy [eventlogging/analytics@4b28b14]: (no justification provided)
[18:46:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:46:24] <logmsgbot>	 !log nuria@tin Finished deploy [eventlogging/analytics@4b28b14]: (no justification provided) (duration: 00m 04s)
[18:46:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:47:50] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] DNS: Change db2034 production dns.Server has been moved from row C to Row A Bug:T156478 [dns] - 10https://gerrit.wikimedia.org/r/335054 (owner: 10Papaul)
[18:47:54] <wikibugs>	 (03PS2) 10Jcrespo: DNS: Change db2034 production dns.Server has been moved from row C to Row A Bug:T156478 [dns] - 10https://gerrit.wikimedia.org/r/335054 (owner: 10Papaul)
[18:48:24] <thcipriani>	 !log mediawiki deployments momentarily
[18:48:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:00] <jynus>	 ^I assume you mean "blocking?"
[18:50:15] <nuria>	 !log rollback deployment to eventlogging
[18:50:15] <thcipriani>	 jynus: yes, sorry
[18:50:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:50:28] <godog>	 you've accidentally mediawiki deployments
[18:50:41] <icinga-wm>	 PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: processor/client-side-01
[18:53:10] <logmsgbot>	 !log nuria@tin Started deploy [eventlogging/analytics@4b28b14]: (no justification provided)
[18:53:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:53:22] <logmsgbot>	 !log nuria@tin Finished deploy [eventlogging/analytics@4b28b14]: (no justification provided) (duration: 00m 11s)
[18:53:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:38] <thcipriani>	 !log unlocking mediawiki deployments for test
[18:56:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:56:48] <icinga-wm>	 RECOVERY - Check status of defined EventLogging jobs on eventlog1001 is OK: OK: All defined EventLogging jobs are runnning.
[19:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T1900). Please do the needful.
[19:00:29] <thcipriani>	 SWAT will have to be on hold if there are patches, still testing new scap version
[19:02:57] <wikibugs>	 (03PS3) 10Dzahn: admin: fix log file perms for dc-ops on jessie [puppet] - 10https://gerrit.wikimedia.org/r/334719 (https://phabricator.wikimedia.org/T156529)
[19:05:12] <wikibugs>	 (03CR) 10Dzahn: [C: 032] admin: fix log file perms for dc-ops on jessie [puppet] - 10https://gerrit.wikimedia.org/r/334719 (https://phabricator.wikimedia.org/T156529) (owner: 10Dzahn)
[19:07:30] <ostriches>	 thcipriani: No patches, you're clear
[19:08:48] <icinga-wm>	 PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[scap]
[19:10:09] <wikibugs>	 06Operations, 10Graphite, 13Patch-For-Review: provide aggregated cluster data with graphite, similar to ganglia - https://phabricator.wikimedia.org/T119520#2982910 (10fgiunchedi) 05Open>03declined Declining, this functionality is now provided by Prometheus
[19:10:13] <wikibugs>	 06Operations, 10Analytics, 10Analytics-Cluster, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2982912 (10DarTar) a:05DarTar>03ellery
[19:12:40] <wikibugs>	 06Operations, 07Puppet, 10Horizon, 06Labs: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#2757207 (10greg) UBN! for over a week?
[19:15:54] <wikibugs>	 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#2982951 (10jcrespo) Merged. Virtual console is busy (I assume by yourself), so I do not have visibility of the state of the server right now.
[19:17:48] <icinga-wm>	 PROBLEM - MD RAID on relforge1001 is CRITICAL: CRITICAL: State: degraded, Active: 6, Working: 6, Failed: 2, Spare: 0
[19:17:48] <wikibugs>	 06Operations, 07Puppet, 10Horizon, 06Labs: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#2982953 (10Andrew) p:05Unbreak!>03Normal
[19:17:49] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on relforge1001 is CRITICAL: CRITICAL: State: degraded, Active: 6, Working: 6, Failed: 2, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T156663
[19:17:53] <wikibugs>	 06Operations, 10ops-eqiad: Degraded RAID on relforge1001 - https://phabricator.wikimedia.org/T156663#2982954 (10ops-monitoring-bot)
[19:18:41] <wikibugs>	 06Operations, 10hardware-requests, 06Services (watching), 15User-mobrovac: Site: 2 hardware access request for SCB@CODFW - https://phabricator.wikimedia.org/T156631#2982960 (10RobH) a:05RobH>03mark I currently have a total of 5 spare pool systems in codfw.  Of these 5, 3 of them may meet the specificat...
[19:23:13] <wikibugs>	 06Operations, 10Monitoring: ganglia graphs should not have "N" as units - https://phabricator.wikimedia.org/T81659#2982979 (10fgiunchedi) 05Open>03declined We're replacing Ganglia with Prometheus
[19:31:37] <wikibugs>	 (03PS4) 10ArielGlenn: Move default config into a file [dumps] - 10https://gerrit.wikimedia.org/r/43156 (owner: 10Awight)
[19:41:19] <matanya>	 Urbanecm: did you get the block fix swatted ?
[19:42:08] <thcipriani>	 matanya: no swat at the moment
[19:42:11] <wikibugs>	 (03CR) 10ArielGlenn: "After an irc chat with awight, here's the compromise no one likes :-D" [dumps] - 10https://gerrit.wikimedia.org/r/43156 (owner: 10Awight)
[19:42:16] <thcipriani>	 fixing something with scap, sorry :(
[19:44:33] <matanya>	 thanks thcipriani
[19:44:47] <wikibugs>	 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: Monitor Certificate Transparency (CT) logs - https://phabricator.wikimedia.org/T155807#2983086 (10faidon) This is the example output from just a few moments ago: ``` faidon@einsteinium:~$ certspotter  9a5646f8202e95c8df870ee1d36267fddd70ab5471060509b58f...
[19:44:50] <gehel>	 !log deploying latest wdqs gui
[19:44:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:46:33] <logmsgbot>	 !log gehel@tin Started deploy [wdqs/wdqs@81442a0]: (no justification provided)
[19:46:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:47:56] <logmsgbot>	 !log gehel@tin Finished deploy [wdqs/wdqs@81442a0]: (no justification provided) (duration: 01m 23s)
[19:47:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:11] <gehel>	 SMalyshev: ^
[19:52:12] <ostriches>	 Yay, love my new logging shame :p
[20:00:04] <jouncebot>	 tgr, dr0ptp4kt, and bblack: Dear anthropoid, the time has come. Please deploy Zero (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T2000).
[20:00:15] <dr0ptp4kt>	 here
[20:00:23] <wikibugs>	 06Operations: upgrade netmon1001 to jessie - https://phabricator.wikimedia.org/T125020#2983163 (10RobH)
[20:00:26] <wikibugs>	 06Operations, 10hardware-requests: hardware request for netmon1001 - https://phabricator.wikimedia.org/T156040#2962228 (10RobH) 05Open>03stalled I've created a task (T156667) for the quotation for a replacement system.
[20:01:45] <tgr>	 bblack: OK to do the JsonConfig update now?
[20:02:14] <godog>	 please wait, some scap debugging on, cc thcipriani 
[20:02:34] <godog>	 tgr: ^
[20:11:18] <icinga-wm>	 PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:12:01] <Pchelolo>	 !log update RESTBase to 501ea47edc in staging
[20:12:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:27:06] <Pchelolo>	 FYI: Started an html dump on xenon
[20:30:49] <wikibugs>	 06Operations, 13Patch-For-Review: fix log reading permissions for dc-ops admin group - https://phabricator.wikimedia.org/T156529#2983355 (10Dzahn) 05Open>03Resolved confirmed by papaul on install1001, and install2001 i see the same sudo rules, so should be resolved.
[20:39:18] <icinga-wm>	 RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[20:40:19] <wikibugs>	 06Operations, 10Analytics, 10Analytics-Cluster, 06Research-and-Data, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2983412 (10RobH) Please note that since this is no longer an active hardware request, I'm going to remove the #project so we don't get used...
[20:40:23] <tgr>	 dr0ptp4kt: I'll reschedule
[20:40:51] <dr0ptp4kt>	 tgr: thx
[20:43:00] <logmsgbot>	 !log mobrovac@tin Started deploy [trending-edits/deploy@9addcd0]: Bump max_age to 18h for T156411
[20:43:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:43:05] <stashbot>	 T156411: Compute the trending articles over a period of 24h rather than 1h - https://phabricator.wikimedia.org/T156411
[20:44:26] <wikibugs>	 (03CR) 10Yuvipanda: [C: 032] kubernetesbackend: change absolute kubectl path to '/usr/bin/kubectl' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/334978 (https://phabricator.wikimedia.org/T156605) (owner: 10Zhuyifei1999)
[20:44:50] <marktraceur>	 dapatrick: Can I get you to say "yes" on https://phabricator.wikimedia.org/T132063 please?
[20:45:04] <wikibugs>	 (03Merged) 10jenkins-bot: kubernetesbackend: change absolute kubectl path to '/usr/bin/kubectl' [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/334978 (https://phabricator.wikimedia.org/T156605) (owner: 10Zhuyifei1999)
[20:45:19] <marktraceur>	 Or, you know, "you're a terrible human being and this needs more work", whatever works
[20:45:40] <logmsgbot>	 !log mobrovac@tin Finished deploy [trending-edits/deploy@9addcd0]: Bump max_age to 18h for T156411 (duration: 02m 39s)
[20:45:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:03] <wikibugs>	 (03PS3) 10Thcipriani: Scap: Bump version to 3.5.1-1 [puppet] - 10https://gerrit.wikimedia.org/r/334677 (https://phabricator.wikimedia.org/T127762)
[20:46:18] <icinga-wm>	 PROBLEM - trendingedits endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=6699): Max retries exceeded with url: /?spec (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7fa2da0aa890: Failed to establish a new connection: [Errno 111] Connection refused,))
[20:47:09] <icinga-wm>	 RECOVERY - trendingedits endpoints health on scb1001 is OK: All endpoints are healthy
[20:47:12] <mobrovac>	 known ^
[20:50:23] <logmsgbot>	 !log thcipriani@tin Synchronized README: test scap (duration: 00m 43s)
[20:50:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:38] <godog>	 !log uploaded scap 3.5.1-1
[20:50:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:50:57] <icinga-wm>	 ACKNOWLEDGEMENT - trendingedits endpoints health on scb1004 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.48.29, port=6699): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) Marko Obrovac configuration issues, working on it
[20:52:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Scap: Bump version to 3.5.1-1 [puppet] - 10https://gerrit.wikimedia.org/r/334677 (https://phabricator.wikimedia.org/T127762) (owner: 10Thcipriani)
[20:54:48] <icinga-wm>	 RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[20:54:59] <dapatrick>	 marktraceur, Done.
[20:55:20] <marktraceur>	 dapatrick: Super, thanks a lot!
[20:55:34] <marktraceur>	 I think I still need to get a review for the extension, but at least this part is done...
[20:55:47] <logmsgbot>	 !log mobrovac@tin Started deploy [trending-edits/deploy@5735f00]: Bump memory limit and heartbeat timeout
[20:55:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:57:35] <logmsgbot>	 !log mobrovac@tin Finished deploy [trending-edits/deploy@5735f00]: Bump memory limit and heartbeat timeout (duration: 01m 48s)
[20:57:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:58:22] <wikibugs>	 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2983488 (10Tgr)
[21:00:04] <jouncebot>	 gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T2100).
[21:06:05] <logmsgbot>	 !log mobrovac@tin Started deploy [trending-edits/deploy@5735f00]: (no justification provided)
[21:06:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:12] <logmsgbot>	 !log mobrovac@tin Finished deploy [trending-edits/deploy@5735f00]: (no justification provided) (duration: 03m 07s)
[21:09:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:10:00] <bearND>	 no mobileapps deploy today
[21:10:01] <logmsgbot>	 !log mobrovac@tin Started deploy [trending-edits/deploy@5735f00]: (no justification provided)
[21:10:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:13:14] <logmsgbot>	 !log mobrovac@tin Finished deploy [trending-edits/deploy@5735f00]: (no justification provided) (duration: 03m 13s)
[21:13:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:13:22] <wikibugs>	 06Operations, 06Multimedia, 10Wikimedia-Site-requests, 07Performance: Choose a sensible set of thumbnail sizes for Special:Preferences - https://phabricator.wikimedia.org/T106640#2983536 (10Quiddity) There's a larger list of options (not including the one above) at https://www.mediawiki.org/wiki/Requests_f...
[21:42:53] <wikibugs>	 (03PS1) 10EBernhardson: [WIP] Drop mediawiki logs in HDFS after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/335140
[21:43:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Drop mediawiki logs in HDFS after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/335140 (owner: 10EBernhardson)
[21:47:35] <wikibugs>	 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2983643 (10Tgr)
[21:48:31] <wikibugs>	 (03PS2) 10EBernhardson: [WIP] Drop mediawiki logs in HDFS after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/335140
[21:49:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Drop mediawiki logs in HDFS after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/335140 (owner: 10EBernhardson)
[21:51:48] <icinga-wm>	 PROBLEM - carbon-cache@c service on graphite1003 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@c is failed
[21:51:48] <icinga-wm>	 PROBLEM - Check systemd state on graphite1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[21:54:48] <icinga-wm>	 RECOVERY - carbon-cache@c service on graphite1003 is OK: OK - carbon-cache@c is active
[21:54:48] <icinga-wm>	 RECOVERY - Check systemd state on graphite1003 is OK: OK - running: The system is fully operational
[21:54:48] <icinga-wm>	 PROBLEM - puppet last run on graphite1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:00:04] <jouncebot>	 dapatrick, bawolff, and Reedy: Dear anthropoid, the time has come. Please deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170130T2200).
[22:01:09] <wikibugs>	 (03PS3) 10EBernhardson: [WIP] Drop mediawiki logs in HDFS after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/335140
[22:01:16] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Drop mediawiki logs in HDFS after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/335140 (owner: 10EBernhardson)
[22:07:48] <icinga-wm>	 PROBLEM - puppet last run on elastic1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:22:18] <icinga-wm>	 PROBLEM - Disk space on elastic1019 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 63776 MB (12% inode=99%)
[22:22:48] <icinga-wm>	 RECOVERY - puppet last run on graphite1002 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[22:30:51] <matanya>	 thcipriani: would be nice if you can update https://wikitech.wikimedia.org/wiki/How_to_deploy_code
[22:31:09] * thcipriani looks
[22:32:48] <icinga-wm>	 PROBLEM - puppet last run on labservices1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/usr/local/bin/labs-ip-alias-dump.py]
[22:33:14] <thcipriani>	 matanya: will do, thanks for pointing it out. Most commands should still work the same, I'll get rid of references for scap sync-dir though.
[22:33:37] <matanya>	 yeah, that was my main point, thanks for the release
[22:33:48] <icinga-wm>	 PROBLEM - puppet last run on rcs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:34:48] <icinga-wm>	 RECOVERY - puppet last run on elastic1033 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures
[22:39:18] <icinga-wm>	 RECOVERY - Disk space on elastic1019 is OK: DISK OK
[22:55:48] <icinga-wm>	 PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/usr/local/bin/labs-ip-alias-dump.py]
[22:58:05] <ostriches>	 thcipriani: Relatedly... https://wikitech.wikimedia.org/wiki/How_to_deploy_code#A_note_on_JavaScript_and_CSS seems like a completely useless section
[22:58:15] <wikibugs>	 06Operations, 10Wikimedia-Stream: Error on RCSteam server startup for the "flash policy server" - https://phabricator.wikimedia.org/T153770#2983908 (10Krinkle)
[22:58:49] <thcipriani>	 I've pointed folks at that section in the past
[22:59:35] <ostriches>	 People think we have to re-minify CSS/JS by hand somewhere?
[22:59:39] <ostriches>	 Ok...nvm then....
[23:00:25] <ostriches>	 matanya: For what it's worth, `scap sync-dir` still works, it's just a back-compat alias and is hidden from scap's help
[23:00:33] <ostriches>	 (hidden option)
[23:00:46] <ostriches>	 But yes, doc improvements to remove references are good :)
[23:00:50] <wikibugs>	 06Operations, 10Wikimedia-Stream: rcstream service - gevent dependency incompatibility - https://phabricator.wikimedia.org/T153773#2890586 (10Krinkle)
[23:01:15] <matanya>	 ostriches: i am just nagging around as usual ;) 
[23:01:33] <ostriches>	 Oh no worries, I'm just adding noise/background :)
[23:01:48] <icinga-wm>	 RECOVERY - puppet last run on rcs1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[23:01:55] <matanya>	 we can build a band
[23:03:46] <MatmaRex>	 ostriches: thcipriani: 2011 ;) https://wikitech.wikimedia.org/w/index.php?title=How_to_deploy_code&diff=48117&oldid=48116
[23:04:21] <ostriches>	 Yes, I remember $wgStyleVersion :)
[23:04:47] <ostriches>	 "there is no need to e.g manually do a "build" (to re-minify/re-cache static files)" -- style version didn't really do that :)
[23:05:01] <ostriches>	 Eh, cache busted, I suppose
[23:05:05] <ostriches>	 But minify feels wrong there
[23:05:08] <ostriches>	 Oh well, I'm nitpicking
[23:05:31] <ostriches>	 Also: I miss pre-RL days :(
[23:06:15] <MatmaRex>	 ostriches: since this was edited by neilk, i have extra context for you: UploadWizard use to have manually minified JS and CSS code.
[23:06:29] <MatmaRex>	 (neilk worked on it)
[23:06:58] <ostriches>	 Silly uploadwizard
[23:07:01] <ostriches>	 :D
[23:08:27] <wikibugs>	 (03CR) 10Krinkle: "+2 for stream.wm.o re-use. However this means it'll have to route after DNS/LVS, at the Varnish level. Since there appears to be a separat" [puppet] - 10https://gerrit.wikimedia.org/r/322954 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata)
[23:13:58] <wikibugs>	 (03CR) 10Krinkle: "(meant +1 :))" [puppet] - 10https://gerrit.wikimedia.org/r/322954 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata)
[23:34:18] <icinga-wm>	 PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:58:59] <wikibugs>	 06Operations, 10ops-codfw, 10hardware-requests: decomission db2015 - https://phabricator.wikimedia.org/T149102#2984081 (10RobH)