[00:02:39] <icinga-wm>	 RECOVERY - puppet last run on ganeti1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[00:07:05] <wikibugs>	 (03PS3) 10ArielGlenn: Clean up temp files from page content dumps before retry [dumps] - 10https://gerrit.wikimedia.org/r/336849
[00:07:19] <icinga-wm>	 RECOVERY - puppet last run on db1075 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[00:25:19] <icinga-wm>	 PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:29:39] <icinga-wm>	 PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:38:19] <icinga-wm>	 PROBLEM - puppet last run on nitrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:53:20] <icinga-wm>	 RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[00:57:39] <icinga-wm>	 RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[01:07:19] <icinga-wm>	 RECOVERY - puppet last run on nitrogen is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[02:01:19] <icinga-wm>	 PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:19:48] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.12) (duration: 07m 53s)
[02:19:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:25:07] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Feb 20 02:25:07 UTC 2017 (duration 5m 19s)
[02:25:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:30:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be1008 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[03:07:49] <icinga-wm>	 PROBLEM - puppet last run on labvirt1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:14:39] <icinga-wm>	 PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:20:29] <icinga-wm>	 PROBLEM - SSH on bast3001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[03:20:49] <icinga-wm>	 PROBLEM - DPKG on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:20:49] <icinga-wm>	 PROBLEM - Disk space on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:20:49] <icinga-wm>	 PROBLEM - dhclient process on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:20:49] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:20:49] <icinga-wm>	 PROBLEM - Check size of conntrack table on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:20:50] <icinga-wm>	 PROBLEM - Check systemd state on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:20:50] <icinga-wm>	 PROBLEM - puppet last run on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:20:51] <icinga-wm>	 PROBLEM - configured eth on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:20:51] <icinga-wm>	 PROBLEM - salt-minion processes on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:21:09] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 791.44 seconds
[03:21:39] <icinga-wm>	 RECOVERY - Check size of conntrack table on bast3001 is OK: OK: nf_conntrack is 0 % full
[03:21:40] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on bast3001 is OK: OK ferm input default policy is set
[03:21:40] <icinga-wm>	 RECOVERY - Check systemd state on bast3001 is OK: OK - running: The system is fully operational
[03:21:40] <icinga-wm>	 RECOVERY - Disk space on bast3001 is OK: DISK OK
[03:21:40] <icinga-wm>	 RECOVERY - DPKG on bast3001 is OK: All packages OK
[03:21:40] <icinga-wm>	 RECOVERY - dhclient process on bast3001 is OK: PROCS OK: 0 processes with command name dhclient
[03:21:40] <icinga-wm>	 RECOVERY - salt-minion processes on bast3001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[03:21:41] <icinga-wm>	 RECOVERY - configured eth on bast3001 is OK: OK - interfaces up
[03:21:41] <icinga-wm>	 RECOVERY - puppet last run on bast3001 is OK: OK: Puppet is currently enabled, last run 35 minutes ago with 0 failures
[03:22:19] <icinga-wm>	 RECOVERY - SSH on bast3001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[03:29:09] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 259.28 seconds
[03:29:29] <icinga-wm>	 PROBLEM - puppet last run on mc1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:36:49] <icinga-wm>	 RECOVERY - puppet last run on labvirt1003 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[03:42:40] <icinga-wm>	 RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[03:57:29] <icinga-wm>	 RECOVERY - puppet last run on mc1022 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[04:17:19] <icinga-wm>	 PROBLEM - puppet last run on es1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:45:19] <icinga-wm>	 RECOVERY - puppet last run on es1014 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[04:54:39] <icinga-wm>	 PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:34:39] <icinga-wm>	 PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:52:40] <icinga-wm>	 RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[06:03:39] <icinga-wm>	 RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[06:45:19] <icinga-wm>	 PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:14:19] <icinga-wm>	 RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[07:20:00] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Depool db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338707 (https://phabricator.wikimedia.org/T132416)
[07:23:29] <icinga-wm>	 PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.179 second response time
[07:25:29] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338707 (https://phabricator.wikimedia.org/T132416) (owner: 10Marostegui)
[07:27:09] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Depool db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338707 (https://phabricator.wikimedia.org/T132416) (owner: 10Marostegui)
[07:27:18] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Depool db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338707 (https://phabricator.wikimedia.org/T132416) (owner: 10Marostegui)
[07:28:34] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2048 - T132416 (duration: 00m 41s)
[07:28:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:28:41] <stashbot>	 T132416: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416
[07:29:58] <marostegui>	 !log Deploy alter table on db2048 enwiki.revision - T132416
[07:30:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:31:42] <wikibugs>	 (03PS1) 10Marostegui: db-codfw.php: Update ticket for db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338709
[07:36:36] <wikibugs>	 (03CR) 10Marostegui: [C: 032] db-codfw.php: Update ticket for db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338709 (owner: 10Marostegui)
[07:37:42] <wikibugs>	 (03Merged) 10jenkins-bot: db-codfw.php: Update ticket for db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338709 (owner: 10Marostegui)
[07:37:50] <wikibugs>	 (03CR) 10jenkins-bot: db-codfw.php: Update ticket for db2048 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338709 (owner: 10Marostegui)
[07:39:47] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Update ticket number for db2048 depool reason (duration: 00m 44s)
[07:39:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:50:29] <icinga-wm>	 RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.165 second response time
[08:16:19] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] Clean up temp files from page content dumps before retry [dumps] - 10https://gerrit.wikimedia.org/r/336849 (owner: 10ArielGlenn)
[08:17:08] <logmsgbot>	 !log ariel@tin Started deploy [dumps/dumps@d50e129]: cleanup tmp files before checkpoint file rerun
[08:17:10] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@d50e129]: cleanup tmp files before checkpoint file rerun (duration: 00m 02s)
[08:17:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:17:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:33:39] <icinga-wm>	 PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:41:30] <marostegui>	 !log Increase 100G dbstore1002 lv /dev/mapper/tank-data
[08:41:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:47:09] <gehel>	 !log restarting diamond on wdqs1002 after initial data import
[08:47:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:56:29] <icinga-wm>	 PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:37] <wikibugs>	 (03PS4) 10Marostegui: mariadb: Add gtid_domain_id to s6 [puppet] - 10https://gerrit.wikimedia.org/r/335816 (https://phabricator.wikimedia.org/T149418)
[09:02:39] <icinga-wm>	 RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[09:10:13] <wikibugs>	 (03PS1) 10Muehlenhoff: Blacklist kernel modules for DCCP protocol [puppet] - 10https://gerrit.wikimedia.org/r/338720
[09:15:33] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] mariadb: Add gtid_domain_id to s6 [puppet] - 10https://gerrit.wikimedia.org/r/335816 (https://phabricator.wikimedia.org/T149418) (owner: 10Marostegui)
[09:20:03] <wikibugs>	 (03CR) 10Marostegui: "Compiles fine after the file path change: https://puppet-compiler.wmflabs.org/5506/" [puppet] - 10https://gerrit.wikimedia.org/r/335816 (https://phabricator.wikimedia.org/T149418) (owner: 10Marostegui)
[09:20:04] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on restbase-dev1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Filippo Giunchedi raid degraded - T157425
[09:20:04] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on restbase-dev1001 is CRITICAL: CRITICAL: State: degraded, Active: 11, Working: 11, Failed: 1, Spare: 0 Filippo Giunchedi raid degraded - T157425
[09:20:04] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a CQL 10.64.0.36:9042 on restbase-dev1001 is CRITICAL: connect to address 10.64.0.36 and port 9042: Connection refused Filippo Giunchedi raid degraded - T157425
[09:20:04] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a SSL 10.64.0.36:7001 on restbase-dev1001 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused Filippo Giunchedi raid degraded - T157425
[09:20:04] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-a service on restbase-dev1001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed Filippo Giunchedi raid degraded - T157425
[09:20:04] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-b CQL 10.64.0.37:9042 on restbase-dev1001 is CRITICAL: connect to address 10.64.0.37 and port 9042: Connection refused Filippo Giunchedi raid degraded - T157425
[09:20:04] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-b SSL 10.64.0.37:7001 on restbase-dev1001 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused Filippo Giunchedi raid degraded - T157425
[09:20:05] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-b service on restbase-dev1001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-b is failed Filippo Giunchedi raid degraded - T157425
[09:20:05] <icinga-wm>	 ACKNOWLEDGEMENT - puppet last run on restbase-dev1001 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 15 minutes ago with 3 failures. Failed resources (up to 3 shown): Service[cassandra-b],Service[cassandra-a],File[/srv/log/restbase/syslog.log] Filippo Giunchedi raid degraded - T157425
[09:21:29] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "I would instead point to Wikitech which has far more informations: https://wikitech.wikimedia.org/wiki/Jouncebot" (031 comment) [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/338700 (owner: 10Zppix)
[09:25:29] <icinga-wm>	 RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:25:52] <wikibugs>	 (03CR) 10Marostegui: [C: 032] mariadb: Add gtid_domain_id to s6 [puppet] - 10https://gerrit.wikimedia.org/r/335816 (https://phabricator.wikimedia.org/T149418) (owner: 10Marostegui)
[09:28:26] <wikibugs>	 (03PS1) 10Jcrespo: Add python3-mysql for the mariadb client servers [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/338721
[09:31:32] <wikibugs>	 (03PS2) 10Jcrespo: Add python3-pymysql for the mariadb client servers [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/338721
[09:32:38] <wikibugs>	 (03CR) 10Marostegui: [C: 031] Add python3-pymysql for the mariadb client servers [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/338721 (owner: 10Jcrespo)
[09:32:58] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] Add python3-pymysql for the mariadb client servers [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/338721 (owner: 10Jcrespo)
[09:33:13] <marostegui>	 !log Manually deploy gtid_domain_id on s6 hosts - T149418
[09:33:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:33:18] <stashbot>	 T149418: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418
[09:35:02] <wikibugs>	 (03PS1) 10Jcrespo: Rebase mariadb module to the latest version [puppet] - 10https://gerrit.wikimedia.org/r/338722
[09:37:04] <wikibugs>	 (03CR) 10Jcrespo: [C: 032] Rebase mariadb module to the latest version [puppet] - 10https://gerrit.wikimedia.org/r/338722 (owner: 10Jcrespo)
[09:40:57] <wikibugs>	 (03CR) 10Gehel: [C: 031] "It very much makes sense to not lint external code..." [puppet] - 10https://gerrit.wikimedia.org/r/338143 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar)
[09:41:48] <wikibugs>	 (03CR) 10Hashar: "And I have another patch to let us run the syntax checks with Puppet 4.x  :-}" [puppet] - 10https://gerrit.wikimedia.org/r/338143 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar)
[09:46:40] <moritzm>	 !log upgrading mediawiki servers in codfw to HHVM 3.12.14
[09:46:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:52:10] <wikibugs>	 (03CR) 10Zfilipin: [C: 031] build: allow usage of a different puppet version [puppet] - 10https://gerrit.wikimedia.org/r/338633 (owner: 10Hashar)
[09:52:33] <hashar>	 :}
[10:00:05] <wikibugs>	 (03PS4) 10Hashar: syntax: ignore stdlib Puppet 4 manifests [puppet] - 10https://gerrit.wikimedia.org/r/338143 (https://phabricator.wikimedia.org/T154894)
[10:00:07] <wikibugs>	 (03PS3) 10Hashar: build: allow usage of a different puppet version [puppet] - 10https://gerrit.wikimedia.org/r/338633
[10:01:13] <wikibugs>	 (03PS1) 10Urbanecm: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338726 (https://phabricator.wikimedia.org/T158432)
[10:01:40] <wikibugs>	 (03CR) 10Hashar: "I made it so the PuppetSyntax ignores are only set for Puppet below 4.   With the follow up change https://gerrit.wikimedia.org/r/#/c/3386" [puppet] - 10https://gerrit.wikimedia.org/r/338143 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar)
[10:01:58] <Urbanecm>	 hashar, can you deploy 338726 please?
[10:02:08] <wikibugs>	 (03PS1) 10ArielGlenn: fix prefetch setup for retries of content file dump steps [dumps] - 10https://gerrit.wikimedia.org/r/338728
[10:02:10] <Urbanecm>	 Or anybody else
[10:02:17] <hashar>	 Urbanecm: url ?  and not right now sorry
[10:02:21] <hashar>	 in a meeting
[10:02:31] <Urbanecm>	 https://gerrit.wikimedia.org/r/338726
[10:02:42] <Urbanecm>	 It's a last-minute throttle rule for T158432
[10:02:42] <stashbot>	 T158432: Lift IP registration cap for an event on 2017-02-20 [IP address currently unknown] - https://phabricator.wikimedia.org/T158432
[10:02:44] <Urbanecm>	 hashar, ^
[10:03:57] <Urbanecm>	 Oh, sorry, I didn't noticed the "not right". I'll try to find anyone else now...
[10:04:04] <Urbanecm>	 *someone
[10:04:31] <hashar>	 ah easy
[10:05:58] <Urbanecm>	 So would you? Or what should I do?
[10:06:02] <Urbanecm>	 If something
[10:08:00] <tabbycat>	 maybe people should face que consecuences of not being diligent enough to request those IP cap lifts in due time
[10:08:12] <tabbycat>	 just sayi'n
[10:08:38] <tabbycat>	 s/que/the
[10:11:17] <wikibugs>	 (03PS1) 10Muehlenhoff: Removed LDAP access for siddarth11 [puppet] - 10https://gerrit.wikimedia.org/r/338731
[10:11:20] <wikibugs>	 (03PS2) 10ArielGlenn: fix prefetch setup for retries of content file dump steps [dumps] - 10https://gerrit.wikimedia.org/r/338728
[10:11:57] <Urbanecm>	 tabbycat, yeah, they should. They should give IPs in the due time at least. But how to tell it them...
[10:12:23] <wikibugs>	 (03PS1) 10Ema: Release 4.1.5-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/338732
[10:13:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Release 4.1.5-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/338732 (owner: 10Ema)
[10:15:58] <ema>	 hashar: mmh there seems to be something wrong with the debian-glue jenkins job ^
[10:16:01] <ema>	 https://integration.wikimedia.org/ci/job/debian-glue/620/consoleText
[10:16:09] <ema>	 E: Unknown operation: --buildresult
[10:16:53] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Revert "diamond: switch to graphite2001" [puppet] - 10https://gerrit.wikimedia.org/r/338733 (https://phabricator.wikimedia.org/T157022)
[10:18:57] <wikibugs>	 (03CR) 10Hashar: [C: 032] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338726 (https://phabricator.wikimedia.org/T158432) (owner: 10Urbanecm)
[10:19:20] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Update to 4.4.49 [debs/linux44] - 10https://gerrit.wikimedia.org/r/338358 (owner: 10Muehlenhoff)
[10:20:26] <wikibugs>	 (03Merged) 10jenkins-bot: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338726 (https://phabricator.wikimedia.org/T158432) (owner: 10Urbanecm)
[10:20:34] <wikibugs>	 (03CR) 10jenkins-bot: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338726 (https://phabricator.wikimedia.org/T158432) (owner: 10Urbanecm)
[10:24:04] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/throttle.php: Add new throttle rule - T158432 (duration: 00m 49s)
[10:24:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:11] <stashbot>	 T158432: Lift IP registration cap for an event on 2017-02-20 [IP address currently unknown] - https://phabricator.wikimedia.org/T158432
[10:24:21] <hashar>	 Urbanecm: done !
[10:24:47] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] fix prefetch setup for retries of content file dump steps [dumps] - 10https://gerrit.wikimedia.org/r/338728 (owner: 10ArielGlenn)
[10:25:56] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Removed LDAP access for siddarth11 [puppet] - 10https://gerrit.wikimedia.org/r/338731 (owner: 10Muehlenhoff)
[10:26:31] <logmsgbot>	 !log ariel@tin Started deploy [dumps/dumps@dee43ca]: fix prefetch on retries of partially complete page content dumps
[10:26:33] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@dee43ca]: fix prefetch on retries of partially complete page content dumps (duration: 00m 02s)
[10:26:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:10] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Add gtid_domain_id to s2 [puppet] - 10https://gerrit.wikimedia.org/r/338734
[10:28:22] <wikibugs>	 (03CR) 10Gehel: [C: 031] "makes sense to me..." [puppet] - 10https://gerrit.wikimedia.org/r/338633 (owner: 10Hashar)
[10:28:35] <wikibugs>	 (03CR) 10Marostegui: [C: 04-1] "Wait a few days till we are sure s6 is fine" [puppet] - 10https://gerrit.wikimedia.org/r/338734 (owner: 10Marostegui)
[10:36:22] <wikibugs>	 (03PS2) 10Marostegui: mariadb: Add gtid_domain_id to s2 [puppet] - 10https://gerrit.wikimedia.org/r/338734 (https://phabricator.wikimedia.org/T149418)
[10:38:36] <wikibugs>	 (03CR) 10Marostegui: [C: 04-1] "Compiles fine and only changes s2 hosts https://puppet-compiler.wmflabs.org/5507/" [puppet] - 10https://gerrit.wikimedia.org/r/338734 (https://phabricator.wikimedia.org/T149418) (owner: 10Marostegui)
[10:43:40] <icinga-wm>	 PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg]
[10:54:05] <moritzm>	 !log rolling restart of nginx on remaining mediawiki servers in eqiad to pick up openssl update
[10:54:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:56:12] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Revert "diamond: switch to graphite2001" [puppet] - 10https://gerrit.wikimedia.org/r/338733 (https://phabricator.wikimedia.org/T157022)
[10:59:37] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] Revert "diamond: switch to graphite2001" [puppet] - 10https://gerrit.wikimedia.org/r/338733 (https://phabricator.wikimedia.org/T157022) (owner: 10Filippo Giunchedi)
[11:00:11] <godog>	 !log switch diamond traffic to graphite1001 - T157022
[11:00:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:00:16] <stashbot>	 T157022: Suspected faulty SSD on graphite1001 - https://phabricator.wikimedia.org/T157022
[11:01:33] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "A small bug and a couple of other comments inline." (033 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823) (owner: 10Giuseppe Lavagetto)
[11:04:49] <wikibugs>	 (03PS2) 10Ema: Release 4.1.5-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/338732
[11:04:58] <wikibugs>	 (03CR) 10Hashar: build: allow usage of a different puppet version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/338633 (owner: 10Hashar)
[11:05:07] <wikibugs>	 (03PS1) 10Addshore: Enable TwoColConflict extension on arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338738 (https://phabricator.wikimedia.org/T158493)
[11:11:39] <icinga-wm>	 RECOVERY - puppet last run on mw2112 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[11:13:01] <wikibugs>	 (03PS21) 10Volans: Cumin: allow connection to the targets [puppet] - 10https://gerrit.wikimedia.org/r/330436 (https://phabricator.wikimedia.org/T154588)
[11:15:20] <tabbycat>	 Urbanecm: simply "declined - can't be done, please provide the data with at least 10 days in advance"
[11:15:22] <tabbycat>	 ;)
[11:23:36] <wikibugs>	 (03PS1) 10Filippo Giunchedi: cache: move graphite/performance to graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/338745 (https://phabricator.wikimedia.org/T157022)
[11:32:48] <wikibugs>	 (03PS1) 10ArielGlenn: make empty list check shorter and clearer [dumps] - 10https://gerrit.wikimedia.org/r/338747
[11:39:17] <wikibugs>	 (03CR) 10Harej: [C: 031] Bump timeout to 1 minute [puppet] - 10https://gerrit.wikimedia.org/r/338473 (https://phabricator.wikimedia.org/T158184) (owner: 10Smalyshev)
[11:49:48] <wikibugs>	 (03CR) 10Hashar: "recheck" [software/cumin] - 10https://gerrit.wikimedia.org/r/338382 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans)
[11:50:48] <wikibugs>	 (03CR) 10Hashar: "recheck" [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans)
[11:52:24] <wikibugs>	 (03CR) 10Zfilipin: [C: 031] build: allow usage of a different puppet version (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/338633 (owner: 10Hashar)
[12:00:47] <wikibugs>	 (03PS2) 10Filippo Giunchedi: cache: move graphite/performance to graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/338745 (https://phabricator.wikimedia.org/T157022)
[12:00:49] <wikibugs>	 (03PS1) 10Filippo Giunchedi: udpmirror: encode line before sending [puppet] - 10https://gerrit.wikimedia.org/r/338750
[12:03:04] <wikibugs>	 (03PS2) 10Filippo Giunchedi: udpmirror: encode line before sending [puppet] - 10https://gerrit.wikimedia.org/r/338750
[12:05:58] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] udpmirror: encode line before sending [puppet] - 10https://gerrit.wikimedia.org/r/338750 (owner: 10Filippo Giunchedi)
[12:07:23] <wikibugs>	 (03PS1) 10MarcoAurelio: Configuration changes for wikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338751 (https://phabricator.wikimedia.org/T158516)
[12:11:23] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] Blacklist kernel modules for DCCP protocol [puppet] - 10https://gerrit.wikimedia.org/r/338720 (owner: 10Muehlenhoff)
[12:12:27] <wikibugs>	 (03PS2) 10Muehlenhoff: ldap::client::utils: Move to require_package [puppet] - 10https://gerrit.wikimedia.org/r/338320
[12:12:58] <wikibugs>	 (03PS2) 10Muehlenhoff: Blacklist kernel modules for DCCP protocol [puppet] - 10https://gerrit.wikimedia.org/r/338720
[12:14:59] <icinga-wm>	 PROBLEM - Disk space on graphite1001 is CRITICAL: DISK CRITICAL - free space: / 918 MB (2% inode=97%)
[12:15:37] <volans>	 godog: FYI ^^^
[12:16:54] <godog>	 sigh, thanks volans 
[12:17:02] <volans>	 21G    daemon.log
[12:17:03] <volans>	 21G    syslog
[12:17:08] <volans>	 :(
[12:17:16] <volans>	 it's full :/
[12:17:28] <tabbycat>	 delete?
[12:17:39] <volans>	 seems Too many open files godog 
[12:17:51] <godog>	 yeah I'm looking too
[12:18:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Blacklist kernel modules for DCCP protocol [puppet] - 10https://gerrit.wikimedia.org/r/338720 (owner: 10Muehlenhoff)
[12:19:59] <icinga-wm>	 RECOVERY - Disk space on graphite1001 is OK: DISK OK
[12:20:56] <godog>	 !log remove syslog from graphite1001, bump max open files for carbon-c-relay
[12:21:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:19] <wikibugs>	 (03PS3) 10Filippo Giunchedi: cache: move graphite/performance to graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/338745 (https://phabricator.wikimedia.org/T157022)
[12:26:21] <wikibugs>	 (03PS1) 10Filippo Giunchedi: graphite: increase maximum open files for frontend carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/338753
[12:26:27] <wikibugs>	 (03PS1) 10ArielGlenn: for page content dumps, for each numbered part do either ranges or whole dump [dumps] - 10https://gerrit.wikimedia.org/r/338754 (https://phabricator.wikimedia.org/T158517)
[12:30:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] graphite: increase maximum open files for frontend carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/338753 (owner: 10Filippo Giunchedi)
[12:31:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] for page content dumps, for each numbered part do either ranges or whole dump [dumps] - 10https://gerrit.wikimedia.org/r/338754 (https://phabricator.wikimedia.org/T158517) (owner: 10ArielGlenn)
[12:32:09] <icinga-wm>	 PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 60807 MB (12% inode=99%)
[12:37:11] <wikibugs>	 (03PS1) 10DCausse: Elastic 5.2.1 plugins [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/338756
[12:39:05] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 032] Add debian/ directory for packaging (031 comment) [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans)
[12:40:51] <wikibugs>	 (03CR) 10DCausse: [C: 04-1] Elastic 5.2.1 plugins [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/338756 (owner: 10DCausse)
[12:41:54] <wikibugs>	 (03Merged) 10jenkins-bot: Add debian/ directory for packaging [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans)
[12:45:09] <icinga-wm>	 PROBLEM - Disk space on elastic1018 is CRITICAL: DISK CRITICAL - free space: /srv 62276 MB (12% inode=99%)
[12:45:31] <tabbycat>	 godog: ^?
[12:53:17] <wikibugs>	 (03PS2) 10ArielGlenn: for page content dumps, for each numbered part do either ranges or whole dump [dumps] - 10https://gerrit.wikimedia.org/r/338754 (https://phabricator.wikimedia.org/T158517)
[12:56:03] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 04-1] "This code is ugly, as it lookupvars() the facts twice. You should integrate the check better with the rest of the code." [puppet] - 10https://gerrit.wikimedia.org/r/308882 (owner: 10Hashar)
[12:56:29] <wikibugs>	 (03PS2) 10Faidon Liambotis: Move the Diamond NTP collector to ntp::daemon [puppet] - 10https://gerrit.wikimedia.org/r/338333 (owner: 10Muehlenhoff)
[12:56:42] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 032] Move the Diamond NTP collector to ntp::daemon [puppet] - 10https://gerrit.wikimedia.org/r/338333 (owner: 10Muehlenhoff)
[12:59:18] <wikibugs>	 (03PS2) 10Faidon Liambotis: mirrors: update archvsync to 20170204 [puppet] - 10https://gerrit.wikimedia.org/r/338383
[13:00:47] <wikibugs>	 (03CR) 10Faidon Liambotis: [V: 032 C: 032] mirrors: update archvsync to 20170204 [puppet] - 10https://gerrit.wikimedia.org/r/338383 (owner: 10Faidon Liambotis)
[13:03:53] <Urbanecm>	 jouncebot, next
[13:03:53] <jouncebot>	 No deployments scheduled for the forseeable future!
[13:04:50] <moritzm>	 !log installing jasper security updates
[13:04:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:52] <wikibugs>	 (03PS2) 10Faidon Liambotis: apt.w.o: redirect / to wikitech article [puppet] - 10https://gerrit.wikimedia.org/r/330140 (owner: 10Ema)
[13:05:58] <wikibugs>	 (03CR) 10Faidon Liambotis: [V: 032 C: 032] apt.w.o: redirect / to wikitech article [puppet] - 10https://gerrit.wikimedia.org/r/330140 (owner: 10Ema)
[13:06:26] <wikibugs>	 (03CR) 10Volans: Add schema support (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823) (owner: 10Giuseppe Lavagetto)
[13:08:09] <icinga-wm>	 RECOVERY - Disk space on elastic1018 is OK: DISK OK
[13:11:59] <wikibugs>	 (03PS6) 10Faidon Liambotis: Linting changes for docker/etcd/kubernetes profiles [puppet] - 10https://gerrit.wikimedia.org/r/334303 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[13:12:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Linting changes for docker/etcd/kubernetes profiles [puppet] - 10https://gerrit.wikimedia.org/r/334303 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[13:13:59] <wikibugs>	 (03CR) 10Volans: "recheck" [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans)
[13:14:45] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: Add schema support (033 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823) (owner: 10Giuseppe Lavagetto)
[13:15:02] <wikibugs>	 (03PS7) 10Faidon Liambotis: Linting changes for docker/etcd/kubernetes profiles [puppet] - 10https://gerrit.wikimedia.org/r/334303 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[13:15:49] <wikibugs>	 (03PS1) 10Gehel: elasticsearch - reimage elastic10(25|28|29|30) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338761 (https://phabricator.wikimedia.org/T151326)
[13:16:49] <icinga-wm>	 PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:17:40] <wikibugs>	 (03CR) 10Gehel: [C: 032] elasticsearch - reimage elastic10(25|28|29|30) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338761 (https://phabricator.wikimedia.org/T151326) (owner: 10Gehel)
[13:17:46] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=elastic10(25|28|29|30).eqiad.wmnet
[13:17:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:05] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3040538 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1025.eqiad.wmnet'] ``` The...
[13:21:27] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3040539 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1028.eqiad.wmnet'] ``` The...
[13:21:31] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3040540 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1029.eqiad.wmnet'] ``` The...
[13:21:52] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3040541 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1030.eqiad.wmnet'] ``` The...
[13:24:29] <icinga-wm>	 PROBLEM - salt-minion processes on puppetmaster1001 is CRITICAL: PROCS CRITICAL: 5 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:26:30] <wikibugs>	 (03PS1) 10Faidon Liambotis: aptrepo: remove most external sources from precise [puppet] - 10https://gerrit.wikimedia.org/r/338762
[13:27:00] <wikibugs>	 (03CR) 10Faidon Liambotis: [V: 032 C: 032] aptrepo: remove most external sources from precise [puppet] - 10https://gerrit.wikimedia.org/r/338762 (owner: 10Faidon Liambotis)
[13:28:13] <wikibugs>	 (03PS2) 10Hashar: [throttle] New rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338128 (https://phabricator.wikimedia.org/T158312) (owner: 10Urbanecm)
[13:29:07] <hashar>	 Urbanecm: can you check my rebase / conflict fix on https://gerrit.wikimedia.org/r/338128 please ?
[13:29:09] <hashar>	 and I will deploy it
[13:30:02] <Urbanecm>	 Yep, working on it. 
[13:30:04] <wikibugs>	 06Operations: Manage apt sources via puppet? - https://phabricator.wikimedia.org/T158562#3040563 (10MoritzMuehlenhoff)
[13:31:19] <icinga-wm>	 PROBLEM - Disk space on elastic1023 is CRITICAL: DISK CRITICAL - free space: /srv 60914 MB (12% inode=99%)
[13:33:03] <hashar>	 Urbanecm: triple checked and it looks good to me
[13:34:43] <wikibugs>	 (03CR) 10Faidon Liambotis: "LGTM to me, would love to see a CR from Luca/Andrew." [puppet] - 10https://gerrit.wikimedia.org/r/334317 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[13:35:01] <Urbanecm>	 Yeah, looks good. 
[13:35:06] <wikibugs>	 (03CR) 10Hashar: [C: 032] [throttle] New rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338128 (https://phabricator.wikimedia.org/T158312) (owner: 10Urbanecm)
[13:35:10] <marostegui>	 !log Transferring dbstore1001:/srv/backups (the last 2 backups) to dbstore2001:/srv/backup/dbstore1001 - T153768
[13:35:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:16] <stashbot>	 T153768: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768
[13:36:42] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 031] "LGTM, anyone else?" [puppet] - 10https://gerrit.wikimedia.org/r/334303 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[13:36:57] <wikibugs>	 (03Merged) 10jenkins-bot: [throttle] New rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338128 (https://phabricator.wikimedia.org/T158312) (owner: 10Urbanecm)
[13:37:42] <wikibugs>	 (03CR) 10jenkins-bot: [throttle] New rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338128 (https://phabricator.wikimedia.org/T158312) (owner: 10Urbanecm)
[13:40:38] <hashar>	 tested on mwdebug1001
[13:40:48] <Urbanecm>	 hashar, throttle rules are testable?
[13:41:02] <hashar>	 well 
[13:41:08] <hashar>	 just making sure that the site is not fatalling out :}
[13:41:11] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/throttle.php: [throttle] New rule - T158312 (duration: 00m 42s)
[13:41:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:41:17] <stashbot>	 T158312: Lift registration cap from an IP on en.wp for event on 2017-03-08 - https://phabricator.wikimedia.org/T158312
[13:41:23] <Urbanecm>	 Understand. 
[13:41:23] <hashar>	 Urbanecm: deployed!
[13:41:28] <Urbanecm>	 Thank for your deploy!
[13:43:49] <icinga-wm>	 RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[13:45:39] <icinga-wm>	 PROBLEM - Check systemd state on mx2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[13:45:49] <icinga-wm>	 PROBLEM - Exim SMTP on mx2001 is CRITICAL: connect to address 208.80.153.45 and port 25: Connection refused
[13:46:40] <icinga-wm>	 RECOVERY - Check systemd state on mx2001 is OK: OK - running: The system is fully operational
[13:46:49] <icinga-wm>	 RECOVERY - Exim SMTP on mx2001 is OK: OK - Certificate mail.wikimedia.org will expire on Mon 23 Oct 2017 06:01:00 PM UTC.
[13:48:29] <icinga-wm>	 RECOVERY - salt-minion processes on puppetmaster1001 is OK: PROCS OK: 4 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:49:29] <moritzm>	 !log installing remaining lcms security updates
[13:49:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:08] <wikibugs>	 (03CR) 10DCausse: "PS9 compiler output: https://puppet-compiler.wmflabs.org/5509" [puppet] - 10https://gerrit.wikimedia.org/r/333969 (https://phabricator.wikimedia.org/T155578) (owner: 10EBernhardson)
[13:52:54] <gehel>	 !log resetting ownership of new .wsp files for wdqs1002 on graphite[12]001
[13:53:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:05] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3040642 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1028.eqiad.wmnet'] ```  and were **ALL** successful.
[13:53:29] <gehel>	 godog: ^ I found 2 files with strange ownership on graphite servers (owned by root instead of _graphite)
[13:54:07] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3040655 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1030.eqiad.wmnet'] ```  Of which those **FAILED**: ``` set(['elastic1030.eqi...
[13:54:09] <phuedx>	 hashar: are there any swats going on today?
[13:54:21] <phuedx>	 (sorry for the direct ping, but you seem active ;) )
[13:54:34] <hashar>	 phuedx: yeah I was talking about it in -releng
[13:54:44] <hashar>	 supposedly there is no deployment on a US holiday
[13:54:49] <phuedx>	 ah
[13:55:01] <phuedx>	 no worries
[13:55:02] <hashar>	 but I guess if they are very trivial  we can do it :)
[13:55:02] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3040673 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1025.eqiad.wmnet'] ```  and were **ALL** successful.
[13:55:09] <gehel>	 godog: I assume this is linked to the disk space issue (those metrics were created around that time)
[13:55:14] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3040676 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1029.eqiad.wmnet'] ```  and were **ALL** successful.
[13:55:25] <hashar>	 the one I pushed was a throttling rule  which is well covered with tests and imho can be done at anytime 
[13:55:39] <icinga-wm>	 PROBLEM - puppet last run on analytics1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:55:52] <hashar>	 phuedx: so depends on the patch you wanna push and how I confident we will feel about pushing it now :}
[13:56:15] <addshore>	 hashar: ahhh, is that why there is no deployment calander?
[13:56:22] <hashar>	 I guess
[13:56:26] <phuedx>	 hashar: i've just come back off of holiday and the folk responsible for the change are on holiday today ;)
[13:56:36] <phuedx>	 i also work a short monday
[13:56:41] <hashar>	 I think usually greg updates the [[Deployments]] page on Friday
[13:56:48] <phuedx>	 so, those things considered, i'll hold off until tomo
[13:57:14] <hashar>	 phuedx: sounds better :)  and we can pair it together tomorrow
[13:57:32] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 04-1] "Yeah, I had a closer look: this won't work. GnuTLS expects a cipher string different than OpenSSL's, so our cipher list won't work. The ma" [puppet] - 10https://gerrit.wikimedia.org/r/335232 (owner: 10BBlack)
[13:57:34] <phuedx>	 \o/
[13:58:38] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 031] Linting fixes (multiple modules) [puppet] - 10https://gerrit.wikimedia.org/r/334317 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys)
[13:59:05] <wikibugs>	 (03CR) 10Ema: [C: 032] Release 4.1.5-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/338732 (owner: 10Ema)
[13:59:11] <wikibugs>	 (03CR) 10Ema: [V: 032 C: 032] Release 4.1.5-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/338732 (owner: 10Ema)
[14:00:07] <wikibugs>	 (03PS3) 10Muehlenhoff: ldap::client::utils: Move to require_package [puppet] - 10https://gerrit.wikimedia.org/r/338320
[14:03:33] <icinga-wm>	 PROBLEM - puppet last run on wtp1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:03:33] <icinga-wm>	 PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.200 second response time
[14:06:58] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] ldap::client::utils: Move to require_package [puppet] - 10https://gerrit.wikimedia.org/r/338320 (owner: 10Muehlenhoff)
[14:08:23] <icinga-wm>	 RECOVERY - Disk space on elastic1023 is OK: DISK OK
[14:10:16] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic10(25|28|29|30).eqiad.wmnet
[14:10:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:33] <icinga-wm>	 RECOVERY - puppet last run on analytics1028 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[14:25:28] <wikibugs>	 (03PS1) 10Gehel: elasticsearch - reimage elastic10(26|31|36|40) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338768 (https://phabricator.wikimedia.org/T151326)
[14:25:56] <wikibugs>	 (03Abandoned) 10Gehel: jessie installs: adding rootdelay=90 to kernel options [puppet] - 10https://gerrit.wikimedia.org/r/337804 (https://phabricator.wikimedia.org/T149845) (owner: 10Gehel)
[14:28:08] <wikibugs>	 (03PS1) 10Hashar: contint: slave role for Saucelabs jobs [puppet] - 10https://gerrit.wikimedia.org/r/338770
[14:30:06] <ema>	 !log varnish 4.1.5-1wm1 uploaded to apt.w.o
[14:30:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:30:33] <icinga-wm>	 RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.232 second response time
[14:30:40] <wikibugs>	 (03CR) 10Volans: "From a quick look around seems with few lines is possible to get the equivalence of the ciphers based on the hex IDs." [puppet] - 10https://gerrit.wikimedia.org/r/335232 (owner: 10BBlack)
[14:31:33] <icinga-wm>	 RECOVERY - puppet last run on wtp1011 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[14:31:33] <icinga-wm>	 PROBLEM - puppet last run on mw1264 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:31:47] <wikibugs>	 (03PS10) 10Gehel: Update elasticsearch module for es5 compatability [puppet] - 10https://gerrit.wikimedia.org/r/333969 (https://phabricator.wikimedia.org/T155578) (owner: 10EBernhardson)
[14:32:00] <ema>	 !log upgrading pinkunicorn to varnish 4.1.5-1wm1
[14:32:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:38] <wikibugs>	 (03CR) 10Gehel: [C: 031] Elastic 5.2.1 plugins [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/338756 (owner: 10DCausse)
[14:39:47] <wikibugs>	 (03CR) 10Gehel: [C: 032] Update elasticsearch module for es5 compatability [puppet] - 10https://gerrit.wikimedia.org/r/333969 (https://phabricator.wikimedia.org/T155578) (owner: 10EBernhardson)
[14:41:24] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: Degraded RAID on relforge1001 - https://phabricator.wikimedia.org/T156663#3040850 (10Gehel) @Cmjohnson any news on that disk?
[14:42:30] <wikibugs>	 (03PS2) 10Hashar: ontint: slave role for Saucelabs jobs [puppet] - 10https://gerrit.wikimedia.org/r/338770
[14:43:23] <icinga-wm>	 PROBLEM - puppet last run on elastic1047 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:43:33] <icinga-wm>	 PROBLEM - puppet last run on relforge1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 34 seconds ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:43:43] <icinga-wm>	 PROBLEM - puppet last run on elastic2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:43:43] <icinga-wm>	 PROBLEM - puppet last run on elastic2022 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:44:11] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: relforge is still on elasticsearch 2.x for a few days [puppet] - 10https://gerrit.wikimedia.org/r/338774
[14:44:23] <icinga-wm>	 PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:44:32] <gehel>	 puppet failures above are mine... checking
[14:44:43] <icinga-wm>	 PROBLEM - puppet last run on elastic1034 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:44:53] <icinga-wm>	 PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:45:09] <godog>	 gehel: shouldn't be linked no, what files?
[14:45:43] <icinga-wm>	 PROBLEM - puppet last run on elastic2017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:45:45] <wikibugs>	 (03PS3) 10Hashar: contint: slave role for Saucelabs jobs [puppet] - 10https://gerrit.wikimedia.org/r/338770
[14:46:12] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: do not manage plugin directory yet [puppet] - 10https://gerrit.wikimedia.org/r/338775
[14:46:23] <icinga-wm>	 PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:46:43] <icinga-wm>	 PROBLEM - puppet last run on elastic1046 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:46:51] <gehel>	 godog: just a sec, fixing my crap...
[14:47:01] <wikibugs>	 (03CR) 10Gehel: [V: 032 C: 032] elasticsearch: do not manage plugin directory yet [puppet] - 10https://gerrit.wikimedia.org/r/338775 (owner: 10Gehel)
[14:47:10] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "Cherry picked PS3 on the CI puppet master.  Play testing it on the instance saucelabs-01.integration.eqiad.wmflabs" [puppet] - 10https://gerrit.wikimedia.org/r/338770 (owner: 10Hashar)
[14:47:33] <icinga-wm>	 PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:47:43] <icinga-wm>	 PROBLEM - puppet last run on elastic2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:47:43] <icinga-wm>	 PROBLEM - puppet last run on elastic2033 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:48:34] <icinga-wm>	 PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:48:35] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: do not manage plugin dir yet [puppet] - 10https://gerrit.wikimedia.org/r/338776
[14:48:44] <wikibugs>	 (03CR) 10Gehel: [V: 032 C: 032] elasticsearch: do not manage plugin dir yet [puppet] - 10https://gerrit.wikimedia.org/r/338776 (owner: 10Gehel)
[14:49:19] <ema>	 !log cp2002, cp4008: libssl1.1 upgraded to 1.1.0e-1+wmf1 and libevent-2.0-5 upgraded to 2.0.21-stable-2+deb8u1
[14:49:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:23] <icinga-wm>	 PROBLEM - puppet last run on elastic1041 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:49:24] <icinga-wm>	 PROBLEM - puppet last run on elastic1040 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:49:34] <icinga-wm>	 PROBLEM - puppet last run on elastic1049 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:49:53] <icinga-wm>	 PROBLEM - puppet last run on elastic2008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:50:23] <icinga-wm>	 PROBLEM - puppet last run on elastic1045 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/elasticsearch/plugins]
[14:50:43] <icinga-wm>	 PROBLEM - puppet last run on elastic2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:50:44] <icinga-wm>	 PROBLEM - puppet last run on elastic2032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:50:53] <icinga-wm>	 RECOVERY - puppet last run on elastic2008 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[14:51:23] <icinga-wm>	 PROBLEM - puppet last run on elastic1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:51:27] <wikibugs>	 (03PS2) 10Gehel: elasticsearch: relforge is still on elasticsearch 2.x for a few days [puppet] - 10https://gerrit.wikimedia.org/r/338774
[14:51:33] <icinga-wm>	 RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[14:51:52] <wikibugs>	 (03PS4) 10Filippo Giunchedi: cache: move graphite/performance to graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/338745 (https://phabricator.wikimedia.org/T157022)
[14:53:19] <gehel>	 godog: the graphite .wsp ownership was probably a mistake during the move of those metrics last week. It was fine on other servers, and as this one was being reimaged (actually importing data), I did not check it until today
[14:55:05] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs1002.eqiad.wmnet
[14:55:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:56:09] <godog>	 gehel: ok! let me know what metric/file is affected if you see the same again
[14:56:23] <gehel>	 godog: yep!
[14:56:35] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] cache: move graphite/performance to graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/338745 (https://phabricator.wikimedia.org/T157022) (owner: 10Filippo Giunchedi)
[14:56:58] <gehel>	 godog: I'm confident that this was me doing an error while moving those. No need to dig further.
[14:58:08] <godog>	 gehel: ack, FWIW to avoid similar things I usually su -s /bin/bash _graphite
[14:58:36] <gehel>	 godog: yeah, that's a solution...
[14:58:58] <wikibugs>	 (03PS3) 10Gehel: elasticsearch: relforge is still on elasticsearch 2.x for a few days [puppet] - 10https://gerrit.wikimedia.org/r/338774
[14:59:35] <wikibugs>	 06Operations, 13Patch-For-Review: Upgrade fluorine to trusty/jessie - https://phabricator.wikimedia.org/T123728#3040931 (10fgiunchedi)
[15:00:33] <icinga-wm>	 RECOVERY - puppet last run on mw1264 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[15:02:06] <wikibugs>	 (03PS1) 10Gehel: elasticsearch: force the creation of the plugins directory symlink [puppet] - 10https://gerrit.wikimedia.org/r/338781
[15:03:30] <wikibugs>	 (03CR) 10Gehel: [C: 032] elasticsearch: relforge is still on elasticsearch 2.x for a few days [puppet] - 10https://gerrit.wikimedia.org/r/338774 (owner: 10Gehel)
[15:05:34] <icinga-wm>	 RECOVERY - puppet last run on relforge1002 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[15:09:23] <icinga-wm>	 PROBLEM - puppet last run on labvirt1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:10:43] <icinga-wm>	 RECOVERY - puppet last run on elastic2022 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[15:11:23] <icinga-wm>	 RECOVERY - puppet last run on elastic1047 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[15:11:53] <icinga-wm>	 RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[15:12:23] <icinga-wm>	 RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[15:12:33] <icinga-wm>	 RECOVERY - puppet last run on elastic1034 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[15:12:43] <icinga-wm>	 RECOVERY - puppet last run on elastic2017 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[15:12:43] <icinga-wm>	 RECOVERY - puppet last run on elastic2002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[15:14:23] <icinga-wm>	 RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[15:14:43] <icinga-wm>	 RECOVERY - puppet last run on elastic1046 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[15:15:33] <icinga-wm>	 RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[15:15:43] <icinga-wm>	 RECOVERY - puppet last run on elastic2033 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[15:15:43] <icinga-wm>	 RECOVERY - puppet last run on elastic2004 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[15:16:23] <icinga-wm>	 RECOVERY - puppet last run on elastic1041 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[15:17:23] <icinga-wm>	 RECOVERY - puppet last run on elastic1045 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[15:17:24] <icinga-wm>	 RECOVERY - puppet last run on elastic1040 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[15:17:33] <icinga-wm>	 RECOVERY - puppet last run on elastic1049 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[15:18:43] <icinga-wm>	 RECOVERY - puppet last run on elastic2013 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[15:19:43] <icinga-wm>	 RECOVERY - puppet last run on elastic2032 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[15:20:23] <icinga-wm>	 RECOVERY - puppet last run on elastic1035 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[15:30:24] <wikibugs>	 06Operations, 10DBA, 10procurement: New DBs purchase: codfw and eqiad final figures - https://phabricator.wikimedia.org/T158580#3040989 (10Marostegui)
[15:41:23] <icinga-wm>	 RECOVERY - puppet last run on labvirt1001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[16:05:18] <wikibugs>	 (03PS4) 10Hashar: contint: slave role for Saucelabs jobs [puppet] - 10https://gerrit.wikimedia.org/r/338770
[16:18:23] <icinga-wm>	 PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:20:06] <wikibugs>	 (03PS13) 10Giuseppe Lavagetto: Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823)
[16:28:27] <wikibugs>	 (03PS5) 10Hashar: contint: slave role for Saucelabs jobs [puppet] - 10https://gerrit.wikimedia.org/r/338770
[16:28:37] <wikibugs>	 (03PS1) 10KartikMistry: Deploy Compact Language Links in Swedish Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338790 (https://phabricator.wikimedia.org/T157114)
[16:31:53] <wikibugs>	 (03PS1) 10Reedy: Disable DisableAccount on two wikis were no disabled users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338792 (https://phabricator.wikimedia.org/T106067)
[16:36:14] <wikibugs>	 (03Abandoned) 10Ema: Allow misc directors to specify url path conditions as well as Host conditions [puppet] - 10https://gerrit.wikimedia.org/r/322964 (owner: 10Ottomata)
[16:39:54] <wikibugs>	 06Operations, 10DBA: Puppetize grants for mysql backups on dbstore hosts - https://phabricator.wikimedia.org/T111929#3041133 (10jcrespo)
[16:39:57] <wikibugs>	 (03PS2) 10Reedy: Disable DisableAccount on wikis where there are no disabled users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338792 (https://phabricator.wikimedia.org/T106067)
[16:40:32] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=elastic10(26|31|36|40).eqiad.wmnet
[16:40:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:40:37] <wikibugs>	 (03PS2) 10Gehel: elasticsearch - reimage elastic10(26|31|36|40) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338768 (https://phabricator.wikimedia.org/T151326)
[16:41:08] <wikibugs>	 06Operations: Restructure our internal repositories further - https://phabricator.wikimedia.org/T158583#3041135 (10MoritzMuehlenhoff)
[16:47:12] <wikibugs>	 (03CR) 10Gehel: [C: 032] elasticsearch - reimage elastic10(26|31|36|40) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338768 (https://phabricator.wikimedia.org/T151326) (owner: 10Gehel)
[16:47:23] <icinga-wm>	 RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures
[16:47:50] <wikibugs>	 (03PS1) 10Jcrespo: Add python3-tabulate package and labsdb password for clients-only [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/338793 (https://phabricator.wikimedia.org/T146149)
[16:56:36] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3041166 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1040.eqiad.wmnet'] ``` The...
[16:57:33] <icinga-wm>	 PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating
[17:04:10] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: rebase module to the latest version and separate labs pass [puppet] - 10https://gerrit.wikimedia.org/r/338797 (https://phabricator.wikimedia.org/T104900)
[17:04:57] <wikibugs>	 (03PS2) 10Jcrespo: Add python3-tabulate package and labsdb password for clients-only [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/338793 (https://phabricator.wikimedia.org/T146149)
[17:06:31] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: rebase module to the latest version and separate labs pass [puppet] - 10https://gerrit.wikimedia.org/r/338797 (https://phabricator.wikimedia.org/T104900)
[17:07:32] <wikibugs>	 (03CR) 10Marostegui: [C: 031] Add python3-tabulate package and labsdb password for clients-only [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/338793 (https://phabricator.wikimedia.org/T146149) (owner: 10Jcrespo)
[17:16:10] <wikibugs>	 (03CR) 10Marostegui: [C: 031] "Looks good: https://puppet-compiler.wmflabs.org/5510/" [puppet] - 10https://gerrit.wikimedia.org/r/338797 (https://phabricator.wikimedia.org/T104900) (owner: 10Jcrespo)
[17:16:12] <wikibugs>	 (03CR) 10Jcrespo: [V: 032 C: 032] Add python3-tabulate package and labsdb password for clients-only [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/338793 (https://phabricator.wikimedia.org/T146149) (owner: 10Jcrespo)
[17:16:28] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: rebase module to the latest version and separate labs pass [puppet] - 10https://gerrit.wikimedia.org/r/338797 (https://phabricator.wikimedia.org/T104900)
[17:18:19] <wikibugs>	 (03PS4) 10Jcrespo: mariadb: rebase module to the latest version and separate labs pass [puppet] - 10https://gerrit.wikimedia.org/r/338797 (https://phabricator.wikimedia.org/T104900)
[17:20:17] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3041210 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1040.eqiad.wmnet'] ```  and were **ALL** successful.
[17:22:46] <wikibugs>	 (03PS1) 10Jcrespo: Add labsdb root pass fake string to make puppet compiler work [labs/private] - 10https://gerrit.wikimedia.org/r/338800 (https://phabricator.wikimedia.org/T104900)
[17:24:44] <wikibugs>	 (03CR) 10Marostegui: [C: 031] Add labsdb root pass fake string to make puppet compiler work [labs/private] - 10https://gerrit.wikimedia.org/r/338800 (https://phabricator.wikimedia.org/T104900) (owner: 10Jcrespo)
[17:25:26] <wikibugs>	 (03CR) 10Jcrespo: [V: 032 C: 032] Add labsdb root pass fake string to make puppet compiler work [labs/private] - 10https://gerrit.wikimedia.org/r/338800 (https://phabricator.wikimedia.org/T104900) (owner: 10Jcrespo)
[17:27:03] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3041214 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1036.eqiad.wmnet'] ``` The...
[17:27:07] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3041215 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1031.eqiad.wmnet'] ``` The...
[17:29:27] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] "It removes the socket on all servers https://puppet-compiler.wmflabs.org/5511/db2034.codfw.wmnet/ , but I think that is something we want," [puppet] - 10https://gerrit.wikimedia.org/r/338797 (https://phabricator.wikimedia.org/T104900) (owner: 10Jcrespo)
[17:30:34] <wikibugs>	 (03CR) 10Jcrespo: [V: 032 C: 032] mariadb: rebase module to the latest version and separate labs pass [puppet] - 10https://gerrit.wikimedia.org/r/338797 (https://phabricator.wikimedia.org/T104900) (owner: 10Jcrespo)
[17:36:54] <wikibugs>	 (03PS1) 10Filippo Giunchedi: role: install apache mod_proxy_http [puppet] - 10https://gerrit.wikimedia.org/r/338803
[17:36:56] <wikibugs>	 (03PS1) 10Filippo Giunchedi: uwsgi: parametrize service settings [puppet] - 10https://gerrit.wikimedia.org/r/338804
[17:36:58] <wikibugs>	 (03PS1) 10Filippo Giunchedi: coal: disable uwsgi autoload [puppet] - 10https://gerrit.wikimedia.org/r/338805
[17:42:45] <icinga-wm>	 PROBLEM - keystone http on labtestcontrol2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 78 bytes in 0.073 second response time
[17:47:29] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3041300 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1026.eqiad.wmnet'] ``` The...
[17:49:39] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3041304 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1031.eqiad.wmnet'] ```  and were **ALL** successful.
[17:52:46] <Pchelolo>	 !log update change-prop to 30873ebd5
[17:52:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:00] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3041321 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1036.eqiad.wmnet'] ```  and were **ALL** successful.
[17:54:28] <logmsgbot>	 !log ppchelko@tin Started deploy [changeprop/deploy@30873eb]: Update change-prop to 30873ebd5: enabling DNS caching for T158338
[17:54:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:34] <stashbot>	 T158338: Set up DNS caching for node services - https://phabricator.wikimedia.org/T158338
[17:56:10] <logmsgbot>	 !log ppchelko@tin Finished deploy [changeprop/deploy@30873eb]: Update change-prop to 30873ebd5: enabling DNS caching for T158338 (duration: 01m 41s)
[17:56:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:57:00] <wikibugs>	 (03Abandoned) 10Paladox: Up max_execution to 15 from 10 in phabricator/php.ini.erb [puppet] - 10https://gerrit.wikimedia.org/r/335714 (https://phabricator.wikimedia.org/T125357) (owner: 10Paladox)
[17:57:03] <wikibugs>	 (03Abandoned) 10Paladox: Up post_max_size to 50M in phabricator's php.ini file [puppet] - 10https://gerrit.wikimedia.org/r/335717 (owner: 10Paladox)
[17:57:25] <wikibugs>	 (03PS1) 10Volans: Improvements in the metadata and package setup [software/cumin] - 10https://gerrit.wikimedia.org/r/338808 (https://phabricator.wikimedia.org/T154588)
[17:57:49] <wikibugs>	 (03PS14) 10Giuseppe Lavagetto: Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823)
[17:57:52] <wikibugs>	 (03PS14) 10Paladox: Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475
[17:58:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/5514/" [puppet] - 10https://gerrit.wikimedia.org/r/338805 (owner: 10Filippo Giunchedi)
[17:59:02] <wikibugs>	 (03PS2) 10Filippo Giunchedi: role: install apache mod_proxy_http [puppet] - 10https://gerrit.wikimedia.org/r/338803
[17:59:10] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] role: install apache mod_proxy_http [puppet] - 10https://gerrit.wikimedia.org/r/338803 (owner: 10Filippo Giunchedi)
[18:04:34] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 031] syntax: ignore stdlib Puppet 4 manifests [puppet] - 10https://gerrit.wikimedia.org/r/338143 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar)
[18:04:41] <icinga-wm>	 PROBLEM - puppet last run on labtestservices2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/usr/local/bin/labs-ip-alias-dump.py]
[18:08:57] <wikibugs>	 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3041361 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1026.eqiad.wmnet'] ```  and were **ALL** successful.
[18:14:36] <wikibugs>	 (03PS1) 10Jcrespo: [WIP] Create scripts for batch sql execution [puppet] - 10https://gerrit.wikimedia.org/r/338809
[18:15:13] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-2] "Not intended for puppet deploy." [puppet] - 10https://gerrit.wikimedia.org/r/338809 (owner: 10Jcrespo)
[18:15:40] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] [WIP] Create scripts for batch sql execution [puppet] - 10https://gerrit.wikimedia.org/r/338809 (owner: 10Jcrespo)
[18:17:21] <icinga-wm>	 PROBLEM - Disk space on elastic1023 is CRITICAL: DISK CRITICAL - free space: /srv 61935 MB (12% inode=99%)
[18:18:49] <wikibugs>	 (03PS2) 10Jcrespo: [WIP] Create scripts for batch sql execution [puppet] - 10https://gerrit.wikimedia.org/r/338809
[18:29:04] <wikibugs>	 (03PS1) 10Gehel: elasticsearch - reimage elastic10(27|32|37|41) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338811 (https://phabricator.wikimedia.org/T151326)
[18:29:24] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic10(26|31|36|40).eqiad.wmnet
[18:29:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:30:11] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=elastic10(27|32|37|41).eqiad.wmnet
[18:30:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:30:25] <icinga-wm>	 RECOVERY - Disk space on elastic1023 is OK: DISK OK
[18:30:49] <tabbycat>	 Reedy: looks like that disableaccount sh*t is going to give us a headache
[18:31:00] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-2] "This simplifies this problem (at the same time that enforces TLS usage):" [puppet] - 10https://gerrit.wikimedia.org/r/338809 (owner: 10Jcrespo)
[18:31:06] <tabbycat>	 iirc what the extension did is to remove the user credentials
[18:32:15] <Reedy>	 I think it still does
[18:32:25] <Reedy>	 Was the group adding done later?
[18:32:48] <Reedy>	 		// While we're not actually turning the user into a "system" user, it
[18:32:48] <Reedy>	 		// has the same end result: all passwords and other authentication
[18:32:48] <Reedy>	 		// credentials removed or set to something invalid, email blanked,
[18:32:48] <Reedy>	 		// token invalidated, and existing sessions dropped. So let's just use
[18:32:48] <Reedy>	 		// that if possible instead of duplicating all the code.
[18:33:39] <tabbycat>	 Reedy: if you disable an account with that extension, the extension added the inactive group (which is default for those on private.dblist)
[18:33:55] <tabbycat>	 the inactive group existed way before though
[18:34:08] <tabbycat>	 and we've been adding blocked accounts to that group too
[18:34:28] <tabbycat>	 I guess there's no problems removing the users from that group?
[18:34:37] <Reedy>	 Probably not
[18:34:40] <tabbycat>	 it does not have any rights attached
[18:35:00] <Reedy>	 Can just do it from the db if we don't care about the log entries
[18:35:17] <tabbycat>	 maybe for when the extension gets removed
[18:35:32] <Reedy>	 Yeah, certainly makes sense there
[18:35:50] <tabbycat>	 so we avoid things like https://phabricator.wikimedia.org/T158413 reedy
[18:36:43] <Reedy>	 It's a mess
[18:36:51] <Reedy>	 All the more reason to get it removed
[18:36:56] <wikibugs>	 (03PS6) 10Hashar: contint: slave role for Saucelabs jobs [puppet] - 10https://gerrit.wikimedia.org/r/338770
[18:38:10] <wikibugs>	 06Operations: Restructure our internal repositories further - https://phabricator.wikimedia.org/T158583#3041423 (10faidon) p:05Triage>03High
[18:38:26] <wikibugs>	 (03CR) 10Hashar: [V: 031 C: 031] "Cherry picked against tip of production branch and on the CI Puppet master. I have provisioned the instances saucelabs-01 02 and 03 with t" [puppet] - 10https://gerrit.wikimedia.org/r/338770 (owner: 10Hashar)
[18:43:56] <tabbycat>	 Reedy: since I've got some time, I'll manually remove the 'inactive' flag for those accounts
[18:44:16] <tabbycat>	 so the script does not run twice on those accounts
[18:44:37] <tabbycat>	 well, on a second thought, no
[18:44:44] <tabbycat>	 too much work xD
[18:45:31] <Reedy>	 		// Try to update block if user is already blocked. Otherwise, attempt to insert a new one.
[18:45:31] <Reedy>	 		$success = $alreadyBlocked ? $block->update() : $block->insert();
[18:47:01] <Reedy>	 The script does remove people from the group when they've been "migrated"
[18:50:04] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] make empty list check shorter and clearer [dumps] - 10https://gerrit.wikimedia.org/r/338747 (owner: 10ArielGlenn)
[18:51:34] <wikibugs>	 (03CR) 10ArielGlenn: [C: 032] for page content dumps, for each numbered part do either ranges or whole dump [dumps] - 10https://gerrit.wikimedia.org/r/338754 (https://phabricator.wikimedia.org/T158517) (owner: 10ArielGlenn)
[19:06:03] <wikibugs>	 (03PS1) 10Tim Starling: Route PHP warnings from the handler into logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338820 (https://phabricator.wikimedia.org/T45086)
[19:06:41] <wikibugs>	 06Operations, 10Traffic, 07Mobile: Samsung Internet's desktop mode getting redirected to mobile site - https://phabricator.wikimedia.org/T158599#3041524 (10MaxSem)
[19:07:20] <logmsgbot>	 !log ariel@tin Started deploy [dumps/dumps@9757356]: fix retries of page content dumps with checkpoint, no dup ranges
[19:07:22] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@9757356]: fix retries of page content dumps with checkpoint, no dup ranges (duration: 00m 02s)
[19:07:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:07:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:13:24] <wikibugs>	 (03CR) 10BryanDavis: "Can we start with this just going to the udp2log aggregator until we had an idea of what the actual volume is?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338820 (https://phabricator.wikimedia.org/T45086) (owner: 10Tim Starling)
[19:15:32] <wikibugs>	 (03PS2) 10MarcoAurelio: Configuration changes for wikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338751 (https://phabricator.wikimedia.org/T158516)
[19:15:52] <wikibugs>	 (03CR) 10Gergő Tisza: "Is there any reason to use the -json channel? I thought we were trying to abandon them." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338820 (https://phabricator.wikimedia.org/T45086) (owner: 10Tim Starling)
[19:22:13] <wikibugs>	 (03PS2) 10Tim Starling: Route PHP warnings from the handler into logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338820 (https://phabricator.wikimedia.org/T45086)
[19:22:35] <wikibugs>	 (03PS2) 10MarcoAurelio: Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482)
[19:22:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482) (owner: 10MarcoAurelio)
[19:24:18] <wikibugs>	 (03PS3) 10MarcoAurelio: Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482)
[19:24:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482) (owner: 10MarcoAurelio)
[19:24:50] <tabbycat>	 ...
[19:29:09] <wikibugs>	 (03CR) 10Hashar: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482) (owner: 10MarcoAurelio)
[19:29:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482) (owner: 10MarcoAurelio)
[19:29:40] <tabbycat>	 hashar: maybe because the change it depends-on is still being processed?
[19:29:44] <hashar>	 tabbycat: that one is a glitch in the matrix
[19:29:56] <hashar>	 ah yeah there is a depends-on
[19:30:02] <hashar>	 that does quite make any sense
[19:30:06] <hashar>	 just chain the patches in gerrit
[19:30:17] <hashar>	 git checkout master
[19:30:22] <hashar>	 git reset --hard origin/master
[19:30:23] <hashar>	 git-review -x 338751
[19:30:33] <hashar>	 git-review -x 338632
[19:30:38] <wikibugs>	 (03CR) 10MarcoAurelio: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482) (owner: 10MarcoAurelio)
[19:30:40] <hashar>	 amend the commit message to remove the Depends-On header
[19:30:44] <hashar>	 and send back
[19:30:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482) (owner: 10MarcoAurelio)
[19:30:53] <tabbycat>	 okay
[19:31:01] <hashar>	 Gerrit has built in support for patches that depends on each other
[19:31:04] <hashar>	 just have to send then in the same chain
[19:31:14] <wikibugs>	 (03PS4) 10MarcoAurelio: Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482)
[19:31:21] <hashar>	 eg:   (master)  -> 338751 --> 338632
[19:31:35] <hashar>	 but that only work when the patches are all in the same repository
[19:31:43] <hashar>	 the depends-on: hack is when changes are in different repo
[19:31:57] <tabbycat>	 I've added the depends-on tag in the past w/o problems
[19:32:03] <tabbycat>	 not sure why it fails now
[19:32:14] <hashar>	 I am not sure what is going on in Zuul, but it seems it ignores the depends-on
[19:32:26] <tabbycat>	 + noob dev here, not complicated stuff please :)
[19:32:33] <hashar>	 yeah :}
[19:33:32] <hashar>	 what are you trying to achieve ?
[19:34:29] <paladox>	 hashar the zuul plugin will make it easyer for gerrit to understand depends-on:.
[19:34:48] <paladox>	 by that i mean server side but also client side too
[19:34:49] <hashar>	 paladox: a Gerrit zuul plugin ?
[19:34:55] <paladox>	 hashar yep, let me find it
[19:35:02] <tabbycat>	 hashar: well, not that it really matters much, but I feel I should first have the change I marked as depends-on deployed first, then merge this one that was failing
[19:35:20] <paladox>	 hashar https://gerrit.googlesource.com/plugins/zuul/
[19:35:25] <paladox>	 https://gerrit.googlesource.com/plugins/zuul/+/master/src/main/resources/Documentation/about.md
[19:35:26] <hashar>	 tabbycat: so the easiest is to have them one after the other your local git repo
[19:35:46] <hashar>	 tabbycat: send that to Gerrit and Gerrit will notice one depends on an other automatically
[19:36:06] <hashar>	 paladox: fun. Is openstack using it ?
[19:36:22] <paladox>	 hashar maybe, but not sure. It was developed by zaro.
[19:36:42] <wikibugs>	 (03PS3) 10MarcoAurelio: Configuration changes for wikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338751 (https://phabricator.wikimedia.org/T158516)
[19:36:56] <paladox>	 hashar actually nope, there gerrit version is to old to support the plugin
[19:37:01] <paladox>	 requires gerrit 2.13+
[19:37:01] <hashar>	 paladox: yeah he was sponsored by HPE to work on openstack iirc
[19:37:06] <paladox>	 oh
[19:37:20] <paladox>	 they are planning to update to gerrit 2.13 i think.
[19:37:33] <paladox>	 i did convert zuul to bazel format so should hopefully work on gerrit 2.14 too
[19:37:35] <hashar>	 maybe we should give it a tryeventually
[19:37:43] * tabbycat testing again
[19:37:48] <hashar>	 then I am already swamped in a lot of various stuff :(
[19:37:51] <paladox>	 Yep
[19:38:17] <wikibugs>	 (03PS5) 10MarcoAurelio: Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482)
[19:38:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482) (owner: 10MarcoAurelio)
[19:38:41] <tabbycat>	 okay so it was not that the dependant change was un-rebased
[19:38:43] <tabbycat>	 rv.
[19:38:48] <wikibugs>	 06Operations, 10Traffic, 07Mobile: Samsung Internet's desktop mode getting redirected to mobile site - https://phabricator.wikimedia.org/T158599#3041524 (10revi) Interesting, my Samsung Galaxy A7 (2016)'s bundled Samsung Internet correctly handles 'request desktop version'. Maybe it's for Google Play version...
[19:39:01] <hashar>	 anyway I gotta escape
[19:39:02] <wikibugs>	 (03PS6) 10MarcoAurelio: Removing the 'shellmanagers' group from Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338632 (https://phabricator.wikimedia.org/T158482)
[19:39:03] <hashar>	 and hunt for some food
[19:39:17] <hashar>	 see you tomorrow!
[19:39:28] <revi>	 bye!
[19:48:26] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite2001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[19:48:26] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[20:19:35] <icinga-wm>	 RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active
[20:20:00] <gehel>	 !log reducing concurrent recoveries / relocations to 4 on elasticsearch eqiad
[20:20:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:22:35] <icinga-wm>	 PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating
[20:27:18] <wikibugs>	 (03PS1) 10Papaul: Add mgmt and production DNS for ms-be2028-msbe2039 Bug: T158337 [dns] - 10https://gerrit.wikimedia.org/r/338824 (https://phabricator.wikimedia.org/T158337)
[20:31:21] <gehel>	 !log taking threaddumps and restarting elastic1017 (high load)
[20:31:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:34:55] <icinga-wm>	 PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[20:36:57] <wikibugs>	 06Operations, 10Continuous-Integration-Infrastructure, 10netops: jsduck publish error: index-pack died of signal 15 - https://phabricator.wikimedia.org/T158601#3041635 (10hashar) The jenkins jobs triggered by Zuul clones the repo from the zuul-merger instances on contint1001 / contint2001. They are being ser...
[20:37:41] <wikibugs>	 06Operations, 10Continuous-Integration-Infrastructure, 10netops: git clone over EQIAD (wmflabs)  CODFW timeout due to low bandwidth (~250 KiB/s) - https://phabricator.wikimedia.org/T158601#3041638 (10hashar)
[20:39:45] <icinga-wm>	 PROBLEM - puppet last run on restbase1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:40:21] <wikibugs>	 06Operations, 10Continuous-Integration-Infrastructure, 06Labs, 10netops: git clone over EQIAD (wmflabs)  CODFW timeout due to low bandwidth (~250 KiB/s) - https://phabricator.wikimedia.org/T158601#3041644 (10Paladox)
[20:41:11] <wikibugs>	 06Operations, 10Continuous-Integration-Infrastructure, 06Labs, 10netops: git clone over EQIAD (wmflabs)  CODFW timeout due to low bandwidth (~250 KiB/s) - https://phabricator.wikimedia.org/T158601#3041581 (10Paladox) Is this happening to any other repos? Should we set this as normal or high priority?
[20:41:25] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[20:41:25] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite2001 is OK: OK: Less than 20.00% above the threshold [500.0]
[20:46:35] <icinga-wm>	 RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.029 second response time
[20:50:04] <wikibugs>	 06Operations, 10RESTBase, 06Services (later): enable restbase syslog/file logging - https://phabricator.wikimedia.org/T112648#1641285 (10Pchelolo) I think we have to proceed on this. Right now, without local logging, if something breaks with Logstash (see T158602) we're losing all the logs completely, which...
[21:07:45] <icinga-wm>	 RECOVERY - puppet last run on restbase1016 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[21:10:36] <icinga-wm>	 PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:25] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite2001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[21:14:26] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[21:32:51] <wikibugs>	 (03PS3) 10Zppix: Update the realname from github repo url --> phabricator diffusion [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/338700
[21:32:56] <wikibugs>	 (03CR) 10Zppix: [C: 031] Update the realname from github repo url --> phabricator diffusion [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/338700 (owner: 10Zppix)
[21:38:35] <icinga-wm>	 RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[21:40:25] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[21:40:25] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite2001 is OK: OK: Less than 20.00% above the threshold [500.0]
[22:05:52] <wikibugs>	 (03CR) 10Hashar: [C: 031] Update the realname from github repo url --> phabricator diffusion [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/338700 (owner: 10Zppix)
[22:22:25] <icinga-wm>	 PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:45:45] <icinga-wm>	 PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.32.133 on port 6479
[22:46:45] <icinga-wm>	 RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 3851141 keys, up 112 days 14 hours - replication_delay is 0
[22:46:55] <icinga-wm>	 PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[22:47:55] <icinga-wm>	 RECOVERY - Redis replication status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3850292 keys, up 112 days 14 hours - replication_delay is 0
[22:51:25] <icinga-wm>	 RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[23:18:55] <icinga-wm>	 PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.284 second response time
[23:24:05] <icinga-wm>	 RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.042 second response time
[23:25:04] <wikibugs>	 (03PS1) 10ArielGlenn: add api job handler, config file in yaml, siteinfo props jobs [dumps] - 10https://gerrit.wikimedia.org/r/338899 (https://phabricator.wikimedia.org/T38178)
[23:38:25] <icinga-wm>	 PROBLEM - puppet last run on lithium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:38:35] <icinga-wm>	 RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exportd is active
[23:41:35] <icinga-wm>	 PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is activating